This is a scary find. Props to the Cloudflare team for the fix and Google for cleaning up their search results. But since Google isn’t the only scraper on the internet and other parties might have discovered this already, you might want to seriously consider informing your clients if you’re behind a Cloudflare proxy.
I’ll quote the scariest parts of the find below. Read the entire articles for more context.
It looked like that if an html page hosted behind cloudflare had a specific combination of unbalanced tags, the proxy would intersperse pages of uninitialized memory into the output (kinda like heartbleed, but cloudflare specific and worse for reasons I’ll explain later).
For a sense of scale and impact:
We keep finding more sensitive data that we need to cleanup. I didn’t realize how much of the internet was sitting behind a Cloudflare CDN until this incident.
[…]
I’m finding private messages from major dating sites, full messages from a well-known chat service, online password manager data, frames from adult video sites, hotel bookings. We’re talking full https requests, client IP addresses, full responses, cookies, passwords, keys, data, everything.
In other words: any site behind Cloudflare might have leaked info from other sites hosted behind Cloudflare. Even if your HTML was perfectly balanced and strict, your site might’ve become the victim of another site with imbalanced HTML tags that leaked your data.
Extra details: Cloudflare Incident report on memory leak caused by Cloudflare parser bug