Cache rules everything around me (C.R.E.A.M.)

Reverse proxies and Internet hegemony

Jun 04, 2021

I’m a researcher at the UC Berkeley Center for Long-Term Cybersecurity, where I direct the Daylight Lab. This is a newsletter about cybersecurity and politics—my work as I do it; more than half-baked, less than peer-reviewed. To follow along, subscribe here.

Imagine you wake up and try to Google something, but you get an error message. “Your IP address has been associated with suspicious activity.” Fine. You use DuckDuckGo. But on every site you visit, you get stopped. You have to complete a CAPTCHA. Browsing the web becomes onerous. Nearly impossible.

This is the experience of hundreds of thousands of Internet users. People from countries like Ghana or Uganda are regularly prevented from accessing websites thanks to “security” measures from caching services, also known as reverse proxies.

These reverse proxies effectively decide who gets to see what online.

Reverse proxies are invisible Internet infrastructure. But their power is concentrated in the hands of a few US-based corporations. With their dominance, these companies could—at the request of a US court order, for example—block pretty much any content on the web. But it doesn’t need to be this way. I’ll explain why.

What are reverse proxies?

I’ve written before about how US-domiciled corporations dominate the provision of basic Internet infrastructure. I specifically called out certificate authorities, who (as of 29 May 2021) control 96.4% of traffic online.1

But, amazingly, certificate authorities aren’t the most centralized of the bunch. 97.6% of websites that use reverse proxies use a US-based provider.

The proportion of core Internet infrastructure provisioned by United States-domiciled companies, by marketshare (as of 29 May 2021).

So, what are reverse proxies? Well, if you run a popular website, you might want to put someone in between you and the public. For example, when someone requests spotify.com, Spotify directs them to a reverse proxy like Cloudflare to handle the incoming traffic.

Why? Well, incoming requests can sometimes be hostile. Distributed denial of service (DDoS) attacks bombard websites with requests, effectively making them unavailable. In DDoS attacks, Cloudflare can help block bad actors by tracking their behavior across the web.

Meanwhile, for well-intentioned users, Cloudflare caches frequently-accessed content, placing copies in servers around the world. When you request that content, it’ll deliver you the version closest to you geographically, boosting speed and minimizing energy use.

Reverse proxy providers’ marketshare by jurisdiction. As of 29 May 2021, companies domiciled in the US provision 97.6% of service by marketshare.

There’s nothing intrinsically wrong with reverse proxies. They provide collective defense against DDoS attacks—using large-scale observation to identify and block decentralized threats. (In the early 2010s, the heyday of Anonymous’s DDoS-oriented “hacktivism,” some thought DDoS attacks would be an existential threat to the Internet. Services like Cloudflare have largely neutralized that threat.). Reverse proxies also provide efficiency and speed gains for the average web user. Those gains translate to lower energy uses and reduced greenhouse gas emissions.

The problem with reverse proxies is that they can effectively censor the Internet.

Reverse proxies can block anything

If you’ve ever used Tor, you’ll have noticed that the Internet is all but unusable. You get a CAPTCHA every fifteen seconds. That’s because reverse proxies—most notably Cloudflare—treat traffic as suspicious simply because it comes from Tor.

When used correctly, Tor provides meaningful privacy guarantees. It helps people worldwide evade state censorship. But Tor is much less widely used than it ought to be, in part because Cloudflare makes it such a pain to browse the Internet with.2

Now, Cloudflare has its reasons for blocking Tor. The question isn’t whether Cloudflare is ‘right’ or ‘wrong’ to block Tor—it’s that their opinion matters above everyone else’s. When Cloudflare decides to stop Tor traffic, that’s the end of it—since so much of the web relies on Cloudflare, Tor users really can’t use the (regular, muggle) Internet in any meaningful way. That leaves Tor little practical use outside of the so-called “dark web,” with which it’s popularly associated today.

I use this case to illustrate that these reverse proxies are a hidden vector of extreme— and highly centralized—power. Few Internet users even know that reverse proxies exist, let alone that a tremendous proportion of their traffic routes through them.3 But, since they sit at a critical position in the network infrastructure—between DNS queries (“give me nytimes.com, please”) and servers’ responses (“here’s the New York Times”)—reverse proxies can effectively block everything on the Internet.4

“But wait”—an astute reader might interject—“companies like Spotify choose to use reverse proxies. If one started misbehaving and censoring content, wouldn’t they change reverse proxy providers?”

It depends on how significant the effect is—and who feels that effect. Providers like Cloudflare regularly block traffic originating in countries like Ghana, treating it as intrinsically suspicious. How many companies have stopped using Cloudflare in response? Very few, likely because few large tech companies have engineers who would even notice.

Likewise, how many companies have stopped using Cloudflare because they block Tor—itself a serious issue for people trying to circumvent Internet censorship globally? Again, given Cloudflare’s persistent dominance, the answer seems to be “very few.”5

By extension, the US can block anything

My doomsday scenario is that not so much that Cloudflare starts blocking content. It’s more than the US government compels them to. Reverse proxy providers would probably have to comply with a US court order.

Imagine 97.6% of reverse providers blocking US-originating traffic to Signal messenger. Imagine them blocking Iranian-originating traffic to WhatsApp.

Or, imagine a more targeted scenario. Imagine getting a phone call from your uncle. He says he can’t access Google, he gets weird messages everywhere. He probably got some spyware on his Windows PC, right? Or maybe his Internet is down. Who knows. Little would you know he’d been placed (perhaps mistakenly) on a digital do-not-fly list, passed on to Cloudflare and similar services, instructing them to limit his access.

If this scenario sounds far-fetched, remember: this court-order mechanism is conceptually similar to how the US already blocks content on the Internet.

A better way

Caching content is an environmental imperative. And without collective defense, DDoS attacks would break the Internet for everyone—except those who can afford millions of dollars in hosting bills. I love those services.

I don’t love that a small handful of US-domiciled companies provide them. These companies—and, through its legal and economic levers, the US—wield tremendous power over the Internet.

So, how can we do this better? I think reverse proxies are a perfect use case for a decentralized autonomous organization or DAO. DAOs use distributed mechanisms to perform community governance actions—things like voting or passing network rules. These networks themselves represent computer programs, which could (in theory!) replicate the functionality of a reverse proxy.6

Community governance mechanisms could make reverse proxies fairer. They could give decision-making power to those Cloudflare doesn’t hear—particularly people in Africa, Central Asia, and elsewhere in the “Global South.” Remember, it’s those users whose requests are often getting blocked in the name of security. They should get a say over what counts as security—security for whom?7

Decentralized reverse proxies might even be more effective, more responsive to emergent threats.8

Centralization vs. security: a false trade?

More broadly, reverse proxies are one of those cases where decentralization and security appear to be in tension—but probably aren’t. Superficially, it seems like we gain more from the centralization of reverse proxies than we lose from their potential abuse. And, since we need “big data” to block DDoS attacks, centralized surveillance is a necessary evil. Right?

I wonder what Ghanans would say about that tradeoff. Besides, recent technical advances9 throw into question the assumption that we need a centralized surveillor in the first place. In short, I’m not convinced that there is a fundamental or non-negotiable tradeoff between centralization and security for reverse proxies. Decentralized versions could be fairer and better.

I think we just haven’t tried hard enough.

Thanks to the Internet Society for funding this work.

Between centralization among certificate authorities and weaknesses in both the CA system and the domain name system, a sufficiently-resourced actor could redirect a request (e.g., nytimes.com) and issue a valid certificate for it (i.e., the lock symbol would appear in your URL bar). Imagine going to the New York Times and seeing a near-perfect forgery, a few words strategically off-kilter. Would you even notice?

The guarantees Tor provides are what most people think happens when they use a private browser window. In reality, a private browser window prevents sites from setting cookies and from seeing cookies they’ve placed in the past. In practice, that alone provides minimal privacy. Someone on your network (like your boss or your company) can still see what you’re doing online. And, in practice, sites can use means other than cookies to track you, like browser fingerprinting.

In contrast, Tor (at least as implemented by the Tor browser bundle) prevents sites from knowing who you are and someone with access to your network from knowing what you’re doing. That doesn’t mean Tor provides perfect anonymity—far from it. But, it’s the best option available for most people, and I think it matches people’s idea of what’s supposed to happen when they use incognito mode in their browsers. In my opinion, Chrome’s incognito mode ought to use Tor by default. (Brave has an optional Tor private browsing mode. I wish they’d replace their regular incognito mode with it.)

If the website you’re communicating with uses TLS, configured correctly, the reverse proxy won’t be able to see the content of your requests. But it will know that you’re making requests, and that metadata itself can be revealing enough. As General Michael Hayden, Obama’s director of the CIA and NSA, once said, “We kill people based on metadata.”

Remember, in the case of DDoS defense, the fact that reverse proxies can block any incoming traffic is exactly why companies use them.

Besides, switching reverse proxy providers isn’t as easy as you might think. If you’ve ever used a ‘standard’ web library like Bootstrap or JQuery on your webpage, you probably used a version hosted on a reverse proxy—probably Cloudflare. Even if you stop using Cloudflare, the libraries you depend on might still. Switching all that stuff over can be a pain at best, and, it worst, could temporarily break your website at worst. The momentum to stick with one’s provider is high.

For DDoS protection, DAO members could vote on rules and algorithms for blocking traffic. Meanwhile, a system like IPFS could handle the content delivery layer. My questions are: (1) could this system scale well enough to challenge something like Cloudflare? (2) how would we equitably distribute ownership, raising marginalized voices while minimizing the potential for re-centralization and abuse?

Who protects whom from what? The core truth of cybersecurity is that this question is (and always has been) political.

Imagine if the system rewarded people monetarily for identifying threats that the rest of the system ended up agreeing were legitimate. That could block the right thing more readily than relying on a single provider’s (potentially faulty, likely biased) algorithms.

Namely advances in distributed systems, differential privacy, and homomorphic encryption.

elsehow