Architecting Websites For The HTTP/2 Era

The arrival of HTTP/2 will require a bit of re-thinking how we handle websites (and webservers). This makes it a good time to reflect on what those changes can bring.

This post is based entirely on theory (the HTTP2 spec), as HTTP/2 is hard to test today. Major browsers support HTTP/2, but very little servers do. And often, it’s unclear which draft of the HTTP/2 spec they do support.

The entire HTTP/2 Spec is available for reading on Github and is highly recommended. It covers a lot more edge-cases than this article does.

And if you’re up for a bit more reading, the HTTP/1.1 spec is also worth it, if only for comparisons’ sake.

Latest benchmarks have shown that changes are in fact needed in order to fully use HTTP/2. Not optimising the way data is transferred could end up hurting performance on the HTTP/2 protocol.

Some notes
Presentation: HTTP/2, The Next Version of the Internet
An introduction to HTTP/2
Less domain sharding
Less concatenation
Is HTTPS/TLS required?
Compression
Server-side push
Request priorities
HTTP methods and status codes
HTTP/2 and Varnish
The rise of alternative webservers
When will we see HTTP/2?
References
Comments

Some notes

This post took a while to finish and gather all the information. It’s, so far, based entirely on theory. It’s my plan to keep this post updated as A) the spec progresses and B) some of these theories can be benchmarked and put to the test.

For that to work, please let me know (in the comments at the bottom or online) what is wrong, what should be expanded upon and how you think the HTTP/2 protocol is going to evolve the web.

(Note: even though HTTP/2 is based on SPDY, I don’t feel benchmarking SPDY would accurately reflect the way HTTP/2 would perform, I therefore consider HTTP/2 “untestable” for the moment.)

HTTP/2 Presentation: The Next Version of the Internet

I’ve given a talk on the state of HTTP/2, if you prefer slides over a long wall of text below, have look at this presentation.

If that’s not your cup of tea, read on below. If you like to me give this presentation at your local User Group, get in touch. ;-)

An introduction to HTTP/2

Before I go deeper into what HTTP/2 can change for the web, it’s important to know what HTTP/2 is. First and foremost, it builds upon the SPDY protocol that Google designed, and has learned the lessons learned from that protocol.

Where HTTP/1.0 and HTTP/1.1 were a plain-text protocol, HTTP/2 isn’t. It’s entirely binary and based on a concept of streams, messages and frames – adding considerable complexity to the protocol.

– The stream is a virtual channel within a connection, which carries bidirectional messages. Each stream has a unique integer identifier (1, 2, …, N).

– The message is a logical HTTP message, such as a request, or response, which consists of one or more frames.

– The frame is the smallest unit of communication, which carries a specific type of data—e.g., HTTP headers, payload, and so on.

HTTP/2 streams, messages and frames

Bottom line here is you can’t telnet into a HTTP/2 webserver and expect to write plain-text headers to make a request. You’ll need tools to translate the HTTP protocol into the HTTP/2 binary form (think curl, wget, your browser, …).

Chances are, you’ll use those tools the same way you use them today. Behind the scenes they’ll translate your HTTP requests into the binary message frame format that HTTP/2 expects.

So a curl request like the one below will work for HTTP/1.0, HTTP/1.1 and HTTP/2 servers. It will be curl that will handle the connection and encode your request transparently to meet with HTTP/2’s requirements.

$ curl -I -H "Accept-Encoding: gzip" -H "User-Agent: YourCustomUA" http://192.168.1.5/mypage.html

HTTP/2 is the first major change to the HTTP protocol since 1999. That means it can learn from 15 years of experience and from watching the web evolve (and my-oh-my, has the web evolved in the last 15 years).

So HTTP/2, what can you bring to the table?

Less domain sharding

In HTTP/1.1 there is a problem known as “concurrent connections per domain”. A browser will open 4 to 8 TCP connections to a given host, and request individual resources (stylesheets, images, javascript, …) one by one. To circumvent this, websites nowadays use multiple domains to load their resources (like static1.domain.tld, static2.domain.tld, …).

The reason for this kind of domain sharding is to have more concurrent downloads of resources. Each connection would otherwise block until one of the previous request is done.

HTTP/2 introduces multiplexing, which allows one TCP/IP connection to request and receive multiple resources, intertwined. Requests won’t be blocking anymore, so there is no need for multiple TCP connections on multiple domain names.

In fact, opening multiple connections would hurt performance in HTTP/2. Each connection would have to go through the SYN -> SYN-ACK -> ACK three-way handshake, wasting round-trips. The HTTP/2 spec describes it like this.

Clients SHOULD NOT open more than one HTTP/2 connection to a given host and port pair, where host is derived from a URI, a selected alternative service [ALT-SVC], or a configured proxy.

9.1 – connection management

This would mean that HTTP resources, such as CSS, JavaScript, Images, … don’t need to come from other (sub)domains anymore, but can all come from the same domain as the parent resource. ~~This would also make it easier to implement protocol-relative URLs.~~ Update: Protocol Relative URLS are now considered an anti-pattern, you should – if available – always use the https:// protocol.

Less concatenation

With HTTP/1.1 there was always a difficult trade-of between domain sharding, as explained above, and resource concatenation.

Since HTTP requests are fairly expensive, they were reduced to a minimum. Separate JavaScript and CSS files were concatenated into a single file, CSS Sprites were used to reduce the number of individual image resources.

Stylesheets would be (partly) inlined, to avoid additional requests to the server for more CSS files (even if there were always arguments against inlining). The inlining of content has mostly been solved by server-side pushes in HTTP/2, more on that later.

For HTTP/2, a part of that workflow can be undone. Looking at CSS sprites for instance, they would commonly include images that are needed on the site, but perhaps not on the page currently being browsed. Yet they were sent to the client in the “large” sprite. Since HTTP requests are becoming less expansive, it can become acceptable to separate those images again and not bundle them in one large file.

The same would apply to CSS and JavaScript as well. Instead of having a single monolithic file with all content, it can be split into chunks that are then only loaded on the pages that need them.

There will, as always, be a tradeoff between making an additional HTTP call and bundling all resources into single files – that’s what the benchmarks will have to decide for us.

Is HTTPS/TLS required?

HTTP/2 is based on SPDY. And SPDY required a TLS (https) connection in order to use the SPDY protocol.

However, the HTTP/2 doesn’t require a secure connection, unlike SPDY. It’s possible to use HTTP/2 on a plain, non-secure HTTP connection. Having said that, it looks like major browsers (Firefox & Chrome) may be limiting the HTTP/2 support for TLS connections only, in order to push for a more secure web (SSL/TLS everywhere).

Firefox will only be implementing HTTP/2 over TLS – and so far that means for https:// schemed URLs. It does enforce the protocol’s >= TLS 1.2 requirement – if a server negotiates HTTP/2 with a lower TLS version it is treated as a protocol error.

Networking/http2 on Mozilla.org

So even though the spec says HTTP/2 is possible on plain HTTP, chances are we’ll only be using it on HTTPS websites.

I believe it’s safe to say the web built on HTTP/2 will be a web built on TLS (1.2 and higher). With free certificate authorities managed by eff.org and cheaper certificates all around, I don’t think are many compelling reasons anymore to not be going SSL/HTTPS in the future (but beware of bad SSL/HTTPS implementations).

Compression

~~HTTP/2 actively discourages the use of compression for secure websites.~~ Just like in HTTP/1, compression is discouraged in situations where security or privacy is of the utmost importance.

Implementations communicating on a secure channel MUST NOT compress content that includes both confidential and attacker-controlled data unless separate compression dictionaries are used for each source of data.

HTTP/2: Use of Compression

HTTP compression (gzip, deflate, …) has been known to compromise the SSL/TLS security in the “breach” and “CRIME” attacks.

The attack exists on HTTP/1.0 and HTTP/1.1 infrastructure and will also be possible on HTTP/2.

HTTP/2 enables greater use of compression for both header fields (Section 4.3) and entity bodies. Compression can allow an attacker to recover secret data when it is compressed in the same context as data under attacker control.

10.6 Use of Compression

For any secure site, where the SSL/TLS connection is used to protect user data, compression should be disabled. For sites that use SSL/TLS only to ensure the validity of the data that is being sent, I believe compression will still be an option – as long as no secret or sensitive information is shown. This is the same today in HTTP/1.1 as well.

HTTP/2 will support the compression of HTTP headers, which is not possible in HTTP/1.1 (where HTTP headers are always sent uncompressed). This is especially useful for sites shipping with a lot of cookies (sites ship with 1MB worth of cookies, really?). This content can now be reliably compressed.

The HTTP header compression doesn’t use the known gzip/deflate algorithms and is as such not vulnerable to BREACH attacks. It uses a custom compression method, known as HPACK, to compress the HTTP headers.

In all likeliness, HTTP/2 will not change the way we handle the compression of data compared to HTTP/1.1. It does offer a great improvement for the compression of HTTP headers.

Server-side push

In HTTP/1.1, the only way for a browser (or “client”) to retrieve data from a server, is to first request it from the server.

In HTTP/2, the server can send along extra resources together with the first HTTP request, thus avoiding additional network round-trips for follow-up HTTP requests.

This is especially useful for those first requests where the browser would ask for the HTTP resource of the page (say, /somepage.html), only to parse the DOM and figure out it needs to request additional CSS/JavaScript/images/… resources as a result.

How will this feature work with todays code, written in PHP, Ruby or .NET? Hard to say. In the end, it’s the HTTP/2 webserver (Nginx, Apache, …) that needs to send along additional HTTP requests to the client.

Will the HTTP/2 webserver determine this on its own, which extra resources to send? Will there be a way to instruct the HTTP/2 webserver from within your programming code? Hopefully, although the syntax or the methods for doing so are still unclear and would/could be highly dependent on the chosen webserver.

At the moment, I would treat this feature as an “obscure black box” that will have little or control over. Here are a few suggestions on how to handle these server-side pushes from within your application code.

The application can explicitly initiate server push within its application code. (example in NodeJS)

The application can signal to the server the associated resources it wants pushed via an additional HTTP header. (ref.: X-Associated-Content header)

The server can automatically learn the associated resources without relying on the application.

Implementing HTTP 2.0 server push

Ilya Grigorik (@igrigorik) has some examples based on NodeJS code that demonstrate this powerful feature (examples where you do have full control over server-side pushes).

Update: the first HTTP/2 server side test results are in, and it looks very promising!

Request priorities

An equally “obscure” feature in HTTP/2 is the prioritisation of HTTP requests. Each request can be given a priority (0 begin the highest priority, like MX DNS records) and will be processed accordingly.

It’ll be up to the browser to specify the priority of each HTTP resource. The HTTP/2 protocol allows the priority to be given, so blocking resources can be given a higher processing priority than non-blocking resources. It’s up to the HTTP/2 webserver to process those priority requests accordingly.

As it looks now, this will be a feature of HTTP/2 that we, developers/users, won’t have a say in. We will most likely not be able to assign priorities to HTTP resources ourselves. This may be a good thing, as browsers will be far more intelligent in figuring out which resources should get which priority.

HTTP methods and status codes

All HTTP status codes that are defined for HTTP/1.1 remain for HTTP/2. We’ll still have HTTP 200 OK requests, the 301 permanent redirects and 404 Page Not Found's.

The same goes for all methods defined in HTTP/1.1: GET, POST, PATCH, PUT, DELETE, … all these methods are still here.

Since HTTP/2 builds further upon HTTP/1.1, all status codes and methods remain the same.

HTTP/2 and Varnish

I’m a huge Varnish fan, the HTTP accelerator/cacher/load balancer/router. Varnish has historically only supported HTTP/1.1, and HTTP only. It never implemented SSL/TLS.

For sites to use Varnish with HTTPS, they would use Pound / HAProxy / Nginx as an “SSL offloader” in front of their Varnish configuration. That service would handle all the SSL/TLS encryption and pass the requests to Varnish in plain HTTP for caching.

However, it looks like support for HTTP/2 may be coming to Varnish after all. It’s no secret that Poul-Henning Kamp, author of Varnish, doesn’t like HTTP/2, or at least – the first drafts – but at the same time he says “if that’s what the people want, I’ll do it”.

At the end of the day, a HTTP request or a HTTP response is just some metadata and an optional chunk of bytes as body, and if it already takes 700 pages to standardise that, and HTTP/2.0 will add another 100 pages to it, we’re clearly doing something wrong.

Poul-Henning Kamp

And in a more recent (May 2014) mailing list post, Poul-Henning Kamp confirms his opinion again.

Isn’t publishing HTTP/2.0 as a “place-holder” is just a waste of everybodys time, and a needless code churn, leading to increased risk of security exposures and failure for no significant gains ?

[…]

Please admit defeat, and Do The Right Thing.

Poul-Henning Kamp

And even more recently (Januari 2015), the HTTP/2 rant got an update.

HTTP/2.0 is not a technical masterpiece. It has layering violations, inconsistencies, needless complexity, bad compromises, misses a lot of ripe opportunities, etc.

Poul-Henning Kamp

Maybe we’ll see HTTP/2 support in Varnish in the 4.x releases, maybe we’ll have to wait for the 5.x release. As far as I can tell, there is no “official” statement from the Varnish community yet.

Update 5/3/2015: Varnish will support HTTP/2 with a minimal TLS proxy.

@mattiasgeniar Yes. Support with happen through support for the PROXY protocol and a minimalistic TLS proxy.

— Per Buer (@perbu) February 27, 2015

Until that time (because as of yet, it’s not clear when Varnish will implement HTTP/2), we’ll be running SSL/TLS proxies.

For me personally, I believe (at least in the short-term) our server setups will look like this.

port :80   --> Varnish HTTP accelerator
port :443  --> Nginx SSL HTTP/2 + SPDY  offloading, proxy all to Varnish on :80
port :8080 --> The actual webserver (Nginx/Apache/...) parsing the requests

If HTTP/2 does seem to catch on for HTTP-connections and not only for TLS sessions, and Varnish turns out not supporting HTTP/2 at all, the setup would be slightly different.

port :80   --> Nginx running HTTP/1.1 and HTTP/2, proxy all to Varnish on :8080
port :443  --> Nginx SSL offloading, proxy all to Varnish on :8080

port :8080 --> Varnish serving the cached requests, proxy all not in the cache to :8081
port :8081 --> The actual webserver (Nginx/Apache/...) parsing the requests

Time will tell. Wether the backend serving the actual PHP/Ruby/Node/… requests will be Nginx or Apache will depend on the sysadmin and the familiarity with each webserver.

The rise of alternative webservers

HTTP/2 may not be 100% new (after all, it’s based on the HTTP/1.1 spec), it does change a few important paradigms on how we think about webservers and sites nowadays. That means the webservers we’re using today, may not be the best for the HTTP/2 world.

Websites are designed and architected with the best user-experience in mind. That means they’re optimized for the browsers rendering them, not the servers serving them. We can easily swap out the webserver (they’re in our control), but we can’t change the browsers clients are using.

So in the HTTP/2 era, we may see H2O as a new rising star, next to the proven webservers like Nginx and Apache. And I don’t think H2O will be alone. It already shows impressive improvements over Nginx, and the HTTP/2 race has only just begun.

When will we see HTTP/2?

This is very hard to say. The timeline for HTTP/2 has a major milestone set on February 2015 for the RFC. The RFC is the moment when the IETF working group has finished the proposal and it’s been reviewed.

So at the earliest, HTTP/2 will be “finalised” on February 2015. We can expect the final implementations in major webservers soon thereafter (especially since Nginx already fully supports SPDY and HTTP/2 is based partly on SPDY). Most modern browsers already support SPDY, making the change to HTTP/2 less of a hurdle (but by no means an easy task).

Both Mozilla and Chrome already support HTTP/2, albeit in a limited form for Mozilla’s firefox, and it needs to be enabled explicitly in Chrome.

The HTTP/2 spec won’t change much compared to the version currently published. Anyone could already implement the current spec and update their implementation should any changes still be approved.

2015 will be the year we see HTTP/2 reach general availability.

References

In no particular order, but all worthy of your time and attention.

If you have any more feedback, please let me know in the comments below. I’d love to hear what you think -– even if you disagree with me entirely!

Want to subscribe to the cron.weekly newsletter?

I write a weekly-ish newsletter on Linux, open source & webdevelopment called cron.weekly.

It features the latest news, guides & tutorials and new open source projects. You can sign up via email below.

No spam. Just some good, practical Linux & open source content.