Architecting Websites For The HTTP/2 Era

Mattias Geniar, Tuesday, January 6, 2015 - last modified: Monday, September 7, 2015

http2_logo

The arrival of HTTP/2 will require a bit of re-thinking how we handle websites (and webservers). This makes it a good time to reflect on what those changes can bring.

This post is based entirely on theory (the HTTP2 spec), as HTTP/2 is hard to test today. Major browsers support HTTP/2, but very little servers do. And often, it's unclear which draft of the HTTP/2 spec they do support.

The entire HTTP/2 Spec is available for reading on Github and is highly recommended. It covers a lot more edge-cases than this article does.

And if you're up for a bit more reading, the HTTP/1.1 spec is also worth it, if only for comparisons' sake.

Latest benchmarks have shown that changes are in fact needed in order to fully use HTTP/2. Not optimising the way data is transferred could end up hurting performance on the HTTP/2 protocol.

Table of Contents

  1. Some notes
  2. Presentation: HTTP/2, The Next Version of the Internet
  3. An introduction to HTTP/2
  4. Less domain sharding
  5. Less concatenation
  6. Is HTTPS/TLS required?
  7. Compression
  8. Server-side push
  9. Request priorities
  10. HTTP methods and status codes
  11. HTTP/2 and Varnish
  12. The rise of alternative webservers
  13. When will we see HTTP/2?
  14. References
  15. Comments

Some notes

This post took a while to finish and gather all the information. It's, so far, based entirely on theory. It's my plan to keep this post updated as A) the spec progresses and B) some of these theories can be benchmarked and put to the test.

For that to work, please let me know (in the comments at the bottom or online) what is wrong, what should be expanded upon and how you think the HTTP/2 protocol is going to evolve the web.

(Note: even though HTTP/2 is based on SPDY, I don't feel benchmarking SPDY would accurately reflect the way HTTP/2 would perform, I therefore consider HTTP/2 "untestable" for the moment.)

HTTP/2 Presentation: The Next Version of the Internet

I've given a talk on the state of HTTP/2, if you prefer slides over a long wall of text below, have look at this presentation.

If that's not your cup of tea, read on below. If you like to me give this presentation at your local User Group, get in touch. ;-)

An introduction to HTTP/2

Before I go deeper into what HTTP/2 can change for the web, it's important to know what HTTP/2 is. First and foremost, it builds upon the SPDY protocol that Google designed, and has learned the lessons learned from that protocol.

Where HTTP/1.0 and HTTP/1.1 were a plain-text protocol, HTTP/2 isn't. It's entirely binary and based on a concept of streams, messages and frames -- adding considerable complexity to the protocol.

-- The stream is a virtual channel within a connection, which carries bidirectional messages. Each stream has a unique integer identifier (1, 2, ..., N).

-- The message is a logical HTTP message, such as a request, or response, which consists of one or more frames.

-- The frame is the smallest unit of communication, which carries a specific type of data—e.g., HTTP headers, payload, and so on.

HTTP/2 streams, messages and frames

Bottom line here is you can't telnet into a HTTP/2 webserver and expect to write plain-text headers to make a request. You'll need tools to translate the HTTP protocol into the HTTP/2 binary form (think curl, wget, your browser, ...).

Chances are, you'll use those tools the same way you use them today. Behind the scenes they'll translate your HTTP requests into the binary message frame format that HTTP/2 expects.

So a curl request like the one below will work for HTTP/1.0, HTTP/1.1 and HTTP/2 servers. It will be curl that will handle the connection and encode your request transparently to meet with HTTP/2's requirements.

$ curl -I -H "Accept-Encoding: gzip" -H "User-Agent: YourCustomUA" http://192.168.1.5/mypage.html

HTTP/2 is the first major change to the HTTP protocol since 1999. That means it can learn from 15 years of experience and from watching the web evolve (and my-oh-my, has the web evolved in the last 15 years).

So HTTP/2, what can you bring to the table?

Less domain sharding

In HTTP/1.1 there is a problem known as "concurrent connections per domain". A browser will open 4 to 8 TCP connections to a given host, and request individual resources (stylesheets, images, javascript, ...) one by one. To circumvent this, websites nowadays use multiple domains to load their resources (like static1.domain.tld, static2.domain.tld, ...).

The reason for this kind of domain sharding is to have more concurrent downloads of resources. Each connection would otherwise block until one of the previous request is done.

HTTP/2 introduces multiplexing, which allows one TCP/IP connection to request and receive multiple resources, intertwined. Requests won't be blocking anymore, so there is no need for multiple TCP connections on multiple domain names.

In fact, opening multiple connections would hurt performance in HTTP/2. Each connection would have to go through the SYN -> SYN-ACK -> ACK three-way handshake, wasting round-trips. The HTTP/2 spec describes it like this.

Clients SHOULD NOT open more than one HTTP/2 connection to a given host and port pair, where host is derived from a URI, a selected alternative service [ALT-SVC], or a configured proxy.
9.1 -- connection management

This would mean that HTTP resources, such as CSS, JavaScript, Images, ... don't need to come from other (sub)domains anymore, but can all come from the same domain as the parent resource. This would also make it easier to implement protocol-relative URLs. Update: Protocol Relative URLS are now considered an anti-pattern, you should -- if available -- always use the https:// protocol.

Less concatenation

With HTTP/1.1 there was always a difficult trade-of between domain sharding, as explained above, and resource concatenation.

Since HTTP requests are fairly expensive, they were reduced to a minimum. Separate JavaScript and CSS files were concatenated into a single file, CSS Sprites were used to reduce the number of individual image resources.

Stylesheets would be (partly) inlined, to avoid additional requests to the server for more CSS files (even if there were always arguments against inlining). The inlining of content has mostly been solved by server-side pushes in HTTP/2, more on that later.

For HTTP/2, a part of that workflow can be undone. Looking at CSS sprites for instance, they would commonly include images that are needed on the site, but perhaps not on the page currently being browsed. Yet they were sent to the client in the "large" sprite. Since HTTP requests are becoming less expansive, it can become acceptable to separate those images again and not bundle them in one large file.

The same would apply to CSS and JavaScript as well. Instead of having a single monolithic file with all content, it can be split into chunks that are then only loaded on the pages that need them.

There will, as always, be a tradeoff between making an additional HTTP call and bundling all resources into single files -- that's what the benchmarks will have to decide for us.

Is HTTPS/TLS required?

HTTP/2 is based on SPDY. And SPDY required a TLS (https) connection in order to use the SPDY protocol.

However, the HTTP/2 doesn't require a secure connection, unlike SPDY. It's possible to use HTTP/2 on a plain, non-secure HTTP connection. Having said that, it looks like major browsers (Firefox & Chrome) may be limiting the HTTP/2 support for TLS connections only, in order to push for a more secure web (SSL/TLS everywhere).

Firefox will only be implementing HTTP/2 over TLS -- and so far that means for https:// schemed URLs. It does enforce the protocol's >= TLS 1.2 requirement -- if a server negotiates HTTP/2 with a lower TLS version it is treated as a protocol error.

Networking/http2 on Mozilla.org

So even though the spec says HTTP/2 is possible on plain HTTP, chances are we'll only be using it on HTTPS websites.

I believe it's safe to say the web built on HTTP/2 will be a web built on TLS (1.2 and higher). With free certificate authorities managed by eff.org and cheaper certificates all around, I don't think are many compelling reasons anymore to not be going SSL/HTTPS in the future (but beware of bad SSL/HTTPS implementations).

Compression

HTTP/2 actively discourages the use of compression for secure websites. Just like in HTTP/1, compression is discouraged in situations where security or privacy is of the utmost importance.

Implementations communicating on a secure channel MUST NOT compress content that includes both confidential and attacker-controlled data unless separate compression dictionaries are used for each source of data.
HTTP/2: Use of Compression

HTTP compression (gzip, deflate, ...) has been known to compromise the SSL/TLS security in the "breach" and "CRIME" attacks.

The attack exists on HTTP/1.0 and HTTP/1.1 infrastructure and will also be possible on HTTP/2.

HTTP/2 enables greater use of compression for both header fields (Section 4.3) and entity bodies. Compression can allow an attacker to recover secret data when it is compressed in the same context as data under attacker control.

10.6 Use of Compression

For any secure site, where the SSL/TLS connection is used to protect user data, compression should be disabled. For sites that use SSL/TLS only to ensure the validity of the data that is being sent, I believe compression will still be an option -- as long as no secret or sensitive information is shown. This is the same today in HTTP/1.1 as well.

HTTP/2 will support the compression of HTTP headers, which is not possible in HTTP/1.1 (where HTTP headers are always sent uncompressed). This is especially useful for sites shipping with a lot of cookies (sites ship with 1MB worth of cookies, really?). This content can now be reliably compressed.

The HTTP header compression doesn't use the known gzip/deflate algorithms and is as such not vulnerable to BREACH attacks. It uses a custom compression method, known as HPACK, to compress the HTTP headers.

In all likeliness, HTTP/2 will not change the way we handle the compression of data compared to HTTP/1.1. It does offer a great improvement for the compression of HTTP headers.

Server-side push

In HTTP/1.1, the only way for a browser (or "client") to retrieve data from a server, is to first request it from the server.

In HTTP/2, the server can send along extra resources together with the first HTTP request, thus avoiding additional network round-trips for follow-up HTTP requests.

This is especially useful for those first requests where the browser would ask for the HTTP resource of the page (say, /somepage.html), only to parse the DOM and figure out it needs to request additional CSS/JavaScript/images/... resources as a result.

How will this feature work with todays code, written in PHP, Ruby or .NET? Hard to say. In the end, it's the HTTP/2 webserver (Nginx, Apache, ...) that needs to send along additional HTTP requests to the client.

Will the HTTP/2 webserver determine this on its own, which extra resources to send? Will there be a way to instruct the HTTP/2 webserver from within your programming code? Hopefully, although the syntax or the methods for doing so are still unclear and would/could be highly dependent on the chosen webserver.

At the moment, I would treat this feature as an "obscure black box" that will have little or control over. Here are a few suggestions on how to handle these server-side pushes from within your application code.

1. The application can explicitly initiate server push within its application code. (example in NodeJS)

2. The application can signal to the server the associated resources it wants pushed via an additional HTTP header. (ref.: X-Associated-Content header)

3. The server can automatically learn the associated resources without relying on the application.

Implementing HTTP 2.0 server push

Ilya Grigorik (@igrigorik) has some examples based on NodeJS code that demonstrate this powerful feature (examples where you do have full control over server-side pushes).

Update: the first HTTP/2 server side test results are in, and it looks very promising!

Request priorities

An equally "obscure" feature in HTTP/2 is the prioritisation of HTTP requests. Each request can be given a priority (0 begin the highest priority, like MX DNS records) and will be processed accordingly.

It'll be up to the browser to specify the priority of each HTTP resource. The HTTP/2 protocol allows the priority to be given, so blocking resources can be given a higher processing priority than non-blocking resources. It's up to the HTTP/2 webserver to process those priority requests accordingly.

As it looks now, this will be a feature of HTTP/2 that we, developers/users, won't have a say in. We will most likely not be able to assign priorities to HTTP resources ourselves. This may be a good thing, as browsers will be far more intelligent in figuring out which resources should get which priority.

HTTP methods and status codes

All HTTP status codes that are defined for HTTP/1.1 remain for HTTP/2. We'll still have HTTP 200 OK requests, the 301 permanent redirects and 404 Page Not Found's.

The same goes for all methods defined in HTTP/1.1: GET, POST, PATCH, PUT, DELETE, ... all these methods are still here.

Since HTTP/2 builds further upon HTTP/1.1, all status codes and methods remain the same.

HTTP/2 and Varnish

I'm a huge Varnish fan, the HTTP accelerator/cacher/load balancer/router. Varnish has historically only supported HTTP/1.1, and HTTP only. It never implemented SSL/TLS.

For sites to use Varnish with HTTPS, they would use Pound / HAProxy / Nginx as an "SSL offloader" in front of their Varnish configuration. That service would handle all the SSL/TLS encryption and pass the requests to Varnish in plain HTTP for caching.

However, it looks like support for HTTP/2 may be coming to Varnish after all. It's no secret that Poul-Henning Kamp, author of Varnish, doesn't like HTTP/2, or at least -- the first drafts -- but at the same time he says "if that's what the people want, I'll do it".

At the end of the day, a HTTP request or a HTTP response is just some metadata and an optional chunk of bytes as body, and if it already takes 700 pages to standardise that, and HTTP/2.0 will add another 100 pages to it, we're clearly doing something wrong.
Poul-Henning Kamp

And in a more recent (May 2014) mailing list post, Poul-Henning Kamp confirms his opinion again.

Isn't publishing HTTP/2.0 as a "place-holder" is just a waste of everybodys time, and a needless code churn, leading to increased risk of security exposures and failure for no significant gains ?

[...]

Please admit defeat, and Do The Right Thing.

Poul-Henning Kamp

And even more recently (Januari 2015), the HTTP/2 rant got an update.

HTTP/2.0 is not a technical masterpiece. It has layering violations, inconsistencies, needless complexity, bad compromises, misses a lot of ripe opportunities, etc.
Poul-Henning Kamp

Maybe we'll see HTTP/2 support in Varnish in the 4.x releases, maybe we'll have to wait for the 5.x release. As far as I can tell, there is no "official" statement from the Varnish community yet.

Update 5/3/2015: Varnish will support HTTP/2 with a minimal TLS proxy.

Until that time (because as of yet, it's not clear when Varnish will implement HTTP/2), we'll be running SSL/TLS proxies.

For me personally, I believe (at least in the short-term) our server setups will look like this.

port :80   --> Varnish HTTP accelerator
port :443  --> Nginx SSL HTTP/2 + SPDY  offloading, proxy all to Varnish on :80
port :8080 --> The actual webserver (Nginx/Apache/...) parsing the requests

If HTTP/2 does seem to catch on for HTTP-connections and not only for TLS sessions, and Varnish turns out not supporting HTTP/2 at all, the setup would be slightly different.

port :80   --> Nginx running HTTP/1.1 and HTTP/2, proxy all to Varnish on :8080
port :443  --> Nginx SSL offloading, proxy all to Varnish on :8080

port :8080 --> Varnish serving the cached requests, proxy all not in the cache to :8081
port :8081 --> The actual webserver (Nginx/Apache/...) parsing the requests

Time will tell. Wether the backend serving the actual PHP/Ruby/Node/... requests will be Nginx or Apache will depend on the sysadmin and the familiarity with each webserver.

The rise of alternative webservers

HTTP/2 may not be 100% new (after all, it's based on the HTTP/1.1 spec), it does change a few important paradigms on how we think about webservers and sites nowadays. That means the webservers we're using today, may not be the best for the HTTP/2 world.

Websites are designed and architected with the best user-experience in mind. That means they're optimized for the browsers rendering them, not the servers serving them. We can easily swap out the webserver (they're in our control), but we can't change the browsers clients are using.

So in the HTTP/2 era, we may see H2O as a new rising star, next to the proven webservers like Nginx and Apache. And I don't think H2O will be alone. It already shows impressive improvements over Nginx, and the HTTP/2 race has only just begun.

When will we see HTTP/2?

This is very hard to say. The timeline for HTTP/2 has a major milestone set on February 2015 for the RFC. The RFC is the moment when the IETF working group has finished the proposal and it's been reviewed.

So at the earliest, HTTP/2 will be "finalised" on February 2015. We can expect the final implementations in major webservers soon thereafter (especially since Nginx already fully supports SPDY and HTTP/2 is based partly on SPDY). Most modern browsers already support SPDY, making the change to HTTP/2 less of a hurdle (but by no means an easy task).

Both Mozilla and Chrome already support HTTP/2, albeit in a limited form for Mozilla's firefox, and it needs to be enabled explicitly in Chrome.

The HTTP/2 spec won't change much compared to the version currently published. Anyone could already implement the current spec and update their implementation should any changes still be approved.

2015 will be the year we see HTTP/2 reach general availability.

References

In no particular order, but all worthy of your time and attention.

If you have any more feedback, please let me know in the comments below. I'd love to hear what you think --- even if you disagree with me entirely!


Hi! My name is Mattias Geniar. I'm a Support Manager at Nucleus Hosting in Belgium, a general web geek, public speaker and podcaster. Currently working on DNS Spy. Follow me on Twitter as @mattiasgeniar.

I respect your privacy and you won't get spam. Ever.
Just a weekly newsletter about Linux and open source.

SysCast podcast

In the SysCast podcast I talk about Linux & open source projects, interview sysadmins or developers and discuss web-related technologies. A show by and for geeks!

cron.weekly newsletter

A weekly newsletter - delivered every Sunday - for Linux sysadmins and open source users. It helps keeps you informed about open source projects, Linux guides & tutorials and the latest news.

Share this post

Did you like this post? Will you help me share it on social media? Thanks!

Comments

Jeremy Monday, January 19, 2015 at 18:45

It’s “Poul-Henning Kamp”, not “Paul-Henning Kamp”, just fyi.

Reply


Glen A. Saturday, January 31, 2015 at 20:48

Nice post, thanks Mattias.

I have two questions:

1. If compression over SSL is known to be insecure, then why isn’t a new compression algorithm being developed for HTTP/2? I don’t like the idea that I can’t compress resources without sacrificing security. This is not good.

2. How can a server know to push an associated resource if it might already be cached at the client? I can think of two ways that server-side push could work:

a. [Server-side]: The server keeps track of the latest version of each file that it serves (with file modification timestamps, for example). If an associated resource has been updated since it was last served, the updated version is pushed to the client. This wouldn’t respect the usual cache headers though, so it may not be ideal. It would also be more difficult to implement in a distributed system.

b. [Client-side]: Once a resource has been downloaded the first time, the client stores information about its associated resources. For example, index.html is downloaded along with a.js, b.js, and c.js. When index.html is again requested, the client can include headers that indicate which associated resources it will accept directly (pushed). Something like: Accept-Associated: /js/a.js, /js/b.js

These things have probably already been discussed a million times, but I don’t monitor the relevant groups, so excuse me if I’m stating the obvious. :-)

Reply


    Mattias Geniar Sunday, February 1, 2015 at 14:39

    Hi Glen,

    Thanks for replying!

    In regards to compression: you are right, this is annoying. The same problem exists in HTTP/1.1 and hasn’t really been solved.

    As for the HTTP PUSH support, it’s very well explained here.

    Once the client receives a PUSH_PROMISE frame, it has the option to decline the stream if it wants to (e.g., the resource is already in cache), which is an important improvement over HTTP 1.x. Resource inlining, which is a popular “optimization” for HTTP 1.x, is equivalent to a “forced push”: the client cannot cancel it, and it also cannot individually cache the inlined resource.

    In other words: the server indicates it will send a resource named “X.something” to the client. Before the client actually receives the content, it has the ability to block the request – because it may already be in the cache.

    Reply


      Glen A. Sunday, February 1, 2015 at 17:09

      Ah, that works as well. :-)

      BTW, do you know if there has been any discussion regarding a new compression algorithm, or a solution to the current security issues? If you have any links I would like to add my “vote”/+1.

      Reply


Johnny Robeson Wednesday, May 6, 2015 at 07:28

I don’t know if you plan on continuing to edit this particle, but if so, then you should probably strikethrough this point: “This would also make it easier to implement protocol-relative URLs.” because the linked article suggests not to use protocol relative URLs anymore, and no major browsers support http/2 without TLS.

Reply


    Mattias Geniar Wednesday, May 6, 2015 at 08:07

    Hi Johnny,

    That’s indeed the plan.

    Thanks for the feedback, I’ve updated the article!

    Reply


Billy Hoffman Tuesday, May 12, 2015 at 19:34

“HTTP/2 actively discourages the use of compression for secure websites. HTTP compression (gzip, deflate, …) has been known to compromise the SSL/TLS security in the “breach” and “CRIME” attacks.”

This is a patently wrong statement in an otherwise good article.

There are no known security vulnerabilities with HTTP/1.x’s style of compression That’s because HTTP/1,x only supports compressing response bodies. During SPDY development, the same compression algorithms used for compressing HTTP responses (gzip/deflate) were applied to compression request and response headers. This is what lead to the CRIME vulnerability. The solution was to still use compression, but use a different compression scheme HPACK, which, glossing over a ton of technical details, allows compression while avoid CRIME because it “[uses] separate compression dictionaries are used for each source of data.” HTTP/2 uses HPACK.

Use of TLS compression is not recommended, but that has always been a performance best practice, since TLS compression is not context aware.

For the love of god, keep using compression with your websites, whether HTTP/1.x, TLS + HTTP/1.x, or HTTP/2/

Reply


    Mattias Geniar Tuesday, May 12, 2015 at 22:25

    You are indeed right, I’ve corrected the original post.

    Thanks for letting me know!

    Reply


Frakturfreund Thursday, May 14, 2015 at 23:01

Google’s Go programming language has a build in HTTP(S) server, and they plan to extend it to speak HTTP/2 without breaking existing code. There’s a cool demo site already online, and the code is on GitHub.

Reply


Nehorap Thursday, February 11, 2016 at 16:24

Hi,
From some reason chrome (V48) not support http2. I’m trying to run some demo about the deference between http 1.1 vs http 2 via the following sites: https://http2.golang.org/ and https://http2.akamai.com/demo, but got an error.
I thought that chrome is automatically supported……

Reply


    Mattias Geniar Sunday, February 14, 2016 at 16:32

    It is indeed automatically supported. What error are you getting? Maybe that can shed some light on the situation.

    Reply


Olle Wednesday, March 2, 2016 at 15:28

Thanks for a great blog! :-)
With the introduction of HTTP/2.0, what do you think about the future of web server? Will traditional servers like nginx and Apache be replaced with new alternatives like Caddy or ShimmerCat? Is it worth looking into the new stuff or is it better to wait until they are more robustly tested?

Reply


Leave a Reply

Your email address will not be published. Required fields are marked *

Inbound links