Standardising the “URL”

Profile image of Mattias Geniar

Mattias Geniar, February 01, 2017

Follow me on Twitter as @mattiasgeniar

You’d think that the concept of “a URL” would be pretty clearly defined by now, with the internet being what it is today. Well, turns out – it isn’t.

But Daniel Stenberg, from curl fame, is trying to fix that.

This document is an attempt to describe where and how RFC 3986 (86), RFC 3987 (87) and the WHATWG URL Specification (TWUS) differ. This might be useful input when trying to interop with URLs on the modern Internet.

This document focuses on network-using URL schemes (http, https, ftp, etc) as well as ‘file’.

URL Interop

What really strikes me as odd is the interoperability comparison for each “fragment” in the URL;

<th>
  Value
</th>

<th>
  Known interop issues exist
</th>
<td>
  http
</td>

<td>
  no
</td>
<td>
  ://
</td>

<td>
  YES
</td>
<td>
  user:password
</td>

<td>
  YES
</td>
<td>
  www.example.com
</td>

<td>
  YES
</td>
<td>
  80
</td>

<td>
  YES
</td>
<td>
  index.html
</td>

<td>
  YES
</td>
<td>
  top
</td>

<td>
  no
</td>
Component
scheme
divider
userinfo
hostname
port number
path
fragment

It’s amazing a “URL” even works.

I’ve said it before and I’ll say it again: the internet is held together with duct tape. I hope this proposal gets somewhere, it’ll make parsing URLs a whole lot easier and more reliable.

Source: URL Interop



Want to subscribe to the cron.weekly newsletter?

I write a weekly-ish newsletter on Linux, open source & webdevelopment called cron.weekly.

It features the latest news, guides & tutorials and new open source projects. You can sign up via email below.

No spam. Just some good, practical Linux & open source content.