Poul-Henning Kamp presented this talk at FOSDEM, titled “Ntimed an NTPD replacement”.
There was no intro for the talk. None whatsoever. Just the title. And as expected, a completely full room. The man is famous.
Here are some of my notes.
- Ntimed consists of a
client
(consumer), aslave
(relaying time) and amaster
(primary service) - Ntimed-master: not for normal pedestrians. Talks NTP/PTP protocols. Used by those who run “root” servers for NTP time syncs.
- Uses python for high level science work
- Real-time protocol bits are written in C for security & performance
- First version to be expected late 2015
- Ntimed slave: would replace the current ntpd server you have now (~2-3 per datacenter)
- less than 20.000 lines of code
- around 30% done
- also uses the Ntimed-client code
- has a CLI interface for monitoring / debugging
- uses one thread per interface
- Ntimed-client: setting the clock on the system
- Speaks NTP, PTP in a later stage
- Less than 10.000 lines of code
- Is in a pre-release state now
As usual with PHK, one of the main focusses is security.
- There is no sandboxing in ntimed-client, as the attack surface is very low (ntp protocol = fixed set of UDP network packages, no buffer overflows, low DoS surface).
- Sandboxing scales very badly with portability (supporting multiple OS’s)
- Ntimed-client would be easier/safer if kernel support was better for handling time calls
- It uses a statistical approach to time synchronisation. Because really, time sync isn’t easy …
- Ntimed-client wants to query all NTP servers found in a DNS response (like pool.ntp.org, 10 IP addresses) and use their median (sort of, with more math for statistical evidence) as the absolute correct time
- Clock keeping isn’t easy, as most computer hardware has very cheap components next to heat generators, that can influence the time.
- Kernel interface is really simple: it can
get()
the time,step()
the time to set the correct time,steer()
the time (so adjust the rate of time) and tosleep()
. - Leap-smeers (stretching a leap second over a day, to limit the impact, like Google does), would be implemented/done in the
ntimed-slave
service, never in the client. - Air-traffic control systems are not leap-second aware. The next leap second will happen at 2PM during the day in Tokio. That’ll be a fun day for traveling …
- PHK suggested a DNS-based solution for spreading the bulletins that announce leap seconds. The offset can be packed/hidden in an DNS A record, so the time offset is hidden in the IP address. Confusing? Yes. Creative? yes.
- A glibc pun of course couldn’t be left out, since this is now a gethostbyname() call in Linux. :-)
- “Green Computing”: the ideal of ntimed is to be as green as possible, especially if it’s going to run on millions of servers. Every CPU cycle and Watt power saved has enormous impact. With an Ntimed-client that is aiming to run on as many servers as possible, the impact can not be ignored.
- Ntimed code is available on Github
- The current code is “very defensively written”, Varnish style
- FlexeLint shoutout: “it’s probably the tool that improved my C coding the most over the last 10 years”, dixit PHK
- Why Ntimed?
- Short answer: heartbleed
- long answer: critical FOSS projects are understaffed or overworked
- Why not fix NTPD? PHK tried. But there are 360.000+ lines of code in NTPD.
- “NTPD is doomed”: refactoring would not be time or resource efficient, hence the start of Ntimed. From the ground up, a good security architecture.
- Is NTPD safe? Right now: yes, most likely. Long term: no.
- NTP is probably the oldest code still running on the internet
Despite my rants on this topic, Ntimed looks promising. It’s not done yet, but it’s something to keep an eye on, before the ticking time bomb of NTPD bites us all.