At Fosdem there was a talk about live migrations for containers, using CRIU (Checkpoint/Restore in User-Space).
CRIO comes from the OpenVZ container space, which is backed by Parallels (makers of Virtuozzo, Plesk, …).
Checkpoint/Restore In Userspace, or CRIU (pronounced kree-oo, IPA: /krɪʊ/, Russian: криу), is a software tool for Linux operating system. Using this tool, you can freeze a running application (or part of it) and checkpoint it to a hard drive as a collection of files. You can then use the files to restore and run the application from the point it was frozen at. The distinctive feature of the CRIU project is that it is mainly implemented in user space.
If you’re interested in migration containers “live” (well, sort-of live, with a small hickup during the freeze), keep an eye on this project.
CRIU is already used in the latest version of OpenVZ.
How it works seems dangerous or magical (whichever term you prefer), as it injects code into the running containers to be able to get a dump of the state of the container. The Wiki has an example on how CRIU works with a simple bash loop inside a container.
There are some caveats that can occur after a checkpoint / restore that you should be aware of. And there are a variety of resources that cannot be checkpointed inside a container. In order to resume TCP connections, at least kernel 3.5 is needed for the TCP_REPAIR
support.
CRIU is not yet integrated into Docker, but that should be only a matter of time.
Are there alternative solutions to the whole live migration of containers issue?