An introduction to Docker with Nils De Moor

WhisperX large-v3 + pyannote diarization, lightly edited.

Mattias Geniar

In this episode of SysCast, I talk to Nils De Moor about Docker and using containers in your infrastructure as both a sysadmin and a developer. This is SysCast, episode 2. Hi there, and welcome to the second episode of Syscast, the podcast where we talk Linux, sysadmin stuff, and web development in general.

Today, I am very happy to have the guest, Nils De Moor. He will be talking to us about Docker and everything related to Docker. Hi, Nils.

How are you?

Nils De Moor

Mattias Geniar

It is my pleasure. Thanks for coming here. I first got to know you by the Belgian Docker User Group.

I think that’s probably what you’re most known for. Could you introduce yourself? What do you do in day-to-day business?

How did you get started with the Docker User Group?

Nils De Moor

sure yep so um yeah i’m nils and i i think five years ago i co-founded a company called woorank so we’re based in brussels and in a nutshell we built a software as a service tool for basically for marketing and and web agency so market online marketers basically and with our tool we try to offer a system where they can easily monitor their websites or just their brands in general on their online presence so how are these brands moving in google on what keywords how is social media looking at them are all the basics correct like is my html Correctly built up for Google, for instance, or Google bots to make them easily accessible. So that’s the kind of tool we built and sell for monthly and yearly subscriptions. And so I’m the CTO. since the beginning the technical co-founder and I’ve seen the company grow from just three people so the three of us who started the company to we’re now about 25 people so still a small company but already significant enough and especially we’re kind of tech heavy like half of it is tech profiles So it’s already significant enough to, and I’m speaking a couple of years ago, to have the right tools at hand to make workflows for everyone.

So not only technical people, obviously, but for everyone easy to work. So in terms of developing applications locally on a machine, pushing them to a test and a production environment, onboarding someone new, so making sure that within a couple of hours this person can easily get started in building our applications. So these are things that are really key to me.

It’s my job, obviously, and that I’ve been trying to find the right tools for. And Docker was one of them when it kind of popped up, I guess, three years ago and got a lot of traction from the beginning. And we were…

I guess one of the early users in production and we still are and never regretted this choice.

Mattias Geniar

Okay, I’m glad to hear. You’ve been running Docker for quite a while. I think perhaps one of the reasons you may have started the Belgium Docker user group, which you’re now the, what’s the name?

Chairman?

Nils De Moor

Mattias Geniar

Chairman sounds too CEO-like. So that group has grown tremendously in the last three years. I think last time you mentioned over a thousand members already?

Yeah, indeed. That’s crazy. That’s absolutely crazy.

So I invited you here to talk about Docker. Could you describe Docker in the most simple explanation you can find? What is Docker?

Nils De Moor

Yeah, sure. So Docker basically… And I’m going to prepend my answer by the fact that… docker is is not a crazy new id basically what docker has done is take primitives like linux kernel primitives that already existed for years and kind of build a nicer tool around that to make it accessible and usable for people who are not experts in linux kernel linux commands and stuff like that so they basically build a wrapper around that makes it easy for developers, operations, like full technical teams to easily work with applications, databases, workers, like processes to run them and to push them into the many environments that a company has.

So what does Docker do? Basically it will isolate or will allow you to isolate applications from each other. So if I want to run a node application next to a Java and next to a Python application and let’s throw in a DB, a Postgres database into the mix.

I can easily run them on one operating system. So it can be on a virtual machine, but within one operating system. So I don’t have to allocate a full operating system to one application.

But I can run those applications within one operating system. And they won’t interfere with each other. So they’re nicely isolated from each other.

They don’t know about each other. And they won’t get in the way of each other. That’s one important part.

So it isolates things. And in the other hand, it also makes it really easy to automate. So if I wear my operations hat and I want to push out a new, as I said, there’s a new Postgres version, I can easily, in an automatable way, update this container that I’m running.

So this Postgres container that I’m running on… server x i can very easily automate to run this new version again without interfering with the rest and on top of that yeah it it also makes things versionable because i can i can write i can write the things that need to be the configuration that needs to be done And if tomorrow I make a change that’s breaking, it allows me to easily roll back to my previous version that I’m sure, okay, this was a thing that worked. So those are the key things, I guess. So Docker makes, yeah, simplicity is probably the best word.

So it makes it really simple for everyone in a team, in a technical team and among technical teams to understand. It’s versionable and it isolates everything nicely from each other.

Mattias Geniar

okay that makes a lot of sense if i look at docker i as you mentioned it’s not something new but it’s a more convenient way to use existing tools in a way it’s been in it’s been improving on what tools like openvz have done for years running multiple containers on a single machine but just taking it to the next level with a lot more stability and more of an ease of mind approach to it you mentioned that it’s easy to use In a way, I agree. And at the same time, it’s very hard to get into, especially if you’re either a developer with a bit of knowledge behind the terminal. Getting started with Docker, I find it’s a bit of a burden.

It’s having to overcome some kind of mental barrier where you no longer have a single server that runs everything, but have several microservices is perhaps the wrong term if you’re thinking of databases. But smaller isolated machines that are running containers in this case. I think that’s one of the hardest things to get started with Docker.

It’s getting your head around the concepts of Docker. If we look at that, let’s take a classic LAMP stack. Let’s say you’re running a WordPress website.

How would you go about doing that with Docker setup in mind?

Nils De Moor

Okay. So, yeah, speaking as someone who’s been working with Docker in three years, yeah, I’ve seen a lot of improvements, and especially putting myself in beginner’s shoes. I can’t imagine anymore how I felt about Docker.

So I’ve seen the tool, let’s call it, grow over the years. And… I understand what you’re saying.

And especially probably since it’s grown so quickly and so many bells and whistles have been added on top of that, it indeed might look a bit daunting to get started with. But the good thing is that… I must say that the documentation they provide is really complete.

It takes you by the hand and it will guide you along the way of setting up your first container and then building up from there. One of the nice improvements that have been done over the last year, maybe a bit longer, is that Docker also kept adding new tools into their ecosystem. For instance, one of the tools is Docker Machine. which is kind of a provisioning, kind of light configuration management tool that will make it really easy to have a host.

It can be a server, it can be your laptop, it can be a VM, it doesn’t matter. but a host to make sure that that host can run Docker, so that Docker is installed without even having to know too much about that host. What Linux distribution doesn’t care, the tool will figure that out for itself. Again, my local machine or something that’s installed in the US or in Australia, it doesn’t matter.

Docker machine… You just run one command against that host or a group of hosts even. You can say, okay, here are a thousand servers in my AWS account.

Make sure they all run Docker. So that’s one tool. Docker Machine allows you to do that very simply.

Then another one, and then… So Docker Machine was, let’s say, more on the ops side, wearing the ops hat. When we flip over to the development side of things, so we switch our hat…

So you mentioned a WordPress application. So there will be a PHP part, an application that runs PHP code. There will be a MySQL database.

There will be an Apache in front of that. Maybe something like a log shipping container because I want my logs to be centralized somewhere. have this this stack of application maybe even a cache let’s show in a cache a memcache so let’s say we have four to five different components of this application well there’s a second tool that is natively provided with docker these days so when you install docker you get it for free and it’s called docker compose And what Docker Compose allows you to do is very simply in a YAML file is define your full application and say, OK, I have my application container. So that’s my PHP code.

It needs to run this PHP container. Okay, then underneath I define MySQL and I say, okay, MySQL, sorry, and I say within the part of my application container, okay, and link on the network to my MySQL container. So the kind of… service discovery like the the very basic light service discovery already happens within that part so you don’t have to configure too much within your containers you can just define it in the docker compose file container a connects to container b and that’s it And so on.

So your log shipping container, you can all, within that same file. So in the end, you end up with a file that defines your four, five, whatever applications. And then it’s just a matter of… writing Docker Compose up in your command line tool, and Docker Compose will then see, okay, I have to run all these applications, what are the dependencies, which one will come first, but in the end, it will run all the containers that you’ve defined, make sure they connect to each other on network, add in some volume here and there maybe, and within a couple of…

Second, you have your full application stack running on your local machine by just running one command line command.

Mattias Geniar

Nils De Moor

No, sorry. I probably didn’t make myself clear there. So Docker Compose will purely define an application stack.

Now, the thing is, and then we dig even further into the ecosystem. So in its most basic form, it will just run whatever I just set on my local machine. So all the containers on my local machine, they connect to each other and I can start developing on that application.

Now, as you say, okay, let’s bring this to production now. So we have a bunch of servers, a bunch of instances, doesn’t matter where. It can be a cloud provider, it can be on-premise, bare metal, doesn’t matter.

I want to run my application in there. Then there’s a third tool that Docker has thrown in the mix, and that’s Docker Swarm. And in Docker Swarm…

Docker Swarm is basically an orchestrator. So again, with simple CLI tools, you can say, okay, all these servers underneath me here, I’m going to throw them all in a bag, and I just want them to act as resources, and I don’t even care or want to know what runs on what machine. But… can instruct my docker compose file so i can kind of extend my docker compose file that i have for my local machine i can add some some extra parameters on top saying okay i want my php application to run in at least five containers and on yeah preferably within this data center so i can tag servers by data center with this CPU and memory provisioning.

And you can shoot that Docker Compose file then into Docker Swarm, and Docker Swarm will orchestrate that for you and will make sure to look for servers that have capacity available. So you ask, for instance, for your MySQL, okay, it needs at least two gigabytes of memory. Then Docker Swarm will look, okay, underneath me here, I have all these servers.

Which one has two gigabytes free? uh server x okay i’m gonna deploy the container there and i have to deploy the application containers so i’m gonna distribute them here here and here And so that’s where Docker Compose and Docker Swarm kind of work together to make sure that the application, as you have built it on your local machine, then get pushed into a production environment and are deployed in a kind of a redundant way. Again, in the way you define them, obviously. But you create a superset on top of your simple Docker Compose file that you use in your local environment.

Mattias Geniar

I love how the way that you talk about this, you make it sound all so easy. At the same time, I’m very sure that there are listeners here who may have heard about Docker, but have never played with it and are thinking, holy crap, that sounds complex. All I want to do is run a website.

So I think the example of hosting a WordPress website is probably not the best one to run in a Docker instance. I think the use case where you are using it for, to go out and scrape websites, run them through analytics and generate reports, having an entire business built on top of it makes a lot more sense than just running a simple LAMP stack on top of Docker. Because Docker, perhaps part of the introduction that we may have missed here, The very moment that you run a Docker instance, you either download some sort of base box that it was built on top of.

Then that could be either built on, say, an Ubuntu or a CentOS or a very minimal installation of another distribution. That sort of… you can’t say that it’s virtualized because it isn’t virtualized as such but it’s run in some sort of virtual instance on each on your server on your laptop on your on your computer or whatever and on top of that the applications are running so if you’re looking at If you want to run multiple PHP versions, if you want to test multiple Postgres versions, Docker makes a lot of sense because you can run them next to each other. Whereas in a classical distribution, usually if you apt-get install a MySQL or a Postgres, it’s very hard to run a second instance next to it.

So from that point of view, I think Docker makes a lot of sense. Docker Compose, Docker Swarm, I think those are all tools you’ve been using, I think, for a couple of years now. If you’re just getting started with Docker, is Docker Swarm or Docker Compose something you would actively run into, or is that something that fills more of a particular niche, a problem that you’re hitting but cannot easily solve with the basic tools?

Nils De Moor

Okay. So, yeah, that’s kind of a long question that I indeed, maybe for context, very briefly, you said virtualization and in the end, Docker is not virtualization indeed, but concepts are quite similar, even the terminology they use. So basically what you do is in Docker is also, as you said, have a you start you always start from a docker image and a docker image is basically something that either a uh yeah official provider it can be ubuntu it can be centos it can be one level higher i’d say like node php they also build images then on top of of whatever links distribution they want underneath so You build images.

But the thing is that the difference with virtualization is the fact that in virtualization, you kind of cut the server in a couple of parts and really install full operating systems next to each other. While in the Docker world… That doesn’t happen.

You just have one operating system as base layer and you run from those images containers on that same operating system. So you don’t have the overhead of a full operating system layer. You just have one operating system as your base layer and containers run all on the same, but isolated so they don’t interfere with each other.

So, yeah, trickling a bit further, like if you’re If the question or the initial question, I guess, was like, yeah, for you, it makes sense in your company because you do a lot of stuff. Would it make sense for someone just playing around and building a small website? And I am… one fully 100 i’d say yes because for instance one of the things i almost never do anymore is install things on my computer um note switches version like every hour i think um yes probably uh i mean yeah and that’s the ecosystem i know best but I’m fed up with always installing specific, Node-specific kind of tools to make sure versions can be switched.

No, I just, if someone in my team is running or is building his application against Node, I don’t know what version number they are today, and another developer is working on another application or building another application in another version, I can easily run those two on my same machine. So if, for instance, last week I had to write some kind of data job, which usually is much easier in Python. I didn’t install Python on my computer.

I just pull in the Docker, the official Docker Python image, and I wrote all my code in Python in my sublime, nicely on my computer here. And I just do Docker run Python. with my script and it’s running without me having to install anything. It’s installed in a container and then when I’m done with my job and I don’t actually need Python anymore for a couple of days or weeks, I just throw away the container and there’s no trace on my computer that I ever run a Python application.

So it really keeps actually my local environment very, very sane because Yeah, throwing around with dependencies here and there has always been a nightmare. Now it’s just, okay, I want MySQL, I want Postgres, I want Memcache, just install the container. When I’m done with my experiment, I throw them away and my system keeps clean.

Mattias Geniar

Nils De Moor

No, exactly. Exactly, exactly. So if I have to run multiple versions next to each other, it’s no problem because the containers, I mean, they don’t look into each other, so they don’t care.

And then to finally go at the actual question is like, okay, if today… want to start with docker um well first of all i’d say just just go to the getting started in the docs it’s it’s really really uh very well explained and and takes you by the hand of running that first container and going further but the tools you’ll definitely run into is obviously docker and they call it Docker Engine now, by the way, to use the right terminology. So Docker Engine is the core of Docker, the actual daemon that runs on your system. So that will obviously need.

And then Docker Machine and Docker Compose are also tools you should, from the beginning, learn how to work with. Docker Machine, well… Again, in the getting started, they will probably tell you to use that to install whatever it is, a Mac machine, a Windows machine, a Linux machine, doesn’t matter.

With Docker machine, you can easily do that. And then Docker Compose is usually a file in whatever application you have. You have a Docker Compose file that kind of defines the stack that you’re building at that time.

So you will… So actually more often, especially on the local machine, interface way more with Docker Compose because you always want to work with your full stack, I guess. And Docker is really only container by container.

So that’s why Docker Compose makes more sense in a kind of application development way. So those, yeah, Docker Engine, Docker Machine, and Docker Compose are the three tools you will always be working with.

Mattias Geniar

Nils De Moor

Okay, so… They’re stopping and killing. That’s for a fact.

And this has indeed been a topic of many discussion for already almost three years now and still ongoing. So persistent data is… in the docker world seen as a problem but it doesn’t have to be the thing is indeed docker containers are not very tangible in a way that indeed if you stop it and throw it away you lose the data and docker is such an easy tool to work with to start stop and throw away containers that indeed if if you don’t pay attention you just throw away a mysql container with full with data just like that within one command so that’s what probably scares most people now on the other hand there are primitives and and tools around docker to make sure that data can be persistent first of all you can add a volume you can mount a volume of a the system it’s running on into a container saying that okay let’s say in the case of MySQL database, the data folder I mount as a volume, meaning that every file within the container, so if I connect to the database which runs in the container, every piece of data that’s written will then be written on the mounted volume, meaning that it will end up on the host that’s sitting underneath. So that’s the simplest form of making sure that, okay, if the container goes away, at least the data is written on the host instance, the host server, so it isn’t gone.

And then if I run a new MySQL container and I mount that same data volume, I will just be back in service as I was before. That’s, by the way, a very well-known kind of update mechanism you can put in place. Obviously, you have to be careful with backwards compatibility, but if I have to go from one MySQL version to another one, I can just throw the container away.

Well, make sure that the data is written to the underlying host, to a mounted volume, throw away a container with the old version. bring in a container with a new version, sync that to the same data folder, so mount that same data folder within, and I’m running MySQL with my original data as a new version.

Mattias Geniar

That’s probably the easiest way to do a MySQL upgrade indeed. That kind of shared storage where you mount the host’s file system or directory on the host inside of a container, it’s best comparable to what Vagrant by default does with a Vagrant directory that you mount from, say, your Mac or your Windows. Same idea.

Yeah. Okay.

Nils De Moor

And then to take it even further, because everything is all about cloud native apps. So let’s say, OK, it’s easy to play around with containers, and I want to throw them from one server to another. that’s something that, I mean, with the native volume mounting isn’t easily solved. Because, yeah, if the container moves from one hardware instance to another one, well, the data needs to follow, obviously.

Because otherwise, yeah, if I boot the container somewhere else, I start the container somewhere else, I mount it to a volume, the data is not there. Okay, it’s an empty database. So that’s why…

Docker has made its volume drivers pluggable, meaning that third-party providers, and there are already a few big ones that are backed by a lot of money, that are working on systems where the data can follow the container. So if I say I kick out a container on one host and I boot it on a new one, there are mechanisms to have your data follow that container. Yeah, the mechanisms, there’s plenty, and it really depends on what kind of infrastructure you’re deployed.

For instance, on Amazon Cloud, on AWS, those plugins can say, okay, the network disk attached to instance A, if the container moves to instance B, we just attach that disk to instance B and follow it in such a way. But there’s plugins that also can do the replication for you, so where you really have… the container you move the container and the data is synced to this new server there’s network file systems all sorts of stuff so there’s so many well so many there’s already lots of community-backed plugins that will help you migrate data from one instance to another without too much downtime. But it’s still in early stage, like all those plugins, even those companies backed by a lot of money. there’s always a big line of warning on their websites like don’t use this or at least be very careful when using this in production so it’s still early stage and I think in the next couple of months, years we’ll see great improvements that it just works out of the box but the primitives are there so it’s getting there I’d say.

Mattias Geniar

Nils De Moor

Yeah, of course, of course. So to get started and say, okay, you want to run your personal blog, what I would do is I’ll run it on the same host and indeed send persistent data things, just write them to the volume. But it’s not that Docker, and again, it’s the same mantra that I said earlier, Docker didn’t introduce this as a new problem.

Like… If I install MySQL natively on a server and the server goes bust, I lose the data also. The thing is just, again, that Docker made it so easy to play and kick around applications that people might forget to move their data with the application.

With all the tools that are available, we’re moving to a world where even that’s something we don’t have to be worrying about anymore. But right now, it’s not the case yet. So indeed, when there’s data involved, you still need to be a bit careful.

Okay, where do I store the data? Because if I just keep it local within the container, I might lose it if I throw away the container. That’s it.

So you need to be careful with that. But other than that, it’s just as you were working before. You install MySQL.

It writes data to disk. If you’re not careful with that disk, you also can lose the data.

Mattias Geniar

Yes, indeed. There’s one particular data pattern that I’ve seen come up a lot when talking about Docker. I wonder if you’ve heard about it and know the advantages of it.

By default, you could just mount a file system from the server within a Docker container. But I hear a lot of people that are using the file system inside another Docker container. I’m not sure if that’s common practice or what the benefits would be with doing that particular move.

Nils De Moor

Yeah, so indeed, it kind of stems from the idea that everything in an operating system can, and ideally for the Docker lovers, needs to be containerized. And indeed, there was… A time where I think even it was a best practice, but I’m not sure.

I hope I’m not saying mistakes here, but they don’t. I mean, there’s nothing wrong with it, but it’s not as heavily advertised anymore. But indeed, there was a time where people said, OK, I have a data container.

So actually just a very stupid, simple container that doesn’t even run processes or anything that you then attach to your MySQL container that just runs the MySQL process. Again, to make things easier, well… Everything that’s containerized is easy to move around.

So I think that was the base idea behind it. But it’s been a while since I’ve read about people doing that in production. So with the whole volume plug-in system, you get a much more finer grained control on what you do with the data.

So I think everything is moving more into that direction and using the volume drivers. natively to be smart with the way you manage your data.

Mattias Geniar

Nils De Moor

Mattias Geniar

Okay, clear. Besides Docker, there are also other container runtimes that are becoming more and more popular. One of them is called RKT, or however you pronounce it, Rocket.

What’s the difference between Rocket and Docker? When should I use one over the other?

Nils De Moor

So Rocket is built or maintained by CoreOS. By the way, CoreOS is one of the new Linux operating systems that evolved out of the whole Docker world, the Docker ecosystem. Because basically within CoreOS, everything runs in a container.

So there’s just… when that kernel starts there’s a simple process to run docker and then everything on top of that runs in docker container so when you start a core core os instance you almost your kernel is almost doing nothing it’s just making sure that docker can run and then all the rest is run in docker now So they’ve been around almost as long as Docker, so they kind of evolved together and grow together. It’s a separate company, also heavily backed with a lot of money. But at one point, they kind of… disagreed and I I’m not very familiar with the details but they had another vision on how containers or containerization should be should be done and that’s how rocket so RKT was built by them built on the same principles but I think it’s I mean it’s it’s just in the details things on especially not that it’s not important but on on security levels and and uh authorization and stuff like that they they do things differently uh again i never really uh worked with rocket so i can’t uh say the nitty gritty details but What I can say, there was some beef in the community between Docker and Rocket when Rocket was announced.

But over the months after that, they sit together as grown-up people. And then what came out of those conversations was the Open Container Initiative, which is basically a manifest that they’ve written on how containerization should be done. And both Docker and Rocket is just an implementation of that manifest, basically.

And it’s in the open source community. It’s backed by a lot of companies that all have put their shoulders on it. And so this…

This is the idea to keep the idea of containerization and how it should be done open source and not owned by any company. Because I think that was also one of the reasons why Rocket came out. I think CoreOS thought it was a bit, Docker was a bit becoming too commercial.

So they wanted to keep it open. And so the underlying layer now, the OCI, Open Container Initiative, is kind of making sure that containers will always be open source. So if tomorrow Docker closes its ecosystem, at least the primitives are open source and someone can easily pick it up and build a backwards compatible tool on top of it.

Mattias Geniar

Nils De Moor

Mattias Geniar

That’s a good thing indeed. So Docker versus Rocket, it’s a bit like Red Hat versus Ubuntu. It’s flavors.

It’s just about what you’re used to, how you want to work. Going into the practical details of Docker, if you’re an existing organization and you haven’t touched Docker yet, introducing it isn’t usually something that happens in about five minutes. It requires an effort from the development team on the one hand, the operations team, because it’s a new way of running your instances.

You’ve been running this for a couple of years. How did you experience this? What can we do to make the burden of Docker easier to overcome?

Nils De Moor

Okay, yeah, so I’m the CTO and technical co-founder, so basically I could say whatever I want. Now, it’s a question, or I can use whatever I want, but it’s a question I get a lot. And usually what I say is, when people say, well, yeah, I work in… let’s say a bank, and it’s heavily regulated, and I mean, just Docker is out of the question.

I cannot sell this. The first thing I say, well, it’s kind of… Docker should be a conversation starter in a sense that…

If you think it’s going to be too hard to implement on an organizational level, why don’t you just start with using it locally and you can commit the Docker Compose file to your code repository. I don’t think it will hurt anyone or anyone will look at you, hey, you’re doing something crazy here. So it makes it easy for you to just write the application and that’s where you can get started.

And usually what happens is that… it starts evolving within the development so one one person starts with with introducing a couple of simple things or or just one application into docker and then Usually it’s quite an easy sell to the rest of the team to say, hey, guys, we’ve been tossing around virtual images here to run virtual machines. So the gigabytes are flowing around. Why don’t we just use this Docker container?

It’s like 100 megabytes big. It’s much easier to work with. And you get the exact same package as you do.

You just have to install Docker. It’s a simple installer that you download. Docker compose or Docker run our application and it’s running so then it already spans out to a development team and that’s not a tough sell and then obviously depending on the organization and I have to say I don’t have much experience in big organizations with large IT stacks but then the idea of pushing or pushing the idea to an ops team saying okay basically we don’t care anymore well or we we don’t need you to make sure that all the versions of all the dependencies we need are installed on that machine just make sure that you give us a machine or a couple of machines that are just able to run docker then we can push the application containers to you and it will run out of the box.

You don’t have to do anything. So basically what Docker has done is create this nice level or this layer of indirection between the operations side and the development side. And I don’t say it needs to be two silos.

It can… just be the same person that develops and then also is responsible for running the production environment. But it’s just that layer of indirection with on one side, okay, you’re the person or you wear the hat that is responsible to provision a couple of instances, a couple of servers that are capable of running Docker. That’s it.

That’s where your job stops. And then you put on the developer hat and you make sure that you have an application that runs within one container. It doesn’t matter what gets installed in that container.

It’s just finally the artifact that comes out of what you’ve done is just one container. And then you can push that to your infrastructure. And usually, but again, in some organizations, those decisions aren’t easily made, no matter how easy they are.

But that’s what I would say, okay, that’s a good flow of trying to convince people of the simplicity of things.

Mattias Geniar

I like that, Flo. It makes it indeed very simple to convey the message from dev to ops. At the same time, I think, speaking as mostly an ops person here, it shifts a bit of what we’re used to doing with Puppet or Chef or Ansible, the config management aspect of things.

We’re used to configuring a database, configuring a web server. That’s within our realm of scripts and commands. And now developers are creating the Docker instances.

Since at that point they’re creating them, they’re quote unquote responsible for the configuration. As an operations person, you’re sort of giving control of your servers away. But that also means you’re giving a bit of the ability to intervene when there’s an issue, modify configurations as you see fit.

I think Docker… sort of forces a very narrow cooperation between dev and ops because if they are two silos this this idea of docker can never work then operations gets something blindly has to accept it as is and pray that everything works in reality that doesn’t work so there needs to be a very tight cooperation between both teams i think

Nils De Moor

Yeah, obviously. And yeah, that’s the key of all of it, I guess. When we started with Docker at that time, we were looking at a tool that makes… conversation easy among everyone.

Not that we had big problems with conversation, but just you want something that’s easy to understand on all sides, even business sides, even business people in our company kind of understand what Docker does because it’s easy to explain to them. Like you want to run two Java applications in separate versions, you can do that with Docker. That’s usually how I explain that because they all somehow ever had to build in school a Java application. um and indeed it it’s it’s true that from an ops perspective um part of the the configuration is is well i’m gonna say it the way we experienced it like we’re ansible users and and we had kind of a big repository of ansible scripts by introducing docker indeed like that repository is still a fraction of was it what it was before That being said, it’s not like you take away the problem.

Still, your applications have to be able of finding each other, like service discovery, load balancing, those things still exist, those problems. You just push them a level higher. So it just becomes the question, okay, is it really developer or ops that needs to be aware of this?

But now you basically… create a new problem, but one that’s easier to solve, like service discovery gets pushed to a level higher, and there’s other tools already available for that. So it’s not like, okay, the job or the responsibility of an ops person is taken away. No, not at all.

Quite the opposite. It’s a tool that allows you to gain much more speed in terms of deployment, or just in workflows altogether. And now, what we’ve seen, for instance, two years ago, two and a half years ago, when we did a deploy, it took about, I’d say, an hour to get something from being pushed out of the development into production.

Because we worked with virtual machines, they have to be built, they have to be thrown around to the right places. uh they had to boot in a production environment so we we run application on different servers we always brought in new servers from an image and then point all the traffic from the old version to the new one that took about an hour which was i mean everyone was cool with that and it worked really well it was nicely automated now with docker we deploy in a matter of minutes And sometimes people complain in our team that a deployment took more than five minutes. The problem has entirely shifted and it’s a good thing. And they should be complaining about those things and we should be finding ways to improve that even further.

But it has just allowed us to gain much more speed and to focus or to put our focus on other problems in making life easier. So as I said, service discovery, load balancing are now much more interesting problems to solve both from an ops perspective as a development perspective.

Mattias Geniar

Nils De Moor

Yeah, exactly. And yeah, there’s also almost no bottlenecks or points in a workflow where things can get blocked. Because again, from an ops perspective, it’s your job to make sure that basically anyone can deploy containers with the click of a button.

And yeah. from a developer perspective, when you have that ability to just press a play button or merge in a commit and it gets built and it gets pushed in production, that’s really the nice thing. Again, that level of indirection that gets created by running Docker containers. You don’t have to worry too much about all the dependencies that point to each other and mingle with each other and get problematic.

No, you just make sure that on one hand you have an easy… to understand infrastructure and applications on the other hand. And you can really focus on the points where your application shines and not having to worry about the infrastructure underneath.

Mattias Geniar

Yes, exactly. Switching back a bit to the configuration management point of view, you’re using Ansible. It doesn’t really matter what the listener might be using.

In the end, Docker images or the Docker file, that which builds the Docker images, sort of becomes your config management. You’re shifting away from Ansible towards a more Dockerized set of…

Nils De Moor

Yeah, so the thing is that, I mean, don’t get me wrong, there’s still configuration management needed. You still need… Yeah, you still need that infrastructure layer underneath.

So you still need nodes that can run Docker. You still need application services that monitor those instances. And ideally, I mean, some of them you can already put in a container.

So then even your processes, like for instance, we run crons and everything is also in Docker containers in our system. But you might… come in a situation where your Docker demon is failing, like your server isn’t serving Docker containers anymore, obviously you still need to be aware of that and have mechanisms in place to alert you on that or to try to recover from that. It’s not that Docker moves away all the responsibility of the operations side.

No, definitely not. So we still have Ansible scripts that do all this monitoring, have watchdogs, have everything in place to make sure that the applications that are running on top of it are doing what they’re supposed to do. And logs are pushed, for instance, through Logstash into Elasticsearch, an Elk stack.

So, again, it’s not like Docker takes away all the problems. No, the problems that you had before are still there, but now it’s easier to think about those problems and solve those problems because they’re more isolated and you don’t have to worry that some things are going to interfere with each other.

Mattias Geniar

Okay, I gotcha. You’ve been running this Docker for quite a while. So what’s your flow from, say, fixing bugs inside of your code to having that code running production?

How does that pipeline look like?

Nils De Moor

Yeah, so the way we work, so Git repository, so you make your fix or your feature or whatever. You push it to a branch. We use GitHub, but yeah, GitLab, your own Git repository, it doesn’t matter.

So we push it in there. We ask, or we do a pull request to pull that… that branch into our master branch we asked the team to do a code review so people put their remarks or when everything’s good you get a thumbs up so you merge at the time of the merge to master our build system. Could be a Jenkins, but we use a third-party provider, CircleCI, but it doesn’t matter.

Your build system can listen to those changes. We’ll see, ah, there’s something new on master. So I’m going to run, or I’m going to build a Docker container, like check out the code, build a Docker image from that new code, run all the tests and the regressions and all sorts of stuff against this container. and then when everything is green, we push that Docker image to our Docker repository.

So then it sits in the repository. It’s available for basically whatever environment to be picked up. And then still, that’s something we do manually, like the actual deploy of the image.

We could automate it, but I like still the human factor. I don’t want to take out the human of the whole workflow just to be able to roll back easily or to monitor what’s happening in production. So then…

When that process is done, we get another notification and then someone can just run, well, at the click of a button, an Ansible deploy script, which is fairly easy, that will take that image or that will instruct our production servers to pull that new image and run it on top of the old version.

Mattias Geniar

Nils De Moor

So we use the Docker Hub, which is the official repository. Yeah, well, that’s probably the name says. It’s like GitHub, but for Docker images.

So Docker Hub, they have a similar pricing model where for open source or public images, you can use them for free. And for private repositories, you have to pay a certain amount. So we use that.

We use that system. So Docker Hub, we pay for private repositories. But the Docker registry, so Docker Hub is a Docker registry, just the official one.

But basically the Docker registry code is open source. So you can run it on premise or anywhere where you want it to run. So you can completely own that process and say, okay, we have our own registry, which is what actually most companies are doing. and that’s easy because for instance one of the downsides of using Docker Hub is that from time to time it might be slow just like again I like this comparison with GitHub because today was one of those days again like if GitHub is down yeah you’re yeah and and you you use it a lot like uh you get your your technical teams will get very grumpy and same happens from time to time to docker hub it gets slow or stuff like that and and your deploys start getting slow and building up so ideally that’s something you don’t want and you want to improve so in our case it’s something we could indeed start taking in-house but for now it doesn’t luckily happen that much that it will

Mattias Geniar

Nils De Moor

Let me think. That’s a hard question, isn’t it? Well…

Times are a bit different already because there are now lots of good tools. Let me think. Well, one of the things I would definitely recommend is to keep things simple and build from…

Because one of the mistakes that were made in the beginning by probably everyone coming out of a vagrant kind of world, virtualization kind of world, was that… everyone wanted to run their applications entirely in one container. And that’s something you ideally don’t want to do, like run, as in the example we gave, like run your application container or run your application in one container run your database in another container like don’t cram it all in one because then you create these kind of dependencies that you initially didn’t or wanted to get fixed anyway so that’s I guess the biggest tip I can give for someone starting and coming out of a more virtualized way like keep applications separated in different containers that’s the the Yeah, the best tip I can give I’d say.

Mattias Geniar

Okay, this sort of comes back to what most people say a lot about Docker. The idea is to have one process running within one container or at least that’s the theory. A lot of people are discussing whether that’s either a good one or multiple processes are allowed inside one container.

What’s your take on this? How many processes should we have or how many functions should one container ideally have?

Nils De Moor

Yeah, so that’s indeed one of those other things that have kind of created their own story and shouldn’t have. In the beginning, and I shouldn’t be saying the wrong thing, but in the beginning, everyone said one process in one container. That’s the golden rule.

Now, that’s actually not a good rule, or that’s something I wouldn’t really recommend or say that that’s what you should live by. The only thing is that Docker, when it runs a container, will… we’ll kind of look at the main process that it’s running in it. So when you run a container, you run a process.

Well, the first process that starts is monitored or the Docker daemon will have a look at it and see, okay, it’s still running or it’s not running. uh meaning that yeah if you have other processes and and your main process dies uh then yeah they will they will stop uh yeah with with the container so that’s that’s uh um i mean yeah that’s that’s normal i guess so but does that mean that you shouldn’t be running um or that you shouldn’t be running more than one process in one container? No, not at all. We have applications that spawn or that have to spawn other processes within the same container because that’s just the way they work.

It’s kind of a distributed way of doing certain things. There’s no problem at all with that, but just keep in mind that it’s the main apparent process that gets monitored and that’s what will make the Docker daemon kill the container if there’s something wrong with that but yeah like how many processes can you run yeah as much as you want basically basically what what comes down to what works in your organization and using a bit of common sense yeah exactly but it’s just more like uh as a high level rule or tip i’d say just keep logical components in different containers so uh your application in a container, your DB in a container, your log shipper in a container. Those are all logical components that are important in my application, which I don’t want to have dependencies between.

They just need to connect to each other, but there’s no actual dependency. My application can still run if my log shipper is down, or even my application can… provide some service if the db is down okay it won’t be it won’t be ideal but at least it can stay up like all those components try to keep them um yeah like logically separated i think that’s the best summary you can give one component per container exactly that makes more sense than having one single process because like you mentioned one component doesn’t necessarily mean one process that’s a good idea

Mattias Geniar

Nils De Moor

OK, yeah, so this is probably I need to keep short because probably a topic we can do a whole other podcast on. But one of the tools that have popped up quite recently are orchestrations, orchestration tools, so tools that allow you to manage a, I’d say, production infrastructure easier in a sense that. they will or they will put all of your infrastructure in one big bag and one of those tools for instance is apache from the apache ecosystem apache mesos um and similar although it’s not exactly the same but uh google’s uh kubernetes is another one and then yeah docker swarm is also within the same space so these are orchestrators where basically you can define your applications which technically but in yeah depends on what tool you use of the tree i just mentioned don’t necessarily have to be containers but they’re just natively built with the idea of containers in mind and you can just define okay run my application container at least 10 times in my somewhere in my infrastructure i don’t care where With this provisioning of memory, like one gigabyte of memory per container, a certain CPU priority, some network provisioning, like at least one Mbit per container, eventually also a volume that you can define, like I need this amount of data available for my container. So you can set all those constraints, and then you should, that which is basically your application configuration, in one of those tools, so Mesos, Kubernetes, or Docker Swarm, And those tools will define for you or will decide for you, okay, where am I going to run those?

Where do I have resources available? And I will run them and make sure they always keep running. So if one goes down, I will bring up a new one to be sure that the state of your application is always defined as, or is always, your application is always in the state that was defined in that particular configuration file.

Those are up-and-coming tools these days because everyone now has run Docker on its local machine and maybe pushed some containers with some scripts to a production server. But if you really want to go… like full-on Docker with all your applications in your full infrastructure, you’re going to need one of those tools to manage the underlying infrastructure or basically take away the management but make sure that your infrastructure is available as one big computer, I’d say, like one big operating system, like your data center becomes the operating system and you can toss Docker containers at it and they run… wherever i mean you don’t really care you don’t really have to know but they run as you want them to run within that infrastructure totally docker on steroids

Mattias Geniar

Nils De Moor

so the way we did it was initially to just have stateless applications in that system so we didn’t have to come up with a solution for volumes but now the whole volume plug-in and third-party volume plug-ins system has been growing. And now it indeed will also allow us to think in terms of, okay, where we run Redis. By the way, Redis, MySQL, all those things, we run them in Docker containers, but on fixed hosts.

And that’s ideally indeed something we also want to… make more flexible and just say, okay, we shoot them within the same Mesos, Kubernetes, Docker Swarm, whatever, but with a certain volume constraint so that we know that our data isn’t going away or is at least following our database container. So, yeah, but… Again, they all say it’s too early for production, but we’re already experimenting with it and want to move from not only stateless applications, but also stateful applications within the same piece of infrastructure.

Mattias Geniar

Okay, that’s fascinating. Nils, I think this is time to wrap up. I want to thank you very much for your time and a very interesting conversation.

If people would like to get in touch with you, how can they find you online?

Nils De Moor

Mattias Geniar

I’ll add all those links in the show notes. Just sort of a mandatory plug here. If you like listening to this episode, I would really appreciate leaving a feedback or a rating in iTunes.

I know it’s iTunes, but apparently if you’re podcasting, that’s about the only dictionary that really matters. I’ll add links in the show notes as well. So if you have five minutes, I’d appreciate it if you could just write a small summary, hit those five stars and have a big thanks on my part.

Okay, Nils, thank you very much. I think you mentioned it earlier. The idea of orchestrators is a topic on its own, and I would happily have you as a guest when we discuss that.

Nils De Moor

All right, thank you. Thank you for having me. Take care, Nils.

Take care. Bye-bye.

An introduction to Docker with Nils De Moor

Shownotes for episode 2, published Wednesday, 26 May 2016

Shownotes#

Transcript