Config Management Camp: Kubernetes, Sysdig & Mgmt

WhisperX large-v3 + pyannote diarization, lightly edited.

Mattias Geniar

Hi there and welcome to another episode of Syscast. My name is Mattias Geniar and this is a different episode. It’s not the usual one where I talk with a guest and I interview them.

It’s me having one big monologue. So if you don’t like to hear me talk, this is a very good time to be skipping this episode because it’s only going to be me talking. Bit of an egocentric episode in that sense.

But the reason I’m doing this is I spend a lot of time inside of a car listening to podcasts, and I’m trying something else. I’m now recording a podcast inside of a car. If you can listen to one, why not record one?

The difficulty is having a guest here, so it’s just going to be me. But there’s another good reason I’m doing this, and it’s that there has been a very interesting conference for Linux sysadmins. held in Ghent a few days ago called Config Management Camp and I saw a lot of interesting presentations and projects and I’d like to give a bit of a summary so you can also get up to speed. First one, very big thanks to the organizers of Config Management Camp because it has been, I think, the third or fourth year in a row, and it exceeds expectations every time.

It’s amazing what kind of speakers they have. It’s config management, so you have entire rooms dedicated to talks about Puppet or Chef or Ansible, but you also have a lot of talks about the container orchestration space like Kubernetes. You have the guys for Foreman and Rudder sponsoring and giving talks.

Lots of interesting geeky things. Now, the things that I definitely remembered from the conference, one was a very interesting presentation about Sysdig. Sysdig is what I’d like to call esterase on steroids.

I think their catchphrase is a bit different. But it essentially is a tool to debug applications. Now, the demo they gave was Sysdig working in combination with a Kubernetes cluster.

Kubernetes is… an orchestration tool so you run containers with it but that means that you spawn a docker or a rocket container they don’t really know on which hardware it’s going to be running and troubleshooting that so you have to troubleshoot one kubernetes then you have to troubleshoot the container stuff and then you have to look inside of the container to see what your application is doing that’s not making things easier for you. A tool like Sysdig appears to be helping in that sense that they load an additional kernel module on each of your hypervisors that intercepts in a non-blocking way all of the system calls that are happening between the kernel and everything in user space. what’s interesting there is that it has it is aware of things like containers and it is aware of namespacing and everything that goes on in there so you can write a command line tool that has a lot of parameters that can filter out only the things you would like to see so the demo they gave which kind of blew me away was one where on one server they did a curl to a particular web server that was running as a container and the sysdig tool was able to intercept the client so the curl command all the way through the kubernetes ip tables mangling and nothing inside of the container looking at what nginx is effectively doing and then tracing that all the way back so nginx generating its response handing that back off to the container and to kubernetes and that being sent towards the client Now, if you’ve ever had to troubleshoot something like that or anything behind a load balancer and a couple of web servers, that gets tricky really fast with lots of different servers to hop into or correlations you have to make in your Kibana interface. It gets tricky.

And now, Kubernetes was able, sorry, Sysdig, was able to make that really easy and really straightforward it allowed you to see in a single command everything that curl did everything that iptables and its nothing did all the way inside of the container inside of the namespace of that container and look at what files was nginx loading what sockets was it opening what kind of dns resolving did it do you see the entire front to back s-trace essentially with a slightly better interface and layout but for debugging tools like that it was immensely powerful so i’ll be looking at sysdig a bit more when i’m troubleshooting the only thing i have a personal issue is perhaps a strong word but reservations with is the fact that it has to load an additional kernel module it’s something that can go wrong even if it is in a non-blocking way just sending out events and receiving events it’s probably stable but you know gut feeling says that it’s a bit tricky to be loading additional kernel modules Having said that, the demo was immensely powerful. The kind of thing that if I were to debug it, it would be spending hours if not days to find the root cause. And their obviously well-prepared demo gave the results and what happened in a matter of minutes.

Obviously it’s prepared so they had all their commands and it’s a stage environment. But still, Sysdig is a really powerful debugging tool. And if you have something like Kubernetes or any other orchestration tool running, I think it can be very powerful in tandem with that.

Especially being aware of namespacing, being aware of Kubernetes, etc. So have a look at that one. It’s definitely looking good.

Speaking of Kubernetes, there are actually a lot of container orchestration tools out there. And it looks like Kubernetes is going to be the clear winner in that sense. So Kubernetes comes from the Borg tool that Google built.

And this is the open source variant of it. So it has a lot of proven technology and looks like everyone’s starting to implement Kubernetes and dropping other container orchestration tools. So if you’re looking at running lots of Docker containers and you are stuck with the default Docker swarm or Docker compose tools, I think Kubernetes is going to be a solid and reliable extension to your container infrastructure.

So let’s declare Kubernetes the clear winner here. Obviously, this is all personal, but I think Kubernetes is the clear winner. Switching topics a bit, another set of interesting presentations that I saw were around a new config management tool called MGMT.

MGMT stands for management. It’s a next generation management tool, so a bit of a hypey term. But it learns from other tools like Ansible, Puppet and Chef and tries to combine the best parts there and remove some of the bottlenecks that those config management tools have.

The two things that really stuck out to me were, one, MGMT, so the new tool, is able to run config management in parallel, meaning if you now have something like Puppet, it will check resources and validate if they are in line with your config one at a time. So if you want to install 15 packages, manage a couple thousand config files, and some more services, it’ll check those one at a time to see if it’s in line with what you want. That obviously doesn’t scale very well if you’re starting to manage a lot of services on a system.

So what was interesting for MGMT is that it tries to define a graph of all your resources you want to manage and it’ll try to determine which it can do in parallel and which it has to do one at a time because they are dependent on each other. So for instance it doesn’t have much there isn’t much point in managing a couple of config files at the same time if one config depends on the other so there’s still a dependency and ordering going on but it can speed up things significantly especially for things like package installs so other config management tools will install one package at a time meaning install the apache web server and then the package manager stops again you want to install another tool so package manager starts again installs the tool stops it again That doesn’t really improve performance much. So what MGMT does, it will bundle all of those package installs together, install them with one big group install, and the package manager only needs to start once.

This was a couple of stage demos again, but it shows that there’s a lot of potential going on here by parallelizing most of that config management. Now, I run Puppet myself mostly, which means, if you’re familiar or not, if you run your Puppet agent to apply your config, it’ll most likely only consume a single CPU core, and if you’re lucky, it’ll consume 100% of that CPU core, so single-threaded and consuming as much CPU as possible within that thread. But, I think if you’re going to be looking at a tool like MGMT, if you have an 8-core machine, if all things go well, you’ll be consuming all of those 8 cores.

So there’s a potential high CPU impact there for applying your configuration. But MGMT has another fix for that, which is the second amazing part of this good looking tool. And it’s that once the run is completed, so once your MGMT run has defined or indexed all the resources it wants to manage and it has started getting everything in line, it’ll keep running as a daemon on your system.

Not just a daemon like your Chef or Puppet agents that run every 30 minutes. No, it’s a daemon that will hook into all of the files and resources and packages that you want to manage and it will subscribe to events emitted by those resources. That means it will use something like the inotify system where every time a file changes, with either permissions, ownership or the content, That emits an event and MGMT subscribes to those events.

So if you remove a file, if you change the config of a file, that event gets sent to MGMT and it’ll respond within milliseconds to get that file back into the state you wanted it to be. That also applies to packages. onto services so every time a package is installed or removed that emits an event mgmt catches that it determines that this was not supposed to happen this is not what the system was supposed to be and it automatically either reinstalls or removes or changes the package back to what your config management wanted it to be this also means that if something like this can run eternally in the background just listening to events there is no more reason to be running an agent run every 30 minutes or every hour to get your system checked or in line this just happens constantly so you’re no longer having to check each and every individual resource every 30 minutes which can be very time consuming and resource consuming But you can just run a daemon listening to events and fixing everything as soon as something changes that is not supposed to. So I think especially the second part where it runs as a daemon and listening to events is a very powerful idea.

I hope that it scales to servers where there are a couple of thousands if not tens of thousands resources to be managed. Time will tell. Right now, MGMT is, I think, in sort of a technology preview state.

It works, but there are still some rough edges and some resources that you can’t yet manage. There’s also work being done to have your existing puppet infrastructure, so all of the puppet code you already have, be interpreted by MGMT. What actually happens is all of the puppet code is being exported as a YAML.

That YAML is being fed into MGMT and that will eventually get your system into a state that works. I haven’t yet tested this but this is one of those things that in theory looks great where you can have your existing code base just being exported as YAML and being fed into MGMT that can then be your config management one of the additional things that james the person who originally started the project highlighted as well was that mgmt is now also being integrated into tools like clusterfs so if you are in if you’re working with a project or you’re developing one where you have a lot of orchestration, if not coordination between tools that you need. MGMT can also help there to get your state of your distributed system in check.

So I think one of the futures of MGMT is definitely going to be a library that can be included into different tools like ClusterFS. perhaps even replace traditional setups like CoroSync to do or etcd or whatever to do some kind of service discovery and state management. So if MGMT is not going to be succeeding as a traditional config management like the way we use Chef or Puppet today, I think it’ll definitely succeed in the space where it can be integrated as a library into clusterFS, into things like failover setups etc. By the way, MGMT is a very interesting tool being developed by Red Hat, something to keep an eye on because it might blow your mind if you see the demos.

I hope that they manage to support it and I think it’ll sort of depend on us as well to contribute to it. So as for MGMT, another couple of interesting topics that I saw were related to just config management in general. In general, I mean, if you’re looking at config management today, more than likely it is to be managing your own servers or your own infrastructure, meaning you’ll be installing your web servers, your… your docker engines your my sql servers etc inside of a ansible playbook chef module puppet module whatever but more and more our businesses are relying on external systems we’re using external apis that are crucial to the business we want to support whether that is your email being in in Google’s cloud, whether that is third-party APIs you are using for push notifications or for sending emails.

All of those systems are adding up complexity, but they are also adding up reliability issues. So you’re going to be relying on external third-party resources, that if they don’t work will severely or potentially severely impact your business which means if you’re a sysadmin your task is probably to support the business which you usually do by writing config management to keep your servers in check but you can also use some kind of config management to keep external resources in check so to prevent configuration drift to automate new setups Very interesting tools are being used here are things like Terraform. So Terraform is traditionally used to spin up AWS instances or DigitalOcean instances based on some kind of YAML structure or JSON structure that determines how those servers should look.

But you can also take that a few steps further and also manage just third party API configurations. And if you really want to go all out, you can also use this to say if you’re using API from company X, but you don’t really want to rely only on company X, but you also want a backup from company Y, Terraform can or be the system in which you define how those API should look and you just plug in multiple providers. So instead of saying my VM needs to run on AWS, it also needs to run on DigitalOcean, you could also do the same for APIs.

If you have an API for push notifications, you can define the state for that configuration, your credentials, your configs, perhaps your rate limiting, all of those things. And instead of using provider X, you can use provider Y and it’s just a different provider within Terraform. So it’s interesting to see that config management is extending beyond just managing servers and services.

It’s going to be used more and more to do the entire stack. By the entire stack I mean the business, everything that goes around it, your client support to your third-party APIs to your servers being managed. Config management is going to be more than just DevOps where you’re managing just your own stack.

You’re going to be managing other people’s stack as well and providing failover in that sense. Those three topics are what really stuck out to me. So that was the Sysdig example to debug Kubernetes clusters.

MGMT being a really interesting tool for parallelization of your config management and just running as an eternal demon listening to events and getting the state back. And just being config management where it’s no longer just your own stack but the external stacks as well. So three of those topics stood out.

I saw a lot more presentations. Some were good, some were really good, some were perhaps with different expectations from my end. But I’m really happy I went.

There was a lot of different discussions going on. I hope Config Management Camp can happen again next year. And if so, I’ll make sure to spread word and get as many people as possible there.

So that was it for this rather short summary of Config Management Camp. I mentioned it at the beginning. I’m recording this inside of a car.

That means background noise, etc. And I’m very curious to hear your thoughts on if this format is worth it. So are you listening to this podcast?

If so, send me a tweet, send me an email. And if you’re listening to it, can you bear listen to this a few more episodes because if the answer is no then there’s really no point in me recording anything in the car i’ll be better off listening to podcasts and just chilling but if this is a format where you can appreciate it and you can hear past the noise i’d love to hear it and i plan to keep this up so i i’m looking for feedback if this works or not if it does absolutely zero feedback then i’ll probably just keep recording this assuming that no news is good news or that there are just no listeners that’s also a possibility Either way, in that case, it doesn’t really matter what kind of quality this is, is it? But I’d love to hear your feedback.

Should I be doing this? Yes or no? Drop me an email, drop me a tweet.

I’ll add my contact details again in the show notes and I’ll add links to all of the different tools and if possible, presentations and videos that I saw as well. Hopefully they can blow your mind as well. So if you did not unsubscribe from this episode or this podcast just yet, Then we will or I will talk to you next time.

Take care. Bye bye.

Config Management Camp: Kubernetes, Sysdig & Mgmt

Shownotes for episode 7, published Wednesday, 9 Feb 2017

Shownotes#

Transcript