3 Years of Puppet Config Management: lessons learned

A little over 3 years ago, I started using Puppet as a config management system on my own servers. I should’ve started much sooner, but back then I didn’t see the “value” in it. How foolish …

Around 2011, when this all started, Puppet was at version 2.7. This means I’ve skipped the “hard years” of Puppet, where I hear a lot of pre-2.7 complains about. So what did I take away from Puppet after 3 years of heavy usage?

Be careful with Puppetlabs Repositories

The first lesson I learned fairly quickly, was that the Puppetlabs Yum/Apt repositories don’t pin their versions. Meaning if you install the puppet package, in 2011 that was version 2.7 – right now that’s version 3.7.3. So image my surprise when working on Puppet code compatible with the 2.7 release, only to find a “yum update” to upgrade my Puppet client package to a new major version.

Once in the 3.x series, this wasn’t much of problem anymore. There were new features and bugfixes, but nothing that broke backwards compatibility. I’m curious to see how Puppet Labs will handle the release of Puppet 4.x, if a yum update will blindly upgrade a Puppet 3.x package to the new Puppet 4.x version.

In the PHP community this has been “solved” (yes, quotes, as it’s not a real fix) by appending the version numbers to the package. The PHP 5.2 to PHP 5.3 upgrade was pretty major and broke a lot of code, even though it was just a minor upgrade. Since then, most PHP packages are called php52, php53, php54, … This way, the package you install will always stay in the same major version and only receive bugfixes or security updates. Perhaps Puppet Labs can start a puppet3 package (for the current release) and a new puppet4 package, for the new release?

Or use a different yum or apt repository location for the Puppet 4.x release, so that current users don’t accidentally upgrade their Puppet 3.x client to an incompatible Puppet 4.x release.

Performance matters

When you just start out using Puppet, the typical Puppet run will take between 5 to 15 seconds. You’re just managing a few resources after all. But as your configs grow, and you add more and more resources to your config management, things can start to slow down. You can use the Puppet Profiler (available in puppet apply) to find the bottlenecks.

Once a run starts to take more than 3 minutes, you should start to look into optimising your code. Keep in mind that Puppet is single-threaded, but usually runs at a full 100% of a CPU core for checking/verifying/managing its resources: files, packages, services, … this all takes some effort.

The use of concat only made matters worse for my code: instead of managing a single File resource, to handle the configuration of a service, you can split it up into multiple parts. This makes sense from a code-reuse perspective, to keep things organised, but it also means that a single configuration file that previously consisted of 1 File resource, can now consist of 10 or more File resources, used to build the final configuration. If you’re using this for webserver vhosts, where you can easily have a few hundreds of them, this starts to add up.

So be careful where you add resources in Puppet. Wherever possible, fall back to the OS built-in tools for managing the resources (ref.: logrotate, cronjobs, …). While Puppet is an amazingly powerful tool, too much simply is too much.

Always purge conf.d directories

In hindsight, this sounds obvious. If you’re creating a module for managing a service that uses a typical conf.d directory where configurations are dropped, make your Puppet definition of the directory purge unwanted configs. Here’s an example of what happened to my Nginx module.

file { '/etc/nginx.conf.d':
  ensure => directory,
}

The above was the simplest definition of a directory possible. I dropped my Nginx configurations into the conf.d folder, reloaded Nginx and done. But whenever you perform a yum update nginx, the RPM package can decide to drop additional config files into its conf.d directory (like a default.conf file). Config files you may not have anticipated (like default vhost configs that bind to port :80). So always purge your conf.d folders.

file { '/etc/nginx.conf.d':
  ensure => directory,
  purge  => true,
}

This way, any file not managed by Puppet, will be automatically removed.

Reload services on conf.d changes

This is follow-up of the remark made previously: if Puppet will automatically remove unwanted files from the conf.d folder, the service should be reloaded. Otherwise, you may have unwanted configs still in active use by the service, that you may not want. So to expand the example above, add a Notify to the File resource that manages the directory. This way, the service does a clean reload whenever Puppet removes files.

file { '/etc/nginx.conf.d':
  ensure => directory,
  purge  => true,
  notify => Service [ 'something' ],
}

Renaming conf.d to puppet.d

When you’re just starting out using Puppet, you probably don’t have your server(s) for the full 100% managed in Puppet. So you’ll end up with some services that are “puppet managed”, and others that are not. How do you make this clear?

One way would be to use the naming scheme of your files and directories to also include the Puppet name. For instance, if you have a Puppet-generated list of configs that would normally live in a conf.d folder, make a new one with the name puppet.d and include that in your Service configs.

It’s obvious that if a directory is named puppet.d, it’s content is managed by Puppet. This also gives you the ability to “partly managed” a service: you can have a conf.d directory as well as a puppet.d directory, where both directories have their configs included. You just have to settle for a sane order in which they’re loaded.

Input validation with stdlib

My first modules where ridiculously naive: they accepted the user input without looking at it. And in the early days, there wasn’t much you could do for input validation either. But now there’s Puppet Labs’ stdlib module, that offers a great set of functions you can use in your Puppet modules and manifests to validate the user input.

You can use Regex’s, validate file paths, … While it isn’t perfect, and you sometimes have to get creative with how you validate things, it is a must-have in your modules. If the input doesn’t match your expected values, fail() so the Puppet catalog compilation halts (so the Puppet run hasn’t even started then). In the error message, try to show the user input: this way, the user can clearly see what they entered, and why it didn’t match the module’s expectations.

if ! ($phpversion in [ '5.3', '5.4', '5.5', '5.6']) {
  fail("php_fpm module: only PHP 5.3 - 5.6 is supported. Your version ${phpversion} is not.")
}

The validate_* functions that stdlib offers fail automatically, if the input doesn’t match. So there’s no need for an if statement, you can just use the functions “as is”.

validate_re($ensure, '^present$|^absent$')

The above validates if the $ensure parameter is either “present” or “absent”, nothing else. If it’s something else, the catalog compilation will halt.

Remember that the input validation happens server-side (the Puppet server).

When you write a module, you write an API interface

I already mentioned my first modules weren’t much to look at. They worked, but they accepted so many parameters that only I understood them – no one else. Since then, I’ve learned that when you create a Puppet Module, you’re doing more than creating a Puppet Module.

You’re trying to make the management of a particular service easier. That means you, the Puppet Module Writer, have to fully understand the service you’re trying to automate. You have to know which direction you want to push the user in. Your Puppet Module will have less parameters than all the options the service has to offer, as you’re using sane defaults and want to present the User of your module with an easy-to-use interface to that module.

You can accept a few parameters, that each trigger other logic. For instance, if you have an Apache module and you want to allow the user to change the Processing Model (prefork, itk, worker), you can have a simply parameter in your module.

class httpd (
  $model = $::httpd::params::model,
) {
...
}

class httpd::params {
  $model = 'prefork'
}

The Apache configs look different for the prefork than the itk model, but the user of your module shouldn’t have to know this. You’ve abstracted away the complexities of running a prefork or an itk version of Apache. All your user needs is a simple parameter, without inside knowledge of how Apache works.

The parameter will install a different Apache package and remove the other ones, it will generate different Apache configs with additional code in them to assign users to the itk model, … All these things will happen without the user noticing it.

Puppet modules will over-simplify things

With the example above, it’s very easy to change a fundamental part of an Apache config. The person who wrote the module knows exactly what happens when you modify that parameter. The old httpd package will be removed and the httpd-itk package will be installed. Each request in Apache will be assigned their own user/group for execution.

There are a lot of fine details that go into making such a config, and making sure they work 100%. The person who wrote the module knows those details, he spent time debugging them and making sure they don’t happen again.

But the person who uses the module has no idea what goes on behind the scenes (assuming they don’t, you know, read the module code). This means the user of your module doesn’t have the same level of knowledge of Apache as you do. It also means that they don’t get to experience the troubleshooting marathon you’ve had to go through, to get to the knowledge-level you are now.

Puppet Modules are designed to simplify things, that’s their ultimate purpose. But keep in mind that you need internal training to keep everyone on the same level, so that whenever the need to debug the application or service arises, your team knows what to do and they don’t depend a 100% on the person who wrote the module.

Everything is a Puppet module

Even your manifests.

For reusability, your manifests should be as small as possible. The concept of Puppet Roles and Profiles comes from this simple methodology as well. As soon as you realise that everything you write into Puppet code should be a module, you’ll achieve a much greater reusability ratio. You won’t have much code duplication anymore in manifests, but everything will be in clean modules.

Code style matters

Puppet Labs has a valuable Style Guide for your code. This is something that is increasingly more valuable as your team grows and more people contribute code. It keeps things organized and styled properly.

There’s a Ruby gem called puppet-lint that can more easily validate your code, to see if it adheres to the style guide. Since the last release, there is also an experimental “auto-fix” feature that will correct the most common style guide violations. For that, you can use the puppet-lint --fix modules/nginx/manifests/init.pp command.

Pre-commit hooks save the day

Since your code lives in a repository of some sorts (whether this is git, svn or mercurial – it doesn’t matter), any change goes through version control. To help you with that, you can have pre-commit hooks in your version management that first validate your code, and if it’s invalid – don’t allow the commit. This way, you are sure to be committing code that at the very least compiles and follows the style guide.

I’ve got a pre-commit hook for your Puppet code (git) on my Github page.

Rspec testing

Another Ruby gem to help save the day is the rspec-puppet one. It allows you to write rspec tests for your Puppet code. This is again something that shows its value when working with multiple person on the same codebase, to keep things backwards-compatible. Your rspec tests make sure that the code you write isn’t just valid Puppet code, but that the result of your code is what you expected it to do.

Sublime Text vs. Geppetto

Puppet Labs has taken over the Geppetto Project, an IDE based on Eclipse to write Puppet code. I’ve given it numerous tries, but for me – it just didn’t work. It was either too slow (and that’s just Eclipse) or crashed too often.

I still stick to using Sublime Text 2 with the additional Puppet Syntax Highlighting. I don’t have code auto-completion or other fancy features, but the speed of Sublime Text and the simple syntax highlighting are all I need.

Conclusion

I’ve only been using Puppet for 3 years (give or take), and I’ve seen a lot of tools be released that help you manage, organise and write Puppet code. I can only imagine what the next 3 years will bring!

If you have any other valuable tips, please feel free to post them in the comments.

And if you liked this article, maybe you’ll like “Automating the Unknown”: a config management story on using a CMS to automate an unknown technology.

Want to subscribe to the cron.weekly newsletter?

I write a weekly-ish newsletter on Linux, open source & webdevelopment called cron.weekly.

It features the latest news, guides & tutorials and new open source projects. You can sign up via email below.

No spam. Just some good, practical Linux & open source content.