3 Years of Puppet Config Management: lessons learned

Mattias Geniar, Monday, November 17, 2014 - last modified: Tuesday, January 20, 2015

A little over 3 years ago, I started using Puppet as a config management system on my own servers. I should've started much sooner, but back then I didn't see the "value" in it. How foolish ...

Around 2011, when this all started, Puppet was at version 2.7. This means I've skipped the "hard years" of Puppet, where I hear a lot of pre-2.7 complains about. So what did I take away from Puppet after 3 years of heavy usage?

Be careful with Puppetlabs Repositories

The first lesson I learned fairly quickly, was that the Puppetlabs Yum/Apt repositories don't pin their versions. Meaning if you install the puppet package, in 2011 that was version 2.7 -- right now that's version 3.7.3. So image my surprise when working on Puppet code compatible with the 2.7 release, only to find a "yum update" to upgrade my Puppet client package to a new major version.

Once in the 3.x series, this wasn't much of problem anymore. There were new features and bugfixes, but nothing that broke backwards compatibility. I'm curious to see how Puppet Labs will handle the release of Puppet 4.x, if a yum update will blindly upgrade a Puppet 3.x package to the new Puppet 4.x version.

In the PHP community this has been "solved" (yes, quotes, as it's not a real fix) by appending the version numbers to the package. The PHP 5.2 to PHP 5.3 upgrade was pretty major and broke a lot of code, even though it was just a minor upgrade. Since then, most PHP packages are called php52, php53, php54, ... This way, the package you install will always stay in the same major version and only receive bugfixes or security updates. Perhaps Puppet Labs can start a puppet3 package (for the current release) and a new puppet4 package, for the new release?

Or use a different yum or apt repository location for the Puppet 4.x release, so that current users don't accidentally upgrade their Puppet 3.x client to an incompatible Puppet 4.x release.

Performance matters

When you just start out using Puppet, the typical Puppet run will take between 5 to 15 seconds. You're just managing a few resources after all. But as your configs grow, and you add more and more resources to your config management, things can start to slow down. You can use the Puppet Profiler (available in puppet apply) to find the bottlenecks.

Once a run starts to take more than 3 minutes, you should start to look into optimising your code. Keep in mind that Puppet is single-threaded, but usually runs at a full 100% of a CPU core for checking/verifying/managing its resources: files, packages, services, ... this all takes some effort.

The use of concat only made matters worse for my code: instead of managing a single File resource, to handle the configuration of a service, you can split it up into multiple parts. This makes sense from a code-reuse perspective, to keep things organised, but it also means that a single configuration file that previously consisted of 1 File resource, can now consist of 10 or more File resources, used to build the final configuration. If you're using this for webserver vhosts, where you can easily have a few hundreds of them, this starts to add up.

So be careful where you add resources in Puppet. Wherever possible, fall back to the OS built-in tools for managing the resources (ref.: logrotate, cronjobs, ...). While Puppet is an amazingly powerful tool, too much simply is too much.

Always purge conf.d directories

In hindsight, this sounds obvious. If you're creating a module for managing a service that uses a typical conf.d directory where configurations are dropped, make your Puppet definition of the directory purge unwanted configs. Here's an example of what happened to my Nginx module.

file { '/etc/nginx.conf.d':
  ensure => directory,
}

The above was the simplest definition of a directory possible. I dropped my Nginx configurations into the conf.d folder, reloaded Nginx and done. But whenever you perform a yum update nginx, the RPM package can decide to drop additional config files into its conf.d directory (like a default.conf file). Config files you may not have anticipated (like default vhost configs that bind to port :80). So always purge your conf.d folders.

file { '/etc/nginx.conf.d':
  ensure => directory,
  purge  => true,
}

This way, any file not managed by Puppet, will be automatically removed.

Reload services on conf.d changes

This is follow-up of the remark made previously: if Puppet will automatically remove unwanted files from the conf.d folder, the service should be reloaded. Otherwise, you may have unwanted configs still in active use by the service, that you may not want. So to expand the example above, add a Notify to the File resource that manages the directory. This way, the service does a clean reload whenever Puppet removes files.

file { '/etc/nginx.conf.d':
  ensure => directory,
  purge  => true,
  notify => Service [ 'something' ],
}

Renaming conf.d to puppet.d

When you're just starting out using Puppet, you probably don't have your server(s) for the full 100% managed in Puppet. So you'll end up with some services that are "puppet managed", and others that are not. How do you make this clear?

One way would be to use the naming scheme of your files and directories to also include the Puppet name. For instance, if you have a Puppet-generated list of configs that would normally live in a conf.d folder, make a new one with the name puppet.d and include that in your Service configs.

It's obvious that if a directory is named puppet.d, it's content is managed by Puppet. This also gives you the ability to "partly managed" a service: you can have a conf.d directory as well as a puppet.d directory, where both directories have their configs included. You just have to settle for a sane order in which they're loaded.

Input validation with stdlib

My first modules where ridiculously naive: they accepted the user input without looking at it. And in the early days, there wasn't much you could do for input validation either. But now there's Puppet Labs' stdlib module, that offers a great set of functions you can use in your Puppet modules and manifests to validate the user input.

You can use Regex's, validate file paths, ... While it isn't perfect, and you sometimes have to get creative with how you validate things, it is a must-have in your modules. If the input doesn't match your expected values, fail() so the Puppet catalog compilation halts (so the Puppet run hasn't even started then). In the error message, try to show the user input: this way, the user can clearly see what they entered, and why it didn't match the module's expectations.

if ! ($phpversion in [ '5.3', '5.4', '5.5', '5.6']) {
  fail("php_fpm module: only PHP 5.3 - 5.6 is supported. Your version ${phpversion} is not.")
}

The validate_* functions that stdlib offers fail automatically, if the input doesn't match. So there's no need for an if statement, you can just use the functions "as is".

validate_re($ensure, '^present$|^absent$')

The above validates if the $ensure parameter is either "present" or "absent", nothing else. If it's something else, the catalog compilation will halt.

Remember that the input validation happens server-side (the Puppet server).

When you write a module, you write an API interface

I already mentioned my first modules weren't much to look at. They worked, but they accepted so many parameters that only I understood them -- no one else. Since then, I've learned that when you create a Puppet Module, you're doing more than creating a Puppet Module.

You're trying to make the management of a particular service easier. That means you, the Puppet Module Writer, have to fully understand the service you're trying to automate. You have to know which direction you want to push the user in. Your Puppet Module will have less parameters than all the options the service has to offer, as you're using sane defaults and want to present the User of your module with an easy-to-use interface to that module.

You can accept a few parameters, that each trigger other logic. For instance, if you have an Apache module and you want to allow the user to change the Processing Model (prefork, itk, worker), you can have a simply parameter in your module.

class httpd (
  $model = $::httpd::params::model,
) {
...
}

class httpd::params {
  $model = 'prefork'
}

The Apache configs look different for the prefork than the itk model, but the user of your module shouldn't have to know this. You've abstracted away the complexities of running a prefork or an itk version of Apache. All your user needs is a simple parameter, without inside knowledge of how Apache works.

The parameter will install a different Apache package and remove the other ones, it will generate different Apache configs with additional code in them to assign users to the itk model, ... All these things will happen without the user noticing it.

Puppet modules will over-simplify things

With the example above, it's very easy to change a fundamental part of an Apache config. The person who wrote the module knows exactly what happens when you modify that parameter. The old httpd package will be removed and the httpd-itk package will be installed. Each request in Apache will be assigned their own user/group for execution.

There are a lot of fine details that go into making such a config, and making sure they work 100%. The person who wrote the module knows those details, he spent time debugging them and making sure they don't happen again.

But the person who uses the module has no idea what goes on behind the scenes (assuming they don't, you know, read the module code). This means the user of your module doesn't have the same level of knowledge of Apache as you do. It also means that they don't get to experience the troubleshooting marathon you've had to go through, to get to the knowledge-level you are now.

Puppet Modules are designed to simplify things, that's their ultimate purpose. But keep in mind that you need internal training to keep everyone on the same level, so that whenever the need to debug the application or service arises, your team knows what to do and they don't depend a 100% on the person who wrote the module.

Everything is a Puppet module

Even your manifests.

For reusability, your manifests should be as small as possible. The concept of Puppet Roles and Profiles comes from this simple methodology as well. As soon as you realise that everything you write into Puppet code should be a module, you'll achieve a much greater reusability ratio. You won't have much code duplication anymore in manifests, but everything will be in clean modules.

Code style matters

Puppet Labs has a valuable Style Guide for your code. This is something that is increasingly more valuable as your team grows and more people contribute code. It keeps things organized and styled properly.

There's a Ruby gem called puppet-lint that can more easily validate your code, to see if it adheres to the style guide. Since the last release, there is also an experimental "auto-fix" feature that will correct the most common style guide violations. For that, you can use the puppet-lint --fix modules/nginx/manifests/init.pp command.

Pre-commit hooks save the day

Since your code lives in a repository of some sorts (whether this is git, svn or mercurial -- it doesn't matter), any change goes through version control. To help you with that, you can have pre-commit hooks in your version management that first validate your code, and if it's invalid -- don't allow the commit. This way, you are sure to be committing code that at the very least compiles and follows the style guide.

I've got a pre-commit hook for your Puppet code (git) on my Github page.

Rspec testing

Another Ruby gem to help save the day is the rspec-puppet one. It allows you to write rspec tests for your Puppet code. This is again something that shows its value when working with multiple person on the same codebase, to keep things backwards-compatible. Your rspec tests make sure that the code you write isn't just valid Puppet code, but that the result of your code is what you expected it to do.

Sublime Text vs. Geppetto

Puppet Labs has taken over the Geppetto Project, an IDE based on Eclipse to write Puppet code. I've given it numerous tries, but for me -- it just didn't work. It was either too slow (and that's just Eclipse) or crashed too often.

I still stick to using Sublime Text 2 with the additional Puppet Syntax Highlighting. I don't have code auto-completion or other fancy features, but the speed of Sublime Text and the simple syntax highlighting are all I need.

Conclusion

I've only been using Puppet for 3 years (give or take), and I've seen a lot of tools be released that help you manage, organise and write Puppet code. I can only imagine what the next 3 years will bring!

If you have any other valuable tips, please feel free to post them in the comments.

And if you liked this article, maybe you'll like "Automating the Unknown": a config management story on using a CMS to automate an unknown technology.



Hi! My name is Mattias Geniar. I'm a Support Manager at Nucleus Hosting in Belgium, a general web geek & public speaker. Currently working on DNS Spy & Oh Dear!. Follow me on Twitter as @mattiasgeniar.

Share this post

Did you like this post? Will you help me share it on social media? Thanks!

Comments

Lennie Monday, November 17, 2014 at 15:34 - Reply

Have you tried Ansible or SaltStack, yet ?


    Mattias Geniar Monday, November 17, 2014 at 15:41 - Reply

    No, I’ve not given it a try. Once you’ve made a (rather large) investment into one config management, it’s hard to switch. The modules and logic you’ve built into Puppet are not always portable to other Config Management solutions.

    It’s on the todo-list though, as I’m sure that using other Config Management tools can teach us all new tricks on getting things done.


Lennie Monday, November 17, 2014 at 16:04 - Reply

Maybe I should explain:

Most of the time what I actually need is a provisioning system.

So what I end up is creating a provisioning system which can handle some of the configuration management as a side job.


ben Tuesday, November 18, 2014 at 05:52 - Reply

Even better, use a community supported module so you don’t have to reinvent the wheel. https://github.com/jfryman/puppet-nginx/blob/master/manifests/config.pp#L161

With regards to “I think Puppet always runs all the modules,” it always enforces all the configuration you’ve defined for that node. How else could you be confident that everything was configured the way you expect it to be?


    Mattias Geniar Tuesday, November 18, 2014 at 09:47 - Reply

    Very true indeed, you can always check the already developed modules. But, how are you supposed to learn Puppet module development, if you never write a module on your own? I think in some cases, it’s OK to reinvent the wheel – but you have to admit defeat if your own module is lacking on functionality and security features than other community-supported modules.


Sergio Tuesday, November 18, 2014 at 07:27 - Reply

Neat write-up !!! I really like you mentioned you started with your own servers. I started with Puppet the same way, but it turned out that if I wasn’t working on my servers for a long while (let’s say months) I’d forget the Puppet syntax.

Then I started using Ansible for my personal servers and tinkering with Vagrant, this worked for me. I mean, even after long periods without doing anything, I can read YAML, and it doesn’t take that long to understand what I was doing. Anyway, I have a question for you:

How do you manage system tools or packages that are no services?

Let’s say you ssh into a server, you realize that tcpdump is not installed. Should you always use Puppet? Do you have a puppet module like essential tools ?

I’ve tried to force me to do this, but I always end installing by hand. Any recommendation on this?

In any case, I guess the important thing is to use any configuration management tool that fits your need. Once again, nice review, I ever go back to Puppet, I will get back to this article.


    Mattias Geniar Tuesday, November 18, 2014 at 09:50 - Reply

    How do you manage system tools or packages that are no services?

    In my configs, we use a “default” module that we include on all systems, that contains the base set of packages we need: tcpdump, strace, lsof, dig, … Consider it a “base module” that gets loaded every time. If the package doesn’t need a config, I find it a lot of overkill to make a separate module out of it.

    Let’s say you ssh into a server, you realize that tcpdump is not installed. Should you always use Puppet? Do you have a puppet module like essential tools ?

    For me, this is a grey area: should Puppet check on every run if tcpdump is indeed installed? Or should it just be installed once, upon provisioning, and then be forgotten? Because every resource Puppet has to check, takes time and server resources. And chances are, nobody is going to delete the tcpdump package.

    In our case, we let Puppet handle the packages, so it gets checked on every run – but it’s one of the first things that we would reconsider if performance because too big of an issue.


Joaquin Menchaca Tuesday, May 15, 2018 at 17:37 - Reply

Just a clarification, rspec doesn’t validate puppet code, but rather the behavior (using ruby).


Jeferson Lemos Wednesday, January 30, 2019 at 13:19 - Reply

In my mind, many times we compare Ansible vc Puppet , however, for me they have differents purposes and one doesn’t discard the other. May you can use both in your environment.


brian Friday, April 12, 2019 at 21:21 - Reply

I feel like there is alot of wisdom in the knowledge, and the way you shared it. thanks alot for this, its very helpful, as a new puppet user.


Leave a Reply

Your email address will not be published. Required fields are marked *