The async Puppet pattern

Mattias Geniar, Tuesday, May 17, 2016

I'm pretty sure this isn't tied to Puppet and is probably widely used by everyone else, but it only occurred to me recently what the structural benefits of this pattern are.

Async Puppet: stop fixing things in one Puppet run

This has always been a bit of a debated topic, both for me internally as well as in the Puppet community at large: should a Puppet run be 100% complete after the first run?

I'm starting to back away from that idea, having spent countless hours optimising my Puppet code to have the "one-puppet-run-to-rule-them-all" scenario. It's much easier to gradually build your Puppet logic in steps, each step activating when the next one has caused its final state to be set.

What I'm mostly seeing this scenario shine in is the ability to automatically add monitoring from within your Puppet code. There's support for Nagios out of the box and I contributed to the zabbixapi ruby gem to facilitate managing Zabbix host and templates from within Puppet.

Monitoring should only be added to a server when there's something to monitor. And there's only something to monitor once Puppet has done its thing and caused state on the server to be as expected.

Custom facts for async behaviour

So here's a pattern I particularly like. There are many alternatives to this one, but it's simple, straight forward and super easy to understand -- even for beginning Puppeteers.

  1. A first Puppet run starts and installs Apache with all its vhosts
  2. The second Puppet run starts and gets a fact called "apache_vhost_count", a simple integer that counts the amount of vhosts configured
  3. When that fact is a positive integer (aka: there are vhosts configured), monitoring is added

This pattern takes 2 Puppet runs to be completely done: the first gets everything up-and-running, the second detects that there are things up-and-running and adds the monitoring.

Monitoring wrappers around existing Puppet modules

You've probably done this: you get a cool module from Forge (Apache, MySQL, Redis, ...), you implement it and want to add your monitoring to it. But how? It's not cool to hack away in the modules themselves, those come via r10k or puppet-librarian.

Here's my take on it:

  1. Create a new module, call it "monitoring"
  2. Add custom facts in there, called has_mysql, has_apache, ... for all the services you want
  3. If you want to go further, create facts like apache_vhost_count, mysql_databases_count, ... to count the specific instance of each service, to determine if it's being used or not.
  4. Use those facts to determine whether to add monitoring or not:
    if ($::has_apache > 0) and ($::apache_vhost_count > 0) {
      @@zabbix_template_link { "zbx_application_apache_${::fqdn}":
        ensure   => present,
        template => 'Application - PHP-FPM',
        host     => $::fqdn,
        require  => Zabbix_host [ $::fqdn ],
      }
    }
        

Is this perfect? Far from it. But it's pragmatic and it gets the job done.

The facts are easy to write and understand, too.

Facter.add(:apache_vhost_count) do
  confine :kernel => :linux
  setcode do
    if File.exists? "/etc/httpd/conf.d/"
      Facter::Util::Resolution.exec('ls -l /etc/httpd/conf.d | grep \'vhost-\' | wc -l')
    else
      nil
    end
  end
end

It's mostly bash (which most sysadmins understand) -- and very little Ruby (which few sysadmins understand).

The biggest benefit I see to it is that whoever implements the modules and creates the server manifests doesn't have to toggle a parameter called enable_monitoring (been there, done that) to decide whether or not that particular service should be monitored. Puppet can now figure that out on its own.

Detecting Puppet-managed services

Because some services are installed because of dependencies, the custom facts need to be clever enough to understand when they're being managed by Puppet. For instance, when you install the package "httpd-tools" because it contains the useful htpasswd tool, most package managers will automatically install the "httpd" (Apache) package, too.

Having that package present shouldn't trigger your custom facts to automatically enable monitoring, it should probably only do that when it's being managed by Puppet.

A very simple workaround (up for debate whether it's a good one), is to have each Puppet module write a simple file to /etc/puppet-managed in each module.

$ ls /etc/puppet-managed
apache mysql php postfix ...

Now you can extend your custom facts with the presence of that file to determine if A) a service is Puppet managed and B) if monitoring should be added.

Facter.add(:has_apache) do
  confine :kernel => :linux
  setcode do
    if File.exists? "/sbin/httpd"
      if File.exists? "/etc/puppet-managed/apache"
        # Apache installed and Puppet managed
        true
      else
        # Apache is installed, but isn't Puppet managed
        nil
      end
    else
      # Apache isn't installed
      nil
    end
  end
end

(example explicitly split up in order to add comments)

You may also be tempted to use the defined() (manual) function, to check if Apache has been defined in your Puppet code and then add monitoring. However, that's dependent on the resource order in which it's evaluated.

Your code may look like this:

if (defined(Service['httpd']) {
   # Apache is managed by Puppet, add monitoring ? 
}

Puppet's manual explains the big caveat though:

Puppet depends on the configuration’s evaluation order when checking whether a resource is declared.

In other words: if your monitoring code is evaluated before your Apache code, that defined() will always return false.

Working with facter circumvents this.

Again, this pattern isn't perfect, but it allows for a clean separation of logic and -- if your team grows -- an easier way to separate responsibilities for the monitoring team and the implementation team to each have their own modules with their own responsibilities.



Hi! My name is Mattias Geniar. I'm a Support Manager at Nucleus Hosting in Belgium, a general web geek & public speaker. Currently working on DNS Spy & Oh Dear!. Follow me on Twitter as @mattiasgeniar.

Share this post

Did you like this post? Will you help me share it on social media? Thanks!

Comments

Tom Wednesday, May 18, 2016 at 11:40 - Reply

Most of what you’re trying fix this way can be solved more elegantly using the roles & profile pattern and relationship metaparameters.

My 2cts.


    Mattias Geniar Wednesday, May 18, 2016 at 11:51 - Reply

    I pondered this particular case too while writing the post, but that makes – especially the monitoring example – heavily tied into the roles & profiles. Nothing wrong with that, but having independent classes being triggered by Puppet facts gives it a more loosely-coupled feeling.

    This is sort of the microservices of Puppet. :P


Tom Wednesday, May 18, 2016 at 11:49 - Reply

Also check out (/var/lib/puppet/cache|/opt/puppetlabs/puppet/cache/state)/classes.txt.
Anything managed by puppet is already listed there. No need to reinvent the wheel. :)


Rhommel Thursday, May 19, 2016 at 02:12 - Reply

I quite agree with your approach, although tbh this can trigger so many anti-patterns and breaks puppet idempotency which is something that I don’t want to loose


Vasili Monday, December 18, 2017 at 03:16 - Reply

This is very cool, and I like the idea of folder presence to determine if a service is puppet managed

but this pattern is not async at all, since you’re depending waiting for one thing to run before starting to run the other, that’s synchronous 100%

Asynchonously running Puppet code would be quite a feat. You would have to run N manifests all at the same time, and have every resource in every manifest run at the same time, waiting for dependencies where necessary.

This would ideally cut down the time to apply state to whichever resource takes the longest which could be a significant improvement.

SaltStack operates this way by default, I recommend checking it out.


Leave a Reply

Your email address will not be published. Required fields are marked *