Recovering from a DRBD split-brain scenario in heartbeat

Want to help support this blog? Try out Oh Dear, the best all-in-one monitoring tool for your entire website, co-founded by me (the guy that wrote this blogpost). Start with a 10-day trial, no strings attached.

We offer uptime monitoring, SSL checks, broken links checking, performance & cronjob monitoring, branded status pages & so much more. Try us out today!

Mattias Geniar, March 24, 2011

Follow me on Twitter as @mattiasgeniar

It’s a dreadful thing, but you’ll eventually run into a split brain scenario using heartbeat and DRBD where resources are started on both devices. In case of DRBD, the result will be something like this.

On Host #1

# cat /proc/drbd
0: cs:Standalone st:Primary/Secondary ds:UpToDate/Unknown C r---

On Host #2

# cat /proc/drbd
0: cs:WFConnection:Primary/Secondary ds:UpToDate/Unknown C r---

Before moving on, you need to decide which host will be the “survivor” and which one will be “sacrificed” as the victim. This will depend on your heartbeat setup, to determine which server holds the active resources. If data was still being written to the DRBD device on Node #1, that one will be the new master.

These actions are performed on the victim, the host you are making secondary. Changes on that host will be lost. So, after having decided which node will be secondary, stop the heartbeat service.

host2 # /etc/init.d/heartbeat stop

List the DRBD resource explicitly as “secondary”.

host2 # drbdadmin secondary [sharename]

Make sure the resources is disconnected.

host2 # drbdadm disconnect [sharename]

And tell it to discard the local data, since the info from the other node will be used.

host2 # drbdadm -- --discard-my-data connect [sharename]

After having done that, tell the active node to reconnect.

host1 # drbdadm connect [sharename]

That should put both DRBD instances back into sync.

If you have not already done so, you can modify your DRBD config to allow “handlers”, special scripts to be executed upon certain actions (such as: split brain, out of sync, …). The config could look like this.

# cat /etc/drbd.conf
resource resourcename {
   [snip]
   handlers {
      out-of-sync "/root/scripts/drbd_out_of_sync.sh $DRBD_RESOURCE";
      split-brain "/root/scripts/drbd_split_brain.sh $DRBD_RESOURCE";
   }
   [snip]
}

Modify the script paths to something custom, and you can have it execute different actions straight away (host isolation, automated recovery, alerting, …).

More information on the “Manual split brain recovery” page from DRBD.

Want to subscribe to the cron.weekly newsletter?

I write a weekly-ish newsletter on Linux, open source & webdevelopment called cron.weekly.

It features the latest news, guides & tutorials and new open source projects. You can sign up via email below.

No spam. Just some good, practical Linux & open source content.