CentOS, Heartbeat, DRBD & NFS: umount failed, device is busy

Want to help support this blog? Try out Oh Dear, the best all-in-one monitoring tool for your entire website, co-founded by me (the guy that wrote this blogpost). Start with a 10-day trial, no strings attached.

We offer uptime monitoring, SSL checks, broken links checking, performance & cronjob monitoring, branded status pages & so much more. Try us out today!

Mattias Geniar, February 03, 2011

Follow me on Twitter as @mattiasgeniar

If you’re playing around with Heartbeat, DRBD & NFS you might find the following annoying bug in your log files.

Filesystem[5133]:INFO: Running stop for /dev/drbd0 on /nfs

Filesystem[5133]:INFO: Trying to unmount /nfs

lrmd[4616]:info: RA output: (FS_repdata:stop:stderr) umount: /nfs: device is busy

lrmd[4616]:info: RA output: (FS_repdata:stop:stderr) umount: /nfs: device is busy

Filesystem[5133]:ERROR: Couldn’t unmount /nfs; trying cleanup with SIGTERM

Filesystem[5133]:INFO: No processes on /nfs were signalled

lrmd[4616]: info: RA output: (FS_repdata:stop:stderr) umount: /nfs: device is busy

umount: /nfs: device is busy

Filesystem[5133]: ERROR: Couldn’t unmount /nfs; trying cleanup with SIGTERM

Filesystem[5133]: INFO: No processes on /nfs were signalled

lrmd[4616]:info: RA output: (FS_repdata:stop:stderr) umount: /nfs: device is busy

Filesystem[5133]: ERROR: Couldn’t unmount /nfs; trying cleanup with SIGTERM

Filesystem[5133]: INFO: No processes on /nfs were signalled

lrmd[4616]: info: RA output: (FS_repdata:stop:stderr) umount: /nfs: device is busy

Filesystem[5133]: ERROR: Couldn’t unmount /nfs; trying cleanup with SIGKILL

Filesystem[5133]: INFO: No processes on /nfs were signalled

lrmd[4616]: info: RA output: (FS_repdata:stop:stderr) umount: /nfs: device is busy

umount: /nfs: device is busy

In my case, it was because the nfs service (/etc/init.d/nfs) was told to stop, but didn’t clean all processes cleanly. This caused the stop for DRBD to fail, since it still had file handles pointing to that particular mount.

I had to edit the init.d script to change this. You can test this, by just running /etc/init.d/nfs stop, and searching for any remaining nfs children that are still alive and holding a lock on your DRBD device.

This was the original /etc/init.d/nfs script:

[snip]

echo -n $"Shutting down NFS mountd: "

killproc rpc.mountd

echo

echo -n $"Shutting down NFS daemon: "

killproc nfsd 2

echo

Which didn’t kill every nfs process, and made every dependent action fail. Here’s the current change.

echo -n $"Shutting down NFS mountd: "

killproc rpc.mountd

echo

echo -n $"Shutting down NFS daemon: "

**killproc nfsd 2

# Force kill if the above didn’t go as planned

sleep 5

killproc nfsd -KILL**

echo

This will properly kill all your nfs-processes, so the rest of the heartbeat resources can be configured.

Want to subscribe to the cron.weekly newsletter?

I write a weekly-ish newsletter on Linux, open source & webdevelopment called cron.weekly.

It features the latest news, guides & tutorials and new open source projects. You can sign up via email below.

No spam. Just some good, practical Linux & open source content.