Zabbix: debugging ‘Lock wait timeout exceeded; try restarting transaction’ on the ‘ids’ table

Want to help support this blog? Try out Oh Dear, the best all-in-one monitoring tool for your entire website, co-founded by me (the guy that wrote this blogpost). Start with a 10-day trial, no strings attached.

We offer uptime monitoring, SSL checks, broken links checking, performance & cronjob monitoring, branded status pages & so much more. Try us out today!

Mattias Geniar, April 12, 2012

Follow me on Twitter as @mattiasgeniar

I recently had fun troubleshooting an issue on a Zabbix server, where the logs would start to fill up with reports such as these.

...
23966:20120412:234746.094 [Z3005] query failed: [1205] Lock wait timeout exceeded; try restarting transaction [update ids set nextid=nextid+1 where nodeid=0 and table_name='events' and field_name='eventid']
23942:20120412:234750.128 slow query: 50.982490 sec, "update ids set nextid=nextid+1 where nodeid=0 and table_name='events' and field_name='eventid'"
zabbix_server [23942]: ERROR [file:db.c,line:1582] Something impossible has just happened.
...
23969:20120412:234752.153 [Z3005] query failed: [1205] Lock wait timeout exceeded; try restarting transaction [update ids set nextid=nextid+1 where nodeid=0 and table_name='events' and field_name='eventid']
23969:20120412:234752.153 slow query: 50.206625 sec, "update ids set nextid=nextid+1 where nodeid=0 and table_name='events' and field_name='eventid'"
zabbix_server [23969]: ERROR [file:db.c,line:1582] Something impossible has just happened.

The problem itself is the “Lock wait timeout exceeded; try restarting transaction", which is actually a MySQL problem in this case. It’s caused in part by InnoDB row-level locks on the very same row that the DB Syncers in Zabbix are trying to reach.

The query in particular, “update ids set nextid=nextid+1 where nodeid=0 and table_name='events' and field_name='eventid'”, is Zabbix’s way of insuring a unique ID for each event. Instead of using a custom method per database backend (sequences for PostgreSQL or auto_increment in MySQL), this implementation was chosing. In race conditions where a lot of events are fired at once, this causes quite long row-level locks on the very same row (‘events’) making the DBSyncer processes or others re-try the transaction over and over again, never completing.

A sort of workaround was to transform our database from a (relatively) slow innodb to a (relatively) faster MyISAM for the ‘ids’ table. Seeing as this table usually only contains around 40 rows, that are frequently updated, this seemed like a good solution.

mysql> ALTER TABLE ids ENGINE = MyISAM;
Query OK, 44 rows affected (0.06 sec)
Records: 44  Duplicates: 0  Warnings: 0

I believe a long-term solution would be for Zabbix Server to have knowledge of which database backend is being used (MySQL/PostgreSQL/…) and use the ID-incrementer provided by that database itself, instead of using a generic workaround such as this.

Want to subscribe to the cron.weekly newsletter?

I write a weekly-ish newsletter on Linux, open source & webdevelopment called cron.weekly.

It features the latest news, guides & tutorials and new open source projects. You can sign up via email below.

No spam. Just some good, practical Linux & open source content.