Zabbix: debugging ‘Lock wait timeout exceeded; try restarting transaction’ on the ‘ids’ table

Oh Dear monitors your entire site, not just the homepage. We crawl and search for broken pages and mixed content, send alerts when your site is down and notify you on expiring SSL certificates.

Start your free 10 day trial! »

Profile image of Mattias Geniar

Mattias Geniar, April 12, 2012

Follow me on Twitter as @mattiasgeniar

I recently had fun troubleshooting an issue on a Zabbix server, where the logs would start to fill up with reports such as these.

...
23966:20120412:234746.094 [Z3005] query failed: [1205] Lock wait timeout exceeded; try restarting transaction [update ids set nextid=nextid+1 where nodeid=0 and table_name='events' and field_name='eventid']
23942:20120412:234750.128 slow query: 50.982490 sec, "update ids set nextid=nextid+1 where nodeid=0 and table_name='events' and field_name='eventid'"
zabbix_server [23942]: ERROR [file:db.c,line:1582] Something impossible has just happened.
...
23969:20120412:234752.153 [Z3005] query failed: [1205] Lock wait timeout exceeded; try restarting transaction [update ids set nextid=nextid+1 where nodeid=0 and table_name='events' and field_name='eventid']
23969:20120412:234752.153 slow query: 50.206625 sec, "update ids set nextid=nextid+1 where nodeid=0 and table_name='events' and field_name='eventid'"
zabbix_server [23969]: ERROR [file:db.c,line:1582] Something impossible has just happened.

The problem itself is the “Lock wait timeout exceeded; try restarting transaction", which is actually a MySQL problem in this case. It’s caused in part by InnoDB row-level locks on the very same row that the DB Syncers in Zabbix are trying to reach.

The query in particular, “update ids set nextid=nextid+1 where nodeid=0 and table_name='events' and field_name='eventid'”, is Zabbix’s way of insuring a unique ID for each event. Instead of using a custom method per database backend (sequences for PostgreSQL or auto_increment in MySQL), this implementation was chosing. In race conditions where a lot of events are fired at once, this causes quite long row-level locks on the very same row (‘events’) making the DBSyncer processes or others re-try the transaction over and over again, never completing.

A sort of workaround was to transform our database from a (relatively) slow innodb to a (relatively) faster MyISAM for the ‘ids’ table. Seeing as this table usually only contains around 40 rows, that are frequently updated, this seemed like a good solution.

mysql> ALTER TABLE ids ENGINE = MyISAM;
Query OK, 44 rows affected (0.06 sec)
Records: 44  Duplicates: 0  Warnings: 0

I believe a long-term solution would be for Zabbix Server to have knowledge of which database backend is being used (MySQL/PostgreSQL/…) and use the ID-incrementer provided by that database itself, instead of using a generic workaround such as this.



Want to subscribe to the cron.weekly newsletter?

I write a weekly-ish newsletter on Linux, open source & webdevelopment called cron.weekly.

It features the latest news, guides & tutorials and new open source projects. You can sign up via email below.

No spam. Just some good, practical Linux & open source content.