Вы находитесь на странице: 1из 3

MySQL Phantom Locks

Created by William Moran, last modified on Aug 21, 2013


Go to start of metadata

We've recently witnessed behavior from MySQL whereby locks are not released. There
are two theories as to why this is happening:
1.MySQL just sucks and gets its lock structures confused
2.Our applications are starting transactions, taking out locks, then leaving the
transaction uncommitted.

The next time this comes up, it's probably worth it to capture the output of SHOW
ENGINE INNODB STATUS to determine if there is a long-running transaction causing
the problem.

The initial symptom is basically a lot of queries throwing errors about lock wait
timeouts. To trace the issue, first use innotop to see if there are locks hanging
out for a long time:

___________________________________________ InnoDB Locks


____________________________________________
ID Type Waiting Wait Active Mode DB Table Index Ins
Intent Special
818823 RECORD 1 00:51 00:51 X ImdxTest FAXLog PRIMARY
0 rec but not gap
818823 TABLE 0 00:51 00:51 IX ImdxTest FAXLog
0
818823 RECORD 1 00:51 00:51 X ImdxTest FAXLog PRIMARY
0 rec but not gap
818827 RECORD 1 00:47 00:47 X ImdxTest Tablets PRIMARY
0 rec but not gap
818827 TABLE 0 00:47 00:47 IX ImdxTest Tablets
0
818827 RECORD 1 00:47 00:47 X ImdxTest Tablets PRIMARY
0 rec but not gap
818187 RECORD 1 00:18 00:18 X ImdxTest FAXLog PRIMARY
0 rec but not gap
818187 TABLE 0 00:18 00:18 IX ImdxTest FAXLog
0
818187 RECORD 1 00:18 00:18 X ImdxTest FAXLog PRIMARY
0 rec but not gap
818829 TABLE 0 00:00 05:01 IX ImdxTest AppLogs
0
818829 TABLE 0 00:00 05:01 IX ImdxTest Tablets
0
818829 RECORD 0 00:00 05:01 X ImdxTest Tablets PRIMARY
0 rec but not gap
818829 RECORD 0 00:00 05:01 X ImdxTest Tablets PRIMARY
0 rec but not gap
818829 TABLE 0 00:00 05:01 IX ImdxTest FAXLog
0
818829 RECORD 0 00:00 05:01 X ImdxTest FAXLog PRIMARY
0 rec but not gap
818829 RECORD 0 00:00 05:01 X ImdxTest Tablets PRIMARY
0 rec but not gap
818829 RECORD 0 00:00 05:01 X ImdxTest Tablets PRIMARY
0 rec but not gap
818829 RECORD 0 00:00 05:01 X ImdxTest Tablets PRIMARY
0 rec but not gap
818829 RECORD 0 00:00 05:01 X ImdxTest Tablets PRIMARY
0 rec but not gap

Notice, in this example, that process 818829 has been holding locks on the Tablets
table for over 5 minutes. This lines up with the errors that are complaining that
they're timing out trying to get a lock on the Tablets table.

Next, see if that process is actually doing anything that would justify holding the
lock that long:

mysql> show processlist;


+--------+-------------+---------------------+----------+-------------+---------
+-----------------------------------------------------------------------
+----------------------------------------------------------------------------------
--------------------+
| Id | User | Host | db | Command | Time |
State | Info
|
+--------+-------------+---------------------+----------+-------------+---------
+-----------------------------------------------------------------------
+----------------------------------------------------------------------------------
--------------------+
| 4 | replication | 10.129.18.36:53927 | NULL | Binlog Dump | 2091922 |
Master has sent all binlog to slave; waiting for binlog to be updated | NULL
|
| 5 | replication | 10.129.18.33:50822 | NULL | Binlog Dump | 2091922 |
Master has sent all binlog to slave; waiting for binlog to be updated | NULL
|
| 6 | replication | 10.129.18.34:53558 | NULL | Binlog Dump | 2091922 |
Master has sent all binlog to slave; waiting for binlog to be updated | NULL
|
| 519335 | emr_jsp | 10.129.18.196:44609 | ImdxTest | Sleep | 4 |
| NULL
|
| 519339 | emr_jsp | 10.129.18.196:44617 | ImdxTest | Sleep | 3 |
| NULL
|
...
| 818827 | emr_jsp | 10.129.18.164:51204 | ImdxTest | Sleep | 1 |
| NULL
|
| 818828 | emr_jsp | 10.129.18.164:51205 | ImdxTest | Sleep | 351 |
| NULL
|
| 818829 | emr_jsp | 10.129.18.164:51206 | ImdxTest | Sleep | 401 |
| NULL
|
| 818830 | emr_jsp | 10.129.18.164:51207 | ImdxTest | Sleep | 175 |
| NULL
|
| 818831 | emr_jsp | 10.129.18.164:51208 | ImdxTest | Sleep | 407 |
| NULL
|
| 818928 | emr_jsp | 10.129.18.161:53978 | ImdxTest | Sleep | 88 |
| NULL
|
| 818930 | jolson | 10.51.4.33:60027 | ImdxTest | Sleep | 0 |
| NULL
|
| 818999 | emr_jsp | 10.129.18.193:53468 | ImdxTest | Sleep | 3 |
| NULL
|
| 819002 | root | localhost | NULL | Query | 0 |
NULL | show
processlist
|
| 819003 | root | localhost | NULL | Sleep | 1 |
| NULL
|
+--------+-------------+---------------------+----------+-------------+---------
+-----------------------------------------------------------------------
+----------------------------------------------------------------------------------
--------------------+

Notice that process 818829 has been sleeping for 401 seconds. Suspicious.

It would appear that killing 818829 releases the lock and allows things to return
to normal. However, this doesn't help us understand the problem.

The next time this situation occurs, we should capture the output of SHOW ENGINE
INNODB STATUS and determine whether the offending process is stuck because of a bug
in MySQL, or whether the application has started a transaction, then forgotten
about it. I suspect that it's a bug in MySQL, since the last time this happened,
@Jon Burkhart investigated and determined that the queries under suspicious weren't
running in explicit transactions.

Вам также может понравиться