Вы находитесь на странице: 1из 14

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-11, NO.

1, JANUARY 1985 67

ing for local computer networks," Commun. ACM, vol. 19, pp. Mustaque Ahamad received the B.E.(Hons.) degree in electTical engi-
395-404, July 1976. neering from Birla Institute of Technology and Science, Pilani, India,
[24] D. Menasce and R. Muntz, "Locking and deadlock detection in in 1981.
distributed databases," IEEE Trans. Software Eng., vol. SE-5, He is currently working toward the Ph.D. degree in computer science
pp. 195-202, May 1979. at the State University of New York at Stony Brook. His research in-
[251 J. McQuillan and D. Walden, "The ARPA network design de- terests include distributed programming languages, operating systems,
cisions," Comput. Networks, vol. 1, pp. 243-289, Aug. 1977. network protocols, and distributed algorithms.
[26] B. Nelson, "Remote procedure call," Dep. Comput. Sci., Carnegie-
Mellon Univ., Pittsburgh, PA, Tech. Rep. CMU-CS-81-119, May
1981.
[271 R. Thomas, "A solution to the concurrency control problem for
multiple copy databases," in Proc. IEEE Compon '78, 1978, pp.
56-62.
[281 H. Sturgis, J. Mitchell, and J. Israel, "Issues in the design and use
of a distributed file system," Oper. Syst. Rev., vol. 14, pp. 55-69, Arthur J. Bernstein (S'56-M'63-SM'78-F'81) received the Ph.D. degree
July 1980. from Columbia University, New York, NY.
[29] R. Smith, "The contract net protocol," in Proc. 1st Conf Dis- He is on the faculty of the Computer Science Department at the State
tributed Computing Systems, 1979, pp. 185-191. University of New York at Stony Brook. His current research interests
[30] R. Strom and S. Yemini, "NIL: An integrated language and sys- are in the area of distributed algorithms, concurrent programming, and
tem for distributed programming," in Proc. SIGPLAN '83 Symp. networks.
Programming Language Issues in Software Systems, 1983, pp. Dr. Bernstein was a member of the IEEE Distinguished Visitors
73-82. Program.

A Priority Based Distributed Deadlock Detection


Algorithm
MUKUL K. SINHA AND N. NATARAJAN

Abstract-Deadlock handling is an important component of transac- I. INTRODUCTION


tion management in a database system. In this paper, we contribute to
the development of techniques for transaction management by present- IN a database system, accesses to data items by concurrent
ing an algorithm for detecting deadlocks in a distributed database sys- transactions must be synchronized to preserve the consis-
tem. The algorithm uses priorities for transactions to minimize the tency of the database. Locking is the most common mecha-
number of messages initiated for detecting deadlocks. It does not con- nism used for access synchronization. When locking is used, a
struct any wait-for graph but detects cycles by an edge-chasing method.
It does not detect any phantom deadlock (in the absence of failures), group of transactions (two or more) may sometimes get in-
and for the resolution of deadlocks it does not need any extra computa- volved in a deadlock [5]: this is a situation in which each mem-
tion. The algorithm also incorporates a post-resolution computation ber of the group waits (indefinitely) for a data item locked by
that leaves information characterizing dependence relations of remain- some member transaction of the group. Deadlocks can be re-
ing transactions of the deadlock cycle in the system, and this will help
in detecting and resolving deadlocks which may arise in the future. An solved by aborting at least one of the transactions involved. A
interesting aspect of this algorithm is that it is possible to compute the simple scheme that can be used to break a deadlock is to use
exact number of messages generated for a given deadlock configuration. timeouts and abort transactions when they have waited for
The complexity is comparable to the best algorithm reported. We fi'rst more than a specified time interval after issuing a lock request.
present a basic algorithm and then extend it to take into account shared Alternatively, a deadlock can be detected using a specific al-
and exclusive lock modes, simultaneous acquisition of multiple locks,
and nested transactions. gorithm for this purpose and resolved by aborting at least one
of the transactions involved in the deadlock.
Index Terms-Deadlock, deadlock detection, distributed database, Using timeouts to handle deadlocks is only a brute force
nested transaction, priority, timestamp, transaction. technique. Since in practice, it is very difficult to choose a
proper timeout interval, this technique may result in unnleces-
Manuscript received February 25, 1984; August 28, 1984. sary transaction aborts. Another major drawback of this
The authors are with the National Centre for Software Development
and Computing Techniques, Tata Institute of Fundamental Research, scheme is that it cannot avoid cyclic restarts [16] ; i.e., a trans-
Bombay 400 005, India. action may repeatedly be aborted and restarted. In contrast

0098-5589/85/0100-0067$01.00 © 1985 IEEE


68 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-1l , NO. 1, JANUARY 1985

to the timeout technique, a deadlock detection scheme aborts explicitly [3], [13], [14]. In comparison to the algorithm
a transaction only when the transaction is involved in a dead- of Chandy and Misra [3], our algorithm has the following
lock. Most deadlock detection schemes [81, [9], [12], [15] advantages.
detect deadlocks by finding cycles in a transaction wait-for 1) In our scheme, a deadlock computation is initiated only
graph, in which each node represents a transaction, and a di- when an antagonistic conflict occurs. In contrast, in their
rected edge from one transaction to another indicates that the scheme, a computation is initiated whenever a transaction be-
former is waiting for a data item locked by the latter transac- gins to wait for another. Hence, our algorithm generates a
tion. In a distributed database system, the problem is, in es- fewer number of messages to detect a deadlock.
sence, of finding cycles in a distributed graph where no single 2) In our scheme, there is no separate phase for deadlock
site knows the entire graph. resolution.
The deadlock detection scheme presented in this paper does Our scheme has some similarities (e.g., initiation of deadlock
not construct any transaction wait-for graph, but follows the computation only when an antagonistic conflict occurs) with
edges of the graph to search for a cycle (called an edge-chasing the algorithm proposed by Moss [13]. However, in comparison
algorithm by Moss [131). It is assumed that each transaction to his scheme, our algorithm has the following advantages.
is assigned a priority in such a way that priorities of all transac- 1) In Moss' scheme, a transaction does not maintain any in-
tions are totally ordered. When a transaction waits for a data formation regarding transactions that wait for it, directly or in-
item locked by a lower priority transaction, we say that an directly. Hence, his scheme requires transactions to initiate
antagonistic conflict has occurred. When an antagonistic con- deadlock detection computations periodically. Thus, his
flict occurs for a data item, the waiting transaction initiates a scheme would, in general, require more messages and it is not
message to find cycles of transactions, in which each transac- possible to compute the exact number of messages generated
tion is waiting for a data item locked by the next. If the mes- before a deadlock is detected.
sage comes back to the initiating transaction, a deadlock cycle 2) In our scheme, a transaction continues to retain the above
is detected. information even after the resolution of a deadlock, and this in
Our algorithm presumes a point-to-point network with a re- turn speeds up detection and resolution of future deadlocks.
liable message communication facility, and it is not applicable 3) Our algorithm is less prone to detect phantom deadlocks
for detecting communication deadlocks [4], [141. that may involve nested transactions than Moss' scheme. In
The distinguishing features of the proposed deadlock detec- our scheme, a detected deadlock is made phantom only when
tion scheme are as follows. a waiting transaction aborts, either explicitly or implicitly. In
1) For a given deadlock cycle, it is possible to compute the contrast, in Moss' scheme, sometimes a detected deadlock is
exact number of messages that have been generated for the pur- made phantom even when an active transaction aborts, say due
pose of deadlock detection. If the number of messages gener- to some application considerations. We discuss this further in
ated is used as a complexity measure, the proposed algorithm Section VI-C
is not inferior to any of the other algorithms reported in the 4) In our scheme all messages have an identical short length
literature. whereas Moss' scheme has messages of varying lengths.
2) When a deadlock is detected, the detector has informa- In the following section, we introduce a distributed database
tion about the highest and the lowest priority transactions of model in order to set the context, and in Section III we de-
the cycle, and this can be used for deadlock resolution. Thus, scribe the basic distributed deadlock detection algorithm. We
resolution does not need any new computation. analyze the cost of the algorithm in Section IV. The basic al-
3) In the absence of failures (site failures or explicit abort of gorithm is applicable when only exclusive locks are used. How-
a waiting transaction by the user), it does not detect any phan- ever, it has been reported in the literature [9] that 80 percent
tom deadlock. of access is only for reading data. Taking this into account, we
4) Even after a transaction is aborted to resolve a deadlock, show in Section V how the basic algorithm can be modified to
other member transactions of the cycle continue to retain in- include share locks as well as simultaneous acquisition of mul-
formation about the remaining transactions. This, in turn, tiple locks. In Section VI, we describe a nested transaction
helps to detect, with fewer number of messages, deadlocks in model and extend the algorithm to detect and resolve dead-
which the remaining transactions (or any subset of them) may locks taking into account nested transactions. We conclude
get involved in the future. the paper with suggestions for further improving the algorithm.
5) The resolution scheme adopted guarantees progress of
computation, and avoids the problem of cyclic restart. II. THE DISTRIBUTED DATABASE MODEL
6) The basic algorithm can be easily extended to a locking A database is a structured collection of information. In a
scheme that provides both share locks and exclusive locks, and distributed database system, the information is spread across a
the scheme in which a transaction can acquire several locks collection of nodes (or sites) interconnected through a com-
simultaneously. munication network. Each node has a system-wide, unique
7) It can also be extended to detect and resolve deadlocks identifi'er, called the site-identification-number (site id, in
which may occur in an environment where transactions can be short), and nodes communicate through messages.
nested within other transactions. All messages sent arrive at their destinations in finite time,
In the literature, several authors have proposed algorithms for and the network filters duplicate messages and guarantees that
deadlock detection in which wait-for graph is not constructed messages are error-free. The site-to-site communication is
SINHA AND NATARAJAN: DISTRIBUTED DEADLOCK DETECTION ALGORITHM 69

pipelined, i.e., the receiving site gets messages in the same synchronization scheme [2] which uses timestamps to sched-
order that the sending site has transmitted them. ule lock requests of transactions (and in turn, prevents dead-
Within a node, there are several processes and data items (or locks), here timestamps are used only to assign priorities to
objects). A process is an autonomous active entity that is transactions.
scheduled for execution. Every process has a system-wide For generating timestamps, we assume that every node has a
unique name, called process-id, and processes communicate logical clock (or counter) that is monotonically increasing, and
with each other through messages. To access one or more data the various clocks are loosely synchronized [111. A timestamp
items, which may be distributed over several nodes, a user generated by a node i is a pair (C, i) where C is the current
creates a transaction process at the local node. The transac- value of the local clock and i is the site-id of the node i.
tion process coordinates actions on all data items participating Greater than (>) and less than (<) relations for timestamps
in the transaction and preserves the consistency of the data- are defined as follows.
base. Henceforth, we use the term transaction to denote the Let t, = (Cl, il) and t2 = (C2, i2) be two timestamps. Then
corresponding transaction process. t1 >t2 iffCl >C2or(Cl =C2andil >i2);
Data items are passive entities that represent some indepen- t1 < t2 iff Cl <C2 or (Cl = C2 and il < i2).
dently accessible piece of information. Each data item is main-
tained by a data manager which has the exclusive right to oper- Each transaction is denoted by an ordered pair of the form
ate on a data item. If a transaction wants to operate on a data (p, t)where p is the process-id of the corresponding transaction
item, it must send a request to the data manager that manages process, and t is the timestamp of the transaction. The pro-
the data item. A data manager can maintain several data items cess-id is used for communication purposes.
simultaneously. However, to simplify the exposition, we shall If two transactions T1 and T2 are denoted by the pairs
assume that a data- manager maintains only one data item. (Pi, t1) and (P2, t2), respectively, we say that Tl > T2, i.e.,
In addition to data manipulation operations, a data manager priority of T1 is higher than that of T2, if t, < t2-
provides two primitives to control access to the data item that Further, we say that there is an antagonistic conflict at a data
it maintains: Lock(data_item) and Un_Lock(data_item). A item if the item is locked, and there is a requester of higher
transaction must lock a data item before accessing it, and it priority than the holder. In such a case, we also say that the
must unlock the data item when it no longer needs to access it. requesterfaces the antagonistic conflict.
A data item can be in one of two lock modes, null or free (N,
i.e., absence of a lock), and exclusive (X, i.e., presence of a III. DISTRUBUTED DEADLOCK DETECTION
lock). A data manager honors the lock request of a transaction AND RESOLUTION
if the data item is free; otherwise it keeps the lock request In this algorithm, a deadlock is detected by circulating a
pending in a queue, called request_Q. Atransactionwhichhas message, called probe, through the deadlock cycle. The occur-
locked the data item is called the holder of the data, whereas a rence of an antagonistic conflict at a data site triggers initia-
transaction which is waiting in the request_Q is called a re- tion of a probe. A probe is an ordered pair (initiator, junior),
quester of the data item. When a holder unlocks the data item, where initiator denotes the requester which faced the antago-
the data manager chooses a lock request from the request_Q, nistic conflict, triggering the deadlock detection computation,
and grants the lock to that requester. The scheduling scheme and initiating this probe. The element junior denotes the
followed by the data manager does not guarantee avoidance of transaction whose priority is the least among transactions
deadlocks [5], e.g., it may follow an arrival order scheduling through which the probe has traversed.
scheme. A data manager sends a probe only to the holder of its data
Transactions can be in one of two states: active or wait. If a while a transaction process sends a probe only to the data
transaction waits in a request_Q of a data manager, it is in manager from which it is waiting to receive the lock grant.
wait state, otherwise it is an active state. An active transaction Transaction processes (or data managers) never communicate
process may or may not be running on a processor. The state among themselves for purposes of deadlock detection.
of a transaction changes from active to wait when its lock re-
quest for a data item is queued by the data manager in its re- A. The Basic Deadlock Detection Algorithm
quest-Q. The state of the transaction changes from wait to
active when the data manager schedules its pending lock re- The basic deadlock detection algorithm has three steps.
quest. In either case, the manager informs the transaction of 1) A data manager initiates a probe in the following two
its change of state. We assume that a transaction acquires locks situations.
one after another (i.e., at any time it has only one outstanding a) When the data item is locked by a transaction, if a lock
lock request), and it follows the two-phase lock protocol [7]. request arrives from another transaction, and requester >
Each transaction is assigned a priority in such a way that pri- holder, the data manager initiates a probe and sends it to the
orities of all transactions are totally ordered. To assign prior- holder.
ities to transactions, we use the timestamp mechanism. When b) When a holder releases the data item, the data manager
a transaction is initiated, it is assigned a unique timestamp. schedules a waiting lock request. If there are more lock re-
Timestamps induce priorities in the following manner: a trans- quests still in the request_Q, then for each lock request for
action is of higher priority than another if the timestamp of which requester> new holder, the data manager initiates a
the former is less than that of the latter. Unlike the timestamp probe and sends it to the new holder.
70 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-11, NO. 1, JANUARY 1985

When a data manager initiates a probe it sets (victim, initiator). Since victim is aborted, it is necessary to
discard those probes (from probe-Qs of various transactions)
initiator: = requester; that have victim as their juinor or initiator. Hence, on receiv-
junior := holder; ing an abort-signal, the victim does the following.
We shall presently assume that a data manager sends a probe a) It initiates a message, clean(victim, initiator), sends it
as soon as the above situations occur. However, as we shall to the data manager where it is waiting, and enters the abort
elaborate in Section VII, in order to improve performance, a phase. Since initiator is the highest priority transaction of
data manager can wait for a while before sending a probe. the deadlock cycle, its probe_Q will never contain any probe
2) Each transaction maintains a queue, called probe_Q, generated by other members of the cycle. Consequently,
where it stores all probes received by it. The probe_Q of a probe_Qs of transactions, from initiator to victim in the direc-
transaction contains information about the transactions which tion of probe traversal, will not contain a probe having victim
wait for it, directly or transitively. Since we have assumed that either as junior or as initiator. And hence, the clean message
a transaction follows the two phase lock protocol, the informa- carries the identity of initiator beyond which it need not
tion contained in the probe_Q of a transaction remains valid traverse.
until it aborts or commits. b) In abort phase, the victim releases all locks it held,
After a transaction enters the second phase of the two phase withdraws its pending lock request, and aborts. During this
lock protocol, it can never get involved in a deadlock. Hence, phase, it discards any probe or clean message that it receives.
when it enters the second phase, it discards the probe-Q. Dur- 2) When a data manager receives clean(victim, initiator)
ing the second phase, any probe or clean message (discussed message, it propagates the message to its holder.
later in this section) received is ignored. 3) When a transaction T receives clean(victim, initiator)
A transaction sends a probe to the data manager, where it is message, it acts as follows.
waiting in the following two cases.
a) When a transaction T receives probe(initiator, junior), purge from the probe_Q every probe that has victim as its
it performs the following. junior or initiator;
if Tis in wait state
if junior > T then if T = initiator
then junior := T; then discard the clean message
save the probe in the probe-Q; else propagate the clean message to the data manager
if T is in wait state where it is waiting
then transmit a copy of the saved probe to the data manager else discard the clean message;
where it is waiting;
A transaction discards a clean message in the following two
b) When a transaction issues a lock request to a data man- situations: 1) the transaction is in active state or, 2) the trans-
ager and waits for the lock to be granted (i.e., it goes from ac- action is same as the initiator of the clean message received.
tive to wait state), it transmits a copy of each probe stored in After "cleaning" up its probe_Q as described above, each
its probe_Q to that data manager. member transaction of the deadlock cycle continues to retain
3) When a data manager receives probe(initiator, junior) the remaining probes in its probeQ. In the future, if the re-
from one of its requesters, it performs the following. maining members (or any subset of them) get involved in a
deadlock cycle, it will be detected with fewer number of mes-
if holder > initiator sages, since probes have already traversed some edges of the
then discard the probe cycle.
else if holder < initiator
then propagate the probe to the holder IV. THE COST OF DEADLOCK DETECTION
else declare deadlock and initiate deadlock resolution; To compare our algorithm to other deadlock detection and
When a deadlock is detected, the detecting data manager has resolution algorithms, we consider three factors which deter-
the identities of two members of the deadlock cycle, initiator mine the cost of any deadlock detection algorithm:
and junior, i.e., the highest and the lowest priority transac- 1) Communication Cost: the number of messages that must
tions, respectively. In order to guarantee progress, we choose be exchanged to detect a deadlock;
to abort junior, i.e., the lowest priority transaction (hereafter 2) Delay: the time needed to detect a deadlock once the
called victim). When victim restarts, its priority does not deadlock cycle is formed (presuming that every message ex-
change, i.e., it uses the same timestamp that was assigned to it change, whether it is an intersite communication or an intra-
when it was initiated. site communication, takes equal time); and
3) Storage Cost: the amount of storage needed by transac-
B. The Deadlock Resolution and Post-Resolution tions and data managers specifically for purposes of deadlock
Computation detection and resolution.
This consists of the following three steps. In our scheme, the communication and the delay costs of de-
1) To abort the victim, the data manager that detects the tecting a deadlock depends on the configuration of a deadlock
deadlock sends an abort signal to the victim. The identity cycle. The configuration indicates which transaction waits for
of the initiator is also sent along with the abort signal: abort which other transaction. We describe a configuration using a
SINHA AND NATARAJAN: DISTRIBUTED DEADLOCK DETECTIONI ALGORITHM 71

Ti T. TN TN-l
O@ b
TN-2 T5 T4
O- bi
T3
*j
T2
.bj
Ti
Ob,- ob O

Obj2
Fig. 1. An edge of a TWFG.

transaction wait-for graph (TWFG) [101 with the following


ObjN
convention.
In a TWFG, nodes and edges are associated with transactions Fig. 2. Deadlock cycle: best configuration.
and data items, respectively. The d;irection of an edge from
Tw TN-1 TN-2 T5 T4 T2 T3 T1
one transaction to another indicates that the former is waiting \ bjN-1 ObjN_2 Qbj4 Obj 3 Obi, Obj,
for the latter.
For example, Fig. 1 indicates a conflict where the data item
Obj, is locked by a transaction Ti and the transaction TJ is
waiting to acquire the lock. We shall call the data manager of
°~~~~~bj
Obij as D. If T > Ti, the conflict is antagonistic and the x N

data manager D, will initiate a deadlock detection computa- Fig. 3. Deadlock cycle: intermediate configuration.
tion by initiating probe(T1, T1), and sending it to the transac-
tion TI. T2 T3 T4 TN-3 TN-2 TN-1 TN Ti

A data item can have many requesters but only one holder, \Ni_lOi-1 Obj4 . ..Obj3 Obj2 Ob'l

and hence, in a TWFG, a node can have several incoming edges


but at most one outgoing edge.

A. The Communication Cost ObiN


Fig. 4. Deadlock cycle: worst configuration.
We analyze the communication cost of our algorithm by
considering three kinds of configurations of a deadlock cycle.
The order of priority among transactions is assumed as follows: gates the probe to DI. When the data manager D1 receives
Ti> Tjifi<j. probe(T2, T3), it discards it since initiator < holder (i.e., T2
'The Best Configuration: For our algorithm, the best dead- T1). Hence, the probe initiated at Obj2 site dies after two
lock configuration, i.e., the configuration for which the dead- steps.
lock is detected with minimum number of messages, is the one As in the previous configuration, in this configuration as well,
in which only one edge of the cycle causes an antagonistic the deadlock will be detected only when the probe initiated by
conflict. the data manager, DN traverses through the entire cycle, and
For example, consider the configuration illustrated in Fig. 2. eventually reaches D1 after 2 * (N - 1) steps. Hence, in this
Except at the site of ObhN where T1 waits for TN and T1 > case, the total number of messages generated is 2 * (N - 1) + 2.
TN, there is no antagonistic conflict at any other site. The The Worst Configuration: By induction, we can infer that
data manager DN initiates probe(T1, TN) and sends it to the the worst deadlock configuration, i.e., which will generate the
transaction TN. On receiving the probe, TN stores it in its maximum number of messages before the deadlock is detected,
probe_Q, -and propagates it to DN-1. In two steps, a probe is the one in which each edge of the cycle except one causes an
travels from one data manager to the next data manager of the antagonistic conflict.
TWFG. For example, consider Fig. 4, in which there are (N - 1)
On receiving probe(Tl, TN), the data manager DN-1 com- antagonistic conflicts. All data managers, except D1, initiate
pares its holder TN-1 to the initiator T1 of the probe. Since
a probe. All probes traverse up to the data manager D1 and
T1 > TN- 1, it propagates probe(T1, TN) to its holder, i.e., terminate, except the probe initiated by DN which leads to
TN.-. The transaction TN-I, in turn, stores probe(Ti, TN) the detection of a deadlock. Hence, the total number of mes-
in its probe_Q, and propagates it to the data manager DN-2, sages generated will be
and so on. 2 *(N- 1)+2 *(N- 2)+2 *(N- 3)+* +2
When the data manager D, finally receives probe(T1, TN)
from the requester T2, it finds that its holder is same as the =N*(N- 1).
initiator of the probe, and hence, it detects the deadlock. In In general, for a deadlock cycle of length N there are (N - 1)!
this case, the total number of messages generated is 2 * (N - 1). possible deadlock configurations. For a specific deadlock con-
An Intermediate Configuration: Consider the deadlock con- figuration, the total number of messages generated will be
figuration of Fig. 3. In comparison to the previous configura-
tion, the positions of T2 and T3 are swapped at Obj2 site. 2 *(N - 1) + CN_2 * 2 * (N - 2) + CN_3 * 2 * (N - 3)
Thus, apart from the data item ObiN, the cycle has one more +*'*+C2 *2.
antagonistic conflict at data item Obj2. Similar to DN, the
data manager D2 also initiates probe(T2, T3), and sends it to where CI is 1, if an antagonistic conflict exists' at data item
the transaction T3. T3 stores it in its probe_Q, and since it Obj1, and 0 otherwise. For the above expression, the maxi-
has an outstanding lock request for data item Objl, it propa- mum and minimum values are N * (N - 1) and 2 * (N - 1),
72 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-ll, NO. 1, JANUARY 1985

respectively. For N= 2, the maximum and the minimum are rently in the system is N, then the length of a probe_Q can
identical, namely 2. grow at most up to (N - 1).
B. The Delay D. Costwise Comparison to Other Algorithms
The delay is defined to be the time taken to detect the dead- In comparison to the algorithm of Chandy and Mishra [3],
lock after the deadlock cycle is formed. Note that irrespective our algorithm has less communication cost since it initiates a
of the configuration of a deadlock cycle of length N (best, deadlock computation only upon the occurrence of antagonistic
worst, or any intermediate), the maximum amount of delay is conflicts, but not otherwise. Furthermore, the resolution of
the time taken to exchange 2 * (N - 1) messages. The delay is deadlock does not involve any extra cost.
maximum if the highest priority transaction of the cycle is the Unlike Moss' algorithm [13], we have separated the cost of
last transaction to enter the wait state, closing the deadlock reliable network communication from that of deadlock detec-
cycle. If a transaction other than the highest priority transac- tion. Incorporation of this distinction in our algorithm enables
tion is the last to enter the wait state, the delay is less. This us to compute exact communication and delay costs of dead-
is because the probe initiated by the highest priority transac- lock detection, for a given configuration.
tion would have traversed part of the cycle before the cycle is In the distributed database model considered by Obermarck
formed. [15], transactions migrate from one data site to another, and
Suppose, in the configuration shown in Fig. 2 (prior to the for- there is a deadlock detector at each site which builds a transac-
mation of a deadlock cycle), all edges except the edge TJ+ 1-TJ tion wait-for graph for that site (by extracting information
(where I < J <N - 1) are formed, i.e., TJ+ I is still active. from lock tables, and other resource allocation tables and
When TJ+ 1 requests for a lock on data item Objj held by TJ, queues). In computing the communication cost to detect a
it enters the wait state closing the deadlock cycle. deadlock cycle (which is N * (N - 1)/2 exchange of messages,
Case 1: If probe(T1, TN), initiated due to the antagonistic in worst case, among deadlock detectors), he does not include
conflict T1 TN, has reached the transaction TJ+1 before it the expenses of transaction migration and construction of a
entered the wait state, the delay to detect the deadlock will be TWFG by deadlock detectors in terms of messages. In con-
equal to the time taken to exchange (2 * J - 1) messages. trast, in our model; the transmission of information from a
Case 2: If probe(TI, TN) is yet to reach the transaction TN, transaction to a data manager and from a data manager to a
i.e., transactions T, and TJ+ 1 entered the wait state in a quick transaction cost one message each. If the above two expenses
succession (closing the deadlock cycle), and the time gap was are also included in terms of messages, the communication cost
too small compared to the time taken to exchange one mes- for his algorithm will become equal to that of ours.
sage. In this case, the delay to detect the deadlock will be
equal to the time taken to exchange 2 * (N - 1) messages. V. EXTENSIONS TO THE DEADLOCK
Hence, if a deadlock cycle is closed by transaction Tj+ 1, then DETECTION ALGORITHM
the time taken to detect the deadlock will be any where be- In this section, we extend the algorithm to take care of two
tween (2 * J - 1) to 2 * (N - 1), for J = I (N- 1). refinements:
For the configuration given in Fig. 2, the delay will be mini- 1) availability of share lock (S_lock) mode as well, and
mum (i.e., the time taken to exchange one message) if 1) the 2) allowing a transaction to acquire locks on more than one
cycle is closed by transaction T2 by waiting for Tl, the highest data item simultaneously, either in share mode or in exclusive
priority transaction of the cycle, and 2) the probe initiated mode.
due to the .antagonistic conflict T1_ TN must have reached T2
before the latter entered the wait phase. A. Share and Exclusive Locks
From this result we can generalize that for any configuration The Distributed Database Model with Share and Exclusive
the minimum time taken to detect a deadlock is the time taken Locks: We extend the basic model, discussed in Section II, by
for exchange of one message, and this can happen only when distinguishing a share lock (S_lock) request from an exclusive
1) the cycle is closed by a transaction waiting for the highestlock (X_lock) request. Correspondingly, a locked data item
priority transaction of the configuration, and 2) the probe can be either in S_mode or in X_mode. The desired lock mode
initiated by the highest priority transaction had reached the is specified as a parameter of the lock request primitive: Lock-
cycle-closing transaction before the latter entered the wait (data-item, mode). In order to distinguish between the two
phase. kinds of lock requests, a data manager splits its request_Q into
Srequest_ Q and Xrequest_ Q, for storing pending S_lock and
C. The Storage Cost X_lock requests, respectively.
In this algorithm, each transaction requires storage space to If a data item is free, a transaction can lock it in any mode.
maintain its probe_Q, and a probe_Q exists until the transac- When a transaction has locked a data item in X_mode, and be-
tion enters the second phase of the two phase lock protocol. come the X_holder, no other transaction can lock the data item
The size of a probe_Q depends upon the number higher of in any mode. A transaction can lock a data item in S_mode,
priority transactions which wait for it directly transitively.
or and become an S_holder even if the item is already locked in
A probe_Q shrinks only when the transaction receives a clean S_mode. Thus, a data item in S_mode can have several S_
message, but not otherwise. holders whereas it can have only one X_holder. When the
If maximum number of transactions that can
the run concur- X_holder releases the lock, if the data manager decides to
SINHA AND NATARAJAN: DISTRIBUTED DEADLOCK DETECTION ALGORITHM 73

T2 T4 T, lustrated in Fig. 5(b), this request forms the deadlock cycle


Obj4 Obj, T3 T2 T4 T3 which has only one antagonistic conflict, i.e.,
(a) T2_T4. But the probe(T2, T4) initiated due to this conflict
was discarded by DI, before T3 required S_lock on Obj1.
T2 T4 Ti And hence, this deadlock will remain undetected.
Obj4 Obj, To handle such cases, a data manager, when it grants S-lock
Obj, to an additional S_holder T, needs to propagate to Ts copies
OT3
of the probes received (may be only some of them), prior to
Obj2 granting the S_lock to Ts. However, in the basic scheme, a
(b) data manager does not preserve the probes it receives.
Fig. 5. (a) A TWFG where a probe gets discarded. (b) A deadlock There are two possible solutions to this problem.
caused by incremental share lock remains undetected by the basic 1) When a data manager schedules an additional S-holder
algorithm. TS, it asks all X_requesters queued in its Xrequest_Q to re-
transmit their probe_Q elements so that relevant probes can
honor S-lock requests, we assume that all S-lock requests be propagated to Ts.
queued in Srequest_Q are scheduled simultaneously. 2) Alternatively, a data manager keeps all probes received in
We note that in this scheduling policy, it is possible that an its own probe_Q, and later, when it schedules an additional
X_requester may starve. Hence, this policy is unfair. We shall S_holder Ts, it checks, for each probe in its probe_Q, whether
discuss this issue later in Section VII. the initiator of the probe is of greater priority than that of Ts,
Since S-holders of a data item can be many, an X_requester and if so, propagates that probe to Ts.
may now wait for more than one transaction simultaneously, The former scheme adds complexity since a data manager
i.e., in a TWFG, a node can have several incoming as well as must keep track of its requests for probes retransmission and
outgoing edges. distinguish an original probe from a retransmitted duplicate
Deadlock Detection and Resolution: With the availability of probe. Further, the communication cost for a given configura-
S_locks, it is now possible that S-holders of a data item may tion cannot be specified exactly. The latter scheme necessitates
increase incrementally. Consequently, it is possible that antag- storage space within each data manager, but the algorithm re-
onistic conflicts for data items may occur incrementally. To mains simple, and the communication cost of a deadlock con-
take this into account, a data manager has to initiate a probe figuration can be specified exactly. Hence, we use the latter
in one more situation apart from those discussed in the basic scheme and modify the basic algorithm as follows.
algorithm-[refer to Section Ill-A, step 1)]. 1) When a data manager receives probe(initiator, junior)
When a data manager grants an additional S-holder Ts, it from one of its requesters, it performs the following.
performs the following. if the data item is in S_mode
if Xrequest_Q is not empty then save the probe in the probe_Q;
then for each X_requester, T, for each holder
if Tx > Ts do
then initiate pro be(Tx, Ts) and send it to Ts; if holder = initiator
then declare deadlock and initiate deadlock resolution
However, this modification alone is not enough since it does else if holder < initiator
not take into account transactions that wait transitively (now) then propagate a copy of the probe to the holder;
for the additional S-holder. We shall elaborate this through
an example. 2) When a data manager grants an additional S_holder Ts
Consider the scenario shown in Fig. 5(a) where Ti > T, for it performs the following.
all i < j. The data item Obil is share locked by Tl, and the if Xrequest_Q is not empty
data items Obj4 and Obj2 are exclusively locked by T4 and T22, then for each X_requester, Tx
respectively. Transactions T4 and T2 wait for exclusive locks do
to be granted on data items Objl and Obj4, respectively. if TX > Ts
Unlike the data manager Dl, D4 finds the antagonistic con- then initiate probe(Tx, T), and send it to T3;
flict T2-T4, initiates probe(T2, T4), and sends it to its holder if the probe_Q is not empty
T4. T4 saves the probe in its probe_Q, and propagates it to then for each probe, P, in its probe_Q
D -where it is an X_requester. On receiving probe(T2, T4), do
D1 discards it since its holder T1 is of higher priority than T2, if Ts <P. initiator
the initiator of the probe. then propagate a copy of P to Ts;
Some time later, another transaction T3 requests an S-lock 3) When a data manager exits from S_mode, it discards its
on the data item Obilj. Since Objl is in S-mode, D1 grants probe_Q.
the S_lock request of T3 immediately. T3 is the additional Post-Resolution Computation: The provision of S_mode re-
S-holder of Objl, and now T4 waits for T3 as well. Since quires only a minor modification in the deadlock resolution
T3 > T4, D1 does not initiate any probe. Later, T3 requests and post-resolution computation. Step 2) of Section Ill-B is
an X_lock for data item Obj2 (held by T2), and waits. As il- modified as follows.
74 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-il, NO. 1, JANUARY 1985

When a data manager receives a clean message Ob 3

if the data item is in X_mode Qb3


then propagate the clean message to the X_holder
else for each S_holder, TS T4 I T3 T2 T,
do propagate a copy of the clean message to T,; Obj3 Obj Obj I
if probe_Q is not empty
then purge every probe that has victim as junior or initiator; Obj2
Storage Cost: This extended algorithm requires extra storage (a)
within each data manager for maintaining its own probe_Q.
The probe_Q within a data manager exists till the data item is
in S_mode. As soon as the data item becomes free or it enters
X_mode the probe_Q is discarded.
Delay and Communication Cost: In the original database
model, if a transaction enters into wait state, it can close at
most one deadlock cycle (in a TWFG, a node can have at most
one outgoing edge). But, in a TWFG for the extended model,
a node can ha've several incoming and outgoing edges, and for-
mation of an edge may close simultaneously'many' cycles.
Givdh a TWFG of n nodes that' is acyclic, the maximum num-
ber of cycles (say M)' which can be closed simultaneously by Obj4
the formation of a single edge is expressed by the following (b)
equation. Fig. 6. (a) A TWFG with multiple outgoing edges. (b) An X-lock re-
quest by T1 simultaneously closes seven cycles.
M=niC1 +n-iC2 + ** +nlCn_l
where n'C1 is the number of cyclesoflength2, n 1C2 is the However, this latter scheme may result in cyclic restart for the
number of cycles of length 3, etc. transaction TI.
Depending upon the type of configuration for each cycle, we In case of multiple cycles, early abortion of one transaction
can calculate the delay and the communication cost based on may resolve many cycles simultaneously. For example, if T2
the formula given in Section IV-A. gets aborted on detection of T72-T1 -T2 cycle, cycles T3-T2-
For example, consider the TWFG given in Fig. 6(a). All data T1-T3, T4-T2-T1-T4, and T4-T3-T2-T1--T4 will also get re-
items are locked by S_lock requests and all edges are due to solved simultaneously. This may result in discarding many
waiting X_lock requests. Ti > Ty for all i < j. Objl is share probes and clean messages. Hence, in this case, we can com-
locked by T1; Obj2 is share locked by T' and T2 ; Obj3 by T1, pute only the limits (best and worst) of delay and communica-
T2, and T3; and Obj4 by T2, T3, and T4. The, Xlock re- tion cost for a specific configuration. The exact cost will de-
quests of T2, T3 and T4 wait for data items Obj1, Obj2 and pend upon many other factors such as scheduling policy of
Obj3, respectively. Until now, there is no deadlock cycle in data managers, characteristics of communication substrate, etc.
the TWFG.
When Ti issues an X_lock request for the data item Obj4 B. Simultaneous Acquisition ofMultiple Locks
and waits, as illustrated in Fig. 6(b), it simultaneously closes Let us now consider the refinement which allows a transac-
seven cycles. (The number can'be derived from the above tion to issue more than one lock request simultaneously. If its
equation.) Three cycles of length 2 (viz. T2 -T1 -'T2' T3 -T1 -T3, requests are not granted immediately, a transaction simulta-
and T4-T1 -T4); three cycles of length 3 (viz. T3-T2-T1 -T3, neously waits for a number of transactions (in a TWFG, a node
T4-T2-T1-T4, and T4-T3-Tl-T4); and one cycle of length 4 will have' several outgoing edges).
(viz. T4-T3-Tj-T1 -IT4) are formed& Though there are seven The modification needed for step 2) of the basic algorithm
cycles in the TWFG, there exist only three antagonistic con- of Section Ill-A is as follows.
flicts, T1 T2, T1_T3, T1_T4. Hence, only three probes will When a transaction issues more than one lock request
originate. simultaneously, if all lock requests are not granted im-
Since T1 is the highest priority transaction of every cycle, all
probes will have T, as their initiator, and all deadlock cycles mediately (i.e., it waits for multiple locks), it sends a
will be independently detected by various data managers for copy of each probe stored in its probe_Q to all data
managers for which it is a requester.
which TI is an S_holder. Since the algorithm chooses the
lowest priority transaction as the victim, all transactions except Now, in the TWFG, a transaction can be the tail of multiple
T, will be junior in at least one of the three probes, and hence, edges. The nature of this wait-for graph is the same as that
in worst case, all transactions except T1 may get aborted. On caused by multiple S_holders. And hence, its characteristics
the contrary, if the initiator is chosen to be the victim, then all will also be the same.
cycles can be broken -simultaneously by aborting only T1. From the above argument, we can deduce that, in a model
SINHA AND NATARAJAN: DISTRIBUTED DEADLOCK DETECTION ALGORITHM 75

that provides share as well as exclusive lock requests, and also A

allows a transaction to issue more than one lock request simul- /\


taneously, the characteristics of the graph as well as the com- /\
plexity of deadlock detection will be similar to the one de- B ./
scribed in the previous subsection.
//
// \
VI. HANDLING NESTED TRANSACTIONS F ./
We shall now discuss the applicability of our algorithm to de- Fig. 7. A transaction tree.
tect deadlocks that may occur in an environment where a trans-
action can be nested within another transaction. The concept
of a nested transaction permits a transaction to decompose its The Visibility Rule: When a transaction A commits, the ef-
task into several subtasks and initiate a new transaction (called fects of the transaction tree rooted at A are visible to a trans-
nested transaction or subtransaction) to perform each of the action X that is external to the transaction tree only if either
subtasks. A nested transaction, in turn, may initiate its own 1) the root transaction has no parent, or
set of nested transactions, thus giving rise to a hierarchy (or 2) the parent of the root is either X or an ancestor of X.
tree) of transactions. Since nesting of transactions follows a As an example, consider the transaction tree illustrated in
tree structure, we use the terms root, leaf, parent, child, ances- Fig. 7. The effects of D will be invisible to F, when D com-
tor, and descendant with the usual connotations. Using nested mits. Only when C commits the cumulative effects of C, D,
transactions, it is possible to achieve higher concurrency, and and E become visible to F. When C aborts, the action tree
higher degree of resilience against failures [6]. rooted at C has no effect, even if D and E have committed
earlier.
A. A Model for Nested Transactions In order to implement the above visibility rule through a
During its execution, a transaction can create a set of nested locking scheme, we introduce the notion of a retainer of a
transactions, which will be its children, simultaneously. After data item, through the following set of rules [13], [141.
creating its children, a parent transaction cannot resume execu- 1) When an S_holder (X_holder) of a data item commits,
tion until all its children commit or abort. However, a (parent) it releases the lock it held, and the parent of the holder, if any,
transaction may abort at any time, either explicitly because a becomes an S_retainer (X-retainer) of the data item, unless
child aborted, or implicitly because an ancestor aborted. A it is already an S_retainer (X-retainer) of that item.
transaction, whether nested or not, always has the properties 2) When an S_holder or the X-holder of a data item aborts,
of failure atomicity and concurrency transparency. However, it releases the lock it held, and no new retainer is introduced.
a nested transaction has an additional property: even if a nested 3) When an S_retainer (X-retainer) of a data item com-
transaction commits, this commitment is only conditional and mits, the parent of that retainer, if any, becomes an S_retainer
the commitment of its effects, i.e., installation of the new (X-retainer) of that item, if it is not already one.
states of the objects modified by it, is dependent on whether 4) When an S_retainer (X-retainer) commits or aborts, it
its parent transaction commits or not. This commit depen- ceases to be an S-retainer (X_retainer) for that data item.
dency follows from the property of atomicity. We allow arbi- As an example, consider the transaction tree of Fig. 7. When
trary nesting of transactions, and hence the commit depen- E commits, C becomes an S_retainer (X-retainer) for all data
dency is transitive. items for which E was an S-holder (X-holder). When C com-
Consider the transaction tree shown in Fig. 7. If A, B, and F mits, A becomes an S_retainer (X-retainer) for all data
are three transactions such that A created B which then created items for which C was an S-holder (X-holder) or S_retainer
F, the effects of F must be committed only when both B and (X-retainer).
A commit. It should be noted that the commit dependency Note that there can be several S_retainers and X-retainers
relation is asymmetric: only children are dependent on their for a data item simultaneously. Even though there can be only
parents and not vice versa. Thus, a transaction may commit one X-holder for a data item at any time, multiple X_re-
even if some (or all) of its children are aborted. tainers arise because a transaction tree grows and shrinks dy-
Once all its children commit or abort, a parent transaction namically when nested transactions are created, committed, or
can resume execution, and it can create a new set of children. aborted. Because of this, it is also possible that a transaction
A transaction is in wait state if either is a retainer as well as a holder of a data item simultaneously.-
1) it is waiting for locks to be granted on some data items, or With the introduction of retainers, we can now restate the
2) it is waiting for its children to commit or abort. rules for granting locks as follows.
Note that a transaction never runs concurrently with its 1) If a transaction T requests an S_lock on a data item, it
children. can be granted if there is no X-holder for the item, and either
The commit dependency described above necessitates new a) there is no X-retainer for the item, or
locking rules. This is required because it is not the case that b) each X-retainer is either T, or an ancestor of T.
when a transaction commits, its effects become visible to any Presence of an S_holder or an S_retainer does not forbid the
transaction. The visibility of effects of a transaction is governed grant of an S_lock.
by the following rule [14]. 2) If a transaction T requests an X-lock on a data item, it
76 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-1l, NO. 1, JANUARY 1985

can be granted if there is no S_holder or X_holder for the Deadlock Detection: We now extend the deadlock detection
item, and either algorithm described in Section V-A, to take into account
a) there is no S-retainer or X_retainer for the item, or nested transactions.
b) each S-retainer (X_retainer) is either T, or an ances- The probe_Q of a data manager is split into S_probe_Q
tor of T. and Xprobe_Q: the former stores the probes received from
For example, suppose in the transaction tree of Fig. 7, F re- S_requesters, and the latter stores the probes received from
quests an S_lock for a data item for which E is an X_holder. X_requesters. A transaction has only one probe_Q.
The S_lock can be granted to F only when either there is no 1) If a data manager cannot grant a lock requested by a
X_retainer or X_holder for the item, or A becomes the only transaction, it acts as follows.
X_retainer for the item, i.e., when C, D, and E commit or
abort. if the lock request of a transaction, T, cannot be honored
When an S_holder releases the lock and if it introduces an then begin
S_retainer to the data item, it may result in simultaneous for each X_retainer and the X_holder (f any), Tx,
scheduling of a descendant X_requester (if any). Similarly, do
when an X_holder releases the lock and if it introduces an if Tx < T
X_retainer to the data item, it may result in simultaneous then initiate probe(T, Tx) and send it to Tx;
scheduling of a descendant X_requester (if any), or one or if X_lock requested
more descendant S_requesters (if any). then for each S_retainer and each S_holder, Ts,
do
B. Nested Transactions and Deadlock Detection if Ts< T
and Resolution then initiate probe(T, T.) and send it to Ts
We shall now discuss the scheme for detecting deadlocks that end;
can arise in the nested transaction model described above. The
Note that in no case will a transaction send a probe to its an-
basic detection algorithm needs to be modified, in order to cestor since an ancestor always has higher priority.
take into account the fact that a transaction now waits for its
descendants to commit/abort. As in the basic algorithm, we 2) When a transaction begins to wait for a data item, or for
its children to commit/abort, it transmits each probe in its
shall use priorities for transactions in order to determine when
to initiate a deadlock computation, as well as for deadlock probe_Q to the data manager, or to its children.
resolution. Timestamps induce priorities among transactions 3) When a transaction T receives a probe P, it performs the
as described earlier. However, the scheme for assigning time- following.
stamps needs to be modified to take into account nested if P. junior > T then P. unior T;
transactions. save P in the probe_Q;
When a nonnested transaction (i.e., the root of a tree) is if T is waiting for its children to commit/abort
created, a (C, i) pair is generated as described in Section II, and then transmit a copy of the saved probe to each child
this pair is assigned as the timestamp of the transaction. When else if T is waiting for a data item
a nested transaction is created, a (C, i) pair is generated, and a then transmit a copy of the saved probe to the data
timestamp is generated for the transaction by concatenating manager;
this (C, i) pair with the timestamp of the parent transaction.
Thus, the timestamp of a nested transaction is a sequence of 4) When a data manager receives a probe P from a transac-
(C, i) pairs, the length of the sequence being determined by tion T it acts as follows.
the depth of nesting. Based on the ordering on (C, i) pairs de-
scribed in Section II, timestamps of transactions are totally if T if waiting for an S_lock
ordered in the following way. then save the probe in S_probe_Q
else save the probe in X_probe_Q;
Given two timestamps, X and Y of the form X1X2 * *Xm
and Y1 Y2 * Yn respectively, where each Xi or Xi is a (C, i) if P. initiator is either a retainer or the holder,
9
or
pair, their relations are defined as follows. P. initiator is a descendant of a retainer or of the holder
X is greater than Y, then declare deadlock and initiate deadlock resolution
if either else begin
1) m > n, and for each X_retainer and the X_holder (if any), Tx,
for all i, 1 <i<n, Xi Yi, do
or
begin
if P. initiator > T,
2) for some i, <i<min(m, n), then propagate the probe P to Tx
end;
Xi = Y1, X2 = Y2,, Xi-1 = Yi1, and Xi > Yi. if T is waiting for an X_lock
Note that in this order, the priority of a transaction is higher then for each S_retainer and each S_holder (if any), T,
than that of its descendants. do
SINHA AND NATARAJAN: DISTRIBUTED DEADLOCK DETECTION ALGORITHM 77

if P. initiator > T, T1TT2


then propagate the probe P to T, \ Obj O

end;
5) When a new retainer or holder is introduced for a data
item, the data manager acts as follows. (Note that when a new (retained)
retainer is introduced, the data manager may have simultane- Fig. 8. A deadlock cycle with nested transactions.
ously scheduled a descendant X_requester, or one or more
descendant S_requesters, i.e., the introduction of a new re- An Illustrative Example: Let us illustrate the working of
tainer may result in simultaneous introduction of new holder(s) this extended algorithm for detecting deadlocks, through an
as well.) example.
if an S_holder or an S_retainer ls, is introduced then Consider the scenario shown in Fig. 8. A transaction T, re-
begin quests an X_lock for the data item Obj1. The lock cannot be
for each requestor, T, in X_request_Q granted since another transaction T2 is an X_holder for Objl .
do T2 has created a child T21 and is waiting for T21 to commit.
if T> Ts T21 is waiting for an S_lock on another data item Obj2, which
then initiate probe(T, Ts) and send it to Ts; has T1 as an X_retainer. (T1 had created earlier a child Tll
for each probe, P, in X_probe_Q which held the item Obj2 in X_mode, and it has committed.)
do In the above situation, a deadlock T1_T2_T21_T1 occurs
if P. initiator > Ts when T, begins to wait for Obj1. Let us illustrate how this
then send a copy of P to Ts deadlock is detected. We consider two possible cases.
end Case 1: T1 > T2. By definition, it follows that T1 > T21.
else % an X_holder or an X_retainer, Tx, is introduced When the data manager of Objl, DI, receives the lock re-
begin quest from TI, it originates probe(T1, T2) and sends it to T2.
for each requester, T, in S_request_Q or X_request_Q When T2 receives this probe, it saves the probe in its probe_Q
do and propagates it to its child T21.
if T> Tx When T21 receives probe(TI, T2), it modifies it to probe(T1,
then initiate probe(T, Tx) and send it to Tx; T21 ), saves it in its probe_Q, and propagates it to D2, the data
for each probe P, in S_probe_Q or X_probe_Q manager of Obj2.
do When D2 receives probe(T1, T21 ), it detects a deadlock since
if P.initiator > Tx the initiator of the probe Tl is an X_retainer for the item.
then send a copy of P to Tx The deadlock is resolved by aborting T21 .
end; Case 2: T2 > T1 . By definition, it follows that T21 > T1 .
Before T, issues its X_lock request for the data item Obj1,
In this extended algorithm, it is possible that a transaction its probe-Q contains probe(T21, Ti ). This is due to the fact
may receive more than one probe with the same value for ini- that when D2 cannot grant the Silock to T21, it initiates
tiator. This may arise because the transaction as well as some probe(T21, T1) and sends it to TI. Upon receiving this probe,
of its ancestors may be retainers or holders for a data item si- T, saves it in its probe-Q.
multaneously. In such cases, the transaction needs to process When T1 waits for an X_lock on Obj1, it propagates probe-
only the probe that it receives first, and it may discard others. (T21, T1) contained in its probe_Q to D1.
In Section VII, we discuss this issue again. Upon receiving probe(T21, TI), D1 detects a deadlock since
Deadlock Resolution and Post-Resolution Computation: As the initiator of the probe T21 is a descendant of T2 which is the
in the basic algorithm, we abort only the lowest priority trans- X_holder of Objl. The deadlock is resolved by aborting TI.
action to resolve the deadlock. However, the scheme for han-
dling clean messages requires some modifications as given below. C. Comparison to Related Work
1) When a transaction receives a clean message, it acts as
follows. Moss [131 has also proposed an edge-chasing algorithm for
detecting deadlocks taking into account nested transactions.
if T is in wait state As described earlier, a major difference between his algorithm
then if T = initiator and ours is that in Moss' scheme, probes are not stored within
then discard the clean message transactions and data managers, and his scheme relies on peri-
else if T is waiting for its children odic retransmission of probes to ensure eventual detection of
then propagate a copy of the clean message to every deadlocks. Apart from this, in Moss' scheme, a data manager
child sends a probe not to the holders of the item, but always to the
else propagate the clean message to the data "potential" retainers. Because of this, his algorithm is prone
manager where it is waiting. to detect phantom or false deadlocks.
2) When a data manager receives a clean message, it updates For example, consider the scenario shown in Fig. 9. There
its S_probe_Q and X_probe_Q, and propagates the message are two transactions T, and T2 , where T, > T2. T2 has created
to all holders and retainers. two children T21 and T22. T1 waits for an X_lock on an item
78 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-1 1, NO. 1, JANUARY 1985

C. Other Mechanisms for AssigningPriorities


/ \
/ \ In our algorithm, we have used timestamps for assigning pri-
/\TT2
orities. However, our scheme is applicable even if some other
Objl ~. (active) @122 mechanism is used for assigning priorities. The only require-
ment is that the mechanism must induce a total order on trans-
actions. For example, the number of resources held by a trans-
Obj 2
action can be used to assign a priority for it. To guarantee
Fig. 9. Moss' scheme: phantom deadlock. uniqueness, we may append the timestamp of the transaction
to the number of resources held. Notice that in this scheme,
Objl, which has T21 as the X_holder. T22 is waiting for an the priority of an active transaction changes dynamically as it
X_lock on another item Obj2, which has T1 as the X_holder. acquires resources, but if a transaction is in wait state its pri-
T21 is active. ority does not change. Because of this, the nature of a conflict
Given this situation, a deadlock occurs only when T21 com- (antagonistic or otherwise) does not change dynamically, and
mits. If T21 aborts of its own accord, say due to some applica- hence, our algorithm is applicable to this dynamic priority
tion considerations, no deadlock results. However, in Moss' scheme as well.
scheme, when T1 's request arrives, D1 sends a probe to T2 even
if T21 is active. T2 propagates this probe to T21 and T22. T21 D. Avoidance of Phantom Deadlocks
ignores this probe since it is active. But, T22 propagates this to In our algorithm, if a waiting transaction which is a compo-
D2 which detects a deadlock. Meanwhile, if T21 aborts, the nent of a deadlock cycle aborts (either due to site crash, or
deadlock detected is a false deadlock. In our scheme, no such abort of the parent or a child, or on user request) after the de-
false deadlock will be detected since D1 sends a probe to T2 tecting probe has traversed through it, we may find a phantom
only when it becomes an X_retainer (i.e., when T21 commits). deadlock. Since a situation of this kind is unpredictable, our
In general, however, our scheme may also detect phantom algorithm comes about as close as possible in avoiding detec-
deadlocks, but such deadlocks become false only if waiting tion of phantom deadlocks. The possibility of phantom dead-
transactions abort, explicitly (on user's request) or implicitly lock can be reduced even further if the victim transaction does
(due to site crash), after the cycle detecting probe has traversed not abort itself until the clean message, initiated by it, comes
through it, but not otherwise. to it after circulating through the entire deadlock cycle. This
requires a clean message to traverse beyond initiator (note that
VII. DISCUSSION in the algorithm described in Section III, the clean message
A. Delaying the Initiation of a Probe does not go beyond initiator).
Currently in our algorithm, a data manager initiates a probe E. Discarding Duplicate Probes
as soon as it finds an antagonistic conflict at its site. But an In our basic algorithm, there is a possibility that some probes
antagonistic conflict is a potential deadlock situation only if may circulate through a deadlock cycle more than once. Sup-
the holder transaction is in wait state, but not otherwise. pose, for example, a transaction which is not part of a dead-
Hence, the initiation and the propagation of the probe can be lock cycle, but waits for (perhaps transitively) a member trans-
delayed until the holder enters the wait state. We suggest that action of a cycle, inserts a probe in the deadlock cycle. If the
a data manager, upon the occurrence of an antagonistic con- outside transaction is of lower priority than the highest pri-
flict, should wait for a specific time period and then only ini- ority transaction of the cycle, the inserted probe ceases to
tiate the probe and send it to the holder. Similarly, the propa- propagate at some point in the cycle. On the other hand, if
gation of probes received by a data manager can be delayed. the outside transaction is of higher priority than the highest
priority transaction of the cycle, the inserted probe propagates
B. Dynamic Assignment of Priorities through the entire cycle, and keeps circulating until the dead-
Another orthogonal technique that can be incorporated to lock cycle is broken. (Note that a probe never propagates
improve performance is to assign priority for a transaction only through the entire cycle if its initiator is a member of the
on demand basis and not a priori. As long as a transaction cycle.)
does not get into conflict with a transaction in wait state, it For example, consider the configuration (an extension of the
need not be assigned a priority. Whenever a conflict arises configuration given in Fig. 2) shown in Fig. 10. Here the trans-
with a waiting transaction, transactions must be assigned pri- action T2 has acquired X_locks on data items Obj2 and ObIx
orities, if possible, in such a way that conflict is nonantago- before it entered the wait state. A transaction TX, which is
nistic. Otherwise, an antagonistic conflict has occurred and a not a member of the deadlock cycle (called an external trans-
probe is initiated. Now, a transaction for which a priority has action), requests a lock for ObIx and waits. For simplicity, we
not been assigned never causes an antagonistic conflict. Thus, assume that Tx enters the wait state after the deadlock cycle
by employing a scheme for dynamic assignment of priorities T1-TN-Ti is formed.
[11, occurrence of antagonistic conflicts, and consequently, If TXs> T2 (but not otherwise), the data manager Dx will ini-
initiation of probes can be reduced still further. tiate probe(Tx, T2 ) and send it to holder T2. Now, a probe ini-
SINHA AND NATARAJAN: DISTRIBUTED DEADLOCK DETECTION ALGORITHM 79

Tx, priority is less than that of Tx. Otherwise, Tx waits for Tr,
which waits for its descendant to commit, and the latter waits
for Tx, resulting in a deadlock. Hence, in the case of nested
TN TN-1 TN-2 T5 T4 T3 T2 Ti
transactions, the above fair scheduling policy can be enforced
\ bjN-1 ObjN-2 Obj4 Obj3 °bi2 Objl /
only when no ancestor of the requesting transaction is a re-
tainer (S_retainer or X_retainer) of the data item. Thus, in
this case, an X_requester may encounter antagonistic conflicts
incrementally.
ObI N
Fig. 10. Propagation of an external probe in a deadlock cycle.
G. Computation of Cycle Length
Since we use an edge-chasing algorithm, it is quite simple to
tiated by an external transaction (called an external probe) en- compute the length of a deadlock cycle. For this purpose, a
ters the deadlock cycle. T2 will save the probe in its probe_Q, probe should have an additional parameter, say length (1),
and since it is waiting for Obj1, will propagate probe(Tx, T2) which is set to one to start with. When a transaction receives
toD1. a probe P, it increments P.1 by one before saving it in its
If T1 > Tx, i.e., the external transaction's priority is lower probe_Q. On receiving a probe P, if a data manager detects a
than the highest priority 'transaction of the cycle, DI will dis- deadlock, then the value of P.1 gives the length of the dead-
card the probe. On the other hand, if Tx > T1, D1 will propa- lock cycle.
gate the probe to T1. Once this probe has crossed over the H. Voluntary Abort by a Transaction
highest priority transaction of the deadlock cycle, it will cover Though the algorithm is designed for detection and resolu-
the entire cycle and will be saved in probe-Qs of all member tion of deadlocks, it can be used by transactions to abort vol-
transactions (and data managers). This is correct since the ex- untarily rather than wait until a deadlock cycle is formed, de-
ternal transaction Tx waits directly or transitively on all mem- tected, and resolved. When a transaction receives a probe, it
ber transactions of the deadlock cycle. But since Tx > T1, the can decide to abort voluntarily on either of two conditions:
probe will keep circulating the cycle indefinitely (until the cycle 1) a transaction with very high priority waits for it directly or
is broken) and a member transaction may receive a probe whose transitively, or 2) the value of P.1 is very high, i.e., a big wait-
initiator is the initiator for some probe already stored in its for chain is already formed.
probe-Q. Such a probe can be considered to be a duplicate,
and it should be discarded. To discard these duplicate probes, ACKNOWLEDGMENT
the following modification to the basic algorithm is needed. The authors thank the referee for his comments and sugges-
When a transaction receives a probe from a data man- tions. They are also thankful to Prof. K. Mani Chandy and
ager, it discards the probe, if there exists a probe in its Prof. M. Stonebraker for their helpful discussions.
probe_Q which has an identical initiator. REFERENCES
F. Fair Scheduling of Exclusive Locks [11 R. Bayer, K. Elhardt, J. Heigert, and A. Reiser, "Dynamic time-
stamp allocation for transactions in database systems," in Distri-
The policy discussed in Section V, of granting an S-lock re- buted Databases, H. J. Schneider, Ed. Amsterdam, The Nether-
quest when an'X_lock request is already pending, is unfair to lands: North-Holland, 1982, pp. 9-20.
X_requestors. A fair scheduling policy would be as follows. [2] P. A. Bernstein and N. Goodman, "Concurrency control in dis-
tributed database systems," ACM Comput. Surveys, vol. 13, pp.
When a transaction T, requests an S_lock, it is granted 185-221, June, 1981.
if there is no X_holder, and no' X_requester of higher [3] K. 'M. Chandy and J. Misra, "A distributed algorithm for detect-
ing resource deadlocks in distributed systems," in Proc. ACM
priority than T. SIGACT-SIGOPS Symp. Principles of -Disbributed Computing,
Ottawa, Ont., Canada, Aug. 1982.
Such a scheme ensures that an X_requester will never en- [41 K. M. Chandy, J. Misra, and L. M. Haas, "Distributed-deadlock
counter antagonistic conflicts incrementally. However, even detection," ACM Trans. Comp ut. Syst., vol. 1, pp. 144-156, May
in this case, S_holders are introduced incrementally, and to 1983.
[51 E. G. Coffman, Jr., M. J. Elphick, and A. Shoshani, "System dead-
take into account transitive wait on these additional S_holders, locks,"ACMComput. Surveys,-vol. 3, pp. 66-78, June 1971.
we need to maintain probe-Qs within data managers. Further, [6] C. T. Davies, "Recovery semantics for a DB/DC systeni," in Proc.
now an S_requester may encounter antagonistic conflicts with [71 ACM Nat. Conf.; vol. 28, 1973, pp. 136-141.
K. P. Eswaran, J. N. Gray, R. A. Lorie, and I. L. Traiger, "The
some S_-holders, and in such cases probes must be sent to notion of consistency and predicate locks in a database system,"
those S-holders. Commun. ACM, vol. 19, pp. 624-633, Nov. 1976.
We must point out here that this fair scheduling policy is not [81 V. D. Gligor and S. H. Shattuck, "On deadlock detection in dis-
tributed systems," IEEE Trans. Software Eng., voL SE-6, pp.
directly applicable in the case of nested transactions since we 435-440, Sept. 1980.
have to take into account retainers also. For example, suppose [9] J. N. Gray, "Notes on database operating systems," in Operating
for some data item there is a retainer Tr and an X_requester Systems, An Advanced-Course (Lecture Notes in Computer Sci-
ence 60). Berlin, Germany: Springer-Verlag, 1978, pp. 398-481.
Tx and let us assume that Tx > Tr. Now, when a descendant [101 R. C. Holt, "Some deadlock properties of computer systems,"
of Tr requests an S-lock, it must be granted, even though its ACM Comput. Surveys, vol. 4, pp. 179-195, Dec. 1972.
80 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-Il, NO. 1, JANUARY 1985

[11] L. Lamport, "Time, clocks and ordering of events in a distributed ment and Computing Techniques, Bombay. From September 1979 to
system," Commur. AC3M, vol. 21, pp. 558-565, July 1978. August 1980, he was a Visiting Engineer in the Computer Systems Re-
[121 D. A. Menasce and R. R. Muntz, "Locking and deadlock detec- search Group at Massachusetts Institute of Technology, where he worked
tion in distributed databases," IEEE Trans. Software Eng., voL on concurrency control problems in distributed systems. He has de-
SE-5, pp. 195-202, May 1979. signed and implemented various systems which include compilers, gen-
[13] J.E.B. Moss, "Nested transactions: An approach to reLable dis- eral purpose graphics systems, multiprocessor operating systems, and a
tributed computing," Lab. Comput. Sci., Massachusetts Inst. file server for a local area network. His current research interests are op-
Technol., Cambridge, MA, Tech. Rep. 260, Apr. 1981. erating systems, database concurrency control, and local area networks.
[14] N. Natarajan, "Communication and synchronization in distributed
programs," Ph.D. dissertation, National Centre for Software De-
velopment and Computing Techniques, Tata Inst. Fundamental
Res., Bombay, India, Nov. 1983.
[15] R. Obermarck, "Distributed deadlock detection algorithm," ACM
Trans. Database Syst., vol. 7, pp. 187-208, June 1982. N. Natarajan was born in Madras, India, on June
[161 D. J. Rosenkrantz, R. E. Stearns, and P.M. Lewis, "System level 28, 1950. He received the B.E. (Hons.) degree
concurrency control for distributed database systems," ACM in electronics and communication engineering
Trans Database Syst., vol. 3, pp. 178-198, June 1978. from the University of Madras, Madras, in 1972,
the M.E. degree in automation from Indian In-
stitute of Science, Bangalore, India, in 1974,
_M .% Mukul K. Sinha was born in Patna, India, on and th PhD. degree in computer science from
September 27, 1950. He received the B.Sc. the University of Bombay, Bombay, India, in
(Engineering) degree in electrical engineering 1983.
from Bihar Institute of Technology, Sindri, He has been working with the National Centre
s@ Ine@lIndia, in 1968, the M.Tech degree in electrical
@t.g for Software Development and Computing
engineering from Indian Institute of Technol- Techniques, Tata Institute of Fundamental Research, Bombay, since
ogy, Kanpur, India, in 1971, and the Ph.D. de- 1974 where he has worked on compilers, operating system for a multi-
gree in computer science from the University of processor, and the design of a local area network. He visited the Labo-
Bombay, Bombay, India, in 1983. ratory for CQmputer Science, Massachusetts Institute of Technology,
He is currently working as a Scientific Officer during 1979-1980. His research interests include operating systems,
at the National Centre for Software Develop- programming languages, computer networks, and distributed systems.

Timing Constraints of Real-Time Systems:


Constructs for Expressing Them,
Methods of Validating Them
B. DASARATHY, MEMBER, IEEE

mal languages for expressing the requirements of systems [9].


Abstract-This paper examines timing constraints as features of real-
time systems. It investigates the various constructs required in require-
In particular, researchers have shown an interest in languages
ments languages to express timing constraints and considers how auto-
for expressing the requirements of reat-time systems. Exam-
matic test systems can validate systems that include timing constraints.
ples of such languages are REVS' RSL [1], [2], [7], CCITT's
Specifically, features needed in test languages to validate timing con-
System Description Language (SDL) [5], Zave's PAISLey
straints are discussed. One of the distinguishing aspects of three tools
developed at GTE Laboratories for real-time systems specification and
[13], and GTE Laboratories' Real-Time Requirements Lan-
testing is in their extensive ability to handle timing constraints. Thus,
guage (RTRL) [10].
the paper highlights the timing constraint features of these tools.
SDL, RSL, and RTRL share a common view of real-time
Index Tenns-Real-time systems, requirements specification, test
systems. They hold that a real-time system (or the ports it
generation, test language, timing constraints, validation. serves) can be modeled as finite-state machines (FSM's) in
INTRODUCTION which a response. at any instance is completely determined by
DURING the past decade there has been great progress in the system's present state and the stimulus that has arrived.
the development of requirements languages; that is, for- The behavior of the system is captured in transitions made
from one state to another state on a stimulus. PAISLey has a
Manuscript received July 29, 1983. more general view of a real-time system in that it allows both
The author is with GTE Laboratories, Inc., Waltham, MA 02254. the system sid 4ts-4wkonment to be modeled as interacting
0098-5589/85/0100-0080$01.00 © 1985 IEEE