Академический Документы
Профессиональный Документы
Культура Документы
February 9, 2011
Gerard Tel
Introduction to Distributed Algorithms (2nd edition)
Cambridge University Press, 2000
Practical Information
12 lectures: Wan Fokkink
room T443
w.j.fokkink@vu.nl
exam: written
you may use Tel’s book
homepage: http://www.cs.vu.nl/∼tcs/da/
copies of the slides
exercise sheets
schedule of lectures and exercise classes
pointers to relevant papers
Distributed Systems
Motivation:
I information exchange (WAN)
I resource sharing (LAN)
I multicore programming
I parallelization to increase performance
I replication to increase reliability
I modularity to improve design
Distributed Versus Uniprocessor
B K C
A D
S R
F L E
7 0 7 0
6 1 6 1
5 2 5 2
4 3 4 3
Formal Framework
γ ∈ C is terminal if γ → δ for no δ ∈ C.
Executions
And k = log2 2k .
Big O Notation
Let f , g : N → R>0 .
f ∈ O(g ) if, for some C > 0, f (n) ≤ C ·g (n) for all n ∈ N.
f ∈ Θ(g ) if f ∈ O(g ) and g ∈ O(f ).
By definition, aloga n = n.
aloga b· logb n = b logb n = n
Therefore, loga b· logb n = loga n.
Assumptions
I asynchronous communication
Remarks:
I Algorithms for undirected channels usually include ack’s.
I Acyclic graphs are usually undirected
(else the graph would not be strongly connected).
Snapshots
hmkri
snapshot
hmkri
Chandy-Lamport Algorithm - Example
hmkri
m1
hmkri
snapshot
∅
Chandy-Lamport Algorithm - Example
snapshot
∅ hmkri
m1 m2
∅
Chandy-Lamport Algorithm - Example
snapshot
m1 {m2 }
The send of m1 and the internal event from red to yellow are both
not causally before the send of m2 . So it is a configuration of
an execution in the same computation as the actual execution.
Lai-Yang Algorithm
G = (V , E ) is an undirected graph.
R1 A node never forwards the token through the same channel twice.
3 10
u v w
8 9
12
4 7 1 11 2
6
x y
5
u v w
x y
Example: 3 6
u v w
4 5
12
10 9 1 7 2
8
x y
11
Depth-First Search with Neighbor Knowledge
I A node holding the token for the first time informs all neighbors
except its father and the node to which it forwards the token.
I The token is only forwarded to nodes that were not yet visited
by the token (except when a node sends the token to its father).
Awerbuch’s Algorithm - Complexity
At least once per time unit, a token is forwarded through a tree edge.
Each tree edge carries 2 forwarded tokens.
Cidon’s Algorithm - Example
9
u v
1
3
2 8
4 7
x w y
5 6
Tree Algorithm
decide decide
Question
decide
Question
Adapt the echo algorithm to compute the sum of these integer values.
Termination Detection
send internal
active passive
read read
A logical clock provides (basic and control) events with a time stamp.
The time stamp of a process is the time stamp of its last event
(initially it is 0).
Only quiet processes, that have been quiet from a time ≤ t onward,
take part in the wave.
When p takes part in the wave, its logical clock becomes > t.
So q will not take part in the wave (because it is tagged with t).
Weight-Throwing Termination Detection
p0 q
s r
s becomes passive.
Solution: Processes are colored white or black. Initially they are white,
and a process that sends a basic message becomes black.
p0 q
s r
At each round trip, the token carries the sum of the counters of
the processes it has traversed.
Still the token may compute a negative sum for a round trip, when
a visited passive process receives a basic message, becomes active,
and sends basic messages that are received by an unvisited process.
Safra’s Algorithm
- If the token and p0 are white and the sum of all counters is zero,
p0 calls Announce.
s r
Drawbacks:
I Reference counting cannot reclaim cyclic garbage.
I In a distributed setting, creating, duplicating or deleting
a remote reference usually induces a message.
Reference Counting - Race Conditions
Examples:
I A process is waiting for one message from a group of processes:
N=1
I A database transaction first needs to lock several files: N = M.
Wait-For Graph
b b
Static Analysis - Example 1
b b
Deadlock
Static Analysis - Example 2
b b
Static Analysis - Example 2
b
Static Analysis - Example 2
b
Static Analysis - Example 2
No deadlock
Bracha-Toueg Deadlock Detection Algorithm - Snapshot
Then it computes:
Out u : the nodes it has sent a request to (not granted)
Inu : the nodes it has received a request from (not purged)
Bracha-Toueg Deadlock Detection Algorithm
When the initiator has received DONE from all nodes in its Out set,
it checks the value of its free field.
NOTIFY
nv = 1 v w nw = 1
u is the initiator.
Bracha-Toueg Deadlock Detection Algorithm - Example
GRANT
v w
await DONE from w NOTIFY
(for DONE to u)
Bracha-Toueg Deadlock Detection Algorithm - Example
await ACK u, w
await DONE from v , x (for DONE to u)
ACK
nu = 1 u x
NOTIFY
v w nw = 0
await DONE w GRANT await DONE from x
(for DONE to u) (for DONE to v )
await ACK from v
(for ACK to x)
Bracha-Toueg Deadlock Detection Algorithm - Example
GRANT DONE
nv = 0 v w
await DONE from w await DONE from x
(for DONE to u) (for DONE to v )
await ACK from u await ACK from v
(for ACK to w ) (for ACK to x)
Bracha-Toueg Deadlock Detection Algorithm - Example
ACK
v w
await DONE from w DONE await ACK from v
(for DONE to u) (for ACK to x)
await ACK from u
(for ACK to w )
Bracha-Toueg Deadlock Detection Algorithm - Example
DONE
v w
ACK await ACK from v
(for ACK to x)
Bracha-Toueg Deadlock Detection Algorithm - Example
ACK
v w
Bracha-Toueg Deadlock Detection Algorithm - Example
v w
Bracha-Toueg Deadlock Detection Algorithm - Example
u x
v w
In their original algorithm, Bracha and Toueg do not take into account
channel states in the snapshot of the wait-for graph.
Assumptions:
N−1 0
N−2 1 N(N+1)
clockwise: 2
messages
N−3 2
In each update round, at least half of the active nodes become passive.
So there are at most log2 |V | rounds.
N−1 0
N−2 1
after 1 round only node 0 is active
N−3 2
x v u w y
4 0
0 2
5 3 1 2
1
leader
0
Election in Acyclic Graphs
3 3 3 3
1 1 3 5 5 1 1 3 5 5 1 1 3 5 5 1 1 3 5 5
4 4 6 2 2 4 4 6 2 2 4 4 6 2 2 4 4 6 2 2
6 6 4 2
1 1 1 1
1 1 3 5 5 1 1 3 5 5 1 1 3 5 5 1 1 3 5 1
4 4 6 2 2 4 4 6 2 2 4 4 6 2 2 4 4 6 2 2
2 2 2 2
1 leader 1 1 1
1 1 3 5 1 1 1 3 5 1 1 1 3 5 1 1 1 3 5 1
4 4 6 2 2 4 4 6 2 2 1 4 6 2 2 1 4 6 2 1
1 1 1 1
Echo Algorithm with Extinction
Election for undirected graphs, based on the echo algorithm:
I Each initiator starts a wave, tagged with its id.
I At any time, each node takes part in at most one wave.
I Suppose a node u in wave v is hit by a wave w :
- if v > w , then u changes to wave w
(it abandons all earlier messages);
- if v < w , then u continues with wave v
(it purges the incoming message);
- if v = w , then the incoming message is treated according to
the echo algorithm of wave w .
I If wave u executes a decide event (at u), u becomes leader.
(Or weighted edges can be totally ordered by also taking into account
the id’s of endpoints of an edge, and using a lexicographical order.)
10
3
1 8
9
2
Fragments
Then e is in M.
Example: 10
3
1 8
9
Each fragment has a unique name. Its level is the maximum number
of joins any node in the fragment has experienced.
e
L < L0 ∧ F →
F
F 0: F ∪ F 0 = (L0 , FN 0 )
L = L0 ∧ eF = eF 0 : F ∪ F 0 = (L + 1, weight eF )
The core edge of a fragment is the last edge that connected two
sub-fragments at the same level; its end points are the core nodes.
Parameters of a node
Its state:
I sleep (for non-initiators)
I find (looking for a lowest-weight outgoing edge)
I found (reported a lowest-weight outgoing edge to the core edge)
find
At reception of hinitiate, L, FN, found i, a node stores L and FN,
and adopts the sender as its father.
It passes on the message through its other branch edges.
Computing the lowest-weight outgoing edge
A core node receives reports from all neighbors, including its father.
11
9 7
x w
3
uv vu hconnect, 0i vw hconnect, 1i
uv vu hinitiate, 1, 5, findi xu wv htest, 1, 3i
ux vw htest, 1, 5i ux vw accept
yv hconnect, 0i wx hreport, 7i
vy hinitiate, 1, 5, findi xw hreport, 9i
yv hreport, ∞i wv hconnect, 1i
wx xw hconnect, 0i wv vu vy vw wx hinitiate, 2, 7, findi
wx xw hinitiate, 1, 3, findi uw htest, 2, 7i
xu wv accept wu reject
uv hreport, 9i yv uv vw xw wv hreport, ∞i
vu hreport, 7i
Anonymous Networks
Assumptions:
I All processes have the same local algorithm.
I Processes (and edges) have no (unique) id.
I The initiators are any non-empty set of processes.
Example: u u w x
(v,1,true)
v v
u<v v < w,x
Election in Arbitrary Anonymous Networks
Each message sent upwards in the constructed tree reports the size
of its subtree. All other messages report 0.
u < v < w < x < y. Only waves that complete are shown.
The process at the left computes size 6, and becomes the leader.
Computing the Size of a Network
Example: (v,2)
(u,2) (u,2)
(v,2)
Question
Each round it sends out one message, which takes at most N steps.
Itai-Rodeh Ring Size Algorithm - Example
(x,3) (u,4)
(z,4) (z,4)
FireWire Election Algorithm
pivot u Dx [v ] := 5 Dv [x] := 5
4
Nx [v ] := u Nv [x] := u u v
pivot v Du [w ] := 5 Dw [u] := 5 1 1
Nu [w ] := v Nw [u] := v x w
1
pivot w Dx [v ] := 2 Dv [x] := 2
Nx [v ] := w Nv [x] := w
pivot x Du [w ] := 2 Dw [u] := 2
Nu [w ] := x Nw [u] := x
Du [v ] := 3 Dv [u] := 3
Nu [v ] := x Nv [u] := w
Question
Answer:
I All nodes must pick the pivots in the same order.
I Each round, nodes need the routing table of the pivot
to compute their own routing table.
Toueg’s Algorithm
Assumption: Each node knows from the start the id’s of all nodes in V .
(Because pivots must be picked uniformly at all nodes.)
Initialization:
S := ∅
forall v ∈ V do
if u = v then Du [v ] := 0; Nb u [v ] := ⊥
else if v ∈ Neighu then Du [v ] := ωuv ; Nb u [v ] := v
else Du [v ] := ∞; Nb u [v ] := ⊥
Toueg’s Algorithm - Local Algorithm at Node u
while S 6= V do
pick w from V \S
forall x ∈ Neighu do
if Nb u [w ] = x then send hys, w i to x
else send hnys, w i to x
num rec u := 0
while num rec u < |Neighu | do
receive a hys, w i or hnys, w i message;
num rec u := num rec u +1
if Du [w ] < ∞ then
if u 6= w then receive Dw from Nb u [w ]
forall x ∈ Neighu that sent hys, w i do send Dw to x
forall v ∈ V do
if Du [w ]+Dw [v ] < Du [v ] then
Du [v ] := Du [w ]+Dw [v ]; Nb u [v ] := Nb u [w ]
S := S ∪ {w }
Toueg’s Algorithm - Complexity + Drawbacks
Drawbacks:
Dv0 := 0 Nb v0 := ⊥
Dw := 6 Nb w := v0
Du := 7 Nb u := w
Dx := 8 Nb x := u 1
u w
Dx := 7 Nb x := w 4
Du := 4 Nb u := v0
1 6
Dw := 5 Nb w := u 1
Dx := 6 Nb x := w
x v0
Dx := 5 Nb x := u 1
Dx := 1 Nb x := v0
Dw := 2 Nb w := x
Du := 3 Nb u := w
Du := 2 Nb u := x
Chandy-Misra Algorithm - Complexity
For each root, the algorithm requires at most O(|V |·|E |) messages.
Merlin-Segall Algorithm
v0 starts a new update round after receiving mydist from all neighbors.
Merlin-Segall Algorithm - Termination + Complexity
0 0 0 8
0 0 0 8
0 1 1 1 8
0
4 3 4 3 4 3
1 5 1 5 1 5
5
4 4
1 1 1
4 5 4 5
4 5 4 5
8 8
0 1 0 1
1 1
1
4 3 4 3
1 5 1 5
4 4
1 1 4
4 5 4 4
4
Merlin-Segall Algorithm - Example (Round 2)
0
0 1 0 1 0 1
0 1 0 1 1 0 1 1
0 1
4 3 4 3 4 3
1 5 1 5 1 5
4 4 4
1 1 1
4 4 4 4 4 4
4 4 4
1
0 1 0 1
1 1
4 3 4 3
1 5 1 5
2 4 2
4 1 4 1 4
2 4 2 4
Merlin-Segall Algorithm - Example (Round 3)
0
0 1 0 1 0 1
0 1 0 1 1 0 1
0 0 1 0
4 3 4 3 4 3
1 5 1 5 1 5
2
1 1 1 4
2 4 2 4 2 4
2 4
1
0 1 0 1 0 1
1 1 1
4 3 4 3 4 3
1 5 1 5 1 5
2 3 2
2 1 4 1 4 1
2 3 2 3 2 3
Merlin-Segall Algorithm - Topology Changes
A number is attached to mydist messages.
When a channel fails or becomes operational, adjacent nodes
send the number of their update round to v0 via the sink tree.
(If the message meets a failed tree link, it is discarded.)
If the failed channel is part of the sink tree, the remaining tree
is extended to a sink tree.
When v0 receives such a message, it starts a new set of
|V |−1 update rounds, with a higher number.
Example: z u v0
y x w
y signals to z that the failed channel was part of the sink tree.
Link-State Routing for Dynamic Topologies
Why is that ?
The Link-State Protocol is the basis for the OSPF (Open Shortest
Path First) and IS-IS (Intermediate System to Intermediate System)
protocols for routing on the Internet.
0
forward/reverse
forward/reverse
explore
f
explore
explore/reverse
f +1
Breadth-First Search - A “Simple” Algorithm
Each round, tree edges carry at most 1 forward and 1 replying reverse.
Initially, the initiator is at level 0, and all other nodes are at level ∞.
In round k+1:
I hforward, k`i travels down the tree, from the initiator
to the nodes at level k`.
(forward messages from non-fathers are purged.)
Frederickson’s Algorithm
2
Worst-case time complexity: O( |V`| )
Possible events:
I Generation: A new packet is placed in an empty buffer.
I Forwarding: A packet is forwarded to an empty buffer
of the next node on its route.
I Consumption: A packet at its destination node is removed
from the buffer.
G1 G2 G3
Acyclic Orientation Cover Controller
I Let vw be an edge in Gi .
The ith buffer of v is linked to the ith buffer of w , and
if i < n, the ith buffer of w is linked to the (i+1)th buffer of v .
Acyclic Orientation Cover Controller - Correctness
Suppose all (i+1)th buffers are empty in δ, for some i < n. Then
all ith buffers must also be empty in δ. For else, since Gi is acyclic,
some packet in an ith buffer could be forwarded or consumed.
Let N = 4 and t = 1.
Since one process may crash, processes can only wait for three votes.
1 0
1 0
The left nodes might all the time receive two 1-votes and one 0-vote,
while the right nodes receive two 0-votes and one 1-vote.
With a (very small) positive probability they all choose the same value.
Bracha-Toueg Crash Consensus Algorithm
<3,0,1> <4,0,2>
0 0 decide 0
weight 1 weight 2
Bracha-Toueg Crash Consensus Algorithm - Correctness
N
So in round k, each correct process receives a hq, b, w i with w > 2.
After round k+1, all correct processes have value b with weight N−t.
N
Let t < 3.
N−t N+t
A correct process decides b upon receiving > 2 +t = 2 b-votes
in one round. (Note that N+t
2 < N−t.)
Echo Mechanism
0 decide 0 0
1 Byzantine 0 Byzantine
0 0
1 1 0 0 0
1 0 1 0
decide 1 1 decide 0
N+t
If > 2 of the accepted votes are for b, then p decides b.
0 0
1 Byzantine 0 Byzantine
0 1
1 1 0 0 1 0 0
1 0 1 0 decide 0
0 1 0
In the first round, the left bottom node does not accept vote 1 by
the Byzantine process, since none of the other two correct processes
confirm this vote. So it waits for (and accepts) vote 0 by
the right bottom node, and thus does not decide 1 in the first round.
1
0 Byzantine decide 0 0 Byzantine
0 0 decide,0
1 0
0 0 decide 0 0 0
decide,0
Bracha-Toueg Byzantine Consensus Alg. - Correctness
Proof: Each round, the correct processes eventually accept N−t votes,
since there are ≥ N−t correct processes. (Note that N−t > N+t2 .)
∀τ ∀p, q 6∈ F (τ ) p 6∈ H(q, τ )
Assumptions:
I Each correct process broadcasts alive every σ time units.
I µ is a (known) upper bound on communication delay.
Each process from which no message is received for σ+µ time units
has crashed.
∃p ∀τ ∀q 6∈ F (τ ) p 6∈ H(q, τ )
After round N, each correct process decides for its value at that time.
Assumptions:
I Each correct process broadcasts alive every σ time units.
I There is an (unknown) upper bound on communication delay.
Each process q initially guesses the communication delay is µq = 1.
If q receives no message from p for σ+µq time units,
then q suspects that p has crashed.
When q receives a message from a suspected process p,
then p is no longer suspected and µq := µq +1.
This failure detector is complete and eventually strongly accurate.
Chandra-Toueg Algorithm
∃p ∃τ ∀τ 0 ≥ τ ∀q 6∈ Crash(F ) p 6∈ H(q, τ 0 )
1 ts 0
1
ts 0
1
ts 0
crashed crashed crashed decide 0
ts 1 ts 2 ts 2
0 0 0
<2,1,0> <ack,2>
<2,0>
1 ts 0
0 ts 2
0 ts 2
crashed
ts 2
0
<decide,0> N = 3 and t = 1
0
decide 0 ts 2
Messages and ack’s that a process sends to itself
are omitted
Chandra-Toueg Algorithm - Correctness
Theorem: Let t < N2 . The Chandra-Toueg algorithm is
a terminating algorithm for t-crash consensus.
Proof (part I): If the coordinator in some round k receives > t ack’s,
then (for some b ∈ {0, 1}):
(1) there are > t processes q with ts q ≥ k, and
(2) ts q ≥ k implies value q = b.
Properties (1) and (2) are preserved in rounds ` > k.
This follows by induction on `−k.
By (1), in round ` the coordinator receives at least one message
with time stamp ≥ k.
Hence, by (2), the coordinator of round ` sets its value to b,
and broadcasts h`, bi.
So from round k onward, processes can only decide b.
Chandra-Toueg Algorithm - Correctness
1
(τ2 − τ1 ) ≤ Cp (τ2 ) − Cp (τ1 ) ≤ (1+ρ)(τ2 − τ1 )
1+ρ
|Cp (τ ) − Cq (τ )| ≤ δ
|Cp (τ ) − Cq (τ )| ≤ δ0
δ−δ0
So there should be a synchronization every 2ρ time units.
Mahaney-Schneider Synchronizer
|ap − aq | ≤ 2δ
|ap − aq | ≤ 2δ
Mahaney-Schneider Synchronizer - Correctness
N
Theorem: Let t < 3. The Mahaney-Schneider synchronizer is
t-Byzantine.
Proof: Let apr (resp. aqr ) be the value that correct process p (resp. q)
accepted from or assigned to process r , in some synchronization round.
By the lemma, for all r , |apr − aqr | ≤ 2δ.
Moreover, apr = aqr for all correct r .
Hence, for all correct p and q,
1 X 1 X 1 2
| ( apr ) − ( aqr )| ≤ t2δ < δ
N processes r N processes r N 3
So we can take δ0 = 23 δ.
δ
There should be a synchronization every 6ρ time units.
Impossibility of d N3 e-Byzantine Synchronizers
N
Theorem: Let t ≥ 3. There is no t-Byzantine synchronizer.
Let the local clock of p run faster than the local clock of q.
1. sends messages
2. receives messages
For simplicity, let the maximum network delay δmax , and the time
to perform internal events in a pulse, be much smaller than δ.
τ −δ ≥τ
Cq
real time
(1+ρ)δ
τ ≤ τ + (1+ρ)2 δ
Cp
real time
(1+ρ)δ
Hence,
≤ Cp−1 (i(1+ρ)2 δ)
Byzantine Broadcast
Byzantine Byzantine 1
1 1
After pulse 1: g Byzantine
Byzantine
0 0 0
Let t > 0.
Proof: By induction on t.
Let g be Byzantine (so t > 0). Then ≤ t−1 lieutenants are Byzantine.
Since t−1 < N−13 , by induction, for every lieutenant q, all correct
lieutenants take in Broadcast q (N−1, t−1) the same decision vq .
So in pulse t+1, all correct lieutenants decide the same value major (M).
N
Incorrectness for t ≥ 3 - Example
N = 3 and t = 1.
g 1
Byzantine p q
g decides 1.
Scenario 0 S Scenario 1 S
0 0 0 0 1 1 1 1
1 1
T U Byzantine Byzantine T U
0 0
The processes in S and T decide 0 The processes in S and U decide 1
Scenario 2 S Byzantine
0 0 1 1
1
T U
0
The processes in T decide 0 and in U decide 1
Partial Synchrony
Byzantine r q
pulse 3: q broadcasts h1, (Sg (1), g ) : (Sr (1), r ) : (Sq (1), q)i
Wp = Wq = {0, 1}
p and q decide ⊥
Mutual Exclusion
Queue maintenance:
I When a process wants to enter the critical section,
it adds its id to its own queue.
I When a process that is not the root gets a new head at
its (non-empty) queue, it asks its father for the token.
I When a process receives a request for the token from a child,
it adds this child to its queue.
Raymond’s Algorithm
When the root exits the critical section (and its queue is non-empty),
it sends the token to the process q at the head of its queue,
makes q its father, and removes q from the head of its queue.
2 3
4 5 6 7
Raymond’s Algorithm - Example
1 3
2 3 7
4 5 6 7 7
Raymond’s Algorithm - Example
1 3, 2
2 2 3 7
4 5 6 7 7
Raymond’s Algorithm - Example
1 3, 2
2 2 3 7, 6
4 5 6 7 7
6
Raymond’s Algorithm - Example
1 2
2 2 3 7, 6, 1
4 5 6 7 7
6
Raymond’s Algorithm - Example
1 2
2 2 3 6, 1
4 5 6 7 3
6
Raymond’s Algorithm - Example
1 2
2 2 3 6, 1
4 5 6 7
6
Raymond’s Algorithm - Example
1 2
2 2 3 1
4 5 6 7
3
Raymond’s Algorithm - Example
1 2
2 2 3 1
4 5 6 7
Raymond’s Algorithm - Example
1 2
2 2 3
4 5 6 7
Raymond’s Algorithm - Example
2 3
4 5 6 7
Raymond’s Algorithm - Correctness
Example: Let N = 7. 1
2 3
4 5 6 7
Question: What are the quorums if 1,2 crashed? And if 1,2,3 crashed?
Agrawal-El Abbadi Algorithm - Correctness
We prove, by induction on k, that each pair of quorums has
a non-empty intersection, and so mutual exclusion is guaranteed.
Advantages:
I fault tolerance
I robustness for dynamic topologies
I straightforward initialization
Self-Stabilization: Communication
1 p3 p1 3
p0 0
2 p3 p1 0 3 p3 p1 0
p0 0 p0 0
Dijkstra’s Token Ring - Correctness
The next time p0 becomes privileged, clearly σi = σ0 for all 0 < i < N.
So then mutual exclusion has been achieved.
Question
Since p1 , . . . , pN−1 only copy values, they will hold (a subset of)
these N−2 values as long as σ0 equals σi for some i = 1, . . . , N−1.
2 p N-4
... p 4 N-6
1 p N-3 p 3 N-5
0 p N-2 p 2 N-4
I leader i < i, or
I dist i ≥ K .
Arora-Gouda Election Algorithm
K =5
leader = 8
p2 father = 7
dist = 4
leader = 8 leader = 8
father = 3 p7 p3 father = 2
dist = 3 dist = 2
leader = 8 leader = 8
father = 1 p5 p1 father = 3
dist = 4 dist = 3
Arora-Gouda Election Algorithm - Example
K =5
leader = 8
p2 father = 7
dist = 4
leader = 8 leader = 8
father = 3 p7 p3 father = 2
dist = 3 dist = 5
leader = 8 leader = 8
father = 1 p5 p1 father = 3
dist = 4 dist = 3
Arora-Gouda Election Algorithm - Example
K =5
leader = 8
p2 father = 7
dist = 4
leader = 8 leader = 3
father = 3 p7 p3 father = −
dist = 3 dist = 0
leader = 8 leader = 8
father = 1 p5 p1 father = 3
dist = 4 dist = 3
Arora-Gouda Election Algorithm - Example
K =5
leader = 3
p2 father = 7
dist = 2
leader = 3 leader = 3
father = 3 p7 p3 father = −
dist = 1 dist = 0
leader = 3 leader = 3
father = 1 p5 p1 father = 3
dist = 2 dist = 1
Arora-Gouda Election Algorithm - Example
K =5
leader = 3
p2 father = 7
dist = 2
leader = 3 leader = 3
father = 3 p7 p3 father = −
dist = 1 dist = 0
leader = 5 leader = 3
father = − p5 p1 father = 3
dist = 0 dist = 1
Arora-Gouda Election Algorithm - Example
K =5
leader = 5
p2 father = 7
dist = 4
leader = 5 leader = 5
father = 3 p7 p3 father = 1
dist = 3 dist = 2
leader = 5 leader = 5
father = − p5 p1 father = 5
dist = 0 dist = 1
Arora-Gouda Election Algorithm - Example
K =5
leader = 5
p2 father = 7
dist = 4
leader = 7 leader = 5
father = − p7 p3 father = 1
dist = 0 dist = 2
leader = 5 leader = 5
father = − p5 p1 father = 5
dist = 0 dist = 1
Arora-Gouda Election Algorithm - Example
K =5
leader = 7
p2 father = 7
dist = 1
leader = 7 leader = 7
father = − p7 p3 father = 7
dist = 0 dist = 1
leader = 7 leader = 7
father = 1 p5 p1 father = 3
dist = 3 dist = 2
Arora-Gouda Election Algorithm - Correctness
Why can always some process on such a cycle declare itself leader ?
application
embedded
system
resources processors
(memory)
Jobs are divided over processors, and are competing for resources.
I execution time e
0 1 2 3 4 5 6
periodic tasks
aperiodic jobs
processor
sporadic jobs accept
Assumptions:
I Jobs are preemptive: they can be suspended at any time of
their execution
I No resource competition
RM
0 2 4 6 8 10 12
EDF / LST
EDF Scheduler
Earliest Deadline First: The earlier the deadline, the higher the priority.
r1 r2 d2 d1
J1 J2 EDF
0 1 2 3 4
J2 J1 non−EDF
EDF Scheduler
r1 r2 d2 d1
J1 J2 EDF
0 1 2 3 4
J2 J1 non−EDF
LST Scheduler
Slack is the available idle time of a job until the next deadline.
Least Slack Time first: less slack gives a job higher priority.
r1 r2 r 3 d1 d3 d2
J1 J2 J3
0 1 2 3 4 5
J1 J3 J2
At the start of a new period, the first es time units can be used
to execute aperiodic jobs.
0 1 2 3 4 5 6 7
Fix an allowed utilization rate ũs for the server, such that
n
X ek
+ ũs ≤ 1
pk
k=1
Aperiodic jobs can now be treated in the same way as periodic jobs,
by the EDF scheduler.
In the absence of sporadic jobs, aperiodic jobs meet the deadlines
assigned to them (which may differ from their actual soft deadlines).
Example: T1 = (0, 2, 1) and T2 = (0, 3, 1). We fix ũs = 61 .
r r 0 r 00
0 1 2 3 4 5 6 7 8 9 10 11 12
J1 requires blue
J1 requires blue
Jk Jk−1 J2 J1 Jk J J1 Jk−1
done done done done
Priority Ceiling
Note that when no resources are in use, all arrived jobs are released.
J2 J1
done done