Академический Документы
Профессиональный Документы
Культура Документы
Outline
Introduction
Distributed Mutual Exclusion
Election Algorithms
Group Communication
Consensus and Related Problems
Introduction
NFS servers are stateless so they do not maintain clients state hence
they do not lock files on behalf of the clients. A distributed mutual
exclusion mechanism is required to ensure consistency and prevent
interference
Ethernet and wifi adapters coordinate to access the same transmission
medium mutual exclusion is again required
Any application that has distributed process example can be car
parking maintenance application with entrance and exit processes
working separately
Main Assumptions
Process 2
(1)
Process 3
Shared
resource
Process n
Prevent interference
Ensure consistency when accessing the
resources
Critical section
Enter()
Exit()
(2)
(3)
Ordering: If one request to enter the CS happenedbefore another, then entry to the CS is granted in
that order
Ring-Based Algorithm
4
2
3) Grant
token
P1
1) Reques
t
token
P2
10
2)
Release
token
P4
P3
Holds the token
Waiting
Ring-Based Algorithm
(1)
A group of unordered
processes in a network
P4
P2
Pn
P1
Ethernet
11
P3
Ring-Based Algorithm
P1
P1
(2)
Enter()
P2
Pn
Exit()
P3
P4
12
Token
navigates
around the
ring
Critical
Section
The basic idea is that processes that require entry to a critical section
multicast a request message, and can enter it only when all the other
processes have replied to this message
Each process i is required to receive permission from Si only. Correctness requires
that multiple processes will never receive permission from all members of their
respective subsets.
S1
S0
0,1,2
1,3,5
2,4,5
S2
19
2
Enter()
Exit()
Critical
Section
15
P3
19
OK
P1
P1
23
23
OK
OK
19
OK
Waiting
queue
23
P2
P1 and P2 request
entering the critical
section
simultaneously
Initialization
State :=
RELEASED;
Process pi request entering the critical section
State := WANTED;
T := requests timestamp;
Multicast request <T, pi> to all processes;
Wait until (Number of replies received = (N 1));
State := HELD;
16
Analysis
(1)
(fairness
)
Each process pj is contained in M of the voting sets Vi
19
Maekawas algorithm
Example. Let there be seven processes 0, 1, 2, 3, 4, 5, 6 then the
optimal solution suggests the following set generation for each process
S0
S1
S2
S3
S4
S5
S6
=
=
=
=
=
=
=
{0, 1, 2}
{1, 3, 5}
{2, 4, 5}
{0, 3, 4}
{1, 4, 6}
{0, 5, 6}
{2, 3, 6}
Maekawas algorithm
Version 1 {Life of process I}
1. Send timestamped request to each process in Si.
2. Request received send ack to process with the
lowest timestamp. Thereafter, "lock" (i.e. commit)
yourself to that process, and keep others waiting.
3. Enter CS if you receive an ack from each member
in Si.
4. To exit CS, send release to every process in Si.
5. Release received unlock yourself. Then send
ack to the next process with the lowest timestamp.
S0 =
{0, 1, 2}
S1 =
{1, 3, 5}
S2 =
{2, 4, 5}
S3 =
{0, 3, 4}
S4 =
{1, 4, 6}
S5 =
{0, 5, 6}
S6 =
{2, 3, 6}
(2)
Initialization
state :=
RELEASED;
voted
:= FALSE;
For pi to enter the critical section
state := WANTED;
Multicast request to all processes in Vi {pi};
Wait until (number of replies received = K 1);
state := HELD;
23
(3)
voted := TRUE;
For pi to exit the critical section
state := RELEASED;
Multicast release to all processes Vi {pi};
24
(4)
25
Maekawas algorithm-version 1
ME1. At most one process can enter its critical
section at any time.
S0 =
{0, 1, 2}
S1 =
{1, 3, 5}
S2 =
{2, 4, 5}
S3 =
{0, 3, 4}
S4 =
{1, 4, 6}
S5 =
{0, 5, 6}
S6 =
{2, 3, 6}
Maekawas algorithm-version 1
ME2. No deadlock. Unfortunately deadlock is
possible! Assume 0, 1, 2 want to enter their
critical sections.
S0 =
{0, 1, 2}
S1 =
{1, 3, 5}
S2 =
{2, 4, 5}
S3 =
{0, 3, 4}
Now, 0 waits for 1 (to send a release), 1 waits for 2 (to send a
S4 =
{1, 4, 6}
S5 =
{0, 5, 6}
S6 =
{2, 3, 6}
is possible!
Maekawas algorithm-Version 2
Avoiding deadlock
If processes always receive messages in
increasing order of timestamp, then
deadlock could be avoided. But this is too
strong an assumption.
Version 2 uses three additional messages:
- failed
- inquire
- relinquish
S0 =
{0, 1, 2}
S1 =
{1, 3, 5}
S2 =
{2, 4, 5}
S3 =
{0, 3, 4}
S4 =
{1, 4, 6}
S5 =
{0, 5, 6}
S6 =
{2, 3, 6}
Maekawas algorithm-Version 2
New features in version 2
S0 =
{0, 1, 2}
S1 =
{1, 3, 5}
S2 =
{2, 4, 5}
S3 =
{0, 3, 4}
S4 =
{1, 4, 6}
S5 =
{0, 5, 6}
S6 =
{2, 3, 6}
Analysis
Fault Tolerance
Outline
Introduction
Distributed Mutual Exclusion
Election Algorithms
Group Communication
Consensus and Related Problems
32
Election Algorithms
(1)
33
Election Algorithms
34
(2)
Bully Algorithm
16
16
9
25
Process 5
starts
the election
25
3
25
35
(1)
(2)
Initialization
Participanti := FALSE;
Electedi := NIL
Pi starts an election
Participanti := TRUE;
Send the message <election, pi> to its
neighbor
Receipt of a message <elected, pj> at pi
Participanti := FALSE;
If pi pj
Then Send the message <elected, pj> to its neighbor
36
(3)
Analysis
Bully Algorithm
Characteristic:
election
Hypotheses:
(1)
Allows
processes
to
crash
during
an
Reliable transmission
Synchronous system
DelayTrans.
DelayTrans.
T = 2 DelayTrans. + DelayTrait.
39
DelayTrait.
Bully Algorithm
Hypotheses (contd):
(2)
40
Bully Algorithm
(3)
d
r
o
in
r
o
t
.
rd
or
7 New Coordinator
Co
o
n
io
ct
Coo
at
n
i
rd
o
C
41
C oo
at
rdti ion
Kc nat
or
O
e
l
o
E
r
Process 5
Election
OKat
ordin
Co
detects
itor
first
e
El
in
n.
od
ctoi r
C
OoK
Ele
rd
El
ec
El
tio
ec
n
ti
on
C
oo
Coordinator
Coordinatorfailed
Bully Algorithm
(4)
Initialization
Electedi := NIL
pi starts the election
Send the message (Election, pi) to pj , i.e., pj > pi
Waits until all messages (OK, pj) from pj are received;
If no message (OK, pj) arrives during T
Then Elected := pi;
Send the message (Coordinator, pi) to pj , i.e.,
p
j < pwaits
i
Else
until receipt of the message
(coordinator)
42
Bully Algorithm
(5)
43
It is impossible for two processes to decide that they are the coordinator
since the process with the lower identifier will discover that the other
exists and defer to it.
But the algorithm is not guaranteed to meet the safety property 1 if
processes that have crashed are replaced by processes with the same
identifier
A process that replaces a crashed process p may decide that it has the highest
identifier just as another process (which has detected ps crash) decides that it has
the highest identifier.
Two processes will therefore announce themselves as the coordinator concurrently
Since there are no guarantees on message delivery order, and the recipients of
these messages may reach different conclusions on which is the coordinator
process
Analysis
Bandwidth consumption
In the best case, the process with the secondhighest identifier notices the coordinators failure.
Then it can immediately elect itself and send N 2
coordinator messages
The bully algorithm requires O(N2) messages in
the worst case that is, when the process with the
lowest identifier first detects the coordinators
failure.
Consensus
Example
(1) Disseminate the updates to the nodes that have a copy of the service
state.
(2), apply the updates in the same order to each copy.
50
Link Failures
p2
Non-faulty
links
p1
p3
p5 c
51
p4
p2
Faulty
link
p1
p3
p5 c
p4
52
Crash Failures
p2
Non-faulty
processor
p1
p3
p5 c
53
p4
Faulty
processor
p2
p1
p5
p3
p4
54
Round
p1
p1
p1
p1
p1
p2
p2
p2
p2
p2
p3
p3
p3
p3
p3
p4
p4
p4
p4
p4
p5
p5
p5
p5
p5
Failure
Consensus Problem
56
Agreement
Start
Finish
Validity
If everybody starts with the same value,
then non-faulty must decide that value
Finish
Start
1
58
Each processor:
1.
2.
59
Start
0
60
Broadcast values
0,1,2,3,4
0
0,1,2,3,4
0,1,2,3,4
0,1,2,3,4
61
0,1,2,3,4
Decide on minimum
0,1,2,3,4
0
0,1,2,3,4
0,1,2,3,4
0,1,2,3,4
62
0,1,2,3,4
Finish
63
Finish
Start
1
Each processor:
65
1.
2.
Start
fail
0
0
1
Broadcasted values
fail
0
0,1,2,3,4
1,2,3,4
1,2,3,4
67
0,1,2,3,4
Decide on minimum
fail
0
0,1,2,3,4
1,2,3,4
1,2,3,4
68
0,1,2,3,4
Finish
fail
0
No Consensus!!!
69
70
An f-resilient algorithm
Round 1:
Broadcast my value
Round 2 to round f+1:
Broadcast any new received values
71
73
fail
0,1,2,3,4
1
1,2,3,4
(new values)
1,2,3,4
0,1,2,3,4
74
0,1,2,3,4
0,1,2,3,4
0,1,2,3,4
0,1,2,3,4
0,1,2,3,4
0,1,2,3,4
0,1,2,3,4
0,1,2,3,4
77
Failure 1
1,2,3,4
1
1,2,3,4
1,2,3,4
0,1,2,3,4
78
Failure 1
0,1,2,3,4
1,2,3,4
4
0
1,2,3,4
0,1,2,3,4
Failure 2
79
Failure 1
0,1,2,3,4
0,1,2,3,4
0,1,2,3,4
0,1,2,3,4
Failure 2
80
Failure 1
0,1,2,3,4
0,1,2,3,4
0,1,2,3,4
0,1,2,3,4
Failure 2
81
Example:
5 failures,
6 rounds
No failure
82
83
84
85
Byzantine Failures
a
Non-faulty
processor
p1
p3
p5 c
86
p2
p4
Byzantine Failures
Faulty
processor
p2
*!#
p1
p3
*!#
%&/
p5
p4
%&/
Processor sends arbitrary messages, plus some messages may be not sent
87
Round
p1
p1
p1
p1
p1
p1
p2
p2
p2
p2
p2
p2
p3
p3
p3
p3
p3
p3
p4
p4
p4
p4
p4
p4
p5
p5
p5
p5
p5
p5
Failure
Failure
89
Byzantine scenario
A timeline perspective
p
q
r
s
t
A timeline perspective
p
q
r
s
t
Outcomes?
Traitor double-votes
A timeline perspective
p
q
r
s
t
A timeline perspective
Attack!
!
Attack!
!
Damn! Theyre on to
me
Attack!
!
Attack!
!
We attack!
Story:
Reliable messages
Possible to show that no protocol can tolerate
f failures if N 3f
BGA Algorithm
Phase 2:
BGA
Algorithm
A Consensus Algorithm
The King algorithm
processors and
failures, where
n
f
4
Assumptions:
1. Number f must be known to processors;
2. Processor ids are in {1,,n}.
103
There are
f 1phases
104
Each processor
105
pi
Round 1, processor
Phase k
vi
Let
vi
abe the majority of received values (including
(in case of tie pick an arbitrary value)
vi a
Set
106
pk
Round 2, king
Phase k
vk
pi
Round 2, process
If
n
vihad majority of less than f 1
2
then set
107
vi vk
108
king 1
Faulty
109
king 2
Phase 1, Round 1
2,1,1,0,0,0
2,1,1,1,0,0
0
2,1,1,0,0,0
0
0
1
2,1,1,1,0,0
Everybody broadcasts
110
king 1
2,1,1,0,0,0
Phase 1, Round 1
Choose the majority
2,1,1,1,0,0
king 1
Each majority vote was
n
3 f 1 5
2
On round
2, everybody will choose the kings value
111
Phase 1, Round 2
0
2
1
king 1
The king broadcasts
112
Phase 1, Round 2
king 1
Phase 2, Round 1
2,1,1,0,0,0
2,1,1,1,0,0
0
2,1,1,0,0,0
0
0
1
2,1,1,1,0,0
Everybody broadcasts
114
2,1,1,0,0,0
king 2
Phase 2, Round 1
Choose the majority
king 2
1
2,1,1,1,0,0
Each majority vote was
n
3 f 1 5
2
On round
2, everybody will chose the kings value
115
Phase 2, Round 2
0
0
0
0
116
0
1
king 2
Phase 2, Round 2
king 2
0
Final decision
117
Case 2: No node has chosen its preferred value with strong majority
118
Case 1:
suppose node
has chosen its preferred value
n
with strong majority
votes)
( f 1
2
At the end of round 1, every other non-faulty node must have preferred
value
(including the king)
Explanation:
a
At least 2 1
non-faulty nodes must have broadcasted
start of round 1
n
119
at
At end of round 2:
If a node keeps its own value:
then decides
120
121
1:v
1:v
p2
2:1:v
1:x
1:w
p3
p2
3:1:u
Faulty processes are shown coloured
2:1:w
3:1:x
p3
BG Algorithm for N = 3f + 1
2 rounds
1. commander sends value to lieutenants
P2 decides
2. lieutenants send value to peers
Majority(u,v,w) =
P2 decides
Majority(v,v,u) = v
P4 decides
p1(Commander)
Majority(v,v,w) = v)
p1(Commander)
1:v
2:1:v
p2
1:v
1:v
1:u
p3
3:1:u
2:1:u
p2
1:v
1:w
p3
3:1:w
4:1:v 4:1:v
2:1:v
P3 decides
Majority(u,v,w) =
P4 decides
Majority(u,v,w) =
4:1:v 4:1:v
3:1:w
2:1:u
p4
3:1:w
p4
Faulty processes are shown coloured
A
125
p0
B
p2
p1
p2
p0 1
0 p3
0
p5
p4
Failures
Getting a Contradiction
p1
act like p3
to p4 in
p0
p1 and p2
must decide 0
0
0
p2
act like p0
to p5 in
127
Getting a Contradiction
p0 and p1
must decide 1
p1
act like p2
to p1 in
128
p0
p2
act like p5
to p0 in
The Contradiction
act like p1
to p0 in
p0
p1
?
act like p4
to p5 in
0
p2
What do p0 and
p2 decide?
Contradiction!
Views
A
130
p1
p2
p0
p1
?
1
0 p3
1
0
act like p1
to p0 in
C
1
p0
p2
p5
p4
act like p4
to p5 in
1:
A
p0
p1
B
act like p2
to p1 in
1
1
p2
act like p5
to p0 in