Вы находитесь на странице: 1из 18

TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581

❒ point-to-point ❒ full duplex data


❍ one sender, one receiver ❍ bi-directional data flow

❒ reliable, in-order byte in same connection


❍ MSS: maximum segment
steam
size (less than MTU of
❍ no “message boundaries” directly connected network)
❒ pipelined ❒ connection-oriented
❍ send window size ❍ handshaking initializes
determined by TCP sender, receiver state
congestion and flow before data exchange
control
❒ flow and congestion
❒ send & receive buffers control

3b: Transport Layer 1


2/4/05 (SSL)

TCP segment structure


32 bits
URG: urgent data counting bytes
(generally not used) source port # dest. port #
of data
sequence number (not segments!)
ACK: ACK #
valid acknowledgement number
head not
PSH: push data now len used
UA P R S F rcvr window size
(generally not used) # bytes
checksum ptr urgent data
rcvr willing
RST, SYN, FIN: to accept
Options (variable length)
connection estab
(setup, teardown
commands)
application
Internet data
checksum (variable length)
(as in UDP)

3b: Transport Layer 2


2/4/05 (SSL)

1
TCP seq. #’s and ACKs
Seq. #’s
Host A Host B
❍ sequence number of
first byte in User Seq=4
2, AC
types K=79,
segment’s data ‘C’
d ata =
‘C’
ACKs host ACKs
receipt of
❍ seq # of next byte ta = ‘C’ ‘C’, echoes
3, da
expected from other q=79, A
CK=4 back ‘C’
S e
side
❍ cumulative ACK host ACKs
Q: how receiver handles receipt Seq=4
of echoed 3, ACK
out-of-order segments? ‘C’
=80

TCP spec doesn’t say, up


to implementor
time
simple telnet scenario

3b: Transport Layer 3


2/4/05 (SSL)

TCP Round Trip Time and Timeout


Q: how to set TCP Q: how to estimate RTT?
timeout value? ❒ SampleRTT: measured time from
❒ longer than RTT segment transmission until ACK
❍ but RTT varies
receipt
❍ ignore retransmissions
❒ too short: premature
timeout ❒ SampleRTT will vary, want
❍ unnecessary
estimated RTT “smoother”
retransmissions ❍ average several recent

❒ too long: slow reaction


measurements, not just
to segment loss current SampleRTT

3b: Transport Layer 4


2/4/05 (SSL)

2
TCP Round Trip Time and Timeout
EstimatedRTT = (1- α)*EstimatedRTT + α*SampleRTT

❒ Exponential weighted moving average


❒ influence of past sample decreases exponentially fast
❒ typical value: α = 0.125

3b: Transport Layer 5


2/4/05 (SSL)

Example RTT estimation:


RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

350

300

250
RTT (milliseconds)

200

150

100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)

SampleRTT Estimated RTT

3b: Transport Layer 6


2/4/05 (SSL)

3
TCP Round Trip Time and Timeout
Setting the timeout
❒ EstimatedRTT plus “safety margin”
❍ large variation in EstimatedRTT -> larger safety margin
❒ first estimate how much SampleRTT deviates from
EstimatedRTT:

DevRTT = (1-β)*DevRTT +
β*|SampleRTT-EstimatedRTT|

(typically, β = 0.25)

Then set timeout interval:

TimeoutInterval = EstimatedRTT + 4*DevRTT

3b: Transport Layer 7


2/4/05 (SSL)

TCP reliable data transfer


❒ TCP creates rdt ❒ Retransmissions are
service on top of IP’s triggered by:
unreliable service ❍ timeout events
❒ Pipelined segments ❍ duplicate acks
❒ Cumulative acks ❒ Initially consider

❒ TCP uses single


simplified TCP sender:
ignore duplicate acks
retransmission timer ❍
❍ ignore flow control,
congestion control

3b: Transport Layer 8


2/4/05 (SSL)

4
Sliding Window Protocol
At the sender, a will be referred to as
SendBase, and s as NextSeqNum

Source: send window

P1
0 1 2 a–1 a s–1 s
Sender
acknowledged unacknowledged

next expected r + RW – 1
Sink: received
P2
0 1 2 r
Receiver
delivered receive window

RW receive window size


SW send window size (s - a ≤ SW)
3b: Transport Layer 9
2/4/05 (SSL)

TCP sender events:


data rcvd from app: timeout:
❒ Create segment with ❒ retransmit segment
seq # that caused timeout
❒ seq # is byte-stream
❒ restart timer
number of first data
byte in segment Ack rcvd:
❒ start timer if not ❒ If acknowledges
already running (think previously unacked
of timer as for oldest segments
unacked segment) ❍ update what is known to
❒ expiration interval: be acked
TimeOutInterval ❍ start timer if there are
outstanding segments

3b: Transport Layer 10


2/4/05 (SSL)

5
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum

loop (forever) { TCP


sender
switch(event)

event: data received from application above


create TCP segment with sequence number NextSeqNum (simplified)
if (timer currently not running)
start timer
pass segment to IP Comment:
NextSeqNum = NextSeqNum + length(data)
• SendBase-1: last
event: timer timeout cumulatively
retransmit not-yet-acknowledged segment with ack’ed byte
smallest sequence number Example:
start timer • SendBase = 72;
y= 73, so the rcvr
event: ACK received, with ACK field value of y wants 73 and up
if (y > SendBase) { • y > SendBase
SendBase = y
means new data
if (there are currently not-yet-acknowledged segments)
start timer
ack’ed
}

} /* end of loop forever */


3b: Transport Layer 11
2/4/05 (SSL)

TCP: retransmission scenarios


Host A Host B Host A Host B

Seq=9 Seq=9
2, 8 b 2, 8 b
yte s data ytes d
Seq= ata
Seq=92 timeout

100,
20 by
tes d
ata
timeout

=1 00
ACK 0
10
X K=
AC ACK=
120
loss
Seq=9 Sendbase Seq=9
2, 8 b 2, 8 b
yte s data = 100 yte s data
reset timer
Seq=92 timeout

SendBase
= 120 20
K=1
=100 Stop timer AC
ACK

SendBase
= 100 SendBase
= 120 premature timeout
time time
lost ACK scenario
3b: Transport Layer 12
2/4/05 (SSL)

6
TCP retransmission scenarios (more)
Host A Host B

Seq=9
2, 8 b
yte s data

=100
timeout
Seq=1 ACK
0 0, 20
bytes
data
X
loss

SendBase =120
ACK
= 120
Stop timer

time
Cumulative ACK scenario

3b: Transport Layer 13


2/4/05 (SSL)

TCP ACK generation [RFC 1122, RFC 2581]

Event at Receiver TCP Receiver action


Arrival of in-order segment with Delayed ACK. Wait up to 500ms
next expected seq #. All data up for next segment. If no next segment
to this seq # already ACKed within 500ms, send ACK.

Arrival of in-order segment with Immediately send single cumulative


next expected seq #. An earlier ACK, ACKing both in-order segments.
segment has ACK pending.

Arrival of out-of-order segment w. Immediately send duplicate ACK,


higher than next expected seq. # . indicating seq. # of next expected byte.
Gap detected.

Arrival of segment that Immediately send ACK, if segment


partially or completely fills gap starts at lower end of gap

3b: Transport Layer 14


2/4/05 (SSL)

7
Fast Retransmit
❒ Time-out period often ❒ If sender receives 3
relatively long: ACKs for the same
❍ long delay before data, it assumes that
resending lost packet segment after ACKed
❒ Detect lost segments data was lost:
via duplicate ACKs. ❍ fast retransmit: resend
❍ Sender often sends segment before timer
many segments back-to- expires
back
❍ If segment is lost,
there will likely be many
duplicate ACKs.

3b: Transport Layer 15


2/4/05 (SSL)

Fast retransmit algorithm:

event: ACK received, with ACK field value of y


if (y > SendBase) {
SendBase = y
if (there remains a not-yet-acknowledged segment)
start timer
}
else {
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3) {
resend segment with sequence number y
start timer for y
}

a duplicate ACK for fast retransmit


already ACKed segment

3b: Transport Layer 16


2/4/05 (SSL)

8
TCP Flow Control
flow control receiver: explicitly
sender won’t overrun informs sender of
receiver’s buffers by (dynamically changing)
transmitting too much, amount of free buffer
too fast space
❍ RcvWindow field in
TCP segment

sender: keeps amount of


transmitted, unACKed
data less than most
recently received
RcvWindow value
buffer at receive side of a TCP connection

3b: Transport Layer 17


2/4/05 (SSL)

TCP Connection Management


❒ initialize TCP variables Three way handshake
❍ seq. #s
❍ buffers, flow control Step 1: client end system
info (e.g. RcvWindow) sends TCP SYN control
segment to server (initial seq
number chosen at random)
Active participant Passive participant
(client) (server)
SYN,
Step 2: server end system
Seque
n ceNum receives SYN, replies with
=x
SYNACK control segment
y ,
Num = allocates buffers
uence

, Se q 1
ACK =x+ specifies server-to-
SYN + ement ❍
wledg
Ackno receiver initial seq. #
ACK,
Ackno
(chosen at random)
wledg
ement
=y+1 Step 3: client end system
replies with ack # (may be
piggybacked in data segment)
3b: Transport Layer 18
2/4/05 (SSL)

9
TCP Connection Management (cont.)

Closing a connection: client server

client closes socket close


FIN

Step 1: client end system


sends TCP FIN control ACK
close
message to server FIN

Step 2: server receives


FIN, replies with ACK.

timed wait
ACK
Then closes connection,
sends FIN.

closed

3b: Transport Layer 19


2/4/05 (SSL)

TCP Connection Management (cont.)

Step 3: client receives FIN, client server


replies with ACK.
closing
❍ Enters “timed wait” -- FIN

will respond with ACK


to retransmitted FINs
(due to lost ACKs) ACK
closing
FIN
Step 4: server receives
ACK. Connection closed.
timed wait

ACK
Note: protocol spec allows
simultaneous FINs closed

closed

3b: Transport Layer 20


2/4/05 (SSL)

10
TCP Connection Management (cont)

TCP server
lifecycle

TCP client
lifecycle

3b: Transport Layer 21


2/4/05 (SSL)

Principles of Congestion Control

Congestion:
❒ informally: “too many sources sending too much
data too fast for network to handle”
❒ different from flow control
❒ manifestations:
❍ lost packets (buffer overflow at routers)
❍ long delays (queueing in router buffers)
❒ a top-10 problem!

3b: Transport Layer 22


2/4/05 (SSL)

11
Causes/costs of congestion

λ’in denotes arrival rate of original packets and retransmitted packets.


cost of congestion:
❒ When packet dropped, any upstream transmission capacity used
for that packet was wasted
❒ behavior on right side of above graph called congestion collapse

3b: Transport Layer 23


2/4/05 (SSL)

Approaches towards congestion control

End-to-end congestion Network-assisted


control: congestion control:
❒ no explicit feedback from ❒ routers provide feedback
network to end systems
❒ congestion inferred from ❍ single bit indicating
end-system’s observed loss congestion (SNA,
and/or delay DECbit, TCP/IP ECN,
❒ approach taken by TCP ATM)
❍ explicit sending rate
for sender

3b: Transport Layer 24


2/4/05 (SSL)

12
TCP Congestion Control
❒ end-to-end control (no network How does sender
assistance) determine CongWin?
❒ sender limits transmission: ❒ loss event = timeout or
LastByteSent-LastByteAcked 3 duplicate acks
≤ CongWin ❒ TCP sender reduces
❒ Roughly, CongWin after loss
CongWin event
send rate ≤ bytes/sec
RTT three mechanisms:
slow start
where CongWin is in bytes ❍
❍ AIMD
❍ reduce to 1 segment
after timeout event
(Note: Consider RcvWindow size to be very large
such that send window size is equal to CongWin)
3b: Transport Layer 25
2/4/05 (SSL)

TCP Slow Start


❒ Probing for usable bandwidth

❒ When connection begins, CongWin = 1 MSS


❍ Example: MSS = 500 bytes & RTT = 200 msec
❍ initial rate = 20 kbps

❒ available bandwidth may be >> MSS/RTT


❍ desirable to quickly ramp up to a higher rate

3b: Transport Layer 26


2/4/05 (SSL)

13
TCP Slow Start (more)
❒ When connection Host A Host B
begins, increase rate
exponentially until one segm
ent
first loss event or

RTT
“threshold” two segm
ents
❍ double CongWin every
RTT
❍ done by incrementing
four segm
CongWin by 1 MSS for ents
every ACK received
❒ Summary: initial rate
is slow but ramps up
exponentially fast time

3b: Transport Layer 27


2/4/05 (SSL)

Refinement (more)
Q: If no loss, when
should the exponential
increase switch to 14
TCP
linear?
congestion window size

12 Reno
A: When CongWin gets 10
(segments)

to current value of 8
threshold 6
4 threshold
2 TCP
Tahoe
Implementation: 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
❒ Variable threshold
Transmission round
❒ At loss event, threshold is set
to 1/2 of CongWin just Series1 Series2
before loss event
(Note: For simplicity, CongWin is in
❒ For initial slow start,
number of segments in the above graph.)
threshold is set to a very
large value (e.g., 65 Kbytes)
3b: Transport Layer 28
2/4/05 (SSL)

14
Rationale for Reno’s Fast Recovery
❒ After 3 dup ACKs:

❒ 3 dup ACKs indicates ❍CongWin is cut in half


❍ window then grows linearly
network capable of
❒ But after timeout event:
delivering some segments
❍ CongWin is set to 1 MSS
❒ timeout occurring instead;
before 3 dup ACKs is ❍ window then grows
“more alarming” exponentially
❍ to a threshold, then grows
linearly

3b: Transport Layer 29


2/4/05 (SSL)

AIMD in steady state


additive increase: multiplicative decrease:
increase CongWin by cut CongWin in half
1 MSS every RTT in after loss event (3 dup
the absence of any acks)
loss event: probing
congestion
window

24 Kbytes

16 Kbytes

8 Kbytes

time
Long-lived TCP connection
3b: Transport Layer 30
2/4/05 (SSL)

15
TCP sender congestion control
Event State TCP Sender Action Comments
ACK receipt Slow Start CongWin = CongWin + MSS, Resulting in a doubling of
for previously (SS) If (CongWin > Threshold) CongWin every RTT
unacked set state to “Congestion
data Avoidance”
ACK receipt Congestion CongWin = CongWin+ Additive increase, resulting
for previously Avoidance MSS * (MSS/CongWin) in increase of CongWin by
unacked (CA) 1 MSS every RTT
data
Loss event SS or CA Threshold = CongWin/2, Fast recovery,
detected by CongWin = Threshold, implementing multiplicative
triple Set state to “Congestion decrease. CongWin will not
duplicate Avoidance” drop below 1 MSS.
ACK
Timeout SS or CA Threshold = CongWin/2, Enter slow start
CongWin = 1 MSS,
Set state to “Slow Start”
Duplicate SS or CA Increment duplicate ACK count CongWin and Threshold not
ACK for segment being acked changed

3b: Transport Layer 31


2/4/05 (SSL)

TCP Throughput limited by loss


rate
❒ TCP average throughput (approximate) in
terms of loss rate, L:
1.22 ⋅ MSS
RTT L
❒ Example: 1500-byte segments, 100ms RTT,
to get 10 Gbps throughput, loss rate needs
to be very low
L = 2·10-10 Å 2·10-8 (correction)

❒ New version of TCP needed for high-speed


appl
3b: Transport Layer 32
2/4/05 (SSL)

16
TCP Fairness
Fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K (AIMD only provides convergence
to same window size, not necessarily same throughput rate)

TCP connection 1

bottleneck
TCP
router
connection 2
capacity R

3b: Transport Layer 33


2/4/05 (SSL)

Why is TCP fair?


Two competing sessions:
❒ Additive increase gives slope of 1, as window size increases
❒ multiplicative decrease reduces window size proportionally

R equal window size


Connection 2 window size

loss: decrease window by factor of 2


congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase

Connection 1 window size R

3b: Transport Layer 34


2/4/05 (SSL)

17
Fairness (more)
Fairness and UDP Fairness and parallel TCP
❒ Multimedia apps often
connections
do not use TCP ❒ nothing prevents app from
❍ do not want rate opening parallel connections
throttled by congestion between 2 hosts.
control ❒ Web browsers do this
❒ Instead use UDP: ❒ Example: link of rate R
❍ pump audio/video at supporting 9 connections
constant rate, tolerate
packet loss ❍ new app asks for 1 TCP, gets
rate R/10
❒ Research area: TCP ❍ new app asks for 11 TCPs, gets
friendly congestion more than R/2 !
control

3b: Transport Layer 35


2/4/05 (SSL)

18

Вам также может понравиться