You are on page 1of 96

Computer Networks 2IC15 Transport Layer

Dr. Tanr zelebi


Thanks to J.F. Kurose & K.W. Ross

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

Transport Layer
Our goals: understand principles behind transport layer services such as:
process-to-process delivery reliable data transfer flow control congestion control

learn about transport layer protocols in the Internet:


UDP: connectionless transport TCP: connection-oriented transport

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

Outline
3.1 Transport-layer services 3.2 Process Addressing: Mux / Demux 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP 3.6 Principles of congestion control 3.7 TCP congestion control

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

Transport services and protocols


Transport services Transport protocols run in end systems send side: breaks app messages into segments, passes to network layer rcv side: reassembles segments into messages, passes to app layer Possible to have more than one transport protocol available to apps Internet: TCP and UDP Q: Why cant we just use IP packets?
application transport network data link physical

application transport network data link physical

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

Transport vs. network layer


Network layer
logical communication between hosts

Transport layer
logical communication between processes relies on & adds to network layer services

Some services that are not available from the network layer may be provided by the transport layer!

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

Analogy
Two houses, each with a dozen of kids, are located in Eindhoven and Amsterdam. All kids in different houses are relatives (cousins) Each kid weekly writes a letter to every other kid in the other house
144 letters/week in total (huge cost)

In each house, there is one kid responsible for mail collection and mail distribution
Each week they give all the letters to a postal-service mail carrier who makes daily visits to the house. They also collect mails from the mailbox.

What corresponds to what?


Application message = letter in an envelope Processes = cousins Hosts (end systems) = houses Transport-layer protocol = The two kids responsible for mail collection/distribution Network-layer protocol = postal service (including mail carriers)

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

E.g. Internet transport-layer protocols


reliable, in-order process-to-process delivery (TCP)
congestion control flow control limited security connection setup
application transport network data link physical

network data link physical

network data link physical

unreliable, unordered process-toprocess delivery (UDP)


adds very little to best-effort IP error detection

network data link physicalnetwork data link physical network data link physical application transport network data link physical

services not available:


delay guarantees bandwidth guarantees

network data link physical

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

Outline
3.1 Transport-layer services 3.2 Process Addressing: Mux / Demux 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP 3.6 Principles of congestion control 3.7 TCP congestion control

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

Process-to-process addressing (Multiplexing / Demultiplexing)


Multiplexing at send host:
getting data from multiple sockets, enveloping data with headers (later used for demux)

Demultiplexing at rcv host:


delivering received segments to the correct socket

= socket application transport network link physical P3

= process P1 P1 application transport network link physical P2 P4 application transport network link physical

host 1
01/03/2012

host 2
Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

host 3
9

Socket/port addressing
Port number
needed to choose among multiple processes running on the host The Internet model: 16 bit integer 0 65 535 Client mostly chooses ephemeral (temporary) port numbers Server mostly uses well-known (permanent) port numbers

ephemeral port number

well-known port number

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

10

Port addressing (cntd)

Internet Assigned Number Authority


Well known ports administrative privileges may be required in OS to assign a socket to these port numbers Registered ports mostly assigned to distinct applications Dynamic ports not publicly and permanently assigned to applications

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

11

IP addressing versus port addressing


destination host needs IP address & port numbers to direct segment to appropriate socket each datagram has source IP address, destination IP address each datagram carries 1 transport-layer segment each segment has source and destination port numbers 2 identifiers IP address Port number

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

12

E.g. TCP/UDP segment format


32 bits source port # dest port #

other header fields

application data (message)

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

13

Connectionless (UDP) demux


Client creates UDP socket
identified by: (dest IP address, dest port number)

Client sends its data to UDP stack through the socket When host receives UDP segment:
checks destination port number in segment directs UDP segment to socket with that port number

IP datagrams with different source IP addresses and/or source port numbers directed to same socket

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

14

Connection-oriented (TCP) demux


TCP socket identified by 4-tuple:
source IP address source port number dest IP address dest port number

Receiver uses all four values to direct segment to appropriate socket Server supports many simultaneous TCP sockets at the same port #
each socket identified by its own 4-tuple

E.g. web servers have different sockets for each connecting client
Note: fixed server port number: 80 Note: non-persistent HTTP will have different socket for each request

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

15

Connection-oriented demux (cont)

P1

P4

P5

P6 SP: 5775 DP: 80 S-IP: B D-IP:C

P2

P1 P3

SP: 9157

SP: 9157

client IP: A

DP: 80 S-IP: A D-IP:C

Web server IP: C

DP: 80 S-IP: B D-IP:C

Client
IP:B

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

16

Connection-oriented demux: Threaded Web Server


P1 P4 SP: 5775 DP: 80 S-IP: B D-IP:C SP: 9157 SP: 9157 P2 P1 P3

client IP: A

DP: 80 S-IP: A D-IP:C

Web server IP: C

DP: 80 S-IP: B D-IP:C

Client
IP:B

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

17

Outline
3.1 Transport-layer services 3.2 Process Addressing: Mux / Demux 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP 3.6 Principles of congestion control 3.7 TCP congestion control

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

18

Internet transport-layer protocols


unreliable, unordered delivery (UDP)
adds little to best-effort IP

reliable, in-order delivery (TCP)


congestion control flow control connection setup

services not available:


delay guarantees bandwidth guarantees

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

19

User Datagram Protocol


UDP unreliable connectionless transport protocol Why would anybody need this?
required for mux / demux small overhead (small headers suitable for short messages) simple: no connection establishment or state, no flow or error cont. no congestion control: UDP can blast away as fast as desired

Applications:
simple request-response communication in processes with internal flow & error control non-critical & periodical tasks (e.g. updating routing information) in conjunction with higher layer protocols for real-time data
Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

01/03/2012

20

UDP header

source port number from 0 to 65535 destination port number from 0 to 65535 length the total length of the user datagram (header + data 8 bytes) checksum detect errors over the entire datagram
optional (if not calculated filled with 0s)

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

21

UDP checksum
Goal: detect errors (e.g., flipped bits) in transmitted segment
Sender: treat segment contents as sequence of 16-bit integers checksum: addition (1s complement sum) of segment contents put checksum value into UDP checksum field Receiver: compute checksum of received segment check if computed checksum equals checksum field value:
NO: error detected YES: no error detected. But maybe errors nonetheless? More later .

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

22

Internet Checksum Example


Note When adding numbers, a carryout from the most significant bit needs to be added to the result Example: add two 16-bit integers 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

23

IP / UDP / TCP
QUESTION: Can an application using UDP be reliable? QUESTION: What would be the difference from TCP? Whats the catch? QUESTION: Are both UDP & IP unreliable to the same degree? Why? (we will discuss it when we come to network layer)

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

24

Outline
3.1 Transport-layer services 3.2 Process Addressing: Mux / Demux 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP 3.6 Principles of congestion control 3.7 TCP congestion control

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

25

Reliable link vs end-to-end reliability


We may need reliable transport service even if the data link is reliable.

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

26

Principles of reliable data transfer


Characteristics of unreliable channel determines the complexity of reliable data transfer protocol (rdt)

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

27

Reliable data transfer: getting started


rdt_send(): called from above, (e.g., by app.) to send data (to be delivered to receiver upper layer) deliver_data(): called by rdt to deliver data to upper layer

send side

receive side

udt_send(): called by rdt, to transfer packet over unreliable channel to receiver


01/03/2012

rdt_rcv(): called when packet arrives on rcv-side of channel


28

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

Reliable data transfer: getting started


We will: incrementally develop sender, receiver sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer
but control info will flow in both directions!

use finite state machines (FSM) to specify sender, receiver


event causing state transition actions taken on state transition
state: when in this state next state uniquely determined by next event

state 1

event actions

state 2

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

29

rdt1.0: reliable transfer over a reliable channel


underlying channel perfectly reliable
no bit errors no loss of packets

separate FSMs for sender & receiver

Wait for call from above

rdt_send(data) packet = make_pkt(data) udt_send(packet)

Wait for call from below

rdt_rcv(packet) extract (packet,data) deliver_data(data)

sender

receiver

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

30

rdt2.0: channel with bit errors


underlying channel may flip bits in packet
checksum to detect bit errors

the question: how to recover from errors:


acknowledgements (ACK): receiver explicitly tells sender that pkt received OK negative acknowledgements (NAK): receiver explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt2.0 (beyond rdt1.0):


error detection receiver feedback: control msgs (ACK,NAK) rcvr->sender
Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

01/03/2012

31

rdt2.0: FSM specification


rdt_send(data) sndpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && isNAK(rcvpkt) Wait for Wait for call from ACK or udt_send(sndpkt) above NAK rdt_rcv(rcvpkt) && isACK(rcvpkt) rdt_rcv(rcvpkt) && corrupt(rcvpkt) udt_send(NAK) Wait for call from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data) deliver_data(data) udt_send(ACK)

sender Next: How does it work?


01/03/2012
Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

receiver

32

rdt2.0: operation with no errors


rdt_send(data) snkpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && isNAK(rcvpkt) Wait for Wait for call from ACK or udt_send(sndpkt) above NAK rdt_rcv(rcvpkt) && isACK(rcvpkt)

receiver
rdt_rcv(rcvpkt) && corrupt(rcvpkt) udt_send(NAK) Wait for call from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data) deliver_data(data) udt_send(ACK)

sender

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

33

rdt2.0: error scenario


rdt_send(data) snkpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && isNAK(rcvpkt) Wait for Wait for call from ACK or udt_send(sndpkt) above NAK rdt_rcv(rcvpkt) && isACK(rcvpkt)

receiver
rdt_rcv(rcvpkt) && corrupt(rcvpkt) udt_send(NAK) Wait for call from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data) deliver_data(data) udt_send(ACK)
34

sender

rdt2.0 has a fatal flaw!


01/03/2012
Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

rdt2.0 has a fatal flaw!


What happens if ACK/NAK messages are corrupted?
sender doesnt know what happened at receiver! cant just retransmit: possible duplicate

Handling duplicates:
sender adds sequence number to each pkt sender retransmits current pkt if ACK/NAK garbled receiver discards (doesnt deliver up) duplicate pkt

stop and wait Sender sends one packet, then waits for receiver response

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

35

rdt2.1: sender, handles garbled ACK/NAKs


rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) &&
Wait for call 0 from above Wait for ACK or NAK 0

( corrupt(rcvpkt) || isNAK(rcvpkt) ) udt_send(sndpkt) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt)

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt)

rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isNAK(rcvpkt) ) udt_send(sndpkt)

Wait for ACK or NAK 1

Wait for call 1 from above

rdt_send(data) sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt)


Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

01/03/2012

36

rdt2.1: receiver, handles garbled ACK/NAKs


rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq0(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && corrupt(rcvpkt) sndpkt = make_pkt(NAK, chksum) udt_send(sndpkt) Wait for 0 from below Wait for 1 from below rdt_rcv(rcvpkt) && not corrupt(rcvpkt) && has_seq0(rcvpkt) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt)

rdt_rcv(rcvpkt) && corrupt(rcvpkt) sndpkt = make_pkt(NAK, chksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && not corrupt(rcvpkt) && has_seq1(rcvpkt) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt)

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt)

extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) zelebi, t.ozcelebi@tue.nl Tanr 01/03/2012
TU/e Computer Science, System Architecture and Networking

37

rdt2.1: discussion
Sender: seq # added to pkt two seq. #s (0,1) will suffice. Why? must check if received ACK/NAK corrupted twice as many states
state must remember whether current pkt has 0 or 1 seq. #

Receiver: must check if received packet is duplicate


state indicates whether 0 or 1 is expected pkt seq #

note: receiver can not know if its last ACK/NAK is received OK by the sender

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

38

rdt2.2: a NAK-free protocol


same functionality as rdt2.1, using ACKs only instead of NAK, receiver sends ACK for last pkt received OK
receiver must explicitly include seq # of pkt being ACKed

duplicate ACK at sender results in same action as NAK: retransmit current pkt

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

39

rdt2.2: sender, receiver fragments


rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || Wait for Wait for isACK(rcvpkt,1) ) ACK call 0 from 0 udt_send(sndpkt) above

sender FSM fragment

rdt_rcv(rcvpkt) && (corrupt(rcvpkt) || has_seq1(rcvpkt)) udt_send(sndpkt)

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0)

Wait for 0 from below

receiver FSM fragment

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK1, chksum) Tanr zelebi, t.ozcelebi@tue.nl udt_send(sndpkt) TU/e Computer Science, System Architecture and Networking

01/03/2012

40

rdt3.0: channels with errors and loss


New assumption:

Approach:
sender waits reasonable amount of time for ACK retransmits if no ACK received in this time if pkt (or ACK) just delayed (not lost): retransmission will be duplicate, but use of seq. #s already handles this receiver must specify seq # of pkt being ACKed requires countdown timer

underlying channel can also lose packets (data or ACKs) checksum, seq. #, ACKs, retransmissions will be of help, but not enough network connection between the sender and the receiver cannot reorder messages.

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

41

rdt3.0 sender
rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) start_timer Wait for call 0from above Wait for ACK0 rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isACK(rcvpkt,1) ) rdt_rcv(rcvpkt)

timeout udt_send(sndpkt) start_timer rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0) stop_timer

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,1) stop_timer timeout udt_send(sndpkt) start_timer rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isACK(rcvpkt,0) ) Wait for ACK1 rdt_send(data)

Wait for call 1 from above

rdt_rcv(rcvpkt)

01/03/2012

sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt) start_timer

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

42

rdt3.0 in action

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

43

rdt3.0 in action

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

44

rdt3.0: stop-and-wait operation


sender first packet bit transmitted, t = 0 last packet bit transmitted, t = L / R first packet bit arrives last packet bit arrives, send ACK receiver

RTT

ACK arrives, send next packet, t = RTT + L / R

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

45

Performance of rdt3.0
rdt3.0 works, but its performance stinks. Example:
1 Gbps link, 15 ms propagation delay, 8000 bit packet

d trans

L 8000bits = = = 8 microseconds 9 R 10 bps

Usender: utilization (fraction of time sender is busy sending)


U = sender L/R RTT + L / R = .008
30.008

= 0.00027
microsec

1kByte pkt every 30 msec -> 33kByte/sec throughput over 1 Gbps link network protocol limits use of physical resources!
Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

01/03/2012

46

Pipelined protocols
Pipelining: sender allows multiple, in-flight, to-be-acknowledged pkts
range of sequence numbers must be increased buffering at sender and/or receiver

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

47

Pipelining increases utilization


sender first packet bit transmitted, t = 0 last bit transmitted, t = L / R first packet bit arrives last packet bit arrives, send ACK last bit of 2nd packet arrives, send ACK last bit of 3rd packet arrives, send ACK receiver

RTT

ACK arrives, send next packet, t = RTT + L / R

Increase utilization by a factor of 3!

U
01/03/2012

sender

3*L/R RTT + L / R

.024
30.008

= 0.0008
microsecon
48

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

Pipelining Protocols
Go-back-N Sender can have up to N unACKed packets in pipeline Rcvr only sends cumulative ACKs
Doesnt ACK packet if theres a gap

Selective Repeat Sender can have up to N unACKed packets in pipeline Rcvr ACKs individual packets Sender maintains timer for each unACKed packet
When timer expires, retransmit only unACKed packet

Sender has timer for oldest unACKed packet


If timer expires, retransmit all unACKed packets

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

49

Go-Back-N
Sender:
k-bit seq # in pkt header window of up to N, consecutive unacked pkts allowed

ACK(n): ACKs all pkts up to and including seq # n - cumulative ACK Sender may receive duplicate ACKs (see receiver) timer for each in-flight pkt OR a single timer for all in-flight packets See an implementation with a single timer in the following slides timeout(n): retransmit pkt n and all higher seq # pkts in window
Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

01/03/2012

50

GBN: sender FSM


rdt_send(data) if (nextseqnum < base+N) { sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ } else refuse_data(data) timeout start_timer udt_send(sndpkt[base]) udt_send(sndpkt[base+1]) udt_send(sndpkt[nextseqnum-1])

base=1 nextseqnum=1

Wait
rdt_rcv(rcvpkt) && corrupt(rcvpkt)

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) base = getacknum(rcvpkt) If (base == nextseqnum) stop_timer else Tanr zelebi, t.ozcelebi@tue.nl start_timer Architecture and Networking TU/e Computer Science, System

01/03/2012

51

GBN: receiver FSM


default udt_send(sndpkt) rdt_rcv(rcvpkt) && notcurrupt(rcvpkt) && hasseqnum(rcvpkt,expectedseqnum) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(expectedseqnum,ACK,chksum) udt_send(sndpkt) expectedseqnum++

expectedseqnum=1 sndpkt = make_pkt(expectedseqnum,ACK,chksum)

Wait

ACK-only: always send ACK for correctly-received pkt with highest inorder seq #
may generate duplicate ACKs need only to remember expectedseqnum

out-of-order pkt:
discard (dont buffer) -> no receiver buffering! Re-ACK pkt with highest in-order seq #
01/03/2012
Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

52

GBN in action
Window Size = 4 packets

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

53

Selective repeat
receiver individually acknowledges all correctly received pkts
buffers pkts, as needed, for eventual in-order delivery to upper layer

sender only resends pkts for which ACK not received


sender timer for each unACKed pkt

sender window
at most N sent but unACKed pkts are allowed

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

54

Selective repeat: sender, receiver windows

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

55

Selective repeat
sender
data from above :
if next available seq # in window, send pkt resend pkt n, restart timer

receiver
pkt n in [rcvbase, rcvbase+N-1]
send ACK(n) out-of-order: buffer in-order: deliver (also deliver buffered, in-order pkts), advance window to next not-yetreceived pkt ACK(n) ignore

timeout(n):

ACK(n) in [sendbase,sendbase+N]:
mark pkt n as received if n is smallest unACKed pkt seq. num, advance window base to next unACKed seq #

pkt n in [rcvbase-N,rcvbase-1]

otherwise:

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

56

Selective repeat in action

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

57

Selective repeat dilemma


Example:
seq #s: 0, 1, 2, 3 window size=3 receiver sees no difference in two scenarios! incorrectly passes duplicate data as new in (a)

Q: what relationship between seq # size and window size?


Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

01/03/2012

58

Outline
3.1 Transport-layer services 3.2 Process Addressing: Mux / Demux 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP 3.6 Principles of congestion control 3.7 TCP congestion control

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

59

Transport Control Protocol


point-to-point
one sender, one receiver

full duplex
bi-directional data flow in same connection

reliable, in-order delivery pipelined


TCP congestion & flow control set the window size

connection-oriented
initialize sender, receiver state before data exchange
seq. #s, buffers, flow control info (e.g. RcvWindow)
application reads data socket door

send & receive buffers


application socket door TCP send buffer
segment

writes data

TCP receive buffer

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

60

URG: urgent data (generally not used) ACK: ACK # valid Header length in 32-bit words PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP)
01/03/2012

TCP segment format


32 bits

source port #

dest port #

sequence number acknowledgement number


head not UA P R S F len used

counting by bytes of data (not segments!)

Receive window Urg data pnter

checksum

Options (variable length)

# bytes rcvr willing to accept

application data (variable length)

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

61

TCP Connection Management


Connection Establishment: Three way handshake
Step 1: client host sends TCP SYN segment to server specifies initial seq # no data Step 2: server host receives SYN, replies with SYNACK segment server allocates buffers specifies server initial seq. # Step 3: client receives SYNACK, replies with ACK segment, which may contain data

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

62

TCP Connection Management (cont.)


Connection Termination
Step 1: client end system sends TCP FIN control segment to server Step 2: server receives FIN, replies with ACK. Sends FIN.

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

63

TCP Connection Management (cont.)


Step 3: client receives FIN, replies with ACK.
Enters time wait - will respond with ACK to received FINs

Step 4: server, receives ACK. Connection closed.

Time Wait
01/03/2012
Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

64

TCP timers

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

65

Retransmission Timer: TCP RTT


Q: How to set TCP timeout value?
longer than RTT but RTT varies too short: premature timeout unnecessary retransmissions too long: slow reaction to segment loss

Q: How to estimate RTT?


SampleRTT: measured time from segment transmission until ACK receipt ignore retransmissions SampleRTT will vary, want estimated RTT smoother average several recent measurements

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

66

Retransmission Timer: TCP RTT (cont)


EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
exponential weighted moving average influence of past sample decreases exponentially fast typical value: = 0.125

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

67

Example RTT estimation:


from gaia.cs.umass.edu to fantasia.eurecom.fr

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

68

Retransmission Timer: TCP RTT (cont)


Setting the retransmission timeout
EstimatedRTT plus safety margin
large variation in EstimatedRTT -> larger safety margin

First estimate of how much SampleRTT deviates from EstimatedRTT: DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT| (typically, = 0.25) Then set timeout interval:
TimeoutInterval = EstimatedRTT + 4*DevRTT TimeoutInterval = 2*EstimatedRTT
Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

[Jacobson 1988] [RFC 793]

01/03/2012

69

Persistence timer
To deal with the zero-size windows What if the receiver advertises that the window size is 0 (by sending ACK) and this ACK is lost? Start persistence timer
when this goes off send a probe (1 byte of data) it is set to the retransmission time & doubled every time a response is not received (until 60 sec, then sent every 60 sec)

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

70

Keep Alive timer


to prevent a long idle connection between a client and a server
either client or server may crash

Sender transmits a 1 byte (plus headers) probe in case of timeout usually set to 2h

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

71

Time-Wait Timer
used during connection termination
let the TCP client resend the FIN ACK in case it is lost allow duplicate FIN segments to be discarded enable other connections to use this TCP socket ID Source IP, Destn IP, Source Port, Destn Port

usually 2 times the maximum segment lifetime


MSL: 2 mins [RFC 793]

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

72

TCP reliable data transfer


detect corrupted segments; lost segments; out-of-order segments & duplicated segments retransmissions triggered by
timeout events duplicate ACKs

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

73

Error control in TCP


Three tools:
1. checksum 2. acknowledgment no NACKs 3. timeout (2 options): separate timeout counters for each segment transmitted single timeout counter for all in-flight segments [RFC2988] less costly used in modern TCP implementations (We will assume the latter as well.)

Initially lets consider a simplified TCP sender:



01/03/2012

ignore duplicate acks ignore flow control, congestion control


Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

74

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum loop (forever) { switch(event) event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data) event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer else stop timer Tanr zelebi, t.ozcelebi@tue.nl } 01/03/2012 TU/e Computer Science, System Architecture and Networking } /* end of loop forever */

TCP sender
(simplified)
Comment: SendBase-1: last cumulatively acked byte

75

TCP: retransmission scenarios


Host A Host B Host A Host B
Seq=92 timeout

timeout

loss
Sendbase= 100 SendBase= 120

SendBase = 100

SendBase= 120

time
01/03/2012

time lost ACK scenario


Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

Seq=92 timeout

premature timeout
76

TCP retransmission scenarios (more)


Host A Host B

timeout

loss
Sendbase= 120

time Cumulative ACK scenario


01/03/2012
Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

77

TCP ACK generation


Event at Receiver
Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed Arrival of in-order segment with expected seq #. One other segment has ACK pending Arrival of out-of-order segment higher-than-expected seq. # . Gap detected Arrival of segment that partially or completely fills gap

[RFC 1122, RFC 2581]

TCP Receiver action


Delayed ACK. Wait up to 500ms for next segment. If no next segment, send ACK Immediately send single cumulative ACK, ACKing both in-order segments Immediately send duplicate ACK, indicating seq. # of next expected byte Immediately send ACK, provided that segment starts at lower end of gap

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

78

Fast Retransmit
Time-out period often relatively long: If sender receives 3 ACKs for the same data, it assumes that segment after ACKed long delay before resending lost packet data was lost:
fast retransmit: resend segment before timer expires

Detect lost segments via duplicate ACKs.


Sender often sends many segments back-to-back If segment is lost, it is likely that there will be many duplicate ACKs.

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

79

Host A

Host B

time Resending a segment after triple duplicate ACKs


01/03/2012
Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

timeout

80

Fast retransmit algorithm:


event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y }

a duplicate ACK for already ACKed segment


01/03/2012

fast retransmit

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

81

TCP Flow Control


Speed-matching service: matching the send rate to the receiving apps drain rate Receiving side has a receive buffer and app process may be slow at reading from this buffer.

flow control
sender wont overflow receivers buffer by transmitting too much, too fast

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

82

TCP Flow control: how it works


Rcvr advertises spare room by including value of RcvWindow in segments Sender limits unACKed data to RcvWindow
guarantees receive buffer doesnt overflow

(Suppose TCP receiver discards out-of-order segments) spare room in buffer


= RcvWindow = RcvBuffer-[LastByteRcvd - LastByteRead]
Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

01/03/2012

83

Outline
3.1 Transport-layer services 3.2 Process Addressing: Mux / Demux 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP 3.6 Principles of congestion control 3.7 TCP congestion control

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

84

Congestion
Congestion:
Appears if the load on the network is greater than the capacity of the network
load: the number of packets sent to the network capacity: the number of packets a network can handle

Indicators
lost packets (buffer overflow at routers) long delays (queuing in router buffers)

Congestion control:
keep the load below capacity
different from flow control!
Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

01/03/2012

85

Causes/costs of congestion: scenario 1


Host A

in : original data

out

Host B

unlimited shared output link buffers

2 senders, 2 receivers one router with infinite buffers no retransmissions

COST: large delays

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

86

Causes/costs of congestion: scenario 2


one router, finite buffers sender retransmission of lost packet
Host A
in : original data 'in : original data, plus retransmitted data

out

Host B

finite shared output link buffers

COST: more work (due to retransmissions) for a given goodput unneeded retransmissions due to delay: link carries multiple copies of pkt
01/03/2012
Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

87

Causes/costs of congestion: scenario 3


4 senders, multiple routers (multihop), finite buffers timeouts / retransmits Host A
in : original data 'in : original data, plus retransmitted data

out

Host B

finite shared output link buffers

Host D Host C

COST: when packet dropped, any upstream transmission capacity used for that packet was wasted!
01/03/2012
Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

88

Network performance
throughput versus network load
Throughput: number of bits passing through the network per second

packets discarded and retransmitted

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

89

Network performance (cntd)


delay versus network load

+ queuing delay transmission delay + processing delay + propagation delay

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

90

Approaches towards congestion control


End-to-end congestion control:
no explicit feedback from network congestion inferred from end-system observed loss, delay approach taken by TCP

Network-assisted congestion control:


routers provide feedback to end systems single bit indicating congestion explicit rate sender should send at approach taken by ATM ABR

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

91

Outline
3.1 Transport-layer services 3.2 Process Addressing: Mux / Demux 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP 3.6 Principles of congestion control 3.7 TCP congestion control

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

92

Congestion control in TCP


TCP assumes that the cause of a timeout is congestion Retransmission of the lost packets does not solve congestion problem it worsens the situation! In flow control, sender window size is determined by the receiver window no information about the network congestion If the network cannot deliver data to the receiver due to congestion, the sender has to slow down. Congestion window: min (receiver window size, congestion window size)
Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

01/03/2012

93

Congestion avoidance in TCP


1. Slow Start (SS) & Additive Increase (AI

Congestion Avoidance)

start with the congestion window (cwnd) = MSS for each successfully received ACK increase the cwnd size by 1 until the cwnd = threshold value; (exponential increase) after that, for each successfully received ACK, increase the window size by 1/n segments up to a size of the receiver window. n=current congestion window (cwnd) size
Congestion Avoidance Multiplicative Decrease Fast Recovery

2.

Multiplicative Decrease (MD)


if a time-out occurs the cwnd is set to one MSS and the threshold is set to half of the cwnd size if 3 duplicated ACKs received the threshold is set to a half of the cwnd size and fast recovery is started (TCP Reno)

Slow Start

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

94

TCP sender congestion control (AIMD)


State Slow Start (SS) Event ACK received for previously unacked data ACK received for previously unacked data Loss event detected by triple duplicate ACK Timeout TCP Sender Action CongWin = CongWin + MSS, If (CongWin > Threshold) set state to Congestion Avoidance CongWin = CongWin + MSS * (MSS/CongWin) Comment Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

Additive increase, resulting in increase of CongWin by 1 MSS every RTT Fast recovery, implementing multiplicative decrease. CongWin will not drop below 1 MSS. Enter slow start

SS or CA

Threshold = CongWin/2, CongWin = Threshold, Set state to Congestion Avoidance

SS or CA

Threshold = CongWin/2, CongWin = 1 MSS, Set state to Slow Start Increment duplicate ACK count for segment being acked

SS or CA

Duplicate ACK

CongWin and Threshold not changed

01/03/2012

Copyright 2005 Pearson Addison-Wesley. All rights reserved. Tanr zelebi, t.ozcelebi@tue.nl 95 TU/e Computer Science, System Architecture and Networking

To Do
Read Chapter 4 of the textbook Exercises (due next Wednesday)
Ch3 R1, R3, R6, R8, R11, R12, R14 In R14.c, there is a typo. It should be ( 1 sec ) Ch3 - P7, P19, P20, P22, P23

Check TCP Visualizer tool:


http://www.win.tue.nl/~tozceleb/2IC16/TcpViz.zip

01/03/2012

Tanr zelebi, t.ozcelebi@tue.nl TU/e Computer Science, System Architecture and Networking

96