Вы находитесь на странице: 1из 108

Introduction to Computer Networks

Phillip Musumeci
April 14, 2002

http://mirriwinni.cs.jcu.edu.au/phillip

JCU

School of InfTech

Contents

Part I
1

Introduction
1.1 Background . . . . . . . . . . . . .
1.2 Network Aims . . . . . . . . . . . .
1.3 Network Use . . . . . . . . . . . . .
1.4 Network Hardware . . . . . . . . .
1.5 Alternative classification criterion .
1.6 LANs . . . . . . . . . . . . . . . . .
1.7 MANs . . . . . . . . . . . . . . . .
1.8 WANs . . . . . . . . . . . . . . . . .
1.9 Internetworks . . . . . . . . . . . .
1.10 Network organisation . . . . . . .
1.11 Example: Message Transfer . . . .
1.12 Network Design Issues . . . . . . .
1.13 Interfaces and Services . . . . . . .
1.14 Types of service . . . . . . . . . . .
1.15 Service Primitives . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

1
1
2
2
3
3
4
5
6
7
8
9
11
11
12
14

Reference Models OSI


2.1 Introduction . . . . . . . . . . . . . .
2.2 Physical Layer . . . . . . . . . . . . .
2.3 Data Link Layer . . . . . . . . . . . .
2.4 Network Layer . . . . . . . . . . . .
2.5 Transport Layer . . . . . . . . . . . .
2.6 Session Layer . . . . . . . . . . . . .
2.7 Presentation Layer . . . . . . . . . .
2.8 Application Layer . . . . . . . . . . .
2.9 Data Transmission in the OSI Model

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

16
16
17
17
18
18
19
19
20
20

TCP/IP Reference Model


3.1 Introduction . . . . . . . . . . . . .
3.2 The Internet Layer . . . . . . . . .
3.3 The Transport Layer . . . . . . . .
3.4 The Application Layer . . . . . . .
3.5 HosttoNetwork Layer . . . . . .
3.6 OSI versus TCP Reference Models
3.7 Example Networks . . . . . . . . .
3.7.1 Internet Services . . . . . .
3.8 Data Communications Services . .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

21
21
21
22
22
23
23
24
25
26

Physical, Data Link, and Network Layers


4.1 Physical Layer Issues . . . . . . . . .
4.2 Data Link Layer Issues . . . . . . . .
4.2.1 Framing . . . . . . . . . . . .
4.2.2 Error Control Overview . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

27
27
27
27
28

c
Phillip
Musumeci 2002

JCU

School of InfTech

4.2.3 Flow Control Overview . . . . . . . . . . .


4.3 Network Layer in Internet . . . . . . . . . . . . . .
4.3.1 IP Header . . . . . . . . . . . . . . . . . . .
4.4 IP Addresses . . . . . . . . . . . . . . . . . . . . . .
4.5 Subnets . . . . . . . . . . . . . . . . . . . . . . . . .
4.6 IP Router Operation . . . . . . . . . . . . . . . . .
4.7 Internet Control Protocols . . . . . . . . . . . . . .
4.7.1 Internet Control Message Protocol (ICMP)
4.7.2 Address Resolution Protocol (ARP) . . . .
4.7.3 Reverse Address Resolution Protocol . . .
4.7.4 Interior Gateway Routing Protocol . . . . .
4.7.5 Exterior Gateway Routing Protocol . . . .
4.8 Internet Multicasting . . . . . . . . . . . . . . . . .
4.9 Classless InterDomain Routing . . . . . . . . . . .
4.10 IPv6 . . . . . . . . . . . . . . . . . . . . . . . . . . .
5

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Transport and Session Layers


5.1 Transport Protocols . . . . . . . . . . . . . . . . . . .
5.1.1 Introduction . . . . . . . . . . . . . . . . . . .
5.1.2 Types of Service . . . . . . . . . . . . . . . . .
5.1.3 Qualities of Service . . . . . . . . . . . . . . .
5.1.4 Transport Service Primitives . . . . . . . . .
5.1.5 Berkeley Sockets . . . . . . . . . . . . . . . .
5.1.6 Addressing . . . . . . . . . . . . . . . . . . .
5.1.7 Multiplexing . . . . . . . . . . . . . . . . . .
5.1.8 Example Transport Protocol TCP . . . . . .
5.2 TCP/IP demonstration client . . . . . . . . . . . . .
5.2.1 Introduction . . . . . . . . . . . . . . . . . . .
5.2.2 Privilege and Complexity . . . . . . . . . . .
5.2.3 Standard versus nonstandard clients . . . . .
5.2.4 Connectionless v connectionoriented SVRs
5.2.5 Program Interface to Protocols . . . . . . . .
5.2.6 Interface Functionality . . . . . . . . . . . . .
5.2.7 System Calls . . . . . . . . . . . . . . . . . . .
5.2.8 BSD Tutorial . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

29
29
30
32
34
34
35
35
36
37
37
38
38
39
40

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

42
42
42
43
43
44
46
47
48
49
50
50
51
51
51
52
52
53
53

Part II
6

54

TCP/IP Protocols
6.1 Introduction . . . . . . . . . . . . . . . .
6.2 Review of TCP/IP Layering . . . . . . .
6.3 User Datagram Protocol . . . . . . . . .
6.4 UDP Multiplexing . . . . . . . . . . . . .
6.5 UDP Port Number Allocation . . . . . .
6.6 Reliable Stream Transport Service (TCP)
6.7 Providing Reliability . . . . . . . . . . .
6.8 What does TCP provide? . . . . . . . . .

ii

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

54
54
54
55
57
57
59
60
61

c
Phillip
Musumeci 2002

JCU
6.9
6.10
6.11
6.12

6.13
6.14
6.15
6.16
6.17
6.18
6.19
6.20
6.21
6.22
6.23
7

School of InfTech
TCP Connections . . . . . . . . . . . . .
Segments and Streams . . . . . . . . . .
Variable Window Size and Flow Control
TCP Segment Format . . . . . . . . . . .
6.12.1 Out of Band Data . . . . . . . . .
6.12.2 Maximum Segment Size . . . . .
6.12.3 TCP Checksum Computation . .
Acknowledgements and Retransmission
TCP Timeouts and Retransmission . . .
TCP Links with High Variance in Delay
Response to Congestion . . . . . . . . .
Open and Close of TCP Connections . .
Reset of TCP Connections . . . . . . . .
TCP Protocol FSM . . . . . . . . . . . . .
Forced Data Delivery . . . . . . . . . . .
Reserved TCP Port Numbers . . . . . .
TCP Summary . . . . . . . . . . . . . . .
Further Information . . . . . . . . . . . .

Introduction to Socket Programming


7.1 Background . . . . . . . . . . . . . . .
7.2 Creating a socket . . . . . . . . . . . .
7.3 Closing a socket . . . . . . . . . . . . .
7.4 Binding . . . . . . . . . . . . . . . . . .
7.5 Server: Listen and Accept . . . . . . .
7.6 Client: Connect . . . . . . . . . . . . .
7.7 Sending and Receiving Data . . . . . .
7.8 Flexible use of read() and write()
7.9 Servers for Multiple Services . . . . .
7.10 Network Byte Order . . . . . . . . . .
7.11 Some Other Related Functions . . . .
7.12 BSD internet super-server inetd . . .
7.13 Additional References . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

61
62
63
63
64
65
65
66
66
68
69
71
72
72
73
74
76
76

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

77
77
77
78
78
79
81
81
81
82
83
84
84
85

IP Router Operation
8.1 Datagram Delivery . . . . . . . . . . . . . .
8.2 Route Table Completeness . . . . . . . . . .
8.3 Route Optimisation Algorithms . . . . . . .
8.4 Interior Gateway Routing Protocol . . . . .
8.4.1 Routing Information Protocol (RIP)
8.4.2 Open Shortest Path First . . . . . . .
8.5 Exterior Gateway Routing Protocol . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

86
86
88
88
89
89
90
91

Internet Control Protocols


9.1 Internet Control Message Protocol (ICMP)
9.2 Address Resolution Protocol (ARP) . . . .
9.3 Reverse Address Resolution Protocol . . .
9.4 Domain Name System . . . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

92
92
94
94
95

iii

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

c
Phillip
Musumeci 2002

JCU

School of InfTech

10 Application Layer
10.1 Introduction . . . .
10.2 Email . . . . . . . .
10.3 Network News . .
10.4 Other Applications

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

A PPP PointtoPoint Protocol

.
.
.
.

98
98
98
99
100
101

Notes:
1. These lecture notes use diagrams from Computer Networks by Andrew Tanenbaum. They are the result my teaching at RMIT and JCU.
2. Texts:
Computer Networks, 3rd edition, Andrew Tanenbaum, Prentice-Hall, 1996.
ISBN 0-13-394248-1.
See also URL http://www.cs.vu.nl/ast.
Internetworking With TCP/IP Vol. 1, D.E. Comer, 2nd edition, PrenticeHall, 1991.
ISBN 0-13-468505-9. (reference)
Computer Networks and Internets, D.E. Comer, Prentice-Hall, 1997.
ISBN 0-13-599010-6. (reference)
Advanced Programming in the UNIX Environment, W. Richard Stevens,
Addison-Wesley, 1992.
ISBN 0-201-56317-7. (coding)

iv

c
Phillip
Musumeci 2002

JCU

School of InfTech

Introduction

We consider the following topics:

Purpose of networks;
Network structure and organisation;
Reference models and service types;
Internet protocols IP, TCP.

Text:
Andrew Tanenbaum, Computer Networks, 3rd edition, Prentice-Hall, 1996.
ISBN 0-13-394248-1. See URL http://www.cs.vu.nl/ast

1.1 Background
Recent technological developments: 18th century for mechanical machines; 19th
century for steam power;
20th century for electric power and information technology.
Differences between collecting, transporting, storing, and processing information
are rapidly disappearing.
The switch from analogue signal communications systems to digital communications systems means (digital) computer data, documents, speech, images, image
sequences, etc. will eventually be indistinguishable.
The merging of computers and communications has affected the organisation of
computer systems a single large computer has evolved into interconnected
smaller1 computers usually called a computer network.
The way of doing business is changing, in terms of collaboration and in terms of
commerce itself.
1

At least in physical terms.

c
Phillip
Musumeci 2002

JCU

School of InfTech

The distribution of resources in a computer network is still evolving, driven by


maintenance costs per desktop and the availability of higher bandwidth at reducing cost.

1.2 Network Aims


Resource sharing printing, disk storage, mail, etc.
Robustness a network allows users to switch between servers to obtain higher
reliability2 .
Economy distributed systems can have centralised management, (expensive)
shared software resources cleanly connected to UI systems in client/server systems.
Client machine

Server machine
Server
process

Client
process

Network
R eq u e s t

Rep l y

Fig. 1-1. The client-server model.

1.3 Network Use


Communication:
remote information access e.g. world wide web, banking and airline systems;
persontoperson communication e.g. iphone, email;
interactive entertainment (emerging).

Assuming the network is more reliable than the resource source!

c
Phillip
Musumeci 2002

JCU

School of InfTech

1.4 Network Hardware


Two network types:

Broadcast networks
contain a single communications channel shared by all machines,
messages are broken into small packets and sent by one machine to all
other machines (which reassemble the message),
an address field specifies the recipient(s).
Communications between pairs of machines is common but it is possible to have 1 bit
in the address field indicate a group address for transmission to multiple recipients
(important for digital HDTV and related payondemand media distribution).

Pointtopoint networks
consist of many connections between individual pairs of machines,
a link between any two machines will often have to pass through intermediate machines so routing of packets is an important issue.
General rule: geographically localised networks tend to use broadcast structures while
geographically spread networks tend to use pointtopoint structures.

1.5 Alternative classification criterion


The scale of a network can also be used to classify a network. Consider AST Figure 1.2:

c
Phillip
Musumeci 2002

JCU

School of InfTech
Interprocessor
distance

Processors
located in same

0.1 m

Circuit board

Data flow machine

System

Multicomputer

1m
10 m
100 m
1 km
10 km
100 km
1,000 km
10,000 km

Example

Room
Local area network

Building
Campus

Metropolitan area network

City
Country

Wide area network


Continent
The internet

Planet

Fig. 1-2. Classification of interconnected processors by scale.


Data flow machines are highly parallel with many functional units e.g. Thinking
Machines CM5 and Fujitsu VP2200 systems3 , Texas Instruments TMS320 C40/C6x
DSP devices.
Multicomputers use short, fast busses to pass messages e.g. some DSP chipsets.
Next are our networks operating over longer distances, divided into local, metropolitan, and widearea networks.
Such networks may be interconnected e.g. the Internet.

1.6 LANs
Distinguished from other networks by size, transmission technology, and topology.
Range in size from single building to a few kilometers in size restricted size
means worstcase transmission time is bounded (used in design).
3

T.I. make high speed cross bar switches for variable topology machines and HP make high speed
optical links for PCB use.

c
Phillip
Musumeci 2002

JCU

School of InfTech

Common transmission technology is a single cable to which all machines are attached, with speeds of 10Mbps to 100Mbps. Note: 1Mbps = 220 bps = 1 megabit/sec
= 1048576bps.
Common topologies include bus and ring e.g.

Computer

Cable

Computer

(a)

(b)

Fig. 1-3. Two broadcast networks. (a) Bus. (b) Ring.

BUS LAN: arbitration is required to handle two or more machines transmitting


simultaneously (a collision). Solutions involve passing tokens to avoid collisions,
or mechanisms to handle a collision.
RING LAN: each bit propagates around the ring (but not more than once!).
Channel allocation: static or dynamic.

1.7 MANs
Like a bigger LAN.
Can support both data and voice (i.e. it is possible to handle delivery time in
speech).
A standard called Distributed Queue Dual Bus (DQDB, IEEE 802.6) developed in
Australia has been agreed upon.

c
Phillip
Musumeci 2002

JCU

School of InfTech
Direction of flow on bus A
Bus A

Computer

...

N
Head end

Bus B
Direction of flow on bus B

Fig. 1-4. Architecture of the DQDB metropolitan area network.

1.8 WANs
Spans a large geographical area.
Connects machines (hosts, end systems) running user programs.
Hosts are connected by a communication subnet, or subnet for short, used in the
context of the toplevel view in Figure 1-5.

Subnet

Router

Host

LAN

Fig. 1-5. Relation between hosts and the subnet.

A subnet consists of: transmission lines, switching elements or switching computers (routers).
6

c
Phillip
Musumeci 2002

JCU

School of InfTech

Routers figure out which transmission link to use, and perform storeandforward
operations.
Subnet ambiguity: the term subnet also has a meaning in terms of addressing.
For pointtopoint subnets, an important design consideration is topology:

(a)

(d)

(b)

(c)

(e)

(f)

Fig. 1-6. Some possible topologies for a point-to-point subnet.


(a) Star. (b) Ring. (c) Tree. (d) Complete. (e) Intersecting rings.
(f) Irregular.
Exercise: Postulate how (or whether) routing issues might be handled here.
For WANs with small packet sizes (such as digital telephones), the packets may
be called cells.

1.9 Internetworks
Gateways are used to interconnect networks.
Any necessary communications conversions are performed e.g. the networks might
use different rules (protocols) for operation, or data representation might be different.
7

c
Phillip
Musumeci 2002

JCU

School of InfTech

An example of a large internetwork is The Internet.

1.10 Network organisation


Network hardware and software is highly structured.
Most networks are organised as a series of layers or levels one reason for these
layers is to help humans control complexity in design and implementation of the
networks.
The purpose of each layer is to offer services to higher layers, shielding those layers from the details of how the services are implemented.
Layer n on one machine carries on a conversation with layer n on the other machine e.g.

Host 1

Host 2
Layer 5 protocol

Layer 5

Layer 5

Layer 4/5 interface


Layer 4

Layer 4 protocol

Layer 4

Layer 3/4 interface


Layer 3

Layer 3 protocol

Layer 3

Layer 2/3 interface


Layer 2

Layer 2 protocol

Layer 2

Layer 1/2 interface


Layer 1

Layer 1 protocol

Layer 1

Physical medium

Fig. 1-9. Layers, protocols, and interfaces.

c
Phillip
Musumeci 2002

JCU

School of InfTech

Note: The dashed lines indicate a conversation between peers. The data bits involved circulate down, across the physical medium, and then back up between the
corresponding levels.
Between each adjacent layer is an interface which defines the primitive operations
and services the lower layer offers to the upper layer.
Design concerns the number and purpose of the layers, and a
(wellunderstood) interface between them.
A set of layers and protocols is called a Network Architecture this is enough
information for someone to build hardware and write software to implement each
layer.
Organisation is usually a series of layers or levels. Each layer offers services to
higher layers, shielding those layers from the details of how the services are implemented.
Dashed lines indicate a conversation between peers (local layer n talks to remote
layer n).
Between each adjacent layer is an interface which defines the primitive operations
and services the lower layer offers to the upper layer.
A list of protocols used by a certain system, one protocol per layer, may be called
a small protocol stack.

1.11 Example: Message Transfer


Communication is to occur from a process running in the top layer shown in AST Figure
1-11.

c
Phillip
Musumeci 2002

JCU

School of InfTech
Layer
Layer 5 protocol

H4

Layer 4 protocol

H4

Layer 3
protocol
3

H3 H4 M1

H3 M2

H3 H4 M1

H3 M2

H2 H3 H4 M1 T2

H2 H3 M2 T2

Layer 2
protocol
2 H2 H3 H4 M1 T2

H2 H3 M2 T2

Source machine

Destination machine

Fig. 1-11. Example information flow supporting virtual communication in layer 5.

Message M is created and passed to layer 4 for transmission.


Layer 4 prepends a header to identify the message (sequence no., size, time, etc.)
and passes it to layer 3.
While messages usually have no size limit, the layer 3 protocol will impose a limit
so incoming messages are broken up into packets which have headers prepended
and are passed to layer 2.
Layer 2 prepends a header and appends a trailer and passes each packet to layer
1 for physical transmission.

As information is passed down, headers and trailers are added. As packets are
passed up, headers and trailers are removed (and used).
These ideas are not limited to just networking even hardware designers may
implement header and trailer handling functions in hardware for high speed communications on multi-CPU DSP systems (but still with some software component).
Peer processes think of the communication (and coordination) as being horizontal.

10

c
Phillip
Musumeci 2002

JCU

School of InfTech

Access will be via functions such as SendMessage and ReceiveMessage at the top
level, and similar packet oriented functions at lower levels.

1.12 Network Design Issues


Addressing: each host must be identified so messages can pass between pairs of
hosts; hosts have multiple users so there must be addressing in the context of each
host.
Data transfer rules:
simplex (unidirectional);
halfduplex (either direction at a time);
fullduplex (both directions simultaneously).
Error handling: detection & correction (per packet); agreement on methods used.
Packet handling: packetising large messages; resequencing of outoforder packets; avoiding mostly empty packets.
Flow control (protect slow data destinations from fast sources).
Multiplexing: for multiple connections between two peer layers, a lower layer
may choose to multiplex these connections (may reduce costs or delays).
Routes: given multiple paths between source and destination, a route must be
chosen.

1.13 Interfaces and Services


Terminology:

Entities are the active elements in a layer, software or hardware.


Peer entities are entities in the same layer on different machines.
Layer n is a service provider to layer n+1, while layer n+1 is a service user of layer
n.

11

c
Phillip
Musumeci 2002

JCU

School of InfTech

Service Access Points are the access points in layer n where layer n+1 can access the
services provided4 .

SAP
IDU
SDU
PDU
ICI

IDU
Layer N+1

ICI

SDU
SAP

= Service Access Point


= Interface Data Unit
= Service Data Unit
= Protocol Data Unit
= Interface Control Information

Interface

Layer N

ICI

SDU

Layer N entities
exchange N-PDUs
in their layer N
protocol

SDU
N-PDU
Header

Fig. 1-12. Relation between layers at an interface.

Between layers, an Interface Data Unit (IDU) is used to exchange information. An


IDU comprises a Service Data Unit (SDU) holding the data plus some control information.
An SDU is transferred by fragmenting it into small parts called Protocol Data Units
(PDU) e.g. packets.

1.14 Types of service


A layer can offer two types of service to other layers above. Connectionoriented implies
a link is established and used as follows:

Connection setup e.g. telephone dialing;


Data is exchanged e.g. talking; and
Connection tear down e.g. hanging up.
4

Some services may be more efficient if they tap into the protocol at a middle layer, so an SAP may
be of use to a user service.

12

c
Phillip
Musumeci 2002

JCU

School of InfTech

Connectionless implies:
each element of data contains a full address;
each element is sent independently of others, meaning that
messages are not guaranteed to arrive in order (in contrast to connectionoriented
services).
Note: connectionoriented services may be built upon lower layers that are connectionless, and viceversa.

Service
Connectionoriented

Connectionless

Example

Reliable message stream

Sequence of pages

Reliable byte stream

Remote login

Unreliable connection

Digitized voice

Unreliable datagram

Electronic junk mail

Acknowledged datagram

Registered mail

Request-reply

Database query

Fig. 1-13. Six different types of service.


Quality of service relates to aspects such as speed, delays, reliability, data loss:
data loss avoided with acknowledgements but costs processing and delays;
reliable connectionoriented service can be implemented as message sequence or
byte stream;

unreliable (i.e. not acknowledged) connectionless service is often called a datagram


service;

for higher reliability, the datagram service can become acknowledged the acknowledged datagram service still avoids connection establishment overheads;

in a requestreply service, datagrams are exchanged (fast, able to handle packet


loss etc., common in clientserver computer systems).
13

c
Phillip
Musumeci 2002

JCU

School of InfTech

1.15 Service Primitives


A service is specified by a set of primitive operations;
Primitives tell the service to perform an action or report on an action.
 

 Primitive 
Meaning
 

 Request
 An entity wants the service to do some work



Indication  An entity is to be informed about an event
 


Response  An entity wants to respond to an event
 


Confirm
The response to an earlier request has come back











Fig. 1-14. Four classes of service primitives.

Services between entity a and entity b can be:

confirmed involving request sent by a, indication received by b, response sent by


b, confirm received by a; and
unconfirmed involving request sent by a and indication received by b.

Example a simple connection oriented service could be built on 8 primitives:

1. CONNECT.request
2. CONNECT.indication
3. CONNECT.response
4. CONNECT.confirm
5. DATA.request
6. DATA.indication
7. DISCONNECT.request
8. DISCONNECT.indication
14

c
Phillip
Musumeci 2002

JCU

School of InfTech

AST Figure 1-15 shows typical use over time of these primitives (ignore Millie):

Layer N + 1

7
Computer 1

Layer N

4
1

Layer N + 1
Layer N

6
5

10

Time

5
6

Computer 2

Fig. 1-15. How a computer would invite its Aunt Millie to tea.
The numbers near the tail end of each arrow refer to the eight service primitives discussed in this section.

Review

A service is a set of primitives that a layer provides to the layer above. A protocol
is a set of rules governing the format and meaning of frames, packets, and messages
exchanged by peer entities. An entity uses protocols to implement a service.

15

c
Phillip
Musumeci 2002

JCU

School of InfTech

Reference Models OSI

2.1 Introduction
The ISO Open Systems Interconnection Reference Model has 7 layers chosen according
to the principles:

1. Layers only created when a different level of abstraction needed;


2. Layers perform well defined function;
3. Layer functions chosen with international standards in mind;
4. Layer boundaries chosen to minimise information flow across boundaries;
5. Number of layers chosen to handle the various distinct functions one per layer,
but with not too many layers.

There exists the OSI model specification document and also ISO layer standards.

16

c
Phillip
Musumeci 2002

JCU

School of InfTech
Layer

Name of unit
exchanged
Application

Application protocol

Application

APDU

Presentation

PPDU

Session

SPDU

Transport

TPDU

Network

Network

Packet

Interface
6

Presentation

Presentation protocol

Interface
5

Session

Transport

Session protocol

Transport protocol
Communication subnet boundary
Internal subnet protocol

Network

Data link

Data link

Data link

Data link

Frame

Physical

Physical

Physical

Physical

Bit

Host A

Router

Router

Host B

Network

Network layer host-router protocol


Data link layer host-router protocol
Physical layer host-router protocol

Fig. 1-16. The OSI reference model.

2.2 Physical Layer


Concerned with transmission of unstructured bit stream over physical link;
Involves parameters such as signal voltage and bit time durations;
Handles transmission which is unidirectional or bidirectional;
Deals with mechanical, electrical, optical, and procedural aspects of establishing
the link and moving data bits and disestablishing the link.

2.3 Data Link Layer


Main task is to take a raw transmission facility and transform it into a line that
appears free of undetected transmission errors;
17

c
Phillip
Musumeci 2002

JCU

School of InfTech

This is done by breaking input data into data frames, sending them sequentially,
and processing acknowledgement frames sent back by the receiver;
Since the physical layer appears to be a bit conduit, it is up to the data link layer
to create and recognise frame boundaries this is done by attaching special bit
patterns to the beginning and end of each frame (and handling the case where this
bit pattern needs to be represented within the frame);
Handles data errors e.g. frame retransmission when frames are lost or corrupted,
also handles duplicate frames;
Handles flow control;
Medium access in broadcast networks is handled by the medium access sublayer.

2.4 Network Layer


Provides upper layers with independence from the data transmission and switching technologies used;
Concerned with controlling the operation of the subnet, including routing which
can be static (wired in) or dynamic (varied to improve performance e.g. avoid
congestion);
Handles accounting;
Convert different addressing schemes and packet sizes between different networks;
Broadcast networks do not have a routing problem.

2.5 Transport Layer


Provide reliable transparent transport of data between end points;
Basic function is to accept data from the session layer, split it into smaller units if
need be, pass these to the network layer, and ensure all pieces arrive correctly;
Must be efficient, and must isolate upper layers from changes in hardware technology;

18

c
Phillip
Musumeci 2002

JCU

School of InfTech

May create multiple network connections in order to achieve high throughput, or


may multiplex several transport connections onto the same network connection to
reduce cost;
The type of service is determined at the transport layer (errorfree pointtopoint
channel, or isolated messages with no guarantee of delivery, or multicast);
Is a true endtoend layer, from source to destination (contrast with lower layers
of Figure 1-16);
Handles establishing and deleting connections, and the naming of the end point
users;
Handles flow control.

2.6 Session Layer


Provides the control structure for communication (sessions) between applications,
and establishes, manages, and terminates these sessions;
Can provide dialogue control (e.g. manage oneway traffic), manage tokens (tokens may be exchanged in protocols where only one end at a time may attempt
critical operations), provide synchronisation (inserts checkpoints in operations so
that restarts are possible e.g. a file transfer restart).

2.7 Presentation Layer


Can perform generally useful transformations and functions on data in a way that
supports different users needs and avoids each user programming their own solution;
Concerned with syntax & semantics of the information transmitted;
Common services include encryption, text compression, reformatting;
Data conversion from hostspecific to networkoriented back to (a different) host
specific form5 ;
5
Different computers have different numeric representations while different data bases might have
different data representation.

19

c
Phillip
Musumeci 2002

JCU

School of InfTech

2.8 Application Layer


Provides a variety of other protocols (in the OSI environment);
Might provide a network virtual terminal service, network management service, transaction server, file transfer protocol (handling different naming conventions and different text representations), mail, etc.

2.9 Data Transmission in the OSI Model


Sending
Process

Receiving
Process
Data

Application
layer

Application protocol

Presentation
layer

Presentation protocol

Session
layer
Transport
layer
Network
layer
Data link
layer
Physical
layer

AH

Network
layer

Data

NH
DH

Transport
layer

Data

TH

Network
protocol

Session
layer

Data

SH

Transport
protocol

Presentation
layer

Data

PH

Session protocol

Application
layer

Data

Data
Bits

DT

Data link
layer
Physical
layer

Actual data transmission path

Fig. 1-17. An example of how the OSI model is used. Some of the
headers may be null. (Source: H.C. Folts. Used with permission.)
Actual data transmission is vertical (apart from the lower physical link);
Each layer is programmed as if it is transferring data with a horizontal peer.

20

c
Phillip
Musumeci 2002

JCU

School of InfTech

TCP/IP Reference Model

3.1 Introduction
The Internet was originally developed using leased telephone lines and later satellite and radio links;
It had to handle connection of multiple networks in a seamless way;
Defined in 1974, design predates OSI.

OSI

TCP/IP
Application

Application

Presentation

Session

Transport

Transport

Network

Internet

Data link

Host-to-network

Physical

Not present
in the model

Fig. 1-18. The TCP/IP reference model.

3.2 The Internet Layer


It allows hosts to inject packets into any network and have them travel independently to the destination;
Packets may take different routes;
Packets may arrive out of order (in which case higher layers must reorder them).
The internet layer defines an official packet format and protocol called IP (Internet
Protocol);

21

c
Phillip
Musumeci 2002

JCU

School of InfTech

The purpose of this layer is to deliver IP packets hence major issues are: routing,
congestion.

3.3 The Transport Layer


It allows peer entities on the source and destination hosts to carry on a conversation (similar to OSI transport layer);
There are two endtoend protocols defined:
TCP (Transmission Control Protocol) is a reliable connectionoriented protocol
transfers a byte stream from one machine to another across the internet
without error, breaks message into fragments and reassembles, handles flow
control.
UDP (User Datagram Protocol) is an unreliable, connectionless protocol avoids
TCPs overheads, used in clientserver requestreply application (where errors etc. are usually handled directly), also suitable where speed is more
important than error avoidance e.g. speech, video.
Relationship of IP, TCP, UDP:

Layer (OSI names)


TELNET

FTP

SMTP

TCP

Protocols

DNS

UDP

Transport

IP

Networks

ARPANET

Network
Packet
radio

SATNET

Application

LAN

Physical +
data link

Fig. 1-19. Protocols and networks in the TCP/IP model initially.

3.4 The Application Layer


The TCP/IP model does not have a session or presentation layer the need was
not perceived, and they are now considered of little use;
22

c
Phillip
Musumeci 2002

JCU

School of InfTech

Application layer includes higher level protocols such as:


virtual terminal (TELNET);
file transfer (FTP);
mail transfer (SMTP);
domain name service (DNS);
network news transfer (NNTP);
hypertext transfer (HTTP).

3.5 HosttoNetwork Layer


Not described in the TCP/IP reference model;
Usually not described in texts;
Read the sources! I.e. look at 4.4BSD Lite/Lite2 source distributions or subsequent
OS sources such as the *BSD family. Useful network sites include:
http://www.freebsd.org
http://www.au.freebsd.org
http://www4.au.freebsd.org

3.6 OSI versus TCP Reference Models


Layers up through and including the transport layer provide an endtoend network
independent transport service;
OSI model makes the distinctions between services and interfaces and protocols
explicit;
OSI reference model was devised before protocols were invented, while TCP/IP
was given a model to describe the existing protocols (the protocols fit their model
very well indeed!);
Separation of interfaces and implementation ties in well with modern OO design
techniques;
Different number of layers;

23

c
Phillip
Musumeci 2002

JCU

School of InfTech

Network layer: OSI provides connectionless and connectionoriented communications while TCP/IP has only connectionless communications;
Transport layer: OSI has connectionoriented communications while TCP/IP has
connectionless and connectionoriented communications.

Reading: AST 1.4.4 Critique of the OSI Models and Protocols and AST 1.4.5 Critique of the TCP/IP Reference Model.

3.7 Example Networks


Internet in Australia there are a number of backbone networks across the country with more expected. E.g. Optus, Telstra.
On a local scale, the pathways between JCU/SIT hosts and remote hosts can be
described with the use of the traceroute UNIX command.
Novell Netware is a very popular PC networking system based on Xerox Network System (XNS), predates OSI, appears similar to TCP/IP. It uses a proprietary
protocol stack shown in AST Figure 1-22:
Layer
Application
Transport

...

File server

SAP

SPX

NCP
IPX

Network
Data link

Ethernet

Token ring

ARCnet

Physical

Ethernet

Token ring

ARCnet

Fig. 1-22. The Novell NetWare reference model.


Physical and data link layers can be chosen from various industry standards including ethernet, IBM token ring, etc.
Has an unreliable connectionless internetwork protocol called IPX, like IP but with
10 byte addresses instead of 4 bytes.
24

c
Phillip
Musumeci 2002

JCU

School of InfTech

Has a connectionoriented protocol called NCP (network core protocol) providing user data transport and other services (a second protocol SPX provides only
transport).
Servers regularly advertise services (SAP).
PC network protocols are starting to be based on TCP/IP.

Bytes

1 1

12

12

Destination address

Source address

Data

Packet type
Transport control
Packet length
Checksum

Fig. 1-23. A Novell NetWare IPX packet.

3.7.1

Internet Services

Email: basic service allowing messages to be composed, sent, and received. Usually, a mail client handles email composition and reading while an operating system service handles email transfer via the Simple Mail Transfer Protocol (e.g. BSD
Unix sendmail handles SMTP).
News: message transfer system allowing individuals to communicate to groups.
An application program handles news composition & reading while a network
service handles news propagation via the Network News Transfer Protocol (NNTP).
File Transfer (FTP): a user client program communicates with a remote application
to provide file transfer.
Remote Procedure Call (RPC): a local program communicates a request to a remote
service provider asking for a remote procedure (program) to run and return the
results.
Remote Login: a user can run a remote shell (CLI session or other task) via tools
such as telnet, rlogin, and rsh.

25

c
Phillip
Musumeci 2002

JCU

School of InfTech

3.8 Data Communications Services


There exist quite a few other network standards but they are outside the scope of this
subject (see AST for further information). Some additional services of interest are now
mentioned.
In the 1980s/1990s, Bellcore introduced Switched Multimegabit Data Service (SMDS)
which was aimed at linking remote networks. Basic SMDS operates at 45Mbps,
providing a simple connectionless packet delivery service.
The CCITT developed the X.25 standard in the 1970s. It provides an interface between public packer-switched networks and customers. Can be switched virtual
circuit (setup, use, teardown) or permanent virtual circuit. Packets are ordered.
Capacity has been sold in 2Mbps increments.
Frame Relay provides a bare bones connection oriented bit transport with user
responsible for errors and flow control. Typical speed = 1.5 Mbps.
Broadband Integrated Services Digital Network (B-ISDN) can offer television (various image sizes), telephony and high quality audio, other multimedia services,
LAN interconnections, etc. The underlying technology is Asynchronous Transfer
Mode (ATM) using a 53 byte packet called a cell (5 byte header, 48 byte payload).
Bytes

5
Header

48
User data

Fig. 1-29. An ATM cell.


ATM cell switching is flexible:
Handles constant rate traffic (audio, video6 ) and variable rate traffic (data).
High speed: Gbps possible.
Multicasting possible (telephone companies can become broadcasters).
Is connectionoriented, with current speeds of 155Mbps ( 3 T 1 links) and
622Mbps. Video coding developers have been pushing for priority packets to be
allowed.
6

Compressed video can be bursty.

26

c
Phillip
Musumeci 2002

JCU

School of InfTech

Physical, Data Link, and Network Layers

4.1 Physical Layer Issues


Use communications ideas to move data bits deal with modulation schemes,
transmitters & receivers.
Systems may use time division multiplexing, frequency division multiplexing,
wavelength division multiplexing.
Media can be wire link, fibre link, or radio (wireless).
Must deal with signal attenuation and interference. Optical systems also suffer
phase distortion and signal leakage.
For further information: see AST .

4.2 Data Link Layer Issues


Deals with algorithms that achieve reliable efficient communication between two
adjacent machines.
Machines are linked by a communications channel e.g. coax wire.
Nonideal channel characteristics: circuit errors; finite data rates; nonzero propagation times.
DLL design issues: group bits into frames; handle transmission errors; regulate
flow.
DLL provides a welldefined interface to the network layer.

4.2.1

Framing

Framing refers to the technique of identifying the start and end of each packet.
One technique is to use a special flag (bit pattern).
Using bit stuffing for framing allows data frames to contain an arbitrary number
of bits and frees the system from any concept of character size. Each frame begins and ends with the special bit pattern 01111110. A sender DLL encountering 5
27

c
Phillip
Musumeci 2002

JCU

School of InfTech
consecutive 1 bits inserts a 0 in the bit stream, and the receiver DLL removes the
inserted 0.

(a) 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0
(b) 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0 0 1 0
Stuffed bits
(c) 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0

Fig. 3-5. Bit stuffing. (a) The original data. (b) The data as they
appear on the line. (c) The data as they are stored in the receivers
memory after destuffing.

4.2.2

Error Control Overview

To ensure delivery of a frame, we need some feedback from the receiver to the
sender to indicate success or failure this will handle errors within a frame.
What if a frame is completely lost (perhaps due to a noise burst)? Start a timer
after each frame is sent and resend if no acknowledgement received within some
time limit.
What if frames arrive twice (ack. was lost) or out of order? Give each frame an ID
number.
DLLs duties include management of timers & frame sequence ID numbers.
Error detection: general idea is to have the TX end append extra bits to the message in such a way that the RX end can detect illegal bit combinations. Mathematically, it is possible to define a measure of distance between the valid message+check
bits we then know how many bit errors can be detected.
Error correction: suppose valid combinations of message+check bits differ in at
least 3 bits and we receive a message+check sequence that differs from the allowed
by only 1 bit. If we think only 1 bit is in error, we can choose the nearest allowed
pattern and fix the error.
28

c
Phillip
Musumeci 2002

JCU

School of InfTech

A Hamming Code is a method of determining the smallest number of check bits


to achieve a desired detection and correction capability.
In a Cyclic Redundancy Check, the basic idea is to:
regard the message m as a long binary number;
divide m by a long prime number g;
use the remainder after division as a check;
the sender and receiver both do this calculation for an error to go undetected, a multiple of g bit errors must have occurred for the remainder to still
be OK.
The choice of g is be special!

4.2.3

Flow Control Overview

The sender and receiver of frames may operate at different maximum rates due to
CPU power available, CPU loadings, etc.
Higher speed senders must be prevented from swamping lower speed receivers
in order to prevent frame losses.
Again, a feedback mechanism is employed from receiver to sender.
Can be explicit requests to send n frames, or can be handled by receiver slowing
acknowledgements.
For further information: see AST .

4.3 Network Layer in Internet


The internet can be viewed as a group of subnets joined together.
The glue is the network layer protocol called IP (Internet Protocol) which was designed with internetworking in mind.
Above the network layer is the transport layer which takes data streams and
breaks them up into datagrams, of size up to 64K byte but usually 1500 bytes,
which are handled by the network layer.
These datagrams may be fragmented into smaller units.
29

c
Phillip
Musumeci 2002

JCU

School of InfTech

At the destination, the pieces are reassembled into the original datagrams and
passed to the transport layer.

4.3.1

IP Header
32 Bits

Version

IHL

Type of service

Total length
D M
F F

Identification
Time to live

Protocol

Fragment offset
Header checksum

Source address
Destination address

Options (0 or more words)

Fig. 5-45. The IP (Internet Protocol) header.

An IP datagram contains a header part and a data part, with fields stored most
significant bit first (bigendian).
Version identifies which version of the protocol is being used this allows protocol changes to be supported.
IHL specifies the header length.
Type of service allows the host to tell the subnet what type of service can choose
from combinations of reliability versus speed.
Current use:
bits 7,6,5 = 3bit priority;
bits 4,3,2 = DTR (what is most important out of delay, throughput, reliability);
bits 1,0 = unused.
At present, routers tend to ignore this field!
Total length = datagram length (header and data).

30

c
Phillip
Musumeci 2002

JCU

School of InfTech

Identification field allows a destination host to determine which datagram a fragment that arrives belongs to (it is reassembling the datagram).
DF indicates a dont fragment request and routers should not fragment this datagram (useful if the destination cannot reassemble it e.g. when a PC is booting and
needs to receive its OS as a single datagram). All systems must be able to accept
fragments of 576 bytes or less.
MF indicates there are more fragments to come.
Fragment Offset says where this fragment belongs in the current datagram. 13 bit
size gives a maximum datagram size of 64K.
Time to live limits packets lifetimes prevents packets wondering around forever.
Usually decremented on each hop packet discarded if 0 and source warned.
Protocol field identifies the protocol: TCP, UDP, others.
Header checksum is the header checksum updated on each hop as time to live is
updated (CPU task).
Addresses specify source and destination.
Options:
 



Option
Description
 

 Security
 Specifies how secret the datagram is
 

Strict source routing  Gives the complete path to be followed
 


Loose source routing Gives a list of routers not to be missed
 
 Record route
 Makes each router append its IP address
 



Timestamp
Makes each router append its address and timestamp













Fig. 5-46. IP options.

IP addresses come in three 32 bit classes:

31

c
Phillip
Musumeci 2002

JCU

School of InfTech
32 Bits
Range of host
addresses

Class
A

10

110

1110

Network

1.0.0.0 to
127.255.255.255

Host
Network

11110

128.0.0.0 to
191.255.255.255

Host
Network

Host

192.0.0.0 to
223.255.255.255

Multicast address

224.0.0.0 to
239.255.255.255

Reserved for future use

240.0.0.0 to
247.255.255.255

Fig. 5-47. IP address formats.

4.4 IP Addresses
Are 32 bits long with the leading bits indicating the class of address.
Class A addresses: bit 31=0, a 7 bit network part, and a 24 bit host part, yielding
an address range
1.0.0.0127.255.255.255.
Class B addresses: bits 31,30=10, a 14 bit network part, and a 16 bit host part,
yielding an address range
128.0.0.0191.255.255.255.
Class C addresses: bits 3129=110, a 21 bit network part, and an 8 bit host part,
yielding an address range
192.0.0.0223.255.255.255.
Class D addresses: bits 3128=1110 and a 28 bit multicast part, yielding an address
range
224.0.0.0239.255.255.255.
Class E addresses are reserved for future use, and: bits 3127=11110
(240.0.0.0247.255.255.255).
Address space is used more efficiently if class A networks migrate to class B or C
(where possible).

32

c
Phillip
Musumeci 2002

JCU

School of InfTech

Machines that are to be connected to the internet have to obtain a registered IP


address. An Internet Service Provider and/or ftp://rs.internic.net has
details on registration of official IP address.

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

...

0 0

0 0

A host on this network

Host

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Network

...

1111

127

(Anything)

This host

1111

Broadcast on the
local network
Broadcast on a
distant network
Loopback

Fig. 5-48. Special IP addresses.


There are also a number of special IP addresses:
using 32 ones is a broadcast address to all hosts on the local network note
that this means a host can talk to its neighbours before knowing its own address;
using all ones in only the host field is a broadcast address to all hosts on the
specified network;
using 32 zeros means this host;
using all zeros in the leading address bits up to and including the network
field is an address to a hosts on the local network note that this means a
host can talk to its neighbours without knowing what network they are on;
a 7 bit network address of 127 (with any 24 bit host value) allows software
to talk to/with the network interface without any packets going onto the
wire this allows what is called loopback testing to occur with no address
information at all;
according to RFC 1597, you can use the following IP networks for private nets
which will never be connected to the Internet:
10.0.0.010.255.255.255
172.16.0.0172.31.255.255
192.168.0.0192.168.255.255
This means you could have one registered internet system and then private
networks (using parts of the above IP address ranges) attached to it. A BSD
33

c
Phillip
Musumeci 2002

JCU

School of InfTech
UNIX kernel built with firewall and ipdivert support could also handle
Network Address Translation.

4.5 Subnets
Networking requirements can change, especially in growing organisations.
A range of IP addresses can be broken up into subnets e.g. Fig. 5-49 shows a
class B address in which the original 16 bit host part has been reallocated into a 6
bit subnet part and a 10 bit host part.
This split looks the same from the outside world, so no registrations change.
However, internally, the network has been divided up into smaller subnets (less
collisions, greater total distance can be covered, etc.).
The routers internal to the organisation are simply given new details.

32 Bits

Subnet
mask

10

Network

Subnet

Host

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

Fig. 5-49. One of the ways to subnet a class B network.

4.6 IP Router Operation


Each router maintains lists of network addresses [network,0] and local host addresses [0,host].
The first [network,0] list tells how to get to remote networks.
The second [0,host] list says which hosts are known on the local network.
When an IP packet arrives, it is looked up in the routing table.
If the network part identifies a remote network, the packet is forwarded out the
appropriate interface to the next router.
34

c
Phillip
Musumeci 2002

JCU

School of InfTech

If the network part identifies the local network, it is sent directly to the host (network=0 packets are ignored).
If the network is not found in the tables, the packet is forwarded to a default router
with more extensive tables.
When subnets are added, the router now maintains tables of [network,0],
[this-network,subnet,0] and
[this-network,this-subnet,host].
ANDing the IP address with the subnet mask (Fig. 5-49) gives the particular
subnet. ANDing the IP address with the netmask identifies the network.
3level hierarchy.

4.7 Internet Control Protocols


4.7.1

Internet Control Message Protocol (ICMP)


 



Message type
Description
 

 Destination unreachable  Packet could not be delivered



Time exceeded
 
 Time to live field hit 0


Parameter problem
 
 Invalid header field
 Source quench
 Choke packet
 

 Redirect
 Teach a router about geography
 

Echo request
 
 Ask a machine if it is alive


Echo reply
 
 Yes, I am alive


Timestamp request
Same as Echo request, but with timestamp
 



Timestamp reply
Same as Echo reply, but with timestamp




















Fig. 5-50. The principal ICMP message types.

Destination unreachable as it says, or packet cannot be fragmented prevents


delivery.
Time exceeded packet is looping, or congestion or timeout problems.
35

c
Phillip
Musumeci 2002

JCU

School of InfTech

Parameter problem invalid IP packet found.


Source quench was used for flow control.
Redirect allows network knowledge to propagate.
Echo allows destinations to be checked for reachability and timestamping allows
performance measurement.

4.7.2

Address Resolution Protocol (ARP)

The interface board only knows about 48 bit LAN addresses (each board is manufactured with a unique 48 bit address).
Each interface has an IP address.
ARP is a mechanism that allows a host to find out what 48 bit LAN address belongs to an IP address. A system outputs a broadcast packet to every machine on
the network in question, asking who owns this IP address. The owner replies
with their LAN address.
ARP reduces the need for configuration files.
By having hosts cache the results, ARP requests are reduced. However, cache
entries are discarded after a few minutes so that systems that have their LAN
cards replaced due to failure get operating quickly.
It is also possible for hosts to broadcast their mapping when they bootup. No response is expected. However, a machine with the same IP address should respond
in order to prevent the second machine coming on-line and creating chaos!
It is also possible for routers to react to ARP requests for IP information belonging to remote networks. In proxy ARP, routers cooperate by forwarding the ARP
request to the appropriate network for a response to be generated (and returned).
Note: given just an internet name e.g. marlin.jcu.edu.au, another
service called the domain name service can be used to obtain the IP address
corresponding to the name.

36

c
Phillip
Musumeci 2002

JCU
4.7.3

School of InfTech
Reverse Address Resolution Protocol

RARP does the reverse of ARP.


A diskless machine about to boot up will know its 48 bit LAN address but will not
know its IP address.
An RARP sends out a broadcast packet (with all 32 address bits = 1) saying what
its LAN address is, and an RARP server responds with the IP address.
This allows the diskless machine to share boot files with other machines while
retaining its unique identity.
Broadcast packets with all address bits = 1 are not propagated by routers to avoid
unwanted traffic, & RARP servers must exist on any subnet needing them.
On UNIX, RARP can be handled by daemons such as rarpd and bootp which
are started at boot time.
Note: In some circumstances, it is desired to allocate IP numbers dynamically to
hosts. One solution is to have a UNIX system run a Dynamic Host Configuration
Protocol Server (the dhcpd daemon) which also supports the bootp protocol.

4.7.4

Interior Gateway Routing Protocol

As the internet connects many different organisations, these organisations have


been free to develop their own internal routing methods.
Early algorithms suffered as networks grew so the Internet Engineering Task Force
developed the OSPF (Open Shortest Path First) standard in 1990.
It is published hence open.
It can handle metrics such as physical distance, time delay, and others.
It is dynamic hence can adapt to changes in topology automatically.
Supports routing based on service i.e. the type of service field is now inspected so
that it is possible to handle realtime traffic (multimedia), etc.
It can do load balancing hence routers connected by multiple pathways can have
their traffic spread across the pathways to maximise performance (previously,
37

c
Phillip
Musumeci 2002

JCU

School of InfTech
routers used the best single link and ignored the others). Load spreading is important when routers are connected by multiple PPP links.

OSPF works by having adjacent routers exchange information with acknowledgement and timestamping hence, routers have up-to-date knowledge of costs etc.
In normal operation, a router floods link state update messages to its neighbours.
To minimise overall coordination traffic, one router is elected to be the designated
router and it is considered to be adjacent to all other routers.
As the routers all belong to a single organisation, they can trust one another!
 




Message type

Description

 


 Used to discover who the neighbors are

Hello







Link state update  Provides the senders costs to its neighbors 


 


Link state ack

Acknowledges link state update

 




Database description Announces which updates the sender has

 






Link state request

Requests information from the partner



Fig. 5-54. The five types of OSPF messages.

4.7.5

Exterior Gateway Routing Protocol

Border Gateway Protocol (BGP) acts as a routing protocol between organisations


according to policies chosen by the owners.
Policies can be based on politics, commercial considerations, costs, services to customers (and rejection of traffic of noncustomers), etc.
BGP is very general.
Pairs of BGP routers communicate by establishing a TCP connection.

4.8 Internet Multicasting


IP supports multicasting using class D addresses.
38

c
Phillip
Musumeci 2002

JCU

School of InfTech

The group ID has 28 bits for > 250, 000, 000 groups.
Packets addressed to multicast addresses get best effort delivery with no guarantees.
Address can be permanent e.g.
224.0.0.1 = all systems on a LAN
224.0.0.2 = all routers on a LAN
224.0.0.5 = all OSPF routers on a LAN
224.0.0.6 = all designated OSPF routers on a LAN
Temporary addresses are available to processes running on the computers. A process can ask to join a group and leave a group.
A host must therefore handle traffic to group address(es) as well as its own IP
address(es), and keep track of which groups it has processes belonging to.
Multicasting is supported by special multicast routing.
Internet Gateway Management Protocol (IGMP) allows routers to track which
groups are active on their subnet.

4.9 Classless InterDomain Routing


The granularity of class based addressing limits the efficiency of address use
class B says you get 216 addresses while class C says you get 28 addresses but what
if you want 2,001?
In CIDR, the address ranges 204.0.0.0223.255.255.255 are allocated for future use,
and
194.0.0.0195.255.255.255: Europe,
198.0.0.0199.255.255.255: North America,
200.0.0.0201.255.255.255: Central and South America,
202.0.0.0203.255.255.255: Asia/Pacific.
The network/host bit split in the IP address is variable allowing almost complete
flexibility and high efficiency.
Studies showed that the previous class based addressing was using around 50%
of available IP addresses.

39

c
Phillip
Musumeci 2002

JCU

School of InfTech

Router work is easy with respect to identifying the region on Earth, but then a
large database of information must be (quickly) accessed to determine final packet
routes. This requires a router to have higher computational power and more internal storage (the usual ...).

4.10 IPv6
Development of IPv6 stated in 1990.
Aims: support of billions of hosts; simplify routing tables; simplify protocols; better authentication and privacy; be more responsive to type of service; allow scope
of multicasting to be specified; support mobile IP addresses; allow future protocol
evolution; and have old and new protocols coexist.
Multimedia support was important.
IPv6 is compatible with TCP, UDP, ICMP, IGMP, OSPF, BGP, and DNS.
It is not directly compatible with IPv4 since it has a different header with less fields
to simply processing (and bigger addresses too).
Less fields simplify router work, although routers will have to handle both versions for maybe a decade (probably not a problem given VLSI advances).
The version field is 6 for IPv6.
Priority values 0 . . . 7 are for traffic that can be slowed down given congestion,
while 8 . . . 15 is for realtime traffic.
Addresses are 16 bytes long, with the first group of reserved addresses allocated
to IPv4.
Address use will not be efficient, but will allow fast routing.

40

c
Phillip
Musumeci 2002

JCU

School of InfTech
32 Bits

Version

Priority

Flow label

Payload length

Next header

Hop limit

Source address
(16 bytes

Destination address
(16 bytes)

Fig. 5-56. The IPv6 fixed header (required).


 



 Prefix (binary) 
 Fraction 
Usage




0000 0000
Reserved (including IPv4)
1/256 
 



0000 0001
1/256 
 
 Unassigned

 0000 001
 OSI NSAP addresses

1/128 





0000 010
Novell NetWare IPX addresses
1/128 
 


0000 011
1/128 
 
 Unassigned

 0000 1
 Unassigned

1/32 





0001
Unassigned
1/16 
 


 
 Unassigned

001
1/8 



010
Provider-based addresses
1/8 





011
1/8 
 
 Unassigned

 100
 Geographic-based addresses 
1/8 




101
Unassigned
1/8 
 



110
1/8 
 
 Unassigned

 1110
 Unassigned

1/16 





1111 0
Unassigned
1/32 
 


1111 10
1/64 
 
 Unassigned

 1111 110
 Unassigned


1/128





1111 1110 0
Unassigned
1/512 
 


1111 1110 10  Link local use addresses
 
 1/1024 



1111 1110 11
Site local use addresses
1/1024 





1111 1111
Multicast
1/256


Fig. 5-57. IPv6 addresses


41

c
Phillip
Musumeci 2002

JCU

School of InfTech

Transport and Session Layers

5.1 Transport Protocols


5.1.1

Introduction

Transport protocols (TPs) are the most important part of a communications system
(AST says the heart of the protocol hierarchy).
Its task is to provide reliable, cost-effective data transport from the source machine
to the destination machine, independent of the physical network(s) in use.
A TP shields the user from the details and characteristics of the layers below it.
The TP is complex.
Transport protocols provide the basic end-to-end service of transferring data between users. This is achieved by communication with a remote peer, and using
the services of the network layer.
The hardware and/or software within the transport layer that does the work is
called the transport entity.
Host 1
Application
(or session)
layer
Transport
address

Transport
entity

Network
address
Network layer

Host 2
Application
(or session)
layer

Application/transport
interface

TPDU
Transport
protocol
Transport/network
interface

Transport
entity

Network layer

Fig. 6-1. The network, transport, and application layers.


The existence of the transport layer makes it possible for the transport service to
be more reliable than the underlying network service.
42

c
Phillip
Musumeci 2002

JCU

School of InfTech

The transport layer offers independence from the actual network layer available.
This leads to a view of the lower 4 layers as being the transport service provider and
the upper layers as the transport service user.
The transport layer can also be viewed as a way of enhancing quality of service
(QoS).

5.1.2 Types of Service


Connection or connectionless.
Connection-oriented provides the establishment, maintenance and termination of
a logical connection between transport users. Allows the use of flow control, error
control and sequenced delivery.
A connectionless service can be more appropriate, for example:
Inward data collection;
Outward data dissemination;
Message passing;
Remote procedure calls; and
Real-time applications.

5.1.3 Qualities of Service


Not easily defined by users.
Users may be able to request a particular QoS (the transport protocol can use this
to determine how best to use the network layer).
Example QoS are:
Connection establishment delay elapsed time to request a transport connection and user receiving confirmation (includes remote time delays);
Connection establishment failure probability chance of no connection established within a maximum establishment delay time;
Throughput user number of bytes transferred / unit time over a test interval (bidirectional);
43

c
Phillip
Musumeci 2002

JCU

School of InfTech
Transit Delay delay between message being sent by the transport user at
source and it being received by the transport user at the destination machine
(bidirectional);
Residual error ratio lost or garbled messages that were not fixed;
Priority levels allow higher priority links to be serviced in the event of
congestion;

A transport protocol is limited by the nature of the underlying network, and not
all QoS may be possible.
There is also a trade-off between reliability, delay, throughput and service cost.
Sample applications QoSs:
File transfer low errors and high throughput;
Remote procedure calls low delay;
Email transfer priority levels.
Either a transport protocol provides different QoS via option negotiation, or there
are different TPs for different classes of traffic.

5.1.4

Transport Service Primitives

Provides user access to the transport service.


While the network service tends to model the (possibly unreliable) network that it
is implemented on, the transport service provides the user with access to a somewhat idealised service where acknowledgements, lost packets, congestion, etc. are
not directly visible.
The transport service is therefore easier to use, and we describe access primitives.

44

c
Phillip
Musumeci 2002

JCU

School of InfTech









Primitive
TPDU sent
Meaning




 LISTEN
 (none)
 Block until some process tries to connect




 CONNECT
 CONNECTION REQ.
 Actively attempt to establish a connection





SEND

 DATA
 Send information





RECEIVE

 (none)
 Block until a DATA TPDU arrives




DISCONNECT DISCONNECTION REQ. This side wants to release the connection


Fig. 6-3. The primitives for a simple transport service.

LISTEN a server executes a listen primitive and waits (is blocked) until a connection is established.
CONNECT a client executes a connect primitive when it wants to talk to the
server. The transport entity carries out this primitive by making the caller wait
(blocking) and sending a packet containing a transport layer message to the servers
transport entity.
CONNECTION REQUEST arrives at the server and the transport entity checks to
see if there is a server blocked on a LISTEN. If there is, it is unblocked and the
transport entity sends CONNECTION ACCEPTED message back to the client.
Data can now be exchanged between the client and the server.
A TPDU (transport protocol data unit) is a message sent from transport entity to
transport entity.

Frame
header

Packet
header

TPDU
header

TPDU payload

Packet payload
Frame payload

Fig. 6-4. Nesting of TPDUs, packets, and frames.

45

c
Phillip
Musumeci 2002

JCU

School of InfTech

Note: transport service users do not need to know about the packet nesting and
associated error detection/correction mechanisms present they simply see a
reliable link.
DISCONNECT can be asymmetric, where either transport user can issue a disconnect primitive, or symmetric, where each direction is closed separately.
Termination may be abrupt (with loss of data), or graceful. Some TPs allow fullduplex and half-duplex connections.

5.1.5 Berkeley Sockets


 



Primitive

Meaning






SOCKET

Create a new communication end point








BIND

Attach a local address to a socket






LISTEN

Announce willingness to accept connections; give queue size

 




ACCEPT

Block the caller until a connection attempt arrives

 




CONNECT

Actively attempt to establish a connection

SEND

Send some data over the connection

RECEIVE

Receive some data from the connection
















 




CLOSE

Release the connection



Fig. 6-6. The socket primitives for TCP.

Berkeley UNIX (BSD) developed the sockets interface for users to program at the
transport service level.
A successful socket call returns an ordinary file descriptor for use in following
calls (i.e. network access is file system mapped).
A server would execute the first 4 calls in the order shown, starting with a SOCKET
create call. The BIND call assigns an address for the socket.
A client also creates a socket and then does a CONNECT call in an attempt to access
a remote server (of known address).
Once a link is established, the client and server exchange data.
46

c
Phillip
Musumeci 2002

JCU

School of InfTech

When finished, both must execute a CLOSE to release the symmetric connection.

5.1.6

Addressing

To transfer data, we need the address information of the destination:


User identification;
Transport protocol identification.
Network address of destination station.
The source usually calls a certain TP with a (Station, Port) pair e.g. the Internet
uses (IP address, local port)7 .
We use the architecture-neutral term Transport Service Access Point (TSAP). There
is also an underlying Network Service Access Point (NSAP).

Host 1

Host 2

Application
process

TSAP 6

Network
connection
starts here

Transport
connection
starts here

Application
layer

Transport
layer

NSAP

Network
layer

Server
TSAP 122

NSAP

Data link
layer

Physical
layer
Phys

Fig. 6-8. TSAPs, NSAPs, and connections.


7

ATM uses (AAL, SAP); AAL = ATM adaptation layer, SAP = Service Access Point.

47

c
Phillip
Musumeci 2002

JCU

School of InfTech

How does the source know the address of the destination?


The user knows the address ahead of time;
The destination has a well-known address;
A name server is used (it must be listening on a well-known address);
The target is created at request time. The request goes to a well- known server,
which creates the target and passes it the connection.
In BSD UNIX, the inetd daemon is started at boot-time and it listens for incoming
requests (see Figure 6.9). Also see the man page for inetd.

Host 1

Host 2

Host 1

Host 2
Time
of day
server

Layer
Process
Server

User

Process
Server

User

TSAP

(a)

(b)

Fig. 6-9. How a user process in host 1 establishes a connection


with a time-of-day server in host 2.

If a name server is used, then new services will require a means of registering their
availability to the name server.

5.1.7

Multiplexing

If there are multiple users on one station, the TP must differentiate data to/from
each user and each connection.
48

c
Phillip
Musumeci 2002

JCU

School of InfTech

Thus, the TP must multiplex/demultiplex data to/from the network layer.


Aside: an example of the use of multiplexing also appears in networks with high
delays such as satellite links a user with high throughput needs could multiplex
data across multiple open network connections.

5.1.8

Example Transport Protocol TCP

TCP stands for Transmission Control Protocol.


Used on top of DoDs Internet network, which is unreliable, nonsequencing and
connectionless.
Provides a full-duplex reliable connection between two transport users.
Also provides out-of-band (urgent) data transfer between users.
A user can push any data waiting to be transmitted at the transport layer.
TCP uses a sliding window method to provide both flow and error control.
3-way connection handshake, exchanging window sizes, sequence numbers, and
other connection parameters.
Other QoS such as delay, precedence etc. are handled at the network layer.
TCP fragments and reassembles data to fit the MDU.
TCP reorders arriving data according to the sequence numbers.
The TCP header is 20 bytes long:

49

c
Phillip
Musumeci 2002

JCU

School of InfTech
32 Bits

Source port

Destination port
Sequence number
Acknowledgement number
U A P R S F
R C S S Y I
G K H T N N

TCP
header
length
Checksum

Window size

Urgent pointer
Options (0 or more 32-bit words)

Data (optional)

Fig. 6-24. The TCP header.

As TCP uses IP, which also has a 20 byte header, the header overhead per packet
is 40 bytes, not including data link layer headers.

5.2 TCP/IP demonstration client


5.2.1

Introduction

From Internetworking With TCP/IP Vol. 3 (BSD socket version) by Douglas


Comer & David Stevens, Prentice Hall.
Provides peertopeer communication.
Common approach is to use clientserver paradigm:
Because TCP/IP does not provide any mechanisms that automatically create
running programs when a message arrives, a program must be waiting to
accept communication before any requests arrive.
Here, an application program that initiates peertopeer communication is
called a client.

50

c
Phillip
Musumeci 2002

JCU

School of InfTech
Similarly, a server is any program that waits for an incoming communication
request from a client, and then performs any necessary computation in order
to send a reply.

5.2.2

Privilege and Complexity

Servers may provide controlled access to information that belongs to the operating
system or users, hence protection of data is essential.
Authentication verifying the identity of the client.
Authorisation determining whether a particular client is permitted access to a
particular service.
Data security do not want data unintentionally revealed or compromised.
Privacy.
Protection guarantee network applications cannot abuse system resources.

5.2.3

Standard versus nonstandard clients

Standard applications services consist of those services that are defined by TCP/IP
and are assigned wellknown, universally recognised protocol port identifiers.
All other services may be considered to be locallydefined application services or
nonstandard services.
See definitions in file /etc/services on BSD UNIX hosts (sample also available
at
http://mirriwinni.cs.edu.au/phillip/intro2cn/services ).

5.2.4

Connectionless v connectionoriented SVRs

If the client and server communicate using UDP (User Datagram Protocol), the
interaction is connectionless.
If the client and server communicate using TCP (Transfer Control Protocol), the
interaction is connectionoriented and error detection and correction is handled.
51

c
Phillip
Musumeci 2002

JCU
5.2.5

School of InfTech
Program Interface to Protocols

Loosely specified protocol software interface details of how applications software should interface with TCP/IP protocol software is not specified, but the required functionality is suggested.
Only a few interfaces exist one is the socket interface (or sockets) defined for the
Berkeley UNIX Operating System (BSD), and another is the TLI (Transport Layer
Interface) defined by AT&T.
In the PC world, the winsock library originally provided access to Berkeley style
sockets.

5.2.6

Interface Functionality

Allocate local resources for communication.


Specify local and remote communications endpoints.
Initiate a connection (client side).
Wait for an incoming connection (server side).
Send or receive data.
Determine when data arrives.
Generate urgent data.
Handle incoming urgent data.
Terminate a connection gracefully.
Handle connection termination from the remote side.
Abort communication.
Handle error conditions or a connection abort.
Release local resources when communication finished.

52

c
Phillip
Musumeci 2002

JCU
5.2.7

School of InfTech
System Calls

A system call can be thought of as a function call made with a subroutine supplied
as part of the operating system.
The system subroutine was written as part of the OS so it should be able to protect
itself and the network from incorrect calls.
Design approaches: create new system calls, or extend existing system calls to
handle networking.
The BSD sockets interface extends the file system handling system calls to also
handle networking in a openreadwriteclose paradigm.
Sample source code, based on examples given in the text of Douglas Comer &
David Stevens, is available at URL

http://mirriwinni.cs.edu.au/phillip/intro2cn/tcpip-demo

5.2.8

BSD Tutorial

An Advanced 4.4BSD Interprocess Communication Tutorial is available on-line at


URL

http://mirriwinni.cs.edu.au/phillip/intro2cn/BSD

53

c
Phillip
Musumeci 2002

JCU

School of InfTech

TCP/IP Protocols

6.1 Introduction
We consider the following topics:

Review of TCP/IP Layering;


Overview of User Datagram Protocol (UDP);
Reliable Stream Transport Service (TCP).

Additional References:
Internetworking With TCP/IP Vol. 1, D.E. Comer, 2nd edition, Prentice-Hall,
1991.
ISBN 0-13-468505-9.
Computer Networks and Internets, D.E. Comer, Prentice-Hall, 1997. ISBN 0-13599010-6.

6.2 Review of TCP/IP Layering


TCP/IP protocols are organised into 5 conceptual layers:
Layer 1: Physical. Basic network hardware.
Layer 2: Network Interface. Organises data into frames and transmits them.
Layer 3: Internet. Specify packet format. Handles packet forwarding through one
or more routers to final destination.
Layer 4: Transport. Handles reliable transfer.
Layer 5: Applications.

54

c
Phillip
Musumeci 2002

JCU

School of InfTech

6.3 User Datagram Protocol


TCP/IP is capable of transferring IP datagrams amongst host computers.
At the IP layer, no further distinction other than IP address specifies the user or
application to receive the datagram.
How can multiple destinations at a host be specified?
The mechanism that was developed came from the BSD world.
BSD unix is a multiprocessing operating system where executing programs, referred to as processes or tasks, can be part of the OS or can be user level tasks.
A task was chosen as the ultimate destination on the host computer. However,
this was not the total answer:
Tasks are dynamic i.e. they are continuously created and destroyed, so the sender
is unlikely to know the ID of the recipient task;
We would prefer to be able to change the recipient task without informing the
sender (maybe the recipient task needs restarting);
we would prefer to specify a destination task according to a service rather than its
ID number.
The solution developed uses a set of abstract destination points called protocol
ports8 .
A destination can therefore be specified with host name,port number.
Observe that we have said nothing about a user. Any further specification of recipient will require an additional authentication service. This approach keeps things
simple.
The solution makes use of the underlying IP for basic (unreliable) packet delivery.
Operating systems provide a synchronous access to ports.
If a task attempts to access data before it arrives, that task is blocked (put to sleep)
and when data becomes available, the OS restarts it.
Ports are buffered by the OS (finite size buffer).
8

These are the service access points mentioned earlier in semester 1.

55

c
Phillip
Musumeci 2002

JCU

School of InfTech

When communicating with a remote port, each message carries a destination port
number on the foreign machine and a source port number on the source machine
to which replies are to be addressed.
Thus, any task that receives a message can also reply.
The User Datagram Protocol (UDP) provides unreliable connectionless delivery
service using IP to transfer messages between hosts. It adds the ability to distinguish among multiple destinations within a single host.
An application that uses UDP must handle the problem of reliability, including
message loss, duplication, delay, out-of-order delivery, etc.
These problems are often underestimated by developers who prototype on highly
reliable, low delay private LANs.
Conceptual layering:
Application
User Datagram (UDP)
Internet (IP)
Network Interface
Format of UDP Messages:
UDP Source Port [0-15]
UDP Destination Port [16-31]
UDP Message length [0-15]
UDP Checksum [16-31]
Data [0-31]
...
A 64-bit header is followed by data.
Four 16-bit fields specify source and destination port numbers, the message length
(bytes in header+data), and checksum.
The source port is optional used only if replies are needed.
The checksum is optional however, as IP does not compute a checksum on its
data payload, this checksum should be used if data integrity is to be checked.
A UDP datagram is encapsulated inside an IP datagram for transmission i.e. UDP
prepends a header to user data and passes this new packet to the IP layer, which
prepends an IP header and passes this new(er) packet to the network layer.
As usual, the layered structure provides a (fairly) clear separation of duties:
56

c
Phillip
Musumeci 2002

JCU

School of InfTech

IP is responsible for transferring data between a pair of hosts;


UDP is responsible for differentiating among multiple sources and destinations
within one host.
There is a minor violation of the layering principle the UDP must know the IP
addresses in order to fully specify a source and destination, which means that the
UDP layer must interact with the IP layer to a very limited extent.

6.4 UDP Multiplexing


A common feature of the layer structure to protocols is that a layer can multiplex
its services to multiple user at the next layer up.
In UDP, an obvious way to multiplex is via port numbers.
In practice, an application program must negotiate with the OS to obtain a protocol
port and an associated port number before it can send a UDP datagram.
The Socket Abstraction developed in BSD provides the programmer with functions
needed to send and receive data.
When processing input, UDP accepts incoming datagrams from the IP layer and
demultiplexes according to UDP destination port.
If local host software has negotiated with the OS to accept incoming UDP datagrams on a given port, the OS will have setup queues to hold this data.
Demultiplexed UDP packets are placed in their corresponding queues.
If no local task has requested that it be passed UDP packets on a certain port,
demultiplexing cannot occur and the UDP layer returns a message to say that a
port is unreachable.
How are port numbers assigned?

6.5 UDP Port Number Allocation


Port numbers could be permanently assigned to a particular service by a central
authority, or they could be organised dynamically.

57

c
Phillip
Musumeci 2002

JCU

School of InfTech

The BSD people chose a hybrid solution.


The 16-bit numbers have some constant assignments for services for the very small
values 0-1023 these are known as the well-known port assignments.
The Registered Ports are those from 1024 through 49151.
the Dynamic and/or Private Ports are those from 49152 through 65535.
Some common UDP services are:
tcpmux
echo
discard
systat
daytime
ftp-data
ftp
ssh
telnet
#
smtp
time
nameserver
nicname
xns-time
domain
whois++
sql*net
bootps
bootpc
tftp
finger
http
kerberos-sec
hostname
pop2
pop3
sunrpc
sqlserv
nntp
#x11

1/udp
#TCP Port Service Multiplexer
7/udp
9/udp
sink null
11/udp
users
#Active Users
13/udp
20/udp
#File Transfer [Default Data]
21/udp
#File Transfer [Control]
22/udp
#Secure Shell Login
23/udp
24/udp
any private mail system
25/udp
mail
#Simple Mail Transfer
37/udp
timserver
42/udp
name
#Host Name Server
43/udp
whois
52/udp
#XNS Time Protocol
53/udp
#Domain Name Server
63/udp
66/udp
#Oracle SQL*NET
67/udp
dhcps
#Bootstrap Protocol Server
68/udp
dhcpc
#Bootstrap Protocol Client
69/udp
#Trivial File Transfer
79/udp
80/udp
www www-http
#World Wide Web HTTP
88/udp
kerberos
# krb5 # Kerberos (v5)
101/udp
hostnames
#NIC Host Name Server
109/udp
postoffice
#Post Office Protocol - Version 2
110/udp
#Post Office Protocol - Version 3
111/udp
rpcbind
#SUN Remote Procedure Call
118/udp
#SQL Services
119/udp
usenet
#Network News Transfer Protocol
6000-6063/udp
X Window System

You can list port numbers on BSD (and compatible) operating systems with a command such as fgrep udp /etc/services. See also http://www.isi.edu/innotes/iana/
assignments/port-numbers

58

c
Phillip
Musumeci 2002

JCU

School of InfTech

Notice UDP port 1 is titled TCP Port Service Multiplexer. Connections on this fixed
port number are used to organise dynamic port numbers.
Notice the UDP port range 6000-6063. This entry is commented out but it reminds
users that the X-windows system is allocated TCP ports in this range. As we shall
see, TCP provides similar port multiplexing and the convention is that UDP and
TCP port numbers are allocated the same.

6.6 Reliable Stream Transport Service (TCP)


A reliable stream transport service is provided so that applications need not handle error detection and correction. Problems addressed: packet loss; out-of-order
delivery; delay; duplication; optimal packet size.
The Transmission Control Protocol (TCP) defines this Internet service.
The protocol designers have attempted to find a general purpose solution to the
problems of providing reliable stream delivery. This gives a cleaner separation of
applications software from underlying networking software, and assists in debugging.
Properties of a reliable delivery service:
Stream orientation
An application program can send a stream of bits, with exactly these data bits
being received by the recipient.
Virtual Circuit Connection
The sender can place a call to a possible recipient, with the protocol handling the
call setup. After data is transferred, the protocol also handles the disestablishment
of the link or call teardown.
Buffered Transfer
The sender can inject data into the link in any size that it finds convenient, and the
receiver can extract data from the link in its preferred size.
To make transfer more efficient, the protocol collects enough data from a stream
to fill a reasonably large datagram before transmitting it.
For applications where data must be delivered even though it does not fill a buffer,
TCP provides a push facility.

59

c
Phillip
Musumeci 2002

JCU

School of InfTech

Unstructured Stream
Application programs must agree on the structure of the data transferred.
Full Duplex Connection
concurrent data transfer can occur in both directions.

6.7 Providing Reliability


Most reliable protocols use a single fundamental technique known as positive acknowledgement with retransmission.
The technique requires that a recipient sends an acknowledgement back to the
sender as it receives data.
The sender keeps a record of each packet sent, and waits for an acknowledgement
while running a wait timer.
A successful transaction, in its simplest form, has the sender and recipient exchanging packets.
Packet loss is detected by a timeout, and results in a retransmission.
Packet duplication9 can be handled by including a sequence number in each packet
transferred.
If the sender and recipient communicate in turn, the link is effectively half duplex
and much bandwidth is lost.
The idea of sliding windows is that the sender can transmit a number of packets
before waiting for acknowledgement.
This can be thought of as a system where a number of packets ready for transmission are available, and any window of 8 may be accessed.
Only after a packet is acknowledged can the window be moved further along to
uncover the next packets for transmission.
Initial Window
1 2 3 4 5 6 7 8
1

9 10

2 3 4 5 6 7 8 9

10

Packet ACK: window slides


9

Varying delays might mean a packet could be resent and two copies be received.

60

c
Phillip
Musumeci 2002

JCU

School of InfTech

Network utilisation is much improved as the protocol can essentially keep the
network saturated with packets.

6.8 What does TCP provide?


The protocol specifies the format of the data and acknowledgements that two computers exchange to achieve reliable transfer, and the procedures to ensure that data
arrives correctly.
The protocol does not prescribe how applications programs use TCP, and does
does not prescribe how TCP uses the underlying services.
This allows TCP to be implemented on a variety of underlying communications
systems.
Like UDP, TCP resides above the IP layer.
TCP allows multiple applications to communicate simultaneously.
Like UDP, TCP uses protocol port numbers to identify the ultimate destination
within a machine.
Conceptual layering:

Application
Reliable Stream (TCP) User Datagram (UDP)
Internet (IP)
Network Interface

However, TCP is built on a connection abstraction in which objects are identified as


virtual circuit connections, and not just end points.
Since TCP identifies a connection by a pair of endpoints, a given TCP port number
can be shared by multiple connections on the same machine.

6.9 TCP Connections


Both end points must agree to cooperate.
There are Passive and Active Opens.
61

c
Phillip
Musumeci 2002

JCU

School of InfTech

An application at one end performs a passive open by informing its OS that it will
accept an incoming connection. The OS assigns a TCP port number (and allocates
necessary resources).
An application at the other end can then perform an active open request to establish
the connection.

6.10 Segments and Streams


TCP views its data stream as a sequence of octets or bytes which it divides into
segments for transmission. A segment will usually travel in a single IP datagram.
TCP uses a sliding window mechanism to provide efficient transmission and flow
control.
A sliding window mechanism makes it possible to send multiple segments before an acknowledgement is received keeps network busy so throughput is
increased.
The use of acknowledgements still allows a receiver to restrict data flow as required by its needs (buffers, data use).
The TCP sliding window mechanism operates at the octet/byte level and not the
segment or packet level. The following diagram shows bytes as Bn:

Current Window
B1

B2 B3 B4 B5 B6 B7 B8 B9

B10

The pointer at the left separates bytes on the left that have been sent and acknowledged. The pointer at the right marks the right of the sliding window, indicating
the highest byte in the window that could be sent. The middle pointer separate
bytes sent from those not yet sent.
The protocol sends bytes in the window as soon as possible, so the window shown
here moves rapidly to the right.
Recall that acknowledgements travel back from the receiver to the transmitter, in
a reverse link. In effect, each end must manage two sets of sliding windows
62

c
Phillip
Musumeci 2002

JCU

School of InfTech
one set for its outgoing data (where it determines what to send or resend next),
and one set for its incoming data (where at a minimum it determines what to
acknowledge).

6.11 Variable Window Size and Flow Control


TCP supports a time varying window size.
Each acknowledgement contains a window advertisement that specifies how many
additional octets of data the receiver is prepared to receive.
We now have a feedback mechanism which assists the sender in not exceeding the
capacity of the receiver.
In response to an increase in window size, a sender now sees more bytes in its
window and proceeds to transmit them. In response to a decrease in window size,
a sender may need to temporarily stop sending if its window shrinks back past its
middle pointer.
This adaptive windowing mechanism helps the internet handle hosts and gateways of various speeds and capacities.
The flow control ensures that the end point hosts do not have buffer overflows etc.
The intermediate gateways must also avoid data loss if this happens, it is called
congestion. A TCP implementation must be tuned so that it can detect and recover from congestion an inappropriate retransmission scheme can make losses
worse.

6.12 TCP Segment Format


The basic unit of data exchanged in TCP is the segment.
Segments are exchanged to establish connections, transfer data, carry acknowledgements, advertise window sizes, and close connections.
A segment contains a TCP header followed by data:

63

c
Phillip
Musumeci 2002

JCU

School of InfTech
SOURCE PORT
DESTINATION PORT
SEQUENCE NUMBER
ACKNOWLEDGEMENT NUMBER
HLEN/RESERVED/CODE BITS
WINDOW
CHECKSUM
URGENT POINTER
OPTIONS (if any)
PADDING
DATA

Fields SOURCE PORT and DESTINATION PORT contain TCP port numbers which
identify the application programs at the end points.
The SEQUENCE NUMBER identifies the position in the senders byte stream of the
data in this segment.
The ACKNOWLEDGEMENT NUMBER identifies the number of the octet that the source
expects to receive next (i.e. earlier bytes are OK).
The HLEN integer specifies the segment header length in units of 32-bits, because
the OPTIONS field (and header) has variable length.
The 6-bit CODE BITS specify the purpose and contents of the segment. From left
to right, these bits are:
1. URG - urgent pointer field is valid
2. ACK - acknowledgement field is valid
3. PSH - this segment requests a push
4. RST - reset the connection
5. SYN - synchronise sequence numbers
6. FIN - sender has reached end of its byte stream
The WINDOW field advertises how much data the sender of the packet is willing to
receive.

6.12.1

Out of Band Data

Out of band data is data that the sender wishes to have handled as soon as possible
i.e. out of order.
Examples use: typing an interrupt command in a telnet window linked to a remote
Unix host.
The URG bit is used to indicate that urgent data is present, with its location in the
window given by the URGENT POINTER.
64

c
Phillip
Musumeci 2002

JCU

School of InfTech

6.12.2

Maximum Segment Size

Both ends need to agree on the maximum segment size they will transfer (impacts
on resource allocation at the end points).
The OPTIONS header field is used in negotiations, where each end specifies the
maximum segment size (MSS) that it is willing to receive.
For better efficiency, an MSS should be chosen so that the resulting IP datagram
matches the MTU of the underlying network.
If an MSS is chosen too small: network utilisation suffers (a segment size of 41
bytes of which 40 bytes are header is very inefficient).
If an MSS is chosen too large: the segment, which is encapsulated in an IP datagram (which is itself encapsulated in a network frame), may need to be fragmented in order to fit into the datagram. This results in extra frag/defrag overheads and a single lost fragment means the whole segment must be resent (recall
that the TCP window mechanism operates on segments).
An optimum segment size S occurs when the IP datagrams carrying the segments
are as large as possible without requiring fragmentation anywhere along the path
from the source to the destination.
Can an optimal S be found?
There is no probing mechanism built into TCP.
Also, the network routes can be time varying, in response to topology changes
(nodes may fail or come on-line) or congestion avoidance (time outs cause alternative routes to be used).
There is currently no standard way to find S.

6.12.3

TCP Checksum Computation

A 16-bit arithmetic checksum of a segment allows the receiver to verify that the
TCP header and data has been received without error.
To allow the TCP protocol to verify the correct source and destination host, the
following information is prepended to the segment for the purposes of checksum
calculations:

65

c
Phillip
Musumeci 2002

JCU

School of InfTech
SOURCE IP ADDRESS
DESTINATION IP ADDRESS
ZERO PROTOCOL
TCP LENGTH
SOURCE PORT
DESTINATION PORT
SEQUENCE NUMBER
ACKNOWLEDGEMENT NUMBER
remainder of segment follows

6.13 Acknowledgements and Retransmission


TCP acknowledgements specify the sequence number of the next octet that the
receiver expects to receive.
Counting from 0, this is in effect the number of contiguous correct bytes received.
This is sometimes called a cumulative scheme.
Advantages: easy to compute and unambiguous; any loss does not automatically
force retransmission.
Disadvantages: the sender does not receive information about all successful transmission. If the first of n segments was lost, the sender will eventually retransmit
the lost segment but if the receiver does not then acknowledge all n segments
quickly enough, further unnecessary retransmission is not avoided.
An attempt to avoid this extra retransmission by waiting for an acknowledgement
in order to decide how much to retransmit is similar to discarding the moving
window mechanism and its advantages. Timing is clearly critical.

6.14 TCP Timeouts and Retransmission


TCP is intended for use in an internet environment so it must accommodate time
varying delays in segment delivery.
A segment may traverse a single low-delay link (e.g. local network) or it may traverse a cascade of interconnected links. It is impossible to know the characteristics
of the link beforehand.
This adverse environment is handled by using timers on each segment, coupled
to adaptive timeout mechanisms.

66

c
Phillip
Musumeci 2002

JCU

School of InfTech

TCP monitors the performance of each connection and deduces reasonable values
for timeouts. As the characteristics of the link change, TCP updates its timeout
values.
By measuring the time between segment transmission and segment acknowledgement, TCP obtains a sample round trip time or round trip sample. A new sample value
allows the estimate of round trip time (RTT) to be updated.
An early method to update RTT was to use an exponential forgetting factor, which
gives a response curve like a first order RC circuit discharge:
RT Ti+1 = RT Ti + (1 )RT Tsample
i.e. the applied to the ith RTT discharges it (scales it down) to give the historical
contribution to the (i + 1)th RTT for < 1; and
(1 ) scales the contribution of the sample measurement to RTTi+1 .
An early method to determine the timeout used a scaled value of RTT i.e. Timeout
= RTT. Originally, was set to 2 (too close to 1 and retransmission might occur
too easily).
From a control theory point of view, these techniques are very simple but they do
allow a system to operate reasonably well in the face of random noise and other
unmodelled disturbances.
RTT sample measurements can be ambiguous when a late segment is received,
is it the result of a timeout triggered retransmission or did it just take a very long
time to arrive?
If the segment is the original that was in transit for a long time, but the receiver
assumes it is a retransmission (with smaller transit time), then the updated RTT
will be smaller (even though RTT is actually large).
If the received segment is assumed to be from a retransmit but it is not, RTT update
is again incorrect.
How can we avoid incorrect updates to RTT?
Karns algorithm allows the previous RTT updates to only occur on timing data
from unambiguous acknowledgments10 .
10

Phil Karn is an amateur radio enthusiast who developed this algorithm to allow TCP operation across
a high loss packet radio link.

67

c
Phillip
Musumeci 2002

JCU

School of InfTech

To handle sharp increases in actual round trip time, Karns algorithm also has a
timer backoff strategy whereby, if a retransmission occurs:
new timeout = old timeout
where is typically 2.
The systems updates are now chosen according to the internets behaviour.
When the internet is well behaved, the terms and control updates of RTT and
Timeout.
When the internet misbehaves, Karns algorithm uses to control timeout updates. This decouples the estimate of timeout from round trip travel time. Only
when an acknowledgement arrives does the update of timeout become linked to
RTT estimates.

6.15 TCP Links with High Variance in Delay


What happens if round trip times possess a high variance?
Research has shown that the previous algorithms do not adapt well in the face of
wide variations in delay.
Queueing theory suggests that the variation in RTT, , varies in proportion to the
inverse of remaining network capacity i.e.

1
;0 L 1
1L

where L is network load.


Studies suggest that, for our networks model, the above relationship is an equality
and RT T = 2.
For L=0.5, = 2(1/0.5) = 4 i.e. RTT can vary by a factor of 8.
For L=0.75, = 2(1/0.25) = 8 i.e. RTT varies by 16.
A highly loaded network will have widely varying RTT.
For = 2, it can be shown that RTT can only adapt to loads of 30%.
In 1989, TCP implementations were required to estimate both the average and
variance of RTT, and to use the estimated variance in place of , as described by
the following recursions:
68

c
Phillip
Musumeci 2002

JCU

School of InfTech
RT Ti+1 = RT Ti + (1 ) RT Tsample
DEVi+1 = DEVi + (1 ) |RT Ti RT Tsample |

where is a smoothing factor that controls how much weight is given to an old
ith value when producing a new (i + 1)th value, and the (1 ) factor controls the
weight of new information;
DEV is an estimate of standard deviation which works well (it is actually the
deviation smoothed).
Implementation efficiency: use = a fraction composed of 1/2, 1/4, 1/8 etc. so
integer arithmetic can be used.
This discussion has focussed on how to determine RTT and timeout values in a
real network. Question: what should be done when congestion occurs?

6.16 Response to Congestion


Congestion is a form of severe delay caused by an overload of datagrams at one
or more switching points.
When congestion occurs, delays increase and the gateways begin to enqueue datagrams until they can be sent. If buffer capacity is exceeded, packets are dropped.
Endpoints do not know the details of where or how congestion occurs they see
only increased delay.
Retransmission in response to increased delay leads to more network traffic, which
aggravates congestion and leads to congestion collapse.
The network gateways can monitor the buffer sizes and signal hosts that congestion has occurred11 .
To avoid congestion collapse, TCP must reduce transmission rates and it does this
by adaptively adjusting its effective window size.
We have stated that the transmit window size depends on the advertised window
size sent by the other end in acknowledgements.
11

Internet Control Message protocol (ICMP) allows gateways and hosts to exchange error and control
messages.

69

c
Phillip
Musumeci 2002

JCU

School of InfTech

In practice, TCP maintains a second limit called the congestion window limit.
TCP operates with an Allowed window = min{ advertised window, congestion window }. The congestion window is shrunk during times of congestion.
Multiplicative decrease Congestion Avoidance: Upon loss of a segment, reduce the
congestion window by half (minimum 1). For segments remaining in the allowed
window, backoff the retransmission timer exponentially.
If congestion is assumed, TCP reduces traffic volume exponentially. If loss continues, the result is TCP attempting to send a single datagram (with exponentially
increasing timeout).
The idea is to provide significant and fast load reductions to the gateways.
Slow-Start Recovery: Whenever starting traffic on a new connection or increasing
traffic after a period of congestion, start the congestion window at the size of a single segment and increase the window by 1 segment after each acknowledgement
arrives.
Once the window reaches half of the advertised size, a congestion avoidance phase
is entered and subsequent increases in window size occur only if all segments in
the window are acknowledged.
AST Figure 6-32 shows a system that started out with a congestion window size =
64K when a timeout occurred. The threshold is set to half this value (32K) while
the congestion window size shrinks to 1K at transmission number 0.
The window then grows to 2K, 4K, 8K, 16K, 32K until at transmission 5 congestion
avoidance mode starts.
Window size then grows linearly up to 40K when a timeout occurs after transmission 13. The congestion window size is then set to 40K/2=20K and transmission
resumes. If no more problems occur, the window size can grow as large as the
advertised window size.

70

c
Phillip
Musumeci 2002

JCU

School of InfTech
44

Timeout

40

Congestion window (kilobytes)

36

Threshold

32
28
Threshold

24
20
16
12
8
4
0
0

10
12
14
Transmission number

16

18

20

22

24

Fig. 6-32. An example of the Internet congestion algorithm.

6.17 Open and Close of TCP Connections


TCP uses a three-way handshake to establish and close TCP connections.
TCP Establish:
The first message is sent with the SYN code bit set, and some sequence number x.
This message is received and a reply (second message) is sent with the SYN code
bit set, some sequence number y, and containing acknowledgement x+1.
Finally, the end point setting up the link replies (third message) with a segment
containing an acknowledgement.

TCP Open
Events at site 1
Send SYN seq=x

Events at site 2
&
Receive SYN segment
Send SYN seq=y, ACK x+1
.

Receive SYN+ACK seg


Send ACK y+1
&
receive ACK segment

71

c
Phillip
Musumeci 2002

JCU

School of InfTech

TCP Close:
Note that TCP connections are full duplex and we can think of them as two unidirection independent streams. Once a link is closed, data can no longer travel in
that direction (but control segments can still travel in the opposite direction, as can
data until that link is also closed).
One further point is that the end which is told to close sends back two messages,
one to acknowledge the request, and one to confirm that there is no more data
present once the application has been informed.

TCP Close
Events at site 1
Send SYN seq=x

Events at site 2
&
.

Receive SYN segment


Send ACK x+1
(app. informed)

Send FIN, ACK x+1


(app. informed)

Receive ACK seg


Receive FIN+ACK seg
Send ACK y+1
&
receive ACK segment

6.18 Reset of TCP Connections


TCP must be able to handle exception events where the link must be shutdown.
Example: an application program may have entered an abnormal state and is being aborted.
To reset a connection, one side sends a segment with the RST bit set in the code
word.
The other side responds by aborting.
Data transfer ceases immediately, and all resources are deallocated.

6.19 TCP Protocol FSM


The operation of TCP can best be explained by a finite state machine model See
Comer Figure 12.13 or Tanenbaum Figure 6-28.
An application starts in the closed state, where it can issue a passive or active open
and progress to a new state.
Normal operation from these states leads to the established state.
72

c
Phillip
Musumeci 2002

JCU

School of InfTech

When the link is shutdown, the FSM performs waits (longer than twice the segment lifetime) to avoid interference between links.

(Start)
CONNECT/SYN
CLOSED
CLOSE/
LISTEN/

CLOSE/

SYN/SYN + ACK
LISTEN

SYN
RCVD

RST/

SEND/SYN

SYN/SYN + ACK

(simultaneous open)

SYN
SENT

(Data transfer stake)


ACK/

ESTABLISHED

CLOSE/FIN
CLOSE/FIN

SYN + ACK/ACK
(Step 3 of the three-way handshake)
FIN/ACK
(Passive

(Active close)
FIN/ACK
FIN
WAIT 1

CLOSE
WAIT

CLOSING

ACK/

CLOSE/FIN

ACK/
FIN + ACK/ACK

FIN
WAIT 2

Close)

FIN/ACK

LAST
ACK

TIMED
WAIT

(Timeout/)
CLOSED

ACK/

(Go back to start)

Fig. 6-28. TCP connection management finite state machine. The heavy solid line is the
normal path for a client. The heavy dashed line is the normal path for a server. The
light lines are unusual events.

6.20 Forced Data Delivery


The push operation allows an application to force delivery of data in the stream,
even if the utilisation of the underlying datagram will be inefficient.
This allows interactive applications to achieve a better response.

73

c
Phillip
Musumeci 2002

JCU

School of InfTech

6.21 Reserved TCP Port Numbers


TCP provides static and dynamic port binding using a well known port assignments
for commonly invoked programs.
As mentioned for UDP, port numbers of UDP and TCP are usually the same even
though they are independently assigned.
Some services may be connected to via either UDP or TCP e.g. domain name
server (DNS).
Some common TCP services are:

tcpmux
echo
discard
systat
daytime
chargen
ftp-data
ftp
ssh
telnet
smtp
nameserver
nicname
domain
bootps
bootpc
tftp
gopher
finger
http
kerberos-sec
hostname
pop2
pop3
sunrpc
audionews
nntp
ntp
imap
snmptrap
xdmcp
bgp
ris
appleqtc
kpasswd5

1/tcp
7/tcp
9/tcp
11/tcp
13/tcp
19/tcp
20/tcp
21/tcp
22/tcp
23/tcp
25/tcp
42/tcp
43/tcp
53/tcp
67/tcp
68/tcp
69/tcp
70/tcp
79/tcp
80/tcp
88/tcp
101/tcp
109/tcp
110/tcp
111/tcp
114/tcp
119/tcp
123/tcp
143/tcp
162/tcp
177/tcp
179/tcp
180/tcp
458/tcp
464/tcp

#TCP Port Service Multiplexer


sink null
users

#Active Users

ttytst source
#Character Generator
#File Transfer [Default Data]
#File Transfer [Control]
#Secure Shell Login
mail
#Simple Mail Transfer
name
#Host Name Server
whois
#Domain Name Server
dhcps
#Bootstrap Protocol Server
dhcpc
#Bootstrap Protocol Client
#Trivial File Transfer

www www-http
#World Wide Web HTTP
kerberos
# krb5 # Kerberos (v5)
hostnames
#NIC Host Name Server
postoffice
#Post Office Protocol - Version 2
#Post Office Protocol - Version 3
rpcbind
#SUN Remote Procedure Call
#Audio News Multicast
usenet
#Network News Transfer Protocol
#Network Time Protocol
imap2 imap4 #Interim Mail Access Protocol v2
snmp-trap
#X Display Manager Control Protocol
#Border Gateway Protocol
#Intergraph
#apple quick time
# Kerberos (v5)

74

c
Phillip
Musumeci 2002

JCU

School of InfTech

klogin
kshell
dhcpv6-client
dhcpv6-server
imap4-ssl
nfsd
hylafax

543/tcp
544/tcp
546/tcp
547/tcp
585/tcp
2049/tcp
4559/tcp

# Kerberos (v4/v5)
krcmd
# Kerberos (v4/v5)
#DHCPv6 Client
#DHCPv6 Server
#IMAP4+SSL (use of 585 is not recommended,
nfs
# NFS server daemon
#HylaFAX client-server protocol

A note on some TCP services:


ftp (file transfer protocol) operates with two connections ftp for control and
ftp-data for data.
ssh (secure shell) and telnet provide remote login to UNIX systems ssh is preferred as it supports encryption and can also provide additional pipes for remote X-windows communication, file transfer, etc.
smtp (simple mail transfer protocol) allows UNIX systems to exchange email,
while email delivery to a users desktop system from the UNIX server might employ pop (post office protocol v2 or v3) or imap.
bootp (bootstrap protocol) and tftp (trivial file transfer protocol) support the boot
up of diskless systems by providing a means to download an OS.
dhcp (dynamic host configuration protocol) and bootp provide network hosts
with IP configuration data.
nfs (network file system) is a file networking system originating from Sun Microsystems.
telnet provides connection to remote systems, and it also allows some access to
remote servers on specific port numbers via a command format of telnet host
service e.g.
telnet cay.cs.jcu.edu.au daytime,
telnet cay.cs.jcu.edu.au smtp.
Further port assignments can be obtained from
http://www.isi.edu/in-notes/iana/assignments/port-numbers.

Well Known Ports: 0 through 1023.


Registered Ports: 1024 through 49151.
Dynamic and/or Private Ports: 49152 through 65535
75

c
Phillip
Musumeci 2002

JCU

School of InfTech

6.22 TCP Summary


Provides reliable stream delivery.
Provides a full duplex connection.
Sliding windows allow efficient use of network.
TCP makes few assumptions about the underlying delivery system so it can operate on a variety of such systems.
Provides flow control with the receiver stating how much data it can receive.
Supports out of band messages and push.
Features an adaptive retransmission mechanism with slow-start, multiplicative
decrease, additive increase, and also a congestion avoidance mode.

6.23 Further Information


The internet RFCs (request for comment) are documents detailing discussions
leading to standards adopted by the internet. RFCs are on-line at http://www.faqs.org/.
Some important RFCs:
J. Postel, User Datagram Protocol, RFC 768, USC/Information Sciences Institute,
August 1980.
J. Postel, Transmission Control Protocol - DARPA Internet Program Protocol Specification, STD 7, RFC 793, USC/Information Sciences Institute, September 1981.
TCP RFCs: updates in 1122, window management 813, fault isolation and recovery 816, maximum segment sizes 879, congestion 896.

76

c
Phillip
Musumeci 2002

JCU

School of InfTech

Introduction to Socket Programming

7.1 Background
We know that servers can perform passive opens and wait for a client to perform
an active open thus creating the TCP link between two particular applications on
two hosts.
In UNIX, file IO employs a open-read-write-close paradigm i.e. a file is opened (the
user is provided with an integer file descriptor ID), some data reads and/or writes
occur, and then the file is closed.
The socket abstraction allows a programmer to access TCP/UDP in a way that is
similar to file IO (whenever it makes sense).
In fact, 4.4BSD uses sockets for local interprocess communication (UNIX domain)
and interhost/interprocess communication (INET domain).
The API design provides access similar to file IO although opening a socket may
require more information than opening a file (instead of a file name argument, we
need the transport protocol name, a remote machine address, a client/server flag,
etc.).

7.2 Creating a socket


result = socket( af, type, protocol );

The system call socket() creates a socket and returns the socket ID number.
Argument af specifies the protocol family, which is AF INET for internet protocol
(others may include protocols from Xerox, Apple, CCITT, ISO).
Argument type specifies the type of communication desired e.g. SOCK STREAM
is reliable stream service, SOCK DGRAM is connectionless datagram delivery, and
there is also SOCK RAW for privileged programs to access low-level protocols or
network interfaces.
Argument protocol allows for multiple versions of a particular af/type combination. Example: the TCP/IP protocol suite include the protocol TCP.
77

c
Phillip
Musumeci 2002

JCU

School of InfTech

Note: no local or remote address information has yet been supplied.


Example: To create a stream socket in the Internet domain, the following call might
be used: s = socket(AF INET, SOCK STREAM, 0);. This call would result
in a stream socket being created with the TCP protocol providing the underlying
communication support.
Example: To create a datagram socket for local use, the call might be: s = socket(AF UNIX,
SOCK DGRAM, 0);.

7.3 Closing a socket


status = close( socket );

The system call close() deletes a descriptor and returns 0 if successful.


Closing a socket immediately terminates data transfer.

7.4 Binding
Communicating processes are bound by an association. In the Internet domain,
an association is composed of local and foreign addresses, and local and foreign
ports.
The bind() function call allows a process to specify half of an association i.e. the
local address and local port.
The general form of bind() is bind( socket, localaddr, addrlen ) where
socket is a descriptor previously created (but not bound), localaddr is a structure specifying the local address to be assigned to the socket, and addrlen is an
integer specifying the length of the address structure.
The generic format of the address structure is:

/* Generic socket address */


struct sockaddr {
u_char sa_len;
u_char sa_family;
78

c
Phillip
Musumeci 2002

JCU

School of InfTech
char

sa_data[14];

};

where sa len specifies the length of the address structure, sa family specifies
the family to which the address belongs (e.g. AF INET), and sa data contains
the address.
The internet version of this address structure is:

/* Socket address, internet style */


struct sockaddr_in {
u_char sin_len;
u_char sin_family;
u_short sin_port;
struct in_addr sin_addr;
char
sin_zero[8];
};

where sin port is a port number (2 bytes), in addr is an IP address (4 bytes),


and sin zero pads out the sa data to be 14 bytes.
For machines with multiple IP addresses (i.e. multi-homed), a symbolic address
value INADDR ANY may be used if the binding is to be allowed on any of the
machines IP addresses.

7.5 Server: Listen and Accept


After calling socket() and bind(), a connectionless transport protocol server
is ready to accept messages.
A connection oriented server must call listen() to place the socket in passive
mode, and then call accept() to accept a connection request.
The function call

listen( socket, queuesize );

asks the operating system to build a separate request queue for the previously
bound socket.
79

c
Phillip
Musumeci 2002

JCU

School of InfTech

The function call

newsock = accept( socket,

caddress, caddresslen );

asks the operating system to return with a socket associated with the next request
in the queue.
The accept() call blocks (waits) until a request arrives.
Variable newsock is in fact a descriptor of a new socket that was created by accept() and bound in the same way as socket (in effect, duplicated). The server
task now uses newsock for its work while socket remains available for accepting new requests.
The following extract is from BSD rlogind:

...
f = socket(AF INET, SOCK STREAM, 0);
...
if (bind(f, (struct sockaddr ) &sin, sizeof (sin)) < 0) {
...
}
...
listen(f, 5);
for (;;) {
int g, len = sizeof (from);

/ wait for incoming on socket f /

/ take incoming on socket g /


g = accept( f,
(struct sockaddr ) &from,
&len);
if (g < 0) {
if (errno 6= EINTR)
syslog( LOG ERR,
"rlogind:
continue;
}

accept:

%m");

if (fork() == 0) {
close(f);
doit(g, &from);
}
close(g);

/ service routine uses g /

/ listener keeps using f /

80

c
Phillip
Musumeci 2002

JCU

School of InfTech

7.6 Client: Connect


Clients use function call connect() to establish connection to a specific server:

connect( socket, saddress, saddresslen );

In effect, connect() is the function call that a client uses to connect to a server
that has called accept().
Note: connect() may also be used for connectionless protocols as it records the
servers address in the socket thereby allowing the client to send many messages
to the same server without having to specify the destination address with each
message.

7.7 Sending and Receiving Data


The functions send() and write() can send data via a connected socket. Example: write( socket, buffer, length ) where buffer = address of data
to send and length = number of bytes to send via socket.
Function sendto() takes additional arguments for the destination address (IP,
port) as it does the connect which send() assumed already existed.
Function sendmsg() is similar to sendto() but uses a structure to hold its arguments.
Function writev() is similar to write() but it can gather the data to send from
a buffer list. This avoids copying the transmit data into a contiguous buffer.
The corresponding receive functions are recv(), read(), recvfrom(), recvmsg(),
and readv().

7.8 Flexible use of read() and write()


The UNIX read() and write() functions use descriptors to identify data sources
and sinks.

81

c
Phillip
Musumeci 2002

JCU

School of InfTech

As these are (by design) compatible with the descriptors used for sockets, a single
application program can process local data (files) or remote data (data accessed
via sockets).

7.9 Servers for Multiple Services


The UNIX function select() allows a task to wait for connections on multiple
sockets.
select() applies to IO in general, on UNIX.
The call form is:

nready = select( ndesc,

indesc, outdesc, excdesc, timeout );

where ndesc = number of descriptors to watch; pointers indesc, outdesc and


excdesc point to bit masks which are set to identify descriptors to test for input
ready, output ready, and exceptional conditions respectively; and timeout is a
timeout value.
Note that the user sets the bit masks before each call, and the select() function
returns information on active sockets by setting/clearing the same bit masks.
The only exception source for a socket is out-of-band data.
For BSD, the following macros assist in bit mask handling:
FD SET(fd, &fdset),
FD CLR(fd, &fdset),
FD ISSET(fd, &fdset),
FD ZERO(&fdset).
Example of multiplexing reads:

#include <sys/time.h>
#include <sys/types.h>
...
fd set read template;
struct timeval wait;
...
for (;;) {
wait.tv sec = 1;

/ one second /

82

c
Phillip
Musumeci 2002

JCU

School of InfTech

wait.tv usec = 0;
FD ZERO(&read template);
FD SET(s1, &read template);
FD SET(s2, &read template);
nb = select( FD SETSIZE,
&read template,
(fd set ) 0, (fd set ) 0,
&wait );
if (nb 0) {
/ An error occurred during the select,
or the select timed out. /
}
if (FD ISSET(s1, &read template)) {
/ Socket #1 is ready to be read from /
}
if (FD ISSET(s2, &read template)) {
/ Socket #2 is ready to be read from /
}
}

Note: read template is cleared and re-initialised at the beginning of every main
loop for(;;) traversal.

7.10 Network Byte Order


The network has its own definition of byte order for multibyte quantities short
(2 bytes) and long (4 bytes).
On each host, functions (or macros) are defined to convert between host byte order
and network byte order.
Network to host conversions:
localshort = ntohs( netshort );
locallong

= ntohl( netlong );

Host to network conversions:


netshort = htons( localshort );
netlong

= htonl( locallong );

83

c
Phillip
Musumeci 2002

JCU

School of InfTech

7.11 Some Other Related Functions


After a server successfully returns from accept(), function getpeername()
can be used to provide the full name of the remote system.
gethostname() provides the local host name.
gethostbyname() returns the IP address given the hostname.
gethostbyaddr() returns the hostname given the IP address.
Functions getsocketopt()/setsocketopt() allow socket options to be read
and set.

7.12 BSD internet super-server inetd


inetd is a 4.4BSD daemon that listens for requests for many daemons instead of
having each task (daemon) listening for its own requests.
This reduces the number of idle daemons and simplifies implementation.
inetd handles two types of services: standard and TCPMUX.
A standard service has a well-known port assigned to it, as listed in BSD system
file /etc/services (see also man services), and defined by IANA.
A TCPMUX service is non-standard, has no well-known port assigned, and is
invoked by inetd when a client connects to the tcpmux well-known port.
On BSD, inetd starts at boot time and determines from file /etc/inetd.conf
the servers for which it is to listen (it creates a socket for each service and then
calls select()).
When inetd accepts a connection, it does a fork(), duplicates (dup()) the new
socket to file descriptors {0,1} (stdin and stdout), closes other open file descriptors,
and execs the appropriate server.
The server code is then a program that runs with stdin and stdout already set
up. In fact, the server can be written using stdio IO (with appropriate flushing).
In 4.4BSD, the TCPMUX service is built into inetd by listening to TCP port 1.

84

c
Phillip
Musumeci 2002

JCU

School of InfTech

The TCPMUX service allows a user to add locally developed protocols without
needing an official TCP port assignment. The TCPMUX protocol is described in
RFC-1078:

A TCP client connects to a foreign host on TCP port 1. It sends the service name followed by
a carriage-return line-feed (CRLF). The service name is never case sensitive. The server replies
with a single character indicating positive (+) or negative (-) acknowledgment, immediately followed by an optional message of explanation, terminated with a CRLF. If the reply was
positive, the selected protocol begins; otherwise the connection is closed.

7.13 Additional References


An Advanced 4.4BSD Interprocess Communication Tutorial, Samuel J. Leffler,
Robert S. Fabry, William N. Joy, Phil Lapsley, from Computer Systems Research
Group (CSRG), Department of Electrical Engineering and Computer Science, University of California Berkeley; Steve Miller, Chris Torek, from Heterogeneous Systems Laboratory, Department of Computer Science, University of Maryland).
On *BSD, see file /usr/share/doc/psd/21.ipc/paper.ascii.gz.
An Introductory 4.4BSD Interprocess Communication Tutorial, Stuart Sechrest,
CSRG, Computer Science Division, Department of Electrical Engineering and Computer Science, University of California Berkeley.
On *BSD, see file /usr/share/doc/psd/20.ipctut/paper.ascii.gz.

85

c
Phillip
Musumeci 2002

JCU

School of InfTech

IP Router Operation
We now consider higher level operation of the Internet.

8.1 Datagram Delivery


A node on a given physical network can send a physical frame directly to another
node on the same network. A 48-bit hardware address is used in the destination
part of the frames header.
To transfer an IP datagram, the sender encapsulates the datagram in a physical
frame with a 48-bit hardware address obtained by mapping12 of the destination
32-bit IP address. Network hardware then delivers the frame.
IP routing consists of deciding where to send a datagram based on its destination
32-bit IP (v4) address.
In direct delivery, the destination IP address is identified as being on the same subnet i.e. there is a direct physical connection. Delivery is achieved by encapsulating
the datagram into a frame with the destination hosts 48-bit address for hardware
transmission.
In indirect delivery, the destination IP address is determined as being on a remote
system for which direct delivery is not possible so external information is required
to find a 48-bit hardware address for encapsulation.
Consider a large Internet with many networks interconnected by gateways.
A host that cannot achieve direct delivery sends a datagram to a directly connected
gateway where software extracts the encapsulated datagram and IP routing routines select the next destination.
The datagram is passed between the gateways in the Internet until it arrives at a
gateway that can perform direct delivery.
The total path provided to the IP datagram is determined by the gateways which
form a cooperative interconnected structure in order to maintain route information.
12

Address Resolution Protocol (ARP) allows a host to find a hardware address from an IP address.

86

c
Phillip
Musumeci 2002

JCU

School of InfTech

An IP routing algorithm employs an IP routing table on each node to hold information about possible destinations and how to reach them. A typical routing table
contains pairs (N,G) where N = IP address of destination network and G is the IP
address of the gateway along the path to network N.
Based on the network portion of its own address, a node can easily identify datagrams for which indirect delivery is necessary. The table provides an IP address
for the gateway node so the datagram, with its IP addresses unchanged, is encapsulated in a new frame with a 48-bit hardware address chosen for the next gateway.
If the destination network is not found in the routing table, the packet is forwarded
to a default router with more extensive tables. If no default route is defined, a
routing error has occurred.
In some instances, particular routes may be defined for some nodes.
We can summarise the IP routing algorithm
Route IP Datagram( datagram,routing table ) as:
Extract destination IP address, ID , from datagram;
Compute IP address of destination network, IN ;
if IN matches any directly connected network address, send datagram to destination over that network (this involved resolving ID to a physical address, encapsulating the datagram, and sending the frame);
else if ID appears as a host-specific route, use that;
else if IN appears in routing table, use the table route;
else if a default route has been specified, use that;
else declare a routing error.
How is the routing table determined?
For simple networks, a static routing table is feasible.
In large networks, a dynamic routing table is required in order that the network may
adapt to changes e.g. gateway failures can be worked around, etc.

87

c
Phillip
Musumeci 2002

JCU

School of InfTech

8.2 Route Table Completeness


Is full routing information necessary for an individual host on the network?
Hosts can route datagrams successfully even if they only have partial routing information because they can rely on gateways. Recall the default route choice that
was mentioned at the end of the routing algorithm. If a host has a gateway as
its default route and it does not know how to handle a particular datagram, it offloads the datagram (and the problem) to the gateway which has more complete
information.
The routing tables in a given gateway contain partial information about possible
destinations. An advantage of having partial information present is that it allows
the administrators at the remote site to make decisions about their local routing.
The local gateway passes the datagram through the internet until it arrives at the
remote gateway which uses its routing tables to perform delivery.
Recall that the term internet was originally coined to refer to a set of autonomous
networks that were interconnected to form a single internet.
We can divide the routing problem into two tasks in a similar way:
Routing is performed on a local scale, perhaps involving a number of small
networks connected to a gateway this is the interior gateway task;
Routing is performed on a global scale where we are interested in moving datagrams across one (or more) backbones that interconnect these gateways this is
the exterior gateway task.
First, we briefly discuss how routes can be optimised.

8.3 Route Optimisation Algorithms


In Dijkstras algorithm, each edge in a graph represents a link between nodes and
is assigned a non-negative cost or weight. The shortest path in the graph
is obtained by finding the minimum sum of weights between the two nodes of
interest. This optimisation can be done off-line.
In vector distance routing, a router periodically sends routing information across
the network to neighbours. Each message contains {destination, distance } values.

88

c
Phillip
Musumeci 2002

JCU

School of InfTech

A recipient compares this information to its own tables:

it adds entries for which no routes were previously known;


it updates its distance value for any destination for which the message sender
is the next hop, and
it updates its next hop and distance entries for destinations for which its distance was higher when using a different next hop.
As networks grow, this traffic becomes significant.
An alternative is the Shortest Path First (SPF) algorithm which requires each gateway to have complete topology information. The gateways periodically probe
each other so they know the status (up or down) of their neighbouring gateways this information is periodically broadcast so that each gateway can update
its optimal route information via Dijkstras algorithm.
More complex techniques are required to handle economic as well as technical
costs. Also, network traffic now includes growing real-time data.

8.4 Interior Gateway Routing Protocol


As the Internet connects many different organisations, these organisations have
been free to develop their own internal routing methods. We consider two protocols.

8.4.1

Routing Information Protocol (RIP)

Routing Information Protocol implements vector-distance routing for local networks.


RIP is a widely used protocol due to its distribution in the form of a BSD daemon
routed.
It partitions participants into active and passive (silent) machines.
A gateway running active RIP broadcasts information in the form of { IP network address, integer distance to destination } from the gateways current routing
database every 30 seconds.
89

c
Phillip
Musumeci 2002

JCU

School of InfTech

RIP uses a hop count metric. A datagram direct delivery corresponds to one hop.
A hop count does not take into account link speed e.g. 3 hops across fast networks
would most likely be faster than 2 hops via PPP links. RIP implementations may
advertise artificially high hop counts for slow links in order to optimise routing.
Both active and passive RIP participants listen to all broadcast messages and update their tables when better routes are found. Each route in the table has associated with it a timer so that it is automatically dropped should a gateway providing
the route fail. Routes become invalid after 180 seconds.
Note that this means that good news (fast routes) travels quickly while bad news
(failed routes) travels slowly. This slow convergence problem is addressed by techniques such as triggered updates which force a gateway to immediately broadcast
bad news. An avalanche of updates can also cause problems.

8.4.2

Open Shortest Path First

The early algorithms suffered as networks grew so the Internet Engineering Task
Force developed the OSPF (Open Shortest Path First) standard in 1990. It is published hence open.
It can handle metrics such as physical distance, time delay, and others.
It is dynamic hence can adapt to changes in topology automatically.
Supports routing based on service i.e. the type of service field is inspected so that it
is possible to handle realtime traffic (multimedia), etc.
It can do load balancing hence routers connected by multiple pathways can have
their traffic spread across the pathways to maximise performance (previously,
routers used the best single link and ignored the others). Example: a router can
balance traffic on multiple PPP pathways forming a link to maximise performance.
OSPF works by having adjacent routers exchange information with acknowledgement and timestamping hence, routers have up-to-date knowledge of costs etc.
In normal operation, a router floods link state update messages to its neighbours.
To minimise overall coordination traffic, one router is elected to be the designated
router and it is considered to be adjacent to all other routers.
As the routers all belong to a single organisation, they can trust one another!
90

c
Phillip
Musumeci 2002

JCU

School of InfTech
 




Message type

Description

 


 Used to discover who the neighbors are

Hello







Link state update  Provides the senders costs to its neighbors 


 


Link state ack

Acknowledges link state update

 




Database description Announces which updates the sender has

 






Link state request

Requests information from the partner



Fig. 5-54. The five types of OSPF messages.

8.5 Exterior Gateway Routing Protocol


Border Gateway Protocol (BGP) acts as a routing protocol between organisations
according to policies chosen by the owners.
Policies can be based on politics, commercial considerations, costs, services to customers (and rejection of traffic of noncustomers), etc.
BGP is very general.
Pairs of BGP routers communicate by establishing a TCP connection.

91

c
Phillip
Musumeci 2002

JCU

School of InfTech

Internet Control Protocols

9.1 Internet Control Message Protocol (ICMP)


The Internet Control Message Protocol allows gateways to send error or control
messages to other gateways or hosts; ICMP provides communication between the
Internet Protocol software on one machine and the Internet Protocol software on
another.
ICMP can only report error conditions to the original source machine it cannot
correct the error. The source machine IP software must inform erroneous individual application programs and action must then be taken to correct the problem.
An ICMP message travels across the Internet in the data portion of an IP datagram.
An IP datagram carrying an ICMP message has the value 1 in the PROTOCOL field.
Each ICMP message has its own format with a common initial 32 bits:
07

8 15

16 23

24 31

TYPE CODE
CHECKSUM
Further ICMP Information
IP datagrams carrying an ICMP message are not allowed to trigger error report
messages (as this could cause error messages about error messages about ...).
Principal ICMP message types:

92

c
Phillip
Musumeci 2002

JCU

School of InfTech

 



Message type
Description
 

 Destination unreachable  Packet could not be delivered



Time exceeded
 
 Time to live field hit 0


Parameter problem
 
 Invalid header field
 Source quench
 Choke packet
 

 Redirect
 Teach a router about geography
 

Echo request
 
 Ask a machine if it is alive


Echo reply
 
 Yes, I am alive


Timestamp request
Same as Echo request, but with timestamp
 



Timestamp reply
Same as Echo reply, but with timestamp




















Fig. 5-50. The principal ICMP message types.

When a gateway cannot deliver an IP datagram, a destination unreachable message


is sent back to the source.
The CODE field integer further describes the problem. Some values are: 0=network unreachable; 1=host unreachable; 2=protocol unreachable; 3=port unreachable; 4=fragmentation needed and DF (dont fragment) set;
Time exceeded packet could be looping, or congestion or timeout problems.
Whenever a gateway processes a datagram, it decrements the time-to-live counter
(or hop count) and discards the datagram when the count reaches zero. This ensures that datagrams can not infinitely loop.
Whenever a gateway discards a datagram because its hop count reaches zero, or
a timeout occurs while waiting for fragments of a datagram, it sends a time exceeded message.
Parameter problem invalid IP packet found.
Source quench used for flow control. Usually, a congested gateway sends one
source quench message for every datagram that it discards due to buffer overflows. Once the datagram source stops receiving quench messages, it gradually
increases transmission rate.
Redirect allows network knowledge to propagate. In particular, when a gateway detects a datagram from a host is using a non-optimal route, it forwards the
93

c
Phillip
Musumeci 2002

JCU

School of InfTech
datagram and also sends a redirect message to the host. Thus, a host can boot up
knowing only a default gateway and then optimise its routing.

Echo Request and Echo Reply allows destinations to be checked for reachability and
timestamping allows performance measurement. On UNIX hosts, the ping command uses these messages to display network performance while the traceroute can identify the route in use.
A Timestamp Request leads to a Timestamp Reply to allow datagram transit times to
be computed.

9.2 Address Resolution Protocol (ARP)


As mentioned earlier, each interface has a unique 48-bit hardware LAN address
and a 32-bit IP address.
ARP is a mechanism that allows a host to find out what 48-bit LAN address belongs to an IP address. A system outputs a broadcast packet to every machine on
the network in question, asking who owns this IP address. The owner replies
with their LAN address.
ARP reduces the need for configuration files.
By having hosts cache the results, ARP requests are reduced. However, cache
entries are discarded after a few minutes so that systems that have their LAN
cards replaced due to failure get operating quickly.
It is also possible for hosts to broadcast their mapping when they bootup. No response is expected. However, a machine with the same IP address should respond
in order to prevent the second machine coming on-line and creating chaos!
It is also possible for routers to react to ARP requests for IP information belonging to remote networks. In proxy ARP, routers cooperate by forwarding the ARP
request to the appropriate network for a response to be generated (and returned).

9.3 Reverse Address Resolution Protocol


RARP does the reverse of ARP.

94

c
Phillip
Musumeci 2002

JCU

School of InfTech

A diskless machine about to boot up will know its 48-bit LAN address but will not
know its IP address.
An RARP sends out a broadcast packet (with all 32 address bits = 1) saying what
its LAN address is, and an RARP server responds with the IP address.
This allows the diskless machine to share boot files with other machines while
retaining its unique identity.
Broadcast packets with all address bits = 1 are not propagated by routers to avoid
unwanted traffic, and RARP servers must exist on any subnet needing them.
On UNIX, rarpd can be handled by daemons that starts at boot time.

9.4 Domain Name System


The Domain Name System implements a machine name hierarchy for TCP/IP
Internets.
Example: mirriwinni.cs.jcu.edu.au contains a hostname mirriwinni and
a domain name cs.jcu.edu.au.
This name is a part of the cs.jcu.edu.au domain, which is part of the jcu.edu.au
domain, which is part of the edu.au domain, which is part of the au domain.
Domain name servers provide a mapping between meaningful host names and
actual IP numbers.
An advantage of a hierarchical naming system is that name servers can be distributed, and also organised in a hierarchical manner e.g. a small number of
servers might handle .au names, and be in contact with a slightly larger number of servers handling edu.au queries, etc.
The authority in the hierarchical naming system is also structured. At the top,
some servers are in charge of .au. Beneath these are other servers in charge of
edu.au. Down a few more layers, we might see a CS machine such as cay be a
DNS server for the cs.jcu.edu.au domain.
Nameservers may map a given name to more than one item in the domain system.
The client specifies the type of object desired when resolving a name, and the
server returns objects of that type.

95

c
Phillip
Musumeci 2002

JCU

School of InfTech

Examples on Unix:
nslookup cay.cs.jcu.edu.au
returns the IP address of cay;
nslookup -q=mx cay.cs.jcu.edu.au
returns the Mail eXchange records for email addressed to cay (this MX mapping
allows us to direct email to a main server and backup servers);
nslookup -q=mx cs.jcu.edu.au
returns the Mail eXchange records for host independent CSE email.
Some domain name system record types:
type A consists of a hostname and its IP address;
type CNAME gives the canonical hostname for an alias;
type MX gives a 16 bit preference and name of host that acts as a mail exchanger
for the domain;
type NS is the name of the authorative server for the domain;
type SOA is the statement of authority which describes which parts of the naming
hierarchy a server implements.
The cost of lookup for non-local names can be high so nameservers will maintain
a cache of recently used names when queried, a reply from a remote server will
be marked authorative while an answer from a locally cached (previous) query will
be marked non-authorative.
Each response from a remote server will include a time to live value set by the
authority at the remote site this means that server lookups for hosts whose
IP address does not change can be minimised, while improved correctness can
be obtained for entries that are expected to change by assigning them short TTL
values.
Before an organisation is granted authority for an official domain, it must agree to
operate a domain name server that meets Internet standards.
For robustness, a site must also find a separate non-dependent site to act as a
backup server. A backup server is best physically separate, running on a different
power supply.
Administration information for the .au domain may be found at
http://www.auda.org.au.
96

c
Phillip
Musumeci 2002

JCU

School of InfTech

On Unix, see also man nslookup.


On BSD operating systems, see also man dig which describes how to obtain information about DNS.

97

c
Phillip
Musumeci 2002

JCU

10

School of InfTech

Application Layer

10.1 Introduction
Recall that UNIX Internet daemons such as inetd can simplify the setup of service provision.
In particular, a server task can be written with its client communication mapped
to stdin/stdout IO.
In practice, lightly loaded services are started by inetd as needed while more
heavily used server tasks may be run permanently as daemons in their own right.

10.2 Email
Simple Mail Transfer Protocol (SMTP) is used to transfer email .
On UNIX, the mail delivery daemon is sendmail (or one of a number of new
alternatives such as vmail) which listens on TCP port 25 for connections from
remote machines. On Unix, see telnet <host> 25.
SMTP is a simple ASCII (text) transmission protocol described by RFC 821. Recently, extended SMTP (ESMTP) has been defined in RFC 1425 to handle issues
such as larger message length, different timeouts, and prevention of infinite mailstorms (email loops).
Typical RFC 822 header fields (AST):
 

 Header

Meaning
 

 To:
 Email address(es) of primary recipient(s)
 

Cc:
 
 Email address(es) of secondary recipient(s)


Bcc:
 
 Email address(es) for blind carbon copies
 From:
 Person or people who created the message
 

 Sender:
 Email address of the actual sender
 

Received:  Line added by each transfer agent along the route
 


Return-Path:
Can be used to identify a path back to the sender
















Fig. 7-42. RFC 822 header fields related to message transport.


98

c
Phillip
Musumeci 2002

JCU

School of InfTech

RFC 822 message format (AST):








Header
Meaning



 Date:
 The date and time the message was sent




 Reply-To:
 Email address to which replies should be sent





Message-Id:  Unique number for referencing this message later





In-Reply-To:  Message-Id of the message to which this is a reply





References: Other relevant Message-Ids



 Keywords:
 User chosen keywords







Subject:
Short
summary
of
the
message
for
the
one-line
display


Fig. 7-43. Some fields used in the RFC 822 message header.
RFCs 1341 and 1521 have added language extensions and MIME (Multipurpose
Internet Mail Extensions). In effect, RFC 822 header types have been extended to
include MIME-Version:, Content-Type:, Content-Transfer-Encoding:,
etc.
Other recent developments have included PGP (Pretty Good Privacy) which supports text compression, secrecy, and digital signatures.

10.3 Network News


News reading, like email, is one of the original killer apps that led to early
spread of the internet.
RFC 977 describes the Network News Transfer Protocol (NNTP) which is used to
propagate news articles from one machine to another.
Two methods of requesting news transfers are supported in NNTP: in news pull,
a host contacts one of its newsfeeds and asks for new news; in news push, the
newsfeed calls the client and announces that it has new news.
TCP port 119 is reserved for NNTP.

99

c
Phillip
Musumeci 2002

JCU

School of InfTech

10.4 Other Applications


File Transfer Protocol (FTP) defines a client/server method of transferring files.
Hyper Text Transfer Protocol (HTTP) defines a client/server method of transferring data based on URLs. The client browser is structured so that different types
of data can be interpreted (displayed, played, etc.) as appropriate.
On Unix, you can use telnet to retrieve HTTP data via port 80 e.g.
telnet mirriwinni.cs.jcu.edu.au 80 GET /phillip/index.html HTTP/1.0 Accept:text/plain,text/html Simple Network Management Protocol (SNMP) defines a basic client/server
method of interrogating and setting network configuration attributes. This is outside the scope of this course.

100

c
Phillip
Musumeci 2002

JCU

School of InfTech

PPP PointtoPoint Protocol


This is not examinable!
The Internet Engineering Task Force developed RFC 1661/1662/1663 leading to
PPP.
Handles error detection, multiple protocols, connect time IP address negotiation.
PPP provides:
unambiguous framing method which also handles error detection;
Link Control Protocol (LCP) dealing with bringing lines up, testing them,
negotiating options, and taking them down;
Network Control Protocol (NCP) dealing with network layer options that is
independent of the network layer to be used.
Example of PPP use:
user calls the modem attached to a router (or terminal server or equivalent)
of the Internet service provider (ISP);
after a physical connection is established, modern V.34/V.90 modems undergo a channel equalisation training stage (so that their digital filters can
undo channel distortion);
the user sends the router a series of LCP packets in the payload field of one
or more PPP frames;
these packets and their responses select PPP parameters to be used;
a series of NCP packets are then sent to configure the network layer for a
user wanting to use TCP/IP, an IP address is needed;
the NCP for IP will have the router provide an IP address to the user for use
during the connection (ISPs will own a number of addresses which are
shared amongst their clients dialing in);
when finished, the NCP is used to tear down the network layer connection
and free up the IP address;
finally, the LCP shuts down the data link layer connection and a disconnect
occurs.
The PPP frame is similar to a HDLC frame:

Character (byte) oriented so all frames are an integral number of bytes.


Delimiter of 01111110.
101

c
Phillip
Musumeci 2002

JCU

School of InfTech

Address field is always set to 11111111.


Control field has the default value 00000011 for an unnumbered frame (i.e. sequence numbers not used by default). RFC 663 describes the use of numbered
frames in a noisy environment.
Because the address and control fields can be used with a constant value, the LCP
provides a mechanism for them to be deleted by negotiation.
The next field is the protocol field identifies one of LCP, NCP, IP, IPX, AppleTalk,
others.
The data field holds data up to some negotiated maximum length (if no LCP negotiation occurs, 1500 bytes is used).
The checksum field is usually 2 bytes but can be negotiated to 4 bytes.

Carrier
detected

Both sides
agree on options
Establish

Authentication
successful
Authenticate

Failed
Dead

Failed

Terminate
Carrier
dropped

Network

Open
NCP
configuration

Done

Fig. 3-28. A simplified phase diagram for bringing a line up and


down.
The handling of lines:
DEAD (no physical layer present) which changes to ESTABLISHED (once
carrier detection occurs);
LCP option negotiation occurs (may include authentication);
102

c
Phillip
Musumeci 2002

JCU

School of InfTech
NCP protocol then invoked;
Data transfer occurs.

Note: the LCP protocol only defines how the negotiation is conducted, not what is
negotiated. See Figure 3.29.
 




Name
Direction 
Description
 


 Configure-request I R
 List of proposed options and values




Configure-ack
 
 I R
 All options are accepted



Configure-nak
IR
Some options are not accepted
 




Configure-reject  I R
Some options are not negotiable
 


 Terminate-request I R
 Request to shut the line down




 Terminate-ack
 I R
 OK, line shut down




Code-reject
 
 I R
 Unknown request received



Protocol-reject
IR
Unknown protocol requested
 


 Echo-request
 I R
 Please send this frame back
 


 Echo-reply
 I R
 Here is the frame back





Discard-request

IR

Just discard this frame (for testing)

























Fig. 3-29. The LCP packet types.

103

c
Phillip
Musumeci 2002