Вы находитесь на странице: 1из 120

Internet Protocols

The Internet protocols are the worlds most popular open-system (nonproprietary) protocol suite because they can be used to communicate across any set of interconnected networks and are equally well suited for LAN and WAN communications. The Internet protocols consist of a suite of communication protocols, of which the two best known are the TransmissionControl Protocol (TCP) and the Internet Protocol (IP).

Internet protocols span the complete range of OSI model layers

The Four Layers of TCP/IP protocol suite

Application Transport Network Link

Telnet, FTP, E-mail TCP, UDP IP, ICMP, IGMP

Device driver and Interface Card

Two Hosts on A LAN running FTP FTP Application Protocol FTP FTP

User Processes

Handles Application details

Client
Transport

Server
TCP Protocol IP Protocol Ethernet Protocol

TCP IP
Ethernet Driver

TCP

Kernel

Handles Communication details

Network

IP
Ethernet Driver

Link

Ethernet

FTP Client
TCP

FTP Protocol

FTP Server TCP

TCP Protocol
Router
IP Protocol

IP Protocol

IP
Ethernet Driver
Ethernet Protocol

IP

IP
Token ring Driver Token Ring Protocol Token ring Driver

Ethernet Driver

Ethernet

Two Networks connected through Router

Token ring

Various protocols at different layers in the TCP/IP protocol suite

User Process

User Process

User Process

User Process

Application

TCP

UDP

Transport

ICMP

IP

IGMP

Network

ARP

Hardware Interface

RARP

Link

Media

Internet Addresses
Every interface on an internet must have a unique Internet Address (called IP address). These addresses are 32-bit numbers.

The 32-bit addresses are normally written as four decimal numbers, one for each byte of the address. This is called dotted decimal notation. A multihomed host will have multiple IP addresses one per interface.

7 bits
Class A

24 bits HostId 14 bits 16 bits HostId 21 bits 8 bits

NetId

Class B

1 0

NetId

Class C

1 1 0 28 bits

NetId HostId

Class D

1 1 1 0

Multicast group ID 27 bits

Class E

11110

(reserved for future use)

The Domain Name System


Although the network interfaces on a host, and therefore the host itself, are known by IP addresses, humans work best using the name of a host. In the TCP/IP world the Domain Name System (DNS) is a distributed database that provides the mapping between IP addresses and hostnames.

Encapsulation of data as it goes down the protocol stack User data


Application Appl header

User data TCP

TCP header

Application data
TCP Segment

IP

IP header
Ethernet header

TCP header

Application data
IP datagram Ethernet Driver
Ethernet Trailer

IP header

TCP header

Application data
Ethernet Frame

14

20

20

Ethernet

46 to 1500 bytes

Demultiplexing of a received ethernet frame


Application Application Application Application
Demultiplexing based on port no. in TCP or UDP header

TCP
ICMP IGMP IP

UDP

Demultiplexing based on protocol value in IP header

RARP
Demultiplexing based on frame type in Ethernet header

ARP
Ethernet Driver

Incoming frame

Client-Server Model
Most networking applications are written assuming one side is the client and the other the server. The purpose of the application is for the server to provide some defined service for clients. We can categorize servers into two classes: iterative or concurrent.

Port Numbers
The TCP and UDP identify applications using 16-bit numbers. Servers are normally known by their well known port number. For example, every TCP/IP implementation that provides an FTP server uses a port number 21. Telnet server is on TCP port 23. For any implementation of TCP/IP well known port numbers are between 1 and 1023. A client does not care what port number it uses on its end. These port numbers are called ephemeral ports (i.e. short lived). Most TCP/IP implementations allocate between 1024 and 5000.

Application Programming Interfaces


Two popular application programming interfaces (APIs) for applications using the TCP/IP protocols are called sockets and TLI (Transport Layer Interface). The former is sometimes called "Berkeley sockets". The latter, originally developed by AT&T, is sometimes called XTI (X/Open Transport Interface).

Link Layer
Send/Receive IP datagrams for IP Module ARP Requests and Replies RARP Requests and Replies Different link layers - Ethernet, token ring, FDDI and Serial Lines (SLIP & PPP), loopback driver Two standards: Ethernet and IEEE 802 MTU and path MTU

Ethernet and IEEE 802.3


Family of local area networks (LAN), includes 3 main categories:

Ethernet and IEEE 802.3LAN specifications that operate at 10 Mbps over coaxial cable. 100-Mbps EthernetA single LAN specification, also known as Fast Ethernet, that operates at 100 Mbps over twisted-pair cable. 1000-Mbps EthernetA single LAN specification, also known as Gigabit Ethernet, that operates at 1000 Mbps (1 Gbps) over fiber and twisted-pair cables.

Eternet/IEEE802.3 physical characteristics


characterstics Ethernet IEEE802.3 10Base5 IEEE802.3 10Base2 IEEE802.3 10BaseT

Data rate(Mbps)
Signaling method Maximum segment length Media

10
Baseband 500

10
Baseband 500

10
Baseband 185 50ohm coax (thin)

10
Baseband 100 Unshielded twisted pair

50ohm coax 50ohm coax (thick) (thick)

topology

Bus

Bus

Bus

Star

Characteristics of 100BaseT Media Types


Characteristics 100BaseTX 100BaseFX 100BaseT4

Cable
Number of pairs or strands Connector Maximum segment length Maximum network diameter

Category 5 UTP Type 1 and 2 STP


2 pairs ISO-8877 RJ45 connector 100meters 200meters

62.5/125 micron multi-mode fiber


3 strands

CAT 3,4,5 UTP


4 pairs ISO-8877 RJ45 connector

400meters 400meters

100meters 200meters

Various frame fields exist for both Ethernet and IEEE 802.3

Ethernet Encapsulation(RFC 894)

46-1500 bytes
destination address source address type

data 46-1500 bytes

CRC

2
type 0800

IP datagram

46-1500 bytes
type 0806 ARP request/ PAD reply

28

18

type RARP request/ PAD 8035 reply

28

18

IEEE 802.2/802.3 Encapsulation (RFC 1042)


802.2 LLC
length

802.3 MAC
Destination address

802.2 SNAP
data

Source address

DSAPSSAP Cntl Org code type AA AA 03 00

CRC

38-1492

Type IP datagram 0800

38-1492

Type ARP request/ reply PAD 0806

28

10

Type RARP request /replyPAD 8035

28

10

Gigabit Ethernet Gigabit Ethernet is an extension of the IEEE 802.3 Ethernet standard. Gigabit Ethernet offers 1000 Mbps of raw-data bandwidth while maintaining compatibility with Ethernet and Fast Ethernet network devices. Gigabit Ethernet provides for new, full-duplex operating modes for switch-to-switch and switch-to-end-station connections. It also permits half-duplex operating modes for shared connections by using repeaters and CSMA/CD. Furthermore, Gigabit Ethernet uses the same frame format, frame size, and management objects used in existing IEEE 802.3 networks. In general, Gigabit Ethernet is expected to initially operate over fiberoptic cabling but will be implemented over Category 5 unshielded twisted-pair (UTP) and coaxial cabling as well.

Migrating to Gigabit Ethernet


Upgrading switch-to-switch links Upgrading switch-to-server links Upgrading a Fast Ethernet backbone Upgrading a shared FDDI backbone Upgrading high-performance desktops

SLIP: Serial Line IP


It is a simple form of encapsulation for IP datagrams on Serial Line. (RFC 1055) Framing used by serial line are: 1. The IP datagram is prepended and terminated by the special character called END (0xc0). 2. If a byte of the IP datagram equals the END character, the 2 byte sequence 0xdb, 0xdc is transmitted instead. 3. If a byte of the IP datagram equals the SLIP ESC(0xdb), the 2byte sequence 0xdb, 0xdd is transmitted instead. c0 END c0 1 1 ESC db dc 1 1 db 1 ESC db dd 1 1 END c0 1

PPP: Point to Point Protocol


A way to encapsulate IP datagrams on a serial link, async or bit oriented sync links A link control protocol (LCP) to establish, configure and test data link A family of network control protocols (NCPs) PPP has advantages over SLIP RFC 1548 and RFC 1332

Format of PPP frames

flag AddrControl protocol 7E FF 03

information

CRC

flag 7E

2
protocol 0021

upto 1500 bytes IP datagram

protocol link control data C021 protocol 8021


Network control data

Loopback Interface Most implementations support a loopback interface that allows a client and server on the same host to communicate with each other using TCP/IP. The class A network ID 127 is reserved for the loopback interface. By convention, most systems assign the IP address of 127.0.0.1 to this interface and assign it the name localhost. An IP datagram sent to the loopback interface must not appear on any network.

Processing of IP datagrams by loopback interface

IP output function

IP input function

Place on IP input queue loopback driver

yes

destination IP address equal broadcast address or multicast address? no

place on IP input queue Ethernet driver

Destination IP address yes Equal interface IP address?


no, use ARP to get destination Ethernet address

IP
ARP
Ethernet demultiplex based on Ethernet frame type
receive send

ARP

MTU There is a limit on the size of the frame for both Ethernet and 802.3 encapsulation. This limits the number of bytes of data to 1500 and 1492, respectively. This characteristic of the link layer is called the MTU, its maximum transmission unit.
Network Hyperchannel 16 Mbits/sec token ring (IBM) 4 Mbits/sec token ring (IEEE 802.5) FDDI Ethernet IEEE 802.3/802.2 X.25 Point-to-Point (low delay) MTU (bytes) 65535 17914 4464 4352 1500 1492 576 296

Path MTU
When two hosts on the same network are communicating with each other, it is the MTU of the network that is important. But when two hosts are communicating across multiple networks, each link can have a different MTU. The important numbers are not the MTUs of the two networks to which the two hosts connect, but rather the smallest MTU of any data link that packets traverse between the two hosts. This is called the path MTU.

IP: Internet Protocol


All TCP, UDP, ICMP data transmitted as IP datagrams. Provides unreliable, connectionless datagram delivery service. Hosts and routers have a routing table used for all routing decisions. Three types of routes: Host specific, network specific and default routes

Operation
The internet protocol implements two basic functions: addressing and fragmentation. The internet modules use the addresses carried in the internet header to transmit internet datagrams toward their destinations. The selection of a path for transmission is called routing. The internet modules use fields in the internet header to fragment and reassemble internet datagrams when necessary for transmission through "small packet" networks.

0
V E R S I O N

1
Type Of Service
F L A G S

0123456789 0123456789 0123456789 0 1

IHL

Total Length

Identification TTL Protocol

Fragment Offset

Header Checksum

Source Address Destination Address Options


Padding

Type of Service (PreDTRCx) Precedence (000-111) D (1 = minimize delay) T (1 = maximize throughout) R (1 = maximize reliability) C (1 = minimize cost) x (reserved and set to 0)
The Type of Service is used to indicate the quality of the servic desired.

Recommended values for type-of-service field

Application
Telnet FTP SMTP SNMP

Minimize delay
1 0 0 0

Maximize throughput
0 1 1 0

Maximize reliability
0 0 0 1

Minimize cost
0 0 0 0

Value
0x10 0x08 0x08 0x04

Fragmentation
Fragmentation of an internet datagram is necessary when it originates in a local net that allows a large packet size and must traverse a local net that limits packets to a smaller size to reach its destination. The internet fragmentation and reassembly procedure needs to be able to break a datagram into an almost arbitrary number of pieces that can be later reassembled.

Identification The identification field is used to distinguish the fragments of one datagram from those of another. Flags (xDM) x (reserved and set to 0) D (1 = Don't Fragment) M (1 = More Fragments)

fragment offset
The fragment offset field tells the receiver the position of a fragment in the original datagram.

An Example Fragmentation Procedure


If the total length is less than or equal the maximum transmission unit then submit this datagram to the next step in datagram processing; otherwise cut the datagram into two fragments, the first fragment being the maximum size, and the second fragment being the rest of the datagram. The first fragment is submitted to the next step in datagram processing, while the second fragment is submitted to this procedure in case it is still too large.

An Example Reassembly Procedure


For each datagram the buffer identifier is computed as the concatenation of the source, destination, protocol, and identification fields. If this is a whole datagram (that is both the fragment offset and the more fragments fields are zero), then any reassembly resources associated with this buffer identifier are released and the datagram is forwarded to the next step in datagram processing.

Options: variable
The options may appear or not in datagrams. They must be implemented by all IP modules (host and gateways). In some environments the security option may be required in all datagrams. The option field is variable in length. There may be zero or more options. There are two cases for the format of an option: Case 1: A single octet of option-type. Case 2: An option-type octet, an option-length octet, and the actual option-data octets.

Checksum
The internet header checksum is recomputed if the internet header is changed. For example, a reduction of the time to live, additions or changes to internet options, or due to fragmentation. This checksum at the internet level is intended to protect the internet header fields from transmission errors.

Errors
The internet protocol does not provide a reliable communication facility. There are no acknowledgments either end-to-end or hop-by-hop. There is no error control for data, only a header checksum. There are no retransmissions. There is no flow control.

Internet protocol errors may be reported via the ICMP messages.

C:\sahu\ibm\nwp>ipconfig /all Windows 2000 IP Configuration Host Name : SAHU Node Type : Mixed IP Routing Enabled : No WINS Proxy Enabled : No
Ethernet adapter Local Area Connection: Connection-specific DNS Suffix . : Description : D-Link DFE-530TX PCI adapter Physical Address : 00-80-C8-4D-00-55 DHCP Enabled : No IP Address : 169.254.0.15 Subnet Mask : 255.255.0.0 Default Gateway : 169.254.0.2 DNS Servers : 169.254.0.2 C:\sahu\ibm\nwp>

C:\sahu\ibm\nwp>netstat -r Route Table ================================================= Interface List 0x1 MS TCP Loopback interface
0x1000003 ...00 80 c8 4d 00 55 VIA PCI 10/100Mb Fast Ethernet Adapter

================================================= Active Routes: Network Destination Netmask Gateway Interface Metric


0.0.0.0 127.0.0.0 169.254.0.0 169.254.0.15 169.254.255.255 224.0.0.0 255.255.255.255 0.0.0.0 169.254.0.2 169.254.0.15 255.0.0.0 127.0.0.1 127.0.0.1 255.255.0.0 169.254.0.15 169.254.0.15 255.255.255.255 127.0.0.1 127.0.0.1 255.255.255.255 169.254.0.15 169.254.0.15 224.0.0.0 169.254.0.15 169.254.0.15 255.255.255.255 169.254.0.15 169.254.0.15 1 1 1 1 1 1 1

Default Gateway:

169.254.0.2

ARP: Address Resolution Protocol (rfc 826)


ARP is basic in every TCP/IP suite, without App or sysadmin. Provides mapping between 32 bit IP address and 48 bit MAC address ARP cache is maintained to store recent mappings. Normal expiration time is 20 min arp command used to examine and manipulate the cache

Operation of ARP when user types ftp hostname

hostname
hostname resolver IP address
(1)

FTP
(2)
establish connection with IP address

TCP
(3)

send IP datagram to IP address

(5)

ARP
(8) (9)

(4)

IP

(6)

Ethernet driver ARP request (Ethernet broadcast)

Ethernet driver

Ethernet driver

ARP

(7)

ARP

IP

TCP

ARP Packet Format Ethernet Dest Addr Ethernet Source Addr Frame Type 6 2 6 Ethernet Header

HardProto HardProto Sender I Target Target op Sender Eth typetypesizesize AddrEth AddIP Add Address 4 2 2 1 1 2 6 4 6 28 byte ARP request/reply 32 bit Internet address

ARP/RARP
48 bit Ethernet address

arp cache
An ARP cache is maintained on each host. This cache maintains the recent mappings from Internet addresses to hardware addresses. The normal expiration time of an entry in the cache is 20 minutes from the time the entry was created.
We can examine the ARP cache with the arp command. The -a option displays all entries in the cache: % arp -a

Proxy ARP
Proxy ARP lets a router answer ARP requests on one of its networks for a host on another of its networks. This fools the sender of the ARP request into thinking that the router is the destination host, when in fact the destination host is "on the other side" of the router. The router is acting as a proxy agent for the destination host, relaying packets to it from other hosts.

RARP: Reverse Address Resolution Protocol (rfc 903)


RARP is used to obtain IP address when bootstrapping Packet format same as ARP RARP req. is broadcast asking for senders IP address, MAC address provided. Reply is normally unicast. It is optional in TCP/IP implementation

RARP Packet Format


The format of an RARP packet is almost identical to an ARP packet. The only differences are that the frame type is 0x8035 for an RARP request or reply, and the op field has a value of 3 for an RARP request and 4 for an RARP reply.

ICMP: Internet Control Message Protocol (rfc 792)


Considered as part of IP layer. Required in every TCP/IP implementation Communicates error message and other conditions which need attention Acted on by IP or higher layer TCP, UDP Address mask request and reply ICMP timestamp request and reply

ICMP messages encapsulated within an IP datagram IP datagram IP Header 20 ICMP Message 0 78 type code ICMP Message

15 16 checksum

31

Contents depend on type and code

ICMP Message Types


type code 0 0 3 0 1 4 0 5 0 1 8 0 13 0 14 0 17 0 18 0 Description echo reply (ping reply) destination unreachable network unreachable host unreachable source quench (elementary flow control) Redirect redirect for network redirect for host echo request (ping request) timestamp request timestamp reply address mask request address mask reply

ICMP echo Request and Reply


Ping program is used to test whether another host is reachable. The program sends an ICMP echo request message to a host, expecting an ICMP echo reply to be returned. Ping also measures the round-trip time to the host, giving us some indication of how "far away" that host is.

Format of ICMPv4 and ICMPv6 echo request and reply message 0 type 7 8 code 15 16 checksum

31

identifier

Sequence number

Optional data

ICMP Address Mask Request and Reply


The ICMP address mask request is intended for a diskless system to obtain its subnet mask at bootstrap time. The requesting system broadcasts its ICMP request.

ICMP Address Mask request and reply messages 0 7 8 15 code (0) 16 checksum

31

type (17 or 18)

identifier

Sequence number

32-bit subnet mask

ICMP Timestamp Request and Reply


The ICMP timestamp request allows a system to query another for the current time. The recommended value to be returned is the number of milliseconds since midnight, Coordinated Universal Time(UTC).

UDP: User Datagram Protocol (rfc 768)


Simple, datagram oriented, transport protocol No reliability, does not guarantee delivery If exceeds MTU, IP datagram is fragmented ICMP unreachable error - Path MTU discovery with UDP ICMP source quench error UDP server - client IP address and port number, input queue, restricting IP

UDP encapsulation IP Datagram UDP Datagram IP Header 20 UDP Header UDP Data

8 UDP Header 0 15 16 bit source port 16 bit UDP Length

16 Destination port 16bit UDP checksum

31

Data (if any)

Protocol Application
The major uses of this protocol is the Internet Name Server, the Trivial File Transfer, SNMP. Protocol Number This is protocol 17 when used in the Internet Protocol.

UDP Server Design


Servers typically interact with the operating system and most servers need a way to handle multiple clients at the same time. Client IP Address and Port Number Destination IP Address UDP Input Queue Restricting Local IP Address Restricting Foreign IP Address Multiple Recipients per Port

Broadcasting, Multicasting
There are three kinds of IP addresses: unicast, broadcast, and multicast. Broadcasting is sending a packet to all hosts on a network (usually a locally attached network) and multicasting is sending a packet to a set of hosts on a network.
Broadcasting and multicasting only apply to UDP, where it makes sense for an application to send a single message to multiple recipients. TCP is a connection-oriented protocol that implies a connection between two hosts (specified by IP addresses) and one process on each host (specified by port numbers).

Filtering that takes place up the protocol stack when a frame is received. deliver
UDP

discard deliver discard deliver

IP

Device driver

discard deliver

Interface card

discard

Broadcasting
The four different forms of IP broadcast addresses:
Limited Broadcast Net-directed Broadcast Subnet-directed Broadcast All-subnets-directed Broadcast

Multicasting
IP multicasting provides two services for an application. 1. Delivery to multiple destinations. There are many applications that deliver information to multiple recipients: interactive conferencing and dissemination of mail or news to multiple recipients. 2. Solicitation of servers by clients.

Multicast Group Addresses


The format of a class D IP address.
26 bits
Class D

1110

Multicast group ID

A multicast group address is the combination of the high-order 4 bits of 1110 and the multicast group ID. These are normally written as dotted-decimal numbers and are in the range 224.0.0.0 through 239.255.255.255. Some multicast group addresses are assigned as well-known addresses by the IANA. The multicast address 224.0.1.1 is for NTP, the Network Time Protocol, 224.0.0.9 is for RIP-2.

Mapping of a class D IP address into Ethernet multicast address. The IANA owns an Ethernet address block, which in hexadecimal is 00:00:5e. This is the high-order 24 bits of the Ethernet address, meaning that this block includes addresses in the range 00:00:5e:00:00:00 through 00:00:5e:ff:ff:ff. The IANA allocates half of this block for multicast addresses. Ethernet Multicast Address: 01:00:00:00:00:00 Ethernet Broadcast Address: ff:ff:ff:ff:ff:ff

Since the upper 5 bits of the multicast group ID are ignored in this mapping, it is not unique. Thirty-two different multicast group IDs map to each Ethernet address. For example, the multicast addresses 224.128.64.32 (hex e0.80.40.20) and 224.0.64.32 (hex e0.00 40.20) both map into the Ethernet address 01:00:5e:00:40:20.

How it works? The sending process specifies a destination IP address that is a multicast address, the device driver converts this to the corresponding Ethernet address, and sends it. The receiving processes must notify their IP layers that they want to receive datagrams destined for a given multicast address, and the device driver must somehow enable reception of these multicast frames. This is called "joining a multicast group." When a multicast datagram is received by a host, it must deliver a copy to all the processes that belong to that multicast group

Multicast Socket options


The API support for traditional multicasting requires, new socket options.
IF_ADD_MEMBERSHIP struct ip_mreq IF_DROP_MEMBERSHIP struct ip_mreq

Join a multicast group leave a multicast group

Sending appl
sendto dest IP=224.0.1.1 dest port=123 UDP UDP Protocol = UDP Perfect software filtering based on destination IP Frame type = 0800 datalink datalink Imperfect software filtering based on destination Enet Port 123

Receiving appl
123

UDP

Join 224:.0:1:1

IPv4

IPv4

IPv4 Receive 01:00:5e:00:01:01 datalink 00:0a:95:79:bc:b4

00:04:ac:17:bf:38

Enet hdr Dest Enet=01:00:5e:00:01:01 Frame type =0800

IPv4 hdr

UDP hdr

UDP data Dest port = 123

Dest IP=224..0.1.1 PROTOCOL=UDP

Multicast example of a UDP datagram

IGMP: Internet Group Management Protocol


The Internet Group Management Protocol (IGMP), which is used by hosts and routers that support multicasting. It lets all the systems on a physical network know which hosts currently belong to which multicast groups. This information is required by the multicast routers, so they know which multicast datagrams to forward onto which interfaces. IGMP is defined in RFC 1112

Encapsulation of an IGMP message within an IP datagram


Like ICMP, IGMP is considered part of the IP layer. Also like ICMP, IGMP messages are transmitted in IP datagrams. Unlike other protocols IGMP has a fixed-size message, with no optional data. IGMP messages are specified in the IP datagram with a protocol value of 2

Format of fields in IGMP message

The IGMP version is 1. An IGMP type of 1 is a query sent by a multicast router, and 2 is a response sent by a host The group address is a class D IP address. In a query the group address is set to 0, and in a report it contains the group address being reported.

Joining a Multicast Group


A process must have a way of joining a multicast group on a given interface. A process can also leave a multicast group that it previously joined. These are required parts of any API on a host that supports multicasting. A process can join the same group on multiple interfaces. Membership in a multicast group on a given interface is dynamic-it changes over time as processes join and leave the group.

IGMP Reports and Queries


IGMP messages are used by multicast routers to keep track of group membership on each of the router's physically attached networks 1. A host sends an IGMP report when the first process joins a group. 2. A host does not send a report when processes leave a group. 3. A multicast router sends an IGMP query at regular intervals to see if any hosts still have processes belonging to any groups. 4. A host responds to an IGMP query by sending one IGMP report for each group that still contains at least one process

IGMP reports and queries

TCP: Transmission Control Protocol


TCP provides a connection-oriented, reliable, byte stream service Two endpoints communicating with each other on a TCP connection TCP packetizes the user data into segment Sets a timeout any time sends data Acknowledges data received by the other end

TCP services
Reorders out-of-order data discards duplicate data provides end to end flow control calculates and verifies a mandatory end-toend checksum Popular Apps: Telnet, Rlogin, Ftp and SMTP

TCP Header
The TCP data is encapsulated in an IP datagram.

IP datagram
TCP segment IP Header TCP Header

TCP data

20 bytes

20 bytes

0123456789 0123456789 012345678901

Source Port

Destination Port

Sequence Number

Acknowledgement Number
4b Hdr U|A|P|R| S |F Len ReservedR|C|S|S |Y | I
G|K|H| T|N|N

Window Size 16 Bit


Urgent Pointer Padding

Check Sum Options Data

The sequence number identifies the byte in the stream of data from the sending TCP to the receiving TCP that the first byte of data in this segment represents. When a new connection is being established, the SYN flag is turned on. The sequence number field contains the initial sequence number (ISN) chosen by this host for this connection. The sequence number of the first byte of data sent by this host will be the ISN plus one because the SYN flag consumes a sequence number. The acknowledgment number contains the next sequence number that the sender of the acknowledgment expects to receive.

Flags
URG The urgent pointer is valid ACK The acknowledgement number is valid PSH The receiver should pass this data to the application as soon as possible

RST Reset the connection


SYN Synchronize sequence numbers to initiate a connection. FIN The sender is finished sending data

Connection Establishment and Termination


TCP is connection-oriented protocol. Before either end can send data to the other, a connection must be established.

client
SYN_SENT (active open)

server
SYN J
SYN K, Ack J + 1 LISTEN (passive open) SYN_RCVD

ESTABLISHED
Ack K + 1

ESTABLISHED

TCP three-way handshake

Packets exchanged when a TCP connection is closed

client
FIN_WAIT_1
(Active close) Ack M+1 FIN_WAIT_2 TIME_WAIT Ack N+1 FIN N

server

FIN M CLOSE_WAIT (passive close)

LAST_ACK

CLOSED

Packet exchange for TCP connection client server


SYN_SENT (active open)

SYN J
SYN K, Ack J + 1

LISTEN (passive open) SYN_RCVD

ESTABLISHED

Ack K + 1

data(request) Data(reply) Ack of reply


FIN_WAIT_1 (Active close) Ack M+1 FIN_WAIT_2 TIME_WAIT Ack N+1 FIN N FIN M

ESTABLISHED

CLOSE_WAIT

(passive close)
LAST_ACK

CLOSED

Timeout of Connection Establishment


There are several instances when the connection cannot be established. In one example the server host is down.

How frequently the client's TCP sends a SYN to try to establish the Connection? The second segment is sent 5.8 seconds after the first, and the third is sent 24 seconds after the second.
BSD implementations of TCP run a timer that goes off every 500 ms.

Maximum Segment Size


The maximum segment size (MSS) is the largest "chunk" of data that TCP will send to the other end. When a connection is established, each end can announce its MSS. When a connection is established, each end has the option of announcing the MSS it expects to receive. (An MSS option can only appear in a SYN segment.) If one end does not receive an MSS option from the other end, a default of 536 bytes is assumed. (This default allows for a 20-byte IP header and a 20-byte TCP header to fit into a 576-byte IP datagram.)

TCP State Transition Diagram


The rules regarding the initiation and termination of a TCP connection can be summarized in a state transition diagram.

Starting point CLOSED


Appl: passive open send: <nothing> Recv:SYN; send:SYN,ACK

TCP State Transition Diagram

LISTEN
Passive open

Appl:active open
Appl:send data Send: SYN

Recv:RST

Send: SYN

SYN_RCVD

Recv:SYN Send: SYN,ACK simultaneous open Recv:SYN

SYN_SENT
Recv:SYN, Send: ACK ACK Recv: FIN Send: ACK

Appl: close or timeout

Send: <nothing> Appl: close send:FIN

ESTABLISHED
Appl:close Data transfer rate
Send:FIN

CLOSE_WAIT
Appl: send close FIN

Simultaneous close

FIN_WAIT_1
Recv : ACK send : <nothing>

Recv: FIN Send: ACK Recv:FIN, ACK

CLOSING

LAST_ACK

recv:ACK send: <nothing>

Send:ACK Recv: FIN Send: ACK

Recv:ACK send: <nothing>

FIN_WAIT_2

TIME_WAIT
2MSL timeout

Active close

2MSL Wait State


The TIME_WAIT state is also called the 2MSL wait state. Every implementation must choose a value for the maximum segment lifetime (MSL). It is the maximum amount of time any segment can exist in the network before being discarded. RFC 793 specifies the MSL as 2 minutes. Common implementation values, however, are 30 seconds, 1 minute, or 2 minutes.

TCP Options
The TCP header can contain options. The only options defined in the original TCP specification are the end of option list, no operation, and the maximum segment size option.
End of option Kind =0 1 byte No operation Kind =1

1 byte
Maximum Segment size

Kind =2 1 byte

Len =4 1 byte

MSS 2 bytes

TCP Server Design


Most TCP servers are concurrent. When a new connection request arrives at a server, the server accepts the connection and invokes a new process to handle the new client. Depending on the operating system, various techniques are used to invoke the new server. Under Unix the common technique is to create a new process using the fork function. Lightweight processes (threads) can also be used. TCP Server Port Numbers Restricting Local IP Address Restricting Foreign IP Address Incoming Connection Request Queue

TCP Interactive data Flow


Interactive data segments smaller than mss Rlogin a single byte of data Telnet one line at a time delayed acknowledgments, reduce no. of segments Nagle algorithm to reduce no of small segments on slower WAN, facility to disable.

client
keystroke

server
data byte
server Ack of data byte Echo of data byte echo

display Ack of echoed data byte

Remote echo of interactive key stroke

TCP Bulk Data Flow


Bulk data transfer, control on send/recv buffers, no control on congestion TCP uses sliding window protocol for flow control. Fast sender, slow receiver Sliding windows, window size advertise Slow start - congestion window on sender Bulk data throughput- Bandwidth delay product

client 1 2 3
2049:3073, ack 1, win 4096 1:1025, ack 1, win 4096 1025:2049, ack 1, win 4096

server

4
ack 2049, win 4096

5
ack 3073, win 4096

Transfer of 3072 bytes from client to server

Sliding Windows

Offered window advertised by receiver usable window 10 11 cant send until window moves

sent and acknowledged sent, not ACKed

can send ASAP

Visualization of TCP sliding Window

1. The window closes as the left edge advances to the right. 2. The window opens when the right edge moves to the right, allowing more data to be sent. 3. The window shrinks when the right edge moves to the left.

closes

shrinks

opens

Movement of window edges

Bulk Data Throughput


The interaction of the window size, the windowed flow control, and slow start on the throughput of a TCP connection carrying bulk data. We can calculate the capacity of the pipe as capacity (bits) = bandwidth (bits/sec) x round-trip time (sec) This is normally called the bandwidth-delay product. This value can vary widely, depending on the network speed and the RTT between the two ends. For example, a Tl telephone line (1,544,000 bits/sec) across the United States (about a 60-ms RTT) gives a bandwidth-delay product of 11,580 bytes.

Either the bandwidth or the delay can affect the capacity of the pipe between the sender and receiver.
A doubling of the RTT-doubles the capacity of the pipe. Doubling the bandwidth also doubles the capacity of the pipe.

Congestion
Congestion can occur when data arrives on a big pipe (a fast LAN) and gets sent out a smaller pipe (a slower WAN). Congestion can also occur when multiple input streams arrive at a router whose output capacity is less than the sum of the inputs.

It can lead to the router discarding packets.

TCP Timeout and Retransmit


TCP manages 4 different timers for reliable A ReTransmit timer, expecting an ACK from other end A persist timer keeps window size when other side advertises zero size window A Keepalive timer for detecting crash or reboot of other size A 2MSL timer for Time-Wait state

Persist Timer
Using window size, the receiver perform flow control by specifying the amount of data it is willing to accept from the sender. What happens when the window size goes to 0? This effectively stops the sender from transmitting data, until the window becomes nonzero. If an acknowledgment is lost, we could end up with both sides waiting for the other: the receiver waiting to receive data (since it provided the sender with a nonzero window) and the sender waiting to receive the window update allowing it to send. To prevent this form of deadlock from occurring the sender uses a persist timer that causes it to query the receiver periodically, to find out if the window has been increased.

Silly Window Syndrome


Window-based flow control schemes, such as the one used by TCP, can fall victim to a condition known as the silly window syndrome (SWS). When it occurs, small amounts of data are exchanged across the connection, instead of full-sized segments

TCP Performance
It is now common for off-the-shelf hardware (workstations and faster personal computers) to deliver 800,000 bytes or more per second. It is a worthwhile exercise to calculate the theoretical maximum throughput we could see with TCP on a 10 Mbits/sec Ethernet. The next slide shows the total number of bytes exchanged for a fullsized data segment and an ACK.

Field sizes for Ethernet theoretical maximum throughput calculation.


Field Data #bytes ACK #bytes

Ethernet preamble Ethernet destination address Ethernet source address Ethernet type field IP header TCP header user data pad (to Ethernet minimum) Ethernet CRC interpacket gap (9.6 microsec)
total

8 6 6 2 20 20 1460 0 4 12
1538

8 6 6 2 20 20 0 6 4 12
84

We first assume the sender transmits two back-to-back full-sized data segments, and then the receiver sends an ACK for these two segments. The maximum throughput (user data) is then throughput = 2 x 1460 bytes / (2 x 1538 + 84 bytes) x 10,000,000 bits/sec / 8 bits/byte = 1,155,063 bytes/sec If the TCP window is opened to its maximum size (65535, not using the window scale option), this allows a window of 44 1460-byte segments. If the receiver sends an ACK every 22nd segment the calculation becomes throughput = 22 x 1460 bytes / (22 x 1538 + 84 bytes) x 10,000,000 bits/sec / 8 buts/byte = 1,183,667 bytes/sec

Moving to faster networks, such as FDDI (100 Mbits/sec), indicates that three commercial vendors have demonstrated TCP over FDDI between 80 and 98 Mbits/sec.

The following practical limits apply for any real-world scenario. 1. You cant run any faster than the speed of the slowest link. 2. You cant go any faster than the memory bandwidth of the slowest machine. This assumes your implementation makes a single pass over the data. 3. You cant go any faster than the window size offered by the receiver, divided by the round-trip time.

The bottom line in all these numbers is that the real upper limit on how fast TCP can run is determined by the size of the TCP window and the speed of light.

Unix: TCP/IP Implementation Details


Berkeley networking code - implementation by HP-UX, Sun Solaris, AIX and NT Two APIs: Sockets and TLI (Transport Layer interface) System calls and Library Functions: socket, connect, listen, accept, send, receive etc. 4.4 BSD supports: TCP/IP, XNS, OSI protocols, and Unix domain protocols

7: Application

Process
System Calls

6: Presentation 5: Session

(Socket, Bind, Connect, etc.)

Socket Layer

Protocol Layer
(TCP/IP, XNS, OSI, UNIX)

4: Transport 3: Network

Interface Layer
(Ethernet, SLIP, Loopback, etc.)

2: Data Link

Media

1: Physical

General Organization of networking code in Net/3

NT: TCP/IP Implementation Details


Stack is High Perf., portable, 32 Bit of standard TCP/IP protocol. Slight diff. in implementation, configuration and services for NT and 9x platform TCP/IP suite makes Windows NT an internet ready platform Support for standard features, Performance enhancements and services availability

Architectural Model
Suite comprises of core protocols, services and interfaces Transport Driver Interface Network Device Interface (NDIS) Interfaces for user mode applicationWindows socket and NetBIOS

TDI

Windows Sockets NetBT TCP ICMP IP

NetBIOS Support

User mode
Kernel

Interface

mode

UDP IGMP ARP

NDIS
Interface

Network Card Driver(s) Network Media

Вам также может понравиться