Вы находитесь на странице: 1из 22

Linux TCP/IP Stack

TCP / IP
Process

vs.

OSI model
7: Application 6: Presentation 5: Session

Socket layer

4: Transport 3: Network

Protocol Layer (TCP / IP)

2: Data Link

Interface Layer (Ethernet, etc.)

1: Physical Layer

TCP/IP Stack Overview


Process 1: sosend (... ) 5: recvfrom(.)

Socket Layer

2: tcp_output ( . )

4: tcp_input ( ... )

Protocol Layer (TCP Layer)

3: ip_output ( . )

3: ip_input ( ... )

Protocol Layer (IP Layer)

4: ethernet_output ( . )

2: ethernet_input ( .. )

Interface Layer (Ethernet Device Driver)

Physical Media

Output Queue

Input Queue

Process Layer

to

TCP Layer

send (int socket, const char *buf, int length, int flags) Process

Kernel

sendto (int socket, const char *data_buffer, int length, int flags, struct sockaddr *destination, int destination _length)

sendit (struct proc *p, int socket, struct msghdr *mp, int flags, int *return_size)

uipc_syscalls.c

sosend (struct socket *s, struct mbuf *addr, struct uio *uio, struct mbuf *top, struct mbuf *control, int flags )

uipc_socket.c

tcp_userreq (struct socket *s, int request, struct mbuf *m, struct mbuf * nam, struct mbuf * control )

tcp_userreq.c

TCP Layer

tcp_output (struct tcpcb *tp)

tcp_output.c

Socket Layer
sendto (int socket, const char *data_buffer, int length, int flags, struct sockaddr *destination, int destination _length) MBUF Chain m_next m_nextpkt = NULL m_len = 100 28 Bytes data_buffer m_data m_type = MT_DATA m_flags = M_PKTHDR 128 Bytes mBuf 20 Bytes m_next = NULL m_nextpkt = NULL m_len = 50 m_data m_type = MT_DATA m_flags = 0

m_pkthdr.len = 150
m_pkthdr.recvif =NULL 50 Bytes Data

150 Bytes Data

100 Bytes

Data 58 Bytes Unused Space

Socket Layer -sosend passes data and control information to the protocol layer
sosend(struct socket *s, struct mbuf *addr, struct uio *uio, struct mbuf *data_buffer, struct mbuf *control, int flags )

Initialize a new memory buffer and variables to hold flags

no Is there enough space in the buffer sbspace(s->sb_snd)

yes Copy data_buffer mbuf

int error = tcp_usrreq(s, flags, mbuf, addr, control)


yes More buffers to send? no

0 error

Free the memory buffers received

Return value of error to sendto ( )

TCP Layer - tcp_usrreq(struct socket *s, int request, struct mbuf *data_buffer, mbuf *nam, mbuf * control)
Initialize internet protocol control block inp and TCP control block tp to store information useful for TCP

Convert Socket to Internet Protocol Control Block inp = sotoinpcb(so)

Convert the internet protocol control block to a tcp control block tp = intopcb(inp)

request

PRU_SEND int error = tcp_output(tp) return error to tcp_userreq( )

TCP Layer (tcp_output.c) - tcp_output(struct tcpcb *tp)


Called by tcp_usrreq for one of the following reasons: To send the initial SYN To send a finished_sending message To send data To send a window update after data has been received. tcp_ouput ( ) functionality: 1. determines whether TCP can send a segment or not depending on: flags in the data sent by the socket layer to send an ACK, etc. Size of window advertised by the receivers end. Amount of data ready to send whether unacknowledged data already exists for the connection 2. Calculate the amount of data to be sent depending on: size of receivers window number of bytes in the send buffer 3. Check for window shrink 4. Send a segment Allocate a buffer for the TCP and IP header from the header template Copy the TCP and IP header template into the the buffer to be sent. Fill the fields in the TCP header. Decrement the number of buffers to tbe sent, so that the end can be checked. Set sequencenumber and acknowledgement field. Set three fields in the IP header - IP length, TTL and Tos. Pass the datagram to IP

TCP Layer (tcp_output.c) - tcp_output(struct tcpcb *tp)


struct socket *so = tp -> t_inpcb -> inp_socket

Initialize a tcp header tcp_header

Idle is true if the max sequence number equals the oldest unacknowledged sequence number, if an ACK is not expected from the other end. int idle = (tp -> snd_max == tp -> snd_una) false idle true Check ACK Flag Acknowledgement is not expected, set the congestion window to one segment tp -> snd_cwnd = tp -> t_maxseg;

TCP Layer - tcp_output(struct tcpcb *tp)


Acknowledgement is not expected, set the congestion window to one segment tp -> snd_cwnd = tp -> t_maxseg;

off is the offset in bytes from the beginning of the send buffer of the first data byte to send. off bytes have already been sent and acknowledgement on those is awaited. int off = tp -> snd_nxt - tp -> snd_una

Determine length of data that should be transmitted and the flags to be used. len is the minimum number of bytes in the send buffer, win (the minimum of the receivers window) and the congestion window. len = min(so -> so_snd.sb_cc, win) - off

Determine the flags like TH_ACK, TH_FIN, TH_RST, TH_SYN flags = tcp _outflags [ tp -> t_state ]

TCP Layer - tcp_output(struct tcpcb *tp)


Determine the flags like TH_ACK, TH_FIN, TH_RST, TH_SYN flags = tcp _outflags [ tp -> t_state ]

true tp -> t_flags & TF_ACKNOW Send acknowledgement

false

true tp -> t_flags & TF_SYN || TH_RST Send sequence number or reset

false
true tp -> t_flags & TH_FIN Finished sending

false

Ckeck flags to determine the type of message: window probe retransmission normal data transmission

Allocate an mbuf for the TCP & IP header and data if possible. MGETHDR ( m, M_DONTWAIT, MT_HEADR) M_DONTWAIT indicates that if memory is not available for mbuf then come out of the routine and return an error state.

Length of data < 44 Bytes 100 - 40 - 16

no

Create a new mbuf chain, copy the surplus data and point it to the first mbuf chain.

yes Copy the data from the socket send buffer into the new packet header mbuf

ip_output(m, tp->t_inpcb -> inp_options, &tp -> t_inpcb -> inp_route, so -> so_options & SO_DONOTROUTE, 0)

ip_output.c
ip_output(struct mbuf *m, struct mbuf *opt, struct route *ro, int flags, struct ip_moptions *imo)
1. Header initialization 2. Route Selection 3. Source address selection and Fragmentation 1. Header initialization Packets damaged? no The value of flags decides whats to be done with the data IP_FORWARDING : Forward packet IP_ROUTETOIF : Route directly to Interface IP_ALLOWBROADCAST : Allow broadcasting of packet IP_RAWOUTPUT : Packet contains pre-constructed header If the packet has to be forwarded to another host, i.e if the machine is acting as a router, then the IP header for forwarded packets should not be modified by ip_output. Check if there were any errors while adding headers in higher layers. Most of the fields of the IP header are pre defined by higher layer protocols.

yes

ERROR

if ((flags == IP_FORWARDING ) || (flags == IP_RAWOUTPUT )) yes

no

Save header length in hlen for fragmentation algorithm

Construct and initialize IP header set ip_v = 4, clear ip_off assign unique identifier to ip_id length, offset, TTL, protocol, TOS etc are set by higher layers.

If the packet is not being forwarded and has to be sent to another host then initialize the IP header.

2. Route Selection

Verify Cached Route for destination address

A cached route may be provided to ip_output as an argument. UDP and TCP maintain a route cache associated with each socket.

If (cached_route == destination)

yes

Check if the cached route is the correct destination. If a route has not been provided, ip_output sets a temporary route structure called iproute.

no

Find the interface on which the packet has to be placed. Ifp points to the interfaces ifnet structure.

If the cached route is provided, find the interface on which the frame has to be sent.

Locate route : Call rtalloc(dst_ip) to locate a route to the destination. Find the interface on which the packet has to be placed. Ifp points to the interfaces ifnet structure. If rtalloc(dst_ip) fails to find a route, return host unreachable error.

If the packet is being routed, rtalloc locates a route to the address specified by dst. If rtalloc fails, an EHOSTUNREACH error is generated. If ip_forward called ip_output the error is converted to an ICMP error. If the address is found then ifp is made to point to thr ifnet structure for the interface. If the next hop is not the packets final destination, then dst is changed to point to the next hop router.

3. Source address selection and Fragmentation

Check if valid source address is specified.

no

Select the IP address of the outgoing interface as the source address.

The final section of the ip_output ensures that the IP header has a valid source IP address. This couldnt have been done earlier because the route hadnt been selected yet. If there is no source IP then the IP address of the outgoing interface is used as the source IP.

yes

Does the packet have to be fragmented ?

yes

Fragment the packet if its size is greater than the MTU.

Larger packets (packets that exceed the MTU) must be fragmented before they can be sent.

no

If there are no check_sum errors, send the data to if_output function of the selected interface.

In either case (fragmented or not) the checksum is computed (in_cksum). If no errors are found, the data is sent to if_output function of the output interface.

Interface Layer (if_ethersubr.c)


ether_output(struct ifnet *ifp, struct mbuf *mbuf, struct sockaddr *destination, struct rtentry *routing_entry)
1. Verification 2. Protocol-Specific Processing 3. Frame Construction 4. Interface Queuing.

1. Verification

no Ethernet port up and running ? ifp -> if_flags & (IF_UP | IF_RUNNING ) senderr (ENETDOWN)

yes

Interface Layer(if_ethersubr.c) - ether_output(struct ifnet *ifp, struct mbuf *mbuf,


struct sockaddr *destination, struct rtentry *rt_entry)
Function: Takes the data portion of an Ethernet frame ans encapsulates it with a 14-byte header and places it on the interface send_queue. Phases: Verification, Protocol-Specific Processing, Frame Construction, Interface Queuing.

Arguments ifp points to outgoing interfaces ifnet structure mbuf is the data to be sent destination is the destination address rt_entry points o the routing entry

InitializeEthernet header - struct eth_header *eh Verification no Ethernet port up and running ? ifp -> if_flags & (IF_UP | IF_RUNNING ) senderr (ENETDOWN)

yes

0
Route valid ? rt_entry = rtalloc1 (destination, 1) senderr (EHOSTUNREACH)

Next hop a gateway ? rt = rt -> rt_gwroute

Destination responding to ARP requests? If not then do not send more packets to avoid flooding. rt -> rt_flags & RTF_REJECT

no

Verification

Protocol Specific Processing

Functionality: Finds Ethernet address corresponding to the IP address of the destination.

Protocol Specific Processing

destination -> sa_family

AF_INET

Send ARP broadcast to find the ethernet address corresponding to the destination IP address

Use m_copy( ) to keep the packet till an ack. Is recvd.

Frame Preparartion

Protocol Specific Processing

Frame Preparartion Make sure there is room for the 14 byte ethernet header M_PREPEND ( m, sizeof(ethernet_header), M_DONOTWAIT)

Form the Ethernet header from ethernet frame type, ethernet MAC address, unicast ethernet address associated with the output interface. e.g. the default gateway for a host

Frame Preparartion

Interface Queuing yes Is the output queue full Discard the frame Free the memory buff senderr ( ENOBUFS )

no

Place the frame on the interfaces send queue

if_snd

lestart ( ifp )

lestart ( ifp )

Interface Layer(if_le.c) - lestart(struct ifnet *ifp)


Function: Dequeues frames from the interface output queue and arranges for them to be transmitted by the Ethernet Card.

struct le_softc *le = & le_softcl [ ifp -> if_unit ]

0 le -> sc_if.if_flags & IFF_RUNNING return error

1 Copy the the frame in mbuf to the hardware buffer

Set the IFF_OACTIVE on to indicate that the device is busy transmitting.