Академический Документы
Профессиональный Документы
Культура Документы
Yaakov (J) Stein December 2010 Chief Scientist RAD Data Communications
Course Outline
1. Fundamentals of IP
2. The IP header
Fundamentals of IP
IP networks
IP networks are made up of hosts middleboxes (e.g., firewalls, NATs, NAPTs, Application Layer GWs) routers (obsolete terminology gateways)
It will be useful to differentiate between core routers (connect to other routers) edge routers (connect to hosts) To understand how a router is different from other network elements we need to know the basic principles the IP protocol architecture
Y(J)S L3f
IP
Ethernet, PPP UTP, optic, wireless
Routers
A router is a combination of hardware, software, and memory that is responsible for forwarding packets towards their destinations Routers generally work at ISO layer 3 (network layer) but can also function at layer 2.5 (for MPLS) and may inspect higher layers, but only for optimization (QoS management, load balancing, etc.) Note that Ethernet switches technically filter rather than forward
router
switch
In order to correctly fulfill their function (i.e., to know where to forward) routers usually run routing protocols to exchange information between themselves Ethernet switches do not need such protocols as they learn how to filter So the router performs 2 distinct algorithms : forwarding algorithm (forwarding component) routing algorithm (control component)
Y(J)S L3f
Router interfaces
Routers connect to hosts and to other routers via interfaces (from 1 to many thousands of interfaces per router) 0.0.0.0/0 Routers are responsible for forwarding packets arriving at an ingress interface to an egress interface
.2 .1
192.0.2.0/25
192.0.2.128/25
Interfaces have layer 3 and above properties and also contain layer a and 2 properties (ports)
.129
Most interfaces have unique IP addresses (but there are also un-numbered interfaces)
Interfaces are grouped into subnetworks All interfaces on a subnetwork share the same prefix
.130
.3
.131
Y(J)S L3f
Router forwarding
Based on the 6 principles we can understand how routers forward
1. 2. 3. 4. 5. 6.
The router looks at the packet header It deduces to which subnet the packet belongs If the router can directly interface that subnet it must use the appropriate L2 to send the packet to the host Otherwise it must retrieve the next hop (router) that sends the packet towards the subnet The next router does the same If routing has converged there will be no loops or black holes but there may be during transients
The information needed by the router to properly forward packets is stored in the Forwarding Information Base (FIB) The FIB associates address prefixes with Next Hops (NHs) (and, to save an additional lookup, usually with L2 addresses as well)
More on FIBs
Simple (and primitive) routers have a routing table modern large routers have several different databases The FIB is designed to be fast to search RIBs are designed to be fast to update There may be many RIBs, one for each routing protocol running and static routes may be entered into any of them
There are sometime other databases as well for example link state routing protocols require a LSDB from which the RIB is built
The FIB is built from RIBs
entry 0 prefix 192.168.16.0/24 192.168.196.0/20 interface 1 2 NH 10.10.1.1 10.16.54.2 L2 PPP Eth L2 parameters
FIB
2
3
192.168.0.0/17
0.0.0.0/0
2
3
10.16.1.16
10.1.1.0
Eth
Eth
Y(J)S L3f
10
IP Routing types
IS-IS
send <neighbor-addr,cost> to all routers determine entire flat network topology (SPF - Dijkstras algorithm) fast convergence*, guaranteed loopless => interior routing protocol (IGP)
*convergence time is the time taken until all routers work consistently
before convergence is complete packets may be misforwarded, and there may be loops
Y(J)S L3f
11
The IP Header
IPv4 header
VER
ToS IHL DSCP ECN Identification flags Protocol TTL
Total Length Fragment Offset Header Checksum Source Address (SA) Destination Address (DA) Options padding
VER (4b) = 0100 IHL (4b) = Internet Header Length in 32 bit words (if no options then IHL=5) ToS (1B) = DSCP(6b) + ECN(2b) Total length (2B) = length of header + payload in bytes (64K) Identification (2B) = ID for fragment Flags (3b) = reserved bit + DF (dont fragment) + MF (more fragments) Fragment Offset (13b) = byte number of first byte in fragment (from zero) Header Checksum (16b) = 1s complement sum of header words (0) TTL (1B) = Time To Live counter decremented by each forwarder Protocol (1B) = protocol code (maintained by IANA, 1=ICMP 4=IPv4 6=TCP 17=UDP) Addresses (4B each) Y(J)S L3f 13
ToS
RFC 3168 gives the history of the ToS byte which was redefined several times
Originally it contained :
Precedence (3b) similar to Ethernet P-bits D (1b) requests low delay T (1b) requests high throughput R (1b) requests high reliability reserved (2b)
DSCP (6b) (useful values in draft-baker-diffserv-basic-classes) ECN 00: Non ECN-Capable Transport - Non-ECT 10: ECN Capable Transport - ECT(0) 01: ECN Capable Transport - ECT(1) 11: Congestion Encountered CE
Y(J)S L3f
14
TTL
TTL is used to terminate routing loops limit the lifetime of TCP segments enable traceroute
TTL was originally intended to be the time-in-flight in seconds but since a router must decrement by at least 1 (even if the time << 1 sec) each router is required to reduce the TTL field by at least one so it is effectively a hop-count
(today no-one decrements by more than 1 because of time)
TTL expiration causes routers to discard packets but TTL=1never triggers a discard for a destination host a router receiving a packet destined for it
Y(J)S L3f
15
net22
host8
The set of all addresses with a shared prefix is called a subnet IP can even support arbitrary levels of hierarchy by advertising aggregated addresses AS13
AS14
AS12
AS13
Y(J)S L3f
16
Y(J)S L3f
17
Broadcast must be supported by L2, and are used for purposes such as
ARP, DHCP, routing advertisements to all routers
Broadcast packets are never forwarded by L3 forwarding devices All-ones must never appear as SA and as DA must be accepted by all routers and hosts Directed broadcast may appear as SA as DA it must be accepted by hosts and by default must be accepted by all routers and by default are forwarded (only prefixes count)
but there must be an option not to forward, which must default to forward
might want to turn off for security reasons!
Y(J)S L3f
18
Y(J)S L3f
19
Y(J)S L3f
20
Multicast addresses
RFC 1112 defines IP addresses with MS byte E0- EF (i.e. addresses 224.0.0.0-239.255.255.255) to be multicast addresses A multicast address stands for a group of hosts
membership is dynamic (hosts may join/leave a group at any time) there is no restriction on the number of hosts in a group a host may belong to many groups
Multicast IP forwarding is performed by a multicast router which may be a regular router too Addresses between 224.0.0.0 and 224.0.0.255 (inclusive) are reserved for special purposes (e.g., routing protocols) Multicast routers should not forward packets with such a destination address We wont discuss multicast further
Y(J)S L3f
21
/32 are fully qualified IP addresses 0.0.0.0/0 matches every IP address it is the default route route taken when there is no matching entry in FIB the gateway of last resort
Y(J)S L3f
22
Prefix tables
slash
A.B.C.D/32 A.B.C.D/31 A.B.C.D/30
Mask
255.255.255.255 255.255.255.254 255.255.255.252
slash
A.B.0.0/16 A.B. 0.0/15 A.B. 0.0/14 A.B. 0.0/13 A.B. 0.0/12 A.B. 0.0/11 A.B. 0.0/10 A.B.0.0/9 A.0.0.0/8 A.0.0.0/7
mask
255.255.0.0 255.254.0.0 255.252.0.0 255.248.0.0 255.240.0.0 255.224.0.0 255.192.0.0 255.128.0.0 255.0.0.0 254. 0.0.0
A.B.C.D/29
A.B.C.D/28 A.B.C.D/27 A.B.C.D/26 A.B.C.D/25 A.B.C.0/24 A.B.C.0/23 A.B.C.0/22 A.B.C.0/21 A.B.C.0/20 A.B.C.0/19 A.B.C.0/18
Note : for /25 D=0 or 128 for /26 D= 0, 64, 128, or 192 etc.
255.255.255.248
255.255.255.240 255.255.255.224 255.255.255.192 255.255.255.128 255.255.255.0 255.255.254.0 255.255.252.0 255.255.248.0 255.255.240.0 255.255.224.0 255.255.192.0 255.255.128.0
A.0.0.0/6
A.0.0.0/5 A.0.0.0/4 A.0.0.0/3 A.0.0.0/2 A.0.0.0/1 0.0.0.0/0
252. 0.0.0
248. 0.0.0 240. 0.0.0 224. 0.0.0 192. 0.0.0 128. 0.0.0 0.0.0.0
Y(J)S L3f
A.B.0.0/17
23
127.0.0.0/8
169.254.0.0/16 172.16.0.0/12 192.0.2.0/24 192.88.99.0/24 192.168.0.0/16 198.18.0.0/15 224.0.0.0/4 240.0.0.0/4
127.0.0.0 127.255.255.255
169.254.0.0 - 169.254.255.255 172.16.0.0 - 172.31.255.255 192.0.2.0 - 192.0.2.255 192.88.99.0 - 192.88.99.255 192.168.0.0 - 192.168.255.255 198.18.0.0 - 198.19.255.255 224.0.0.0 239.255.255.255 240.0.0.0 255.255.255.255
loopback addresses
zeroconf private addresses Documentation IPv6-IPv4 relay private addresses device benchmark multicast reserved
Y(J)S L3f
24
Fragmentation
All IP hosts/routers must be able to handle 576 byte packets IP packets can be up to 64K bytes in size but L2 may limit the size of packets that can be transported
IPv4 routers must be able to fragment an incoming packet into a number of forwarded packets (and should minimize the number of fragments)
All hosts must support reassembly (warning security hole!) and routers must be able to reassemble packets for itself (e.g., routing)
(IPv6 does not use fragmentation, instead the sender host determines MTU)
MTU may be learned from routing protocols When originating a packet a router should support RFC 1191 path MTU discovery We wont discuss fragmentation further
Y(J)S L3f
25
IP Options
Some options require the router to inserts its address into the header (if there is free space)
Unrecognized IP options must be passed unchanged Routers MUST support
Source Route options (but there may be an option to discard such packets) Record Route option Timestamp option
Packets with IP options are often forwarded via slow path Some large routers are rumored to discard any packet with options We wont delve too deeply into option processing
Y(J)S L3f
26
Routers
A router is an IP network element with interfaces on at least 2 subnets Routers were originally gateways
(the term is still seen in default GW, Border Gateway Protocol, )
although now that word is now usually reserved for network elements that interfaces networks of different technologies Initially there was no separation between
this was changed by introduction of the concept of a FEC Routers perform many functions:
L2 functions IP forwarding IP routing (control plane) system support (management plane, error logging, etc.) the fast path (simple forwarding) the slow path (control protocol packets and special cases)
Y(J)S L3f
28
Routers L2 functions
L2 functionality includes : encapsulating and decapsulating IP packets into L2 CRC checking/generation sending and receiving IP packets (up to MTU) translating IP destination address into L2 address (including ARPs)
We wont discuss these here
Y(J)S L3f
29
Routers IP forwarding
The forwarding plane receives and forwards IP packets parsing (pulling values from appropriate fields simple IPv4 DA, complex finding URL or MIB variable) dropping invalid packets lookup/search classifying FECs (add metadata, based on DA, DA+ToS, MPLS, ) modification and replication of packets recognizing error conditions and generating ICMP messages fragments packets when necessary choosing a next-hop destination for each packet based on information in its Forwarding Information Base (FIB) forwarding packet traffic management and queuing compression, encryption, etc.
We wont discuss all of these
Y(J)S L3f
30
Y(J)S L3f
31
Error messages
IP networks are best effort, so there are no guarantees that packets sent will actually be delivered
ICMP (in addition to the well-known ping and traceroute) provides a basic error reporting mechanism :
Destination Unreachable Messages Source Quench Messages Time Exceeded Messages Redirect Messages Parameter Problem Messages
There are no such reports in cases where we will say silently discard and when the problematic packet is any of the following :
an ICMP error message a packet that fails the header validation tests (unless specified there) a packet destined to an IP broadcast or IP multicast address a packet sent as a L2 broadcast or multicast a packet with a Martian SA a packet is a non-initial fragment
Y(J)S L3f
32
34
Layer-2 delivery
IP packets are always delivered over a lower layer (L2), such as
The L2 is not specified by IP standards, but must provide the following services : delivery only of packets that passed basic error detection i.e., discard of faulty packets delivery only of packets identified as IPv4 e.g., by EtherType or UPI delineation of packet including elimination of padding supplying length of IP packet (we will call this the L2-length)
Y(J)S L3f
35
The router MAY try to determine why the check failed IP header was truncated by lower layer IP header was corrupted not IPv4 (e.g., IPv6) sender purposely generated illegal IP header
Y(J)S L3f
36
Whats next ?
Now that we are reasonably sure that the packet header is OK we can look at it
Note that we do NOT check the TTL field yet because packets for the router itself are not discarded because of TTL expiry Note that reassembly is not performed (and is only performed by a router for packets destined for itself) Most of the IP options are processed now (but we wont go into the details) except those requiring the routers IP address that can only be processed after the forwarding decision The next step is to observe the address fields
Y(J)S L3f
37
Martian Addresses
Illegal IP addresses are called Martian addresses because they appear as if from outside this world This includes addresses previously described, for example :
A router should silently discard packets with Martian addresses There may be a switch to disable this check but it must default to perform checks! When a packet is discarded because of these rules, the details should be logged
Y(J)S L3f
38
Y(J)S L3f
39
ACLs
Access Control Lists offer an additional (basic) security mechanism as well as a method of controlling/limiting traffic
Routers should implement configurable ACLs When enabled, the router observes source and destination addresses and optionally other fields (e.g., protocol field, L4 ports) before forwarding packets Forwarding may be either according to include lists or exclude lists Include list description of packets to be forwarded Exclude list - description of packets to be blocked When a packet is blocked based on ACLs the details should be logged an ICMP unreachable message should be sent (configuration option) Certain vendors have considerably expanded the ACL functionality
Y(J)S L3f
40
Where does it go ?
The next step is to look at the DA There are three possibilities : 1. the packet is destined for the router (local delivery) and should be queued for local delivery (reassembled if needed) and processed according to regular IP host rules (RFC 1122) 2. the packet is not destined for the router and should be queued for forwarding 3. the packet should be queued for forwarding and a copy must be queued for local delivery Cases where the packet is destined for the router must be handled first
Y(J)S L3f
41
Is it for me ?
The packet is destined only for the router (case 1) if its DA is any of the following :
one of the routers addresses (exact match) the limited broadcast address (FF.FF.FF.FF) a multicast address that is never forwarded (e.g., 224.0.0.1 or 224.0.0.2) AND at least 1 logical interface associated with the physical interface on which the packet arrived is a member of the destination multicast group
When a router receives such a packet it must perform all the functionality of a regular host (including L4 functions and higher)
Y(J)S L3f
42
Is it also for me ?
The packet is destined for the router and to be forwarded (case 3) if its DA is any of the following : a directed broadcast address that addresses at least one of the routers logical interfaces but does not address any of the logical interfaces associated with the physical interface on which the packet arrived a multicast address which may be forwarded and at least one of the logical interfaces associated with the physical interface on which the packet arrived is a member of the destination multicast group Note :
A packet is delivered locally if the packets DA is a directed broadcast address that addresses at least one of the logical interfaces associated with the physical interface on it arrived It is also forwarded unless the link on which the packet arrived uses an encapsulation that does not encapsulate broadcasts differently than unicasts (e.g., by using different Link Layer destination addresses).
Y(J)S L3f
43
Caveat
Routers generally do not forward packets received as L2 broadcasts
A packet is delivered locally if the packets DA is a directed broadcast address that addresses at least one of the logical interfaces associated with the physical interface on it arrived It is also forwarded unless the link on which the packet arrived uses an encapsulation that does not encapsulate broadcasts differently than unicasts (e.g., by using different Link Layer destination addresses) The idea is to deal with a directed broadcast to another network prefix on the same physical link. If the sender sends the broadcast to the router as a L2 unicast this is OK, since the router sees a unicast destined for a different network prefix than the sender sent it on. So the router can safely send it as a Link Layer broadcast out the same physical link. But if the router cant tell whether the packet was received as a L2 unicast, we must ensure that it plays it safe.
Y(J)S L3f
44
TTL
OK, so the packet needs to be forwarded
the next step is to decrement the TTL if the router is so slow that the time is longer than 1 second the router may decrement by more than 1
if TTL 0 then the packet is discarded if the destination is not a multicast address the router must return an ICMP Time Exceeded message a router must not discard an IP unicast or broadcast packet with TTL>0 even if it is sure that another router along the path to the destination will decrement the TTL to zero however, a router may do so for IP multicasts (for efficiency reasons)
Y(J)S L3f
45
Forwarding
OK, so the packet still needs to be forwarded
Now we have to determine --- to whom ? This is decided based on the destination address This decision isnt so simple :
The destination address is not necessarily already in the DA field Once the true destination address is known the router must still determine if the destination host is directly connected to it and if so on which interface ? OR if it needs to pass it through another router, and if so, what is the next routers address (Next-Hop address)
Y(J)S L3f
46
Source routing
Finding the true destination address is made more complex because of source routing Source routing allows (partial or complete) path specification rather than relying on the route determined by the routing protocols Source routing adds determinism and may enable achieving performance goals Two different IP header options are available : SSRR Strict Source and Record Route LSRR Loose Source and Record Route
Note: LSRR packets are often blocked since they enable spoofing attacks
and the next router in the sequence in the DA The router must use the IP DA not the address of the ultimate destination (last address in the option) when determining how to handle a packet
Y(J)S L3f
47
Y(J)S L3f
48
Directly accessible ?
OK, now we have the destination address, what do we do with it? The next step is to determine if the destination host is directly accessible
The first step in the general algorithm is applicable only if the router has interfaces without IP addresses We will omit this case The router now looks at each of its interfaces each of which has an IP address starting with a prefix For each such prefix compare to the corresponding set of bits in the packets DA if they match the packet can be transmitted through the interface Note: there can never be >1 match in a properly configured router If no match is found, then we need to find the Next-Hop router
49
50
51
When unreachable, the ICMP error codes are different with ToS
Y(J)S L3f
52
53
Finishing up
OK, now we finally know which FIB entry to use
1. retrieve from the FIB the (logical) interface through which we need to send the packet 2. if the packet has IP options that require the routers IP address (e.g., Record Route, Timestamp) we can now process them using the routers address corresponding to the interface 3. fix the IP checksum (in general we do not have to recalculate we can fix)
(NB TCP checksum is never changed, when using source routing the TCP checksum pseudoheader has the ultimate DA)
4. retrieve the L2 parameters for the interface (e.g., Ethernet MACs and VLANs, GRE parameters, ) and build the L2 frame 5. finally, queue the packet for transmission
Y(J)S L3f
54
Search
We need to perform the following on our data structures : Insert Delete Modify Search and to check the following metrics : Time complexity for each of the above Size (spatial) efficiency Scalability
Y(J)S L3f
56
LookUp Table
The simplest (and fastest) data structure is the LookUp Table (LUT) AKA indexed array, Location Addressable Memory (LAM) The incoming address is used as an index to access the (NH) information We can put in more info, e.g., L2 type and address, to save further lookups Example :
address 0 1 2 interface 1 2 3 NH 192.0.2.0 192.0.2.16 192.0.2.128 L2 type Ethernet PPP SDH L2 NH address 00-17-42-F7-14-14 VC ID
Limitations only for exact match, not LPM limitation only for small number of possible addresses, e.g. VLANs We can use LUTs after other data structures that return a key as an index
Y(J)S L3f
57
Hash tables
Hash tables enable handling a large number of potential addresses
A hashing function is a function from a large variable (large number of bits or large number of bytes) to a small variable (small number of bits or bytes) which is white (small changes in the input create widely different outputs) Hashing long addresses returns a short index
The problem is that the hashing function is not 1:1 so there will always be the probability of hash clashes (collisions) Solutions : perfect hashing only when addresses are known ahead of time index in table returns list of all addresses stored multiple hashing linear probing, quadratic probing Hashing is very good for exact match (e.g. MAC addresses) but is not suitable for LPM
Y(J)S L3f
58
Hash implementation
From an efficiency point of view hash tables are between LUTs and search tables (to be discussed next) To control collisions, we need a relatively large table (birthday paradox !) 23 people Example : 57 people 99% 99% probability of collision when 3000 entries are put into a hash table of size 1 million Using multiple hashing average computational load is O(1 + keys/table-size) A primitive hash function is the modulo hash H(key) = addr mod table-size table-size should be prime must not be a power of 2 or close
A better hash function is (for appropriate integer m and fraction f) H(key) = Trunc [ m Frac( f addr) ]
The quality of the hashing function depends on f For Fibonacci hashing : f = 1 / is the golden ratio (5 2) 1.618
Y(J)S L3f
59
Search table
The most spatially efficient data structure is the search table It is similar to a LUT but the addresses being looked up are not indexes rather, we need to sequentially search the table for the address We can reduce the search time by ordering the table
2 3
192.168.0.0/17 0.0.0.0/0
2 3
10.16.1.16 10.1.1.0
Y(J)S L3f
60
Y(J)S L3f
61
Linked lists
Search tables are hard to maintain if we need to insert or delete an element we need extensive moving Linked lists are designed to simplify mechanics of such updates still need exhaustive search to find where to insert Linked lists can be singly or doubly linked
start start end
Y(J)S L3f
62
Y(J)S L3f
63
Tree structures
A graph is a collection of nodes and edges connecting them In a directed graph the edges have direction and so or every edge there is a father node and a child node Nodes without children are called leaves A forest is a directed graph without loops (only one path between 2 nodes) A tree is a forest with a single root -- it thus defines a partial order
(it is conventional to draw the tree upsidedown)
A binary tree has no more than 2 output edges per node forest tree
binary tree
Trees can be implemented using arrays, pointers (like linked lists), heaps, etc.
Y(J)S L3f
64
Search trees
Search trees may store data
start at the root find all children of root and check for desired data if not found, find all childrens children recurse
Depth first
start at the root find first child and check for desired data if not found, find first child of first child recurse until data found or leaf if leaf backtrack to father and try the next child
Y(J)S L3f
65
Binary trees
Binary Search Trees simplify storage and manipulation We call the children of a node L and R
Perfect binary trees have exactly 2 children for each internal node Thus there are exactly 2H leaves, where H is the height of the tree Altogether there are (2H+1 1) nodes
To store keys in a binary search tree place keys in all nodes for every intermediate node N key(L) < key(N) L key(R) > key(N)
To search for a key in a BST start at root (set current node to root) check if key is stored in current node if not : if key < key(N) then set current to L else set current to R
In the worst case this takes only H comparisons (for perfect trees O(log N))
Y(J)S L3f
66
3 1 3 5 7
7
Y(J)S L3f
67
Tries
From retrieve, but usually pronounced try A trie is an ordered tree with subvalues on edges values at leaves Tries are related to prefix search representations Tries were originally developed for exact match Tries can be binary or not Tries are good for all lexicographical searches O(log n) but particularly efficient for LPM Trie variants are the fastest known lookups - O(log log n) LC-trie used in modern Linux router implementations
Y(J)S L3f
68
Example trie
Assume the following FIB : 10.0.128.0/16 10.0.0.0/8 192.168.0.0/16 192.168.128.0/24 0.0.0.0/0 0 Note:
in the plain trie, all IP prefixes have the same length 0
10 0 0 128 0 0
0
/16
0 /0 /16
/8
/24
Y(J)S L3f
69
0.0/8
128.0/16
0.0/16
128.0/24
Now maximum length is 2 We could also save memory by exploiting common suffixes
Y(J)S L3f
70
71
Example LC-trie
binary trie LC-trie
Y(J)S L3f
72
73
M1[1]
M1[2]
M1[3]
M2[1]
M2[2]
M2[3]
M3[1]
M3[2]
M3[3]
match lines
search lines
search word
Q[2]
Q[3]
Y(J)S L3f
74
Y(J)S L3f
75
MPLS
77
when a shim header is needed, its format should be: Label there are 220 different labels (+ 220 multicast labels) Traffic Class (ex-EXP) was CoS in Cisco Tag Switching could influence packet queuing Stack bit S=1 indicates bottom of label stack TTL decrementing hop count used to eliminate infinite routing loops generally copied from/to IP TTL field
Special (reserved) labels 0 IPv4 explicit null 1 router alert 2 IPv6 explicit null 3 implicit null 13 MPLS-TP GAL
78
label stack operation to be performed any other info needed to forward (e.g. L2 format, how label is encoded)
if next hop is the LSR itself then operation must be pop for multicast there may be multiple next hops, and packet is replicated
ILM contains:
a NHLFE for each incoming label possibly multiple NHLFEs for a label, but only one used per packet
Y(J)S L3f
79
IP router
pure IP link
LER
MPLS link
LSR
Y(J)S L3f
80
port/label in 2/17
ILM
NHLFE
port/label in 2/17
operation swap
Y(J)S L3f
81
LER Architecture
control plane IP routing table
free label table
IP routers
egress LER
labeling procedure
Y(J)S L3f
82
NAT
What is a NAT ?
Network Address Translation is the mapping of addresses as known in a network or to an end-user into addresses applicable to another network or end-user NAT functionality is located at a higher layer than IP (L4 and L7) but is often co-located with router functionality
NAT functionality needs to surround the router
When we speak of addresses here we mean socket addresses = L3 IP address + L4 port number For IP applications NAT is used for many reasons, including :
mapping private addresses into public ones conserving public addresses load balancing blocking malicious traffic tunneling IPv6 over IPv4
Y(J)S L3f
84
85
All NATs we discuss may process both (but only IPv4) socket addresses as well as fields in protocols that do not have ports (e.g., ICMP)
Y(J)S L3f
86
NAT varieties
S=I E D=X Until recently, NATs were classified as follows (RFC 3489) : full cone (1:1) out A:IP EA:EP independent of X in any X IA:IP address restricted cone out IA:IP EA:EP independent of X in XA:XP IA:IP only if IA:IP previously sent to XA:anyport port restricted cone out IA:IP EA:EP independent of X in XA:XP IA:IP only if IA:IP previously sent to XA:XP symmetric out IA:IP EA:EP with a different mapping for each X in XA:XP IA:IP only if IA:anyport already sent to XA:anyport
Today, these definitions are considered insufficient ! (e.g., RFC 4787)
Y(J)S L3f
S=I D=X
S=E D=X
87
State is retained in the NAT table for sessions (usually < 4K)
When the first packet of a session is received (from a host behind the NAT) sockets are compared with configured rules, if OK IA:IP is assigned EA:EP and the mapping inserted in the NAT table the present packet is corrected (including checksums, etc.) and sent packets (in both directions) belonging to the session are accepted and corrected when the session terminates (based on TCP FIN or time-out) the NAT entry is deleted
Y(J)S L3f
88
ICMP error messages DNS, routing protocols, etc. FTP SIP (over TCP or UDP) H.323, Skinny, RTSP Instant Messaging, peer-to-peer
Not only are these in unknown places in the payload They can be in different formats (including text) Such cases require Application Level Gateways (ALGs)
Y(J)S L3f
89
CLIENT
SERVER 21 20
need to swap
90
P
P,Q,R 1024 and random
CLIENT
SERVER
21
need to swap
server-side NAT problem
91
Example: SIP
ALICE SIP- PROXY SIP- PROXY BOB
RTP traffic
????
Y(J)S L3f
92
Other cases
ICMP error messages include part of the that triggered the message the IP header the first few bytes of the packet payload The internal IP and L4 headers need to be corrected
Peer-to-peer and Instant Messaging are even harder than VoIP the peer contacts from the network without any previous control We wont discuss the next level of NAT here hole punching algorithms STUN - Session Traversal Utilities for NAT RFC 5389 TURN - Traversal Using Relay NAT RFC 5766 ICE - Interactive Connectivity Establishment RFC 5245 These allow clients behind the NAT to discover the NAT (and learn some parameters and possibly characterize it) and rely on a STUN server on the network side of the NAT
Y(J)S L3f
93
NAT functions
Mapping outgoing packets SA IA:IP EA:EP (possibly dependent on XA) Blocking incoming packets if SA XA not known Mapping incoming packets DA EA:EP IA:IP (possibly dependent on XA) Create new mapping when receive outgoing packet from new session Delete mapping according to session termination or aging ALG
Y(J)S L3f
94
NAT processing
outgoing packets
perform L3 forwarding lookup socket(s) in NAT table if found block or perform NAT (according to table) else check rules if pass create new session else block
incoming packets
lookup socket(s) in NAT table if found block or perform NAT perform L3 forwarding else block packet
Y(J)S L3f
95