Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Foundations of Python Network Programming
Foundations of Python Network Programming
Foundations of Python Network Programming
Ebook800 pages9 hours

Foundations of Python Network Programming

Rating: 3.5 out of 5 stars

3.5/5

()

Read preview

About this ebook

Foundations of Python Network Programming, Third Edition, covers all of the classic topics found in the second edition of this book, including network protocols, network data and errors, email, server architecture, and HTTP and web applications, plus updates for Python 3.

Some of the new topics in this edition include:

• Extensive coverage of the updated SSL support in Python 3

• How to write your own asynchronous I/O loop.

• An overview of the "asyncio" framework that comes with Python 3.4.

• How the Flask web framework connects URLs to your Python code.

• How cross-site scripting and cross-site request forgery can be used to attack your web site, and how to protect against them.

• How a full-stack web framework like Django can automate the round trip from your database to the screen and back.

If you're a Python programmer who needs a deep understanding of how to use Python for network-related tasks and applications, this is the book for you. From web application developers, to systems integrators, to system administrators—this book has everything that you need to know.

LanguageEnglish
PublisherApress
Release dateOct 20, 2014
ISBN9781430258551
Foundations of Python Network Programming

Related to Foundations of Python Network Programming

Related ebooks

Programming For You

View More

Related articles

Reviews for Foundations of Python Network Programming

Rating: 3.5714285 out of 5 stars
3.5/5

7 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Foundations of Python Network Programming - Brandon Rhodes

    © Brandon Rhodes and John Goerzen 2014

    Brandon Rhodes and John GoerzenFoundations of Python Network Programming10.1007/978-1-4302-5855-1_2

    2. UDP

    Brandon Rhodes¹  and John Goerzen¹ 

    (1)

    GA, United States

    The previous chapter described modern network hardware as supporting the transmission of short messages called packets, which are usually no larger than a few thousand bytes. How can these tiny individual messages be combined to form the conversations that take place between a web browser and server or between an e-mail client and your ISP’s mail server?

    The IP protocol is responsible only for attempting to deliver each packet to the correct machine. Two additional features are usually necessary if separate applications are to maintain conversations, and it is the job of the protocols built atop IP to provide these features.

    The many packets traveling between two hosts need to be labeled so that the web packets can be distinguished from e-mail packets and so that both can be separated from any other network conversations in which the machine is engaged. This is called .

    All of the damage that can occur to a stream of packets traveling separately from one host to another needs to be repaired. Missing packets need to be retransmitted until they arrive. Packets that arrive out of order need to be reassembled into the correct order. Finally, duplicate packets need to be discarded so that no information in the data stream gets repeated. This is known as providing a .

    This book dedicates a chapter to each of the two major protocols used atop IP.

    The first, the User Datagram Protocol (UDP), is documented in this chapter. It solves only the first of the two problems outlined previously. It provides port numbers, as described in the next section, so that the packets destined for different services on a single machine can be properly demultiplexed. Nevertheless, network programs using UDP must still fend for themselves when it comes to packet loss, duplication, and ordering.

    The second, the Transmission Control Protocol (TCP), solves both problems. It both incorporates port numbers using the same rules as UDP and offers ordered and reliable data streams that hide from applications the fact that the continuous stream of data has in fact been chopped into packets and then reassembled at the other end. You will learn about using TCP in Chapter 3.

    Note that a few rare and specialized applications, such as multimedia being shared among all hosts on a LAN, opt for neither protocol and choose instead to create an entirely new IP-based protocol that sits alongside TCP and UDP as a new way of having conversations across an IP network. This not only is unusual but, being a low-level operation, is unlikely to be written in Python, so you will not explore protocol engineering in this book. The closest approach made to raw packet construction atop IP in this book is the Building and Examining Packets section near the end of Chapter 1, which builds raw ICMP packets and receives an ICMP reply.

    I should admit up front that you are unlikely to use UDP in any of your own applications. If you think UDP is a great fit for your application, you might want to look into message queues (see Chapter 8). Nonetheless, the exposure that UDP gives you to raw packet multiplexing is an important step to take before you can be ready to learn about TCP in Chapter 3.

    Port Numbers

    The problem of distinguishing among many signals that are sharing the same channel is a general one, in both computer networking and electromagnetic signal theory. A solution that allows several conversations to share a medium or mechanism is known as a scheme. It was famously discovered that radio signals can be separated from one another by using distinct frequencies. In the digital realm of packets, the designers of UDP chose to distinguish different conversations using the rough-and-ready technique of labeling each and every UDP packet with a pair of unsigned 16-bit port numbers in the range of 0 to 65,536. The identifies the particular process or program that sent the packet from the source machine, while the specifies the application at the destination IP address to which the communication should be delivered.

    At the IP network layer, all that is visible are packets winging their way toward a particular host.

    Source IP ® Destination IP

    But the network stacks of the two communicating machines—which must, after all, corral and wrangle so many separate applications that might be talking—see the conversation as much more specifically being between an IP address and port number pair on each machine.

    Source (IP : port number) ® Destination (IP : port number)

    The incoming packets belonging to a particular conversation will always have the same four values for these coordinates, and the replies going the other way will simply have the two IP numbers and two port numbers swapped in their source and destination fields.

    To make this idea concrete, imagine you set up a DNS server (Chapter 4) on one of your machines with the IP address 192.168.1.9. To allow other computers to find the service, the server will ask the operating system for permission to receive packets arriving at the UDP port with the standard DNS port number: port 53. Assuming that a process is not already running that has claimed that port number, the DNS server will be granted that port.

    Next, imagine that a client machine with the IP address 192.168.1.30 wants to issue a query to the server. It will craft a request in memory and then ask the operating system to send that block of data as a UDP packet. Since there will need to be some way to identify the client when the packet returns and since the client has not explicitly requested a port number, the operating system assigns it a random one—say, port 44137.

    The packet will therefore wing its way toward port 53 with addresses that look like this:

    Source (192.168.1.30:44137) ® Destination (192.168.1.9:53)

    Once it has formulated a response, the DNS server will ask the operating system to send a UDP packet in response that has these two addresses flipped around the other way so that the reply returns directly to the sender.

    Source (192.168.1.9:53) ® Destination (192.168.1.30:44137)

    Thus, the UDP scheme is really quite simple; only an IP address and port are necessary to direct a packet to its destination.

    But how can a client program learn the port number to which it should connect? There are three general approaches.

    Convention: The Internet Assigned Numbers Authority (IANA) has designated many port numbers as the official, well-known ports for specific services. That is why DNS was expected at UDP port 53 in the foregoing example.

    Automatic configuration: Often the IP addresses of critical services such as DNS are learned when a computer first connects to a network, using a protocol such as DHCP. By combining these IP addresses with well-known port numbers, programs can reach these essential services.

    Manual configuration: For all of the situations that are not covered by the previous two cases, manual intervention by an administrator or user will have to deliver an IP address or the corresponding hostname of a service. Manual configuration in this sense is happening, for example, every time you type a web server name into your web browser.

    When making decisions about defining port numbers, such as 53 for DNS, IANA thinks of them as falling into three ranges—and this applies to both UDP and TCP port numbers.

    Well-known ports (0–1023) are for the most important and widely used services. On many Unix-like operating systems, normal user programs cannot listen on these ports. In the old days, this prevented troublesome undergraduates on multiuser university machines from running programs that masqueraded as important system services. Today the same caution applies when hosting companies hand out command-line Linux accounts.

    Registered ports (1024–49151) are not usually treated as special by operating systems—any user can write a program that grabs port 5432 and pretends to be a PostgreSQL database, for example—but they can be registered by IANA for specific services, and IANA recommends you avoid using them for anything but their assigned service.

    The remaining port numbers (49152–65535) are free for any use. They, as you will see, are the pool on which modern operating systems draw in order to generate arbitrary port numbers when a client does not care what port it is assigned for its outgoing connection.

    When you craft programs that accept port numbers from user input such as the command line or configuration files, it is friendly to allow not just numeric port numbers but human-readable names for well-known ports. These names are standard, and they are available through the function inside Python’s standard socket module. If you want to ask the port for the Domain Name Service, you can find out this way:

    >>> import socket

    >>> socket.getservbyname('domain')

    53

    As you will see in Chapter 4, port names can also be decoded by the more complicated function, which is also provided by the socket module.

    The database of well-known service names and port numbers is usually kept in the file /etc/services on Linux and Mac OS X machines, which you can peruse at your leisure. The first few pages of the file, in particular, are littered with ancient protocols that still have reserved numbers despite not having had an actual packet addressed to them anywhere in the world for many years. An up-to-date (and typically much more extensive) copy is also maintained online by IANA at www.iana.org/assignments/port-numbers .

    Sockets

    Rather than trying to invent its own API for network programming, Python made an interesting decision. At bottom, Python’s Standard Library simply provides an object-based interface to all of the normal, gritty, low-level operating system calls that are normally used to accomplish networking tasks on POSIX-compliant operating systems. The calls even have the same names as the underlying operations they wrap. Python’s willingness to expose the traditional system calls that everyone already understood before it came on the scene is one of the reasons that Python came as such a breath of fresh air to those of us toiling in lower-level languages in the early 1990s. Finally, a higher-level language had arrived that let us make low-level operating system calls when we needed them, without insisting that we use an awkward, underpowered but ostensibly prettier language-specific API instead. It was much easier to remember a single set of calls that worked in both C and Python.

    The underlying system calls for networking, on both Windows and POSIX systems (like Linux and Mac OS X), center around the idea of a communications endpoint called a socket. The operating system uses integers to identify sockets, but Python instead returns a more convenient socket.socket object to your Python code. It remembers the integer internally (you can call its method to peek at it) and uses it automatically every time you call one of its methods to request that a system call be run on the socket.

    Note

    On POSIX systems, the fileno() integer that identifies a socket is also a file descriptor drawn from the pool of integers representing open files. You might run across code that, assuming a POSIX environment, fetches this integer and then uses it to perform non-networking calls like os.read() and os.write() on the file descriptor to do filelike things with what is actually a network communications endpoint. However, because the code in this book is designed to work on Windows as well, you will perform only true socket operations on your sockets.

    What do sockets look like in operation? Take a look at Listing 2-1, which shows a simple UDP server and client. You can see already that it makes only one Python Standard Library call, to the function socket.socket(), and that all of the other calls are to the methods of the socket object it returns.

    Listing 2-1. UDP Server and Client on the Loopback Interface

    #!/usr/bin/env python3

    # Foundations of Python Network Programming, Third Edition

    # https://github.com/brandon-rhodes/fopnp/blob/m/py3/chapter02/udp_local.py

    # UDP client and server on localhost

    import argparse, socket

    from datetime import datetime

    MAX_BYTES = 65535

    def server(port):

    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

    sock.bind(('127.0.0.1', port))

    print('Listening at {}'.format(sock.getsockname()))

    while True:

    data, address = sock.recvfrom(MAX_BYTES)

    text = data.decode('ascii')

    print('The client at {} says {!r}'.format(address, text))

    text = 'Your data was {} bytes long'.format(len(data))

    data = text.encode('ascii')

    sock.sendto(data, address)

    def client(port):

    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

    text = 'The time is {}'.format(datetime.now())

    data = text.encode('ascii')

    sock.sendto(data, ('127.0.0.1', port))

    print('The OS assigned me the address {}'.format(sock.getsockname()))

    data, address = sock.recvfrom(MAX_BYTES)  # Danger!

    text = data.decode('ascii')

    print('The server {} replied {!r}'.format(address, text))

    if __name__ == '__main__':

    choices = {'client': client, 'server': server}

    parser = argparse.ArgumentParser(description='Send and receive UDP locally')

    parser.add_argument('role', choices=choices, help='which role to play')

    parser.add_argument('-p', metavar='PORT', type=int, default=1060,

    help='UDP port (default 1060)')

    args = parser.parse_args()

    function = choices[args.role]

    function(args.p)

    You should be able to run this script right on your own computer, even if you are not currently in the range of a network, because both the server and the client use only the localhost IP address, which should be available whether you are connected to a real network or not. Try starting the server first.

    $ python udp_local.py server

    Listening at ('127.0.0.1', 1060)

    After printing this line of output, the server waits for an incoming message.

    In the source code, you can see that it took three steps for the server to get up and running.

    It first created a plain socket with the socket() call. This new socket is not yet bound to an IP address or port number, is not yet connected to anything, and will raise an exception if you attempt to use it to communicate. However, the socket is, at least, marked as being of a particular type: its family is AF_INET, the Internet family of protocols, and it is of the SOCK_DGRAM datagram type, which means it will use UDP on an IP network. Note that the term datagram (and not packet) is the official term for an application-level block of transmitted data because the operating system networking stack does not guarantee that a single packet on the wire will actually represent a single datagram. (See the following section, where I do insist on a one-to-one correspondence between datagrams and packets so that you can measure the maximum transmission unit [MTU].)

    Next, this simple server uses the command to request a UDP network address, which you can see is a simple Python tuple combining a str IP address (a hostname, you will see later, is also acceptable) and an int UDP port number. This step could fail with an exception if another program is already using that UDP port and the server script cannot obtain it. Try running another copy of the server—you will see that it complains as follows:

    $ python udp_local.py server

    Traceback (most recent call last):

    ...

    OSError: [Errno 98] Address already in use

    Of course, there is a small chance that you received this exception the first time you ran the server because UDP port 1060 is already in use on your machine. It happens that I found myself in a bit of a bind when choosing the port number for this first example. It had to be above 1023, of course, or you could not have run the script without being a system administrator—and, while I do like my little example scripts, I really do not want to encourage anyone to run them as the system administrator! I could have let the operating system choose the port number (as I did for the client, as you will see in a moment), had the server print it out, and then made you type it into the client as one of its command-line arguments. However, then I would not have gotten to show you the syntax for asking for a particular port number yourself. Finally, I considered using a port from the high-numbered ephemeral range previously described, but those are precisely the ports that might randomly already be in use by some other application on your machine, such as your web browser or SSH client.

    So, my only option seemed to be a port from the reserved-but-not-well-known range above 1023. I glanced over the list and made the gamble that you, gentle reader, are not running SAP BusinessObjects Polestar on the laptop or desktop or server where you are running my Python scripts. If you are, then try giving the server a –p option to select a different port number.

    Note that the Python program can always use a socket’s method to retrieve a tuple that contains the current IP address and port to which the socket is bound.

    Once the socket has been bound successfully, the server is ready to start receiving requests! It enters a loop and repeatedly runs recvfrom(), telling the routine that it will happily receive messages up to a maximum length of 65,535 bytes—a value that happens to be the greatest length that a UDP datagram can possibly have, so that you will always be shown the full content of each datagram. Until you send a message with a client, your recvfrom() call will wait forever.

    Once a datagram arrives, recvfrom() will return the address of the client that has sent you a datagram as well as the datagram’s contents as bytes. Using Python’s ability to translate bytes directly to strings, you print the message to the console and then return a reply datagram to the client.

    So, let’s start up our client and examine the result. The client code is also shown in Listing 2-1.

    (I hope, by the way, that it is not confusing that this example—like some of the others in the book—combines the server and client code into a single listing, selected by command-line arguments. I often prefer this style since it keeps server and client logic close to each other on the page, and it makes it easier to see which snippets of server code go with which snippets of client code.)

    While the server is still running, open another command window on your system, and try running the client twice in a row like this:

    $ python udp_local.py client

    The OS assigned me the address ('0.0.0.0', 46056)

    The server ('127.0.0.1', 1060) replied 'Your data was 46 bytes long'

    $ python udp_local.py client

    The OS assigned me the address ('0.0.0.0', 39288)

    The server ('127.0.0.1', 1060) replied 'Your data was 46 bytes long'

    Over in the server’s command window, you should see it reporting each connection that it serves.

    The client at ('127.0.0.1', 46056) says 'The time is 2014-06-05 10:34:53.448338'

    The client at ('127.0.0.1', 39288) says 'The time is 2014-06-05 10:34:54.065836'

    Although the client code is slightly simpler than that of the server—there are only three lines of networking code—it introduces two new concepts.

    The client call to sendto() provides both a message and a destination address. This simple call is all that is necessary to send a datagram winging its way toward the server! But, of course, you need an IP address and port number, on the client end, if you are going to be communicating. So, the operating system assigns one automatically, as you can see from the output of the call to getsockname(). As promised, the client port numbers are each from the IANA range for ephemeral port numbers. (At least they are here, on my laptop, under Linux; under a different operating system, you might get a different result.)

    When you are done with the server, you can kill it by pressing Ctrl+C in the terminal where it is running.

    Promiscuous Clients and Unwelcome Replies

    The client program in Listing 2-1 is actually dangerous! If you review its source code, you will see that although recvfrom() returns the address of the incoming datagram, the code never checks the source address of the datagram it receives to verify that it is actually a reply from the server.

    You can see this problem by delaying the server’s reply and seeing whether someone else can send a response that this naïve client will trust. On a less capable operating system such as Windows, you will probably have to add a long time.sleep() call in between the server’s receive and send to simulate a server that takes a long time to answer. On Mac OS X and Linux, however, you can much more simply suspend the server with Ctrl+Z once it has set up its socket to simulate a server that takes a long time to reply.

    So, start up a fresh server but then suspend it using Ctrl+Z.

    $ python udp_local.py server

    Listening at ('127.0.0.1', 1060)

    ^Z

    [1]  + 9370 suspended  python udp_local.py server

    $

    If you now run the client, it will send its datagram and then hang, waiting to receive a reply.

    $ python udp_local.py client

    The OS assigned me the address ('0.0.0.0', 39692)

    Assume that you are now an attacker who wants to forge a response from the server by jumping in and sending your datagram before the server has a chance to send its own reply. Since the client has told the operating system that it is willing to receive any datagram whatsoever and is doing no sanity checks against the result, it should trust that your fake reply in fact originated at the server. You can send such a packet using a quick session at the Python prompt.

    $ python3

    Python 3.4.0 (default, Apr 11 2014, 13:05:18)

    [GCC 4.8.2] on linux

    Type help, copyright, credits or license for more information.

    >>> import socket

    >>> sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

    >>> sock.sendto('FAKE'.encode('ascii'), ('127.0.0.1', 39692))

    4

    The client will immediately exit and happily interpret this third-party reply as being the response for which it was waiting.

    The server ('127.0.0.1', 37821) replied 'FAKE'

    You can kill the server now by typing fg to unfreeze it and let it keep running (it will now see the client packet that has been queued and waiting for it and will send its reply to the now-closed client socket). Press Ctrl+C as usual to kill it.

    Note that the client is vulnerable to anyone who can address a UDP packet to it. This is not an instance where a man-in-the-middle attacker has control of the network and can forge packets from false addresses, a situation that can be protected against only by using encryption (see Chapter 6). Rather, an unprivileged sender operating completely within the rules and sending a packet with a legitimate return address nevertheless has its data accepted.

    A listening network client that will accept or record every single packet that it sees, without regard for whether the packet is correctly addressed, is known technically as a promiscuous client. Sometimes we write these deliberately, as when we are doing network monitoring and want to see all of the packets arriving at an interface. In this case, however, promiscuity is a problem.

    Only good, well-written encryption should really convince your code that it has talked to the right server. Short of that, there are two quick checks you can do. First, design or use protocols that include a unique identifier or request ID in the request that gets repeated in the reply. If the reply contains the ID you are looking for, then—so long as the range of IDs is large enough that someone could not simply be quickly flooding you with thousands or millions of packets containing every possible ID—someone who saw your request must at least have composed it. Second, either check the address of the reply packet against the address that you sent it to (remember that tuples in Python can simply be == compared) or use connect() to forbid other addresses from sending you packets. See the following sections Connecting UDP Sockets and Request IDs for more details.

    Unreliability, Backoff, Blocking, and Timeouts

    Because the client and server in the previous sections were both running on the same machine and talking through its loopback interface—which is not a physical network card that could experience a signaling glitch—there was no real way that packets could get lost, and so you did not actually see any of the inconvenience of UDP in Listing 2-1. How does code become more complicated when packets can really be lost?

    Take a look at Listing 2-2. Instead of always answering client requests, this server randomly chooses to answer only half of the requests coming in from clients, which will let you see how to build reliability into your client code without waiting what might be hours for a real dropped packet to occur on your network!

    Listing 2-2. UDP Server and Client on Different Machines

    #!/usr/bin/env python3

    # Foundations of Python Network Programming, Third

    Enjoying the preview?
    Page 1 of 1