Вы находитесь на странице: 1из 71

Hardware Implementation of Video Streaming

By Jorgen Peddersen School of Information Technology and Electrical Engineering, The University of Queensland.

Submitted for the Degree of Bachelor of Engineering (Honours) in the Computer Systems Engineering Stream
October 2001

62 Macalister Street Carina Heights, Q 4152 Tel. (07) 3398 8424 October 19, 2001 The Head School of Information Technology and Electrical Engineering The University of Queensland St Lucia, Q 4072 Dear Professor Kaplan, In accordance with the requirements of the degree of Bachelor of Engineering (Honours) in the Computer Systems Engineering stream, I present the following thesis entitled Hardware Implementation of Video Streaming. This work was performed under the supervision of Dr. Peter Sutton. I declare that the work submitted in this thesis is my own, except as acknowledged in the text and endnotes, and has not been previously submitted for a degree at the University of Queensland or any other institution. Yours sincerely,

Jorgen Peddersen

iii

Abstract

by Jorgen Peddersen

Abstract
This thesis describes a pure hardware implementation of simple real-time video streaming using an FPGA (Field Programmable Gate Array). Video streaming is presently performed using mainly software-based techniques on dedicated computers, as designing pure hardware solutions can be slower and harder to debug. The advantage of hardware designs is in cost, as one chip could be mass produced to perform simple video streaming tasks and used in areas such as security video cameras and other live video feeds. The implementation discussed herein uses the XSV-300 FPGA board designed by XESS Corporation to implement a real-time video streaming system. The board provides a simple video decoding chip, a network interface chip and a Xilinx XCV-300 FPGA. The FPGA is configured with code designed in VHDL that handles control of the chips involved to implement a sturdy video streaming design. that can be transmitted over the network to a destination host. The final result is a complete streaming design that does not require a PC. This design has been fully tested and performs well. Possible sources that can be streamed are TV, DVD and game consoles. At its present stage, the image quality and the network bandwidth required for the design are not a match for software-based techniques, although with some future work, the design could match these more expensive solutions in quality and speed. The resulting implementation allows streaming of any RCA or S-Video data source into UDP packets

iv

Acknowledgments

by Jorgen Peddersen

Acknowledgments
This thesis was the product of many hours of work. Long hour implementations that dont end up working can be frustrating as well as interesting to debug. The final design could not be completed without the help of many people, the author therefore wishes to thank: Dr. Peter Sutton for his guidance and patience during many 5-minute meetings. Mum and Dad for being so supportive and understanding. Ashley Partis for proofreading and for co-writing the original VHDL IP stack. James Brennan for writing the RAM code and helping with formatting. Dave Vanden Bout for being a technical support genius who can actually solve problems. Alex Song for some brilliant inspiration. Simon Leung for proofreading the thesis. And last, but not least, Sri Parameswaran who inspired me to choose Computer Systems Engineering.

Contents

by Jorgen Peddersen

Contents
Abstract ....................................................................................................... iv Acknowledgments........................................................................................ v Contents....................................................................................................... vi List of Tables.............................................................................................viii List of Figures ............................................................................................. ix CHAPTER 1 INTRODUCTION ..................................................................... 1
1.1 1.2 1.3 1.4 Introduction to Video Streaming.......................................................................1 The Problem ...................................................................................................... 2 FPGA Solution ..................................................................................................2 Overview ........................................................................................................... 3

CHAPTER 2 REVIEW OF PREVIOUS WORK .............................................. 5


2.1 Previous Work with Board................................................................................ 5 2.1.1 Video Decoder...............................................................................................5 2.1.2 Network Stack................................................................................................5 2.2 Video Streaming Formats.................................................................................. 6 2.3 Other work.........................................................................................................7 2.3.1 Xilinx/MidStream Server ...............................................................................7 2.3.2 Axis Web Cameras and Servers ....................................................................8 2.3.3 JPEG on FPGA .............................................................................................9 2.3.4 Ethernet Intellectual Properties ....................................................................9 2.4 Summary ...........................................................................................................9

CHAPTER 3 PROBLEM DEFINITION........................................................ 11


3.1 General Problem.............................................................................................. 11 3.2 Video quality ...................................................................................................11 3.3 Network Issues ................................................................................................ 12 3.3.1 UDP or TCP?..............................................................................................12 3.3.2 Packet Format .............................................................................................13 3.4 PC Program .....................................................................................................14 3.5 Summary .........................................................................................................14

CHAPTER 4 HARDWARE ENVIRONMENT ............................................... 15


4.1 Description of Board .......................................................................................15 4.1.1 FPGA...........................................................................................................15 4.1.2 CPLD...........................................................................................................16 4.1.3 Video Decoder Chip ....................................................................................16 4.1.4 Ethernet Port ...............................................................................................17 4.1.5 SRAM...........................................................................................................17 4.1.6 Flash RAM...................................................................................................17 4.2 VHDL / Foundation ........................................................................................18 4.2.1 VHDL ..........................................................................................................18 4.2.2 The Foundation Series ................................................................................18 4.3 Summary .........................................................................................................19

CHAPTER 5 VHDL IMPLEMENTATION ................................................. 20


5.1 vi Video Decoding...............................................................................................20

Contents

by Jorgen Peddersen

5.1.1 Initialisation................................................................................................ 20 5.1.2 RAM Format ............................................................................................... 21 5.2 Networking ..................................................................................................... 21 5.2.1 Removal of IP Re-assembly ........................................................................ 21 5.2.2 Fixing Ethernet ........................................................................................... 21 5.2.3 ICMP........................................................................................................... 22 5.2.4 RAM Arbitration ......................................................................................... 22 5.2.5 PC SRAM Viewer........................................................................................ 23 5.3 Image Format .................................................................................................. 23 5.4 Video to Network Interface ............................................................................ 23 5.4.1 Video-In to UDP Packet Converter ............................................................ 24 5.4.2 UDP Connection Handler .......................................................................... 25 5.5 Complete FPGA Design ................................................................................. 26 5.6 CPLD Alteration ............................................................................................. 27 5.7 Summary......................................................................................................... 28

CHAPTER 6 PC IMPLEMENTATION........................................................ 29
6.1 Programming I.D.E......................................................................................... 29 6.2 OpenPTC ........................................................................................................ 29 6.3 Winsock Sockets............................................................................................. 30 6.3.1 Microsoft Foundation Classes .................................................................... 31 6.3.2 Blocking Sockets ......................................................................................... 31 6.3.3 Non-blocking Sockets.................................................................................. 31 6.4 Protocol Definition ......................................................................................... 31 6.5 Graphical User Interface ................................................................................. 32 6.6 Summary......................................................................................................... 33

CHAPTER 7 DESIGN EVALUATION ......................................................... 34


7.1 Streaming Results ........................................................................................... 34 7.1.1 Image Quality ............................................................................................. 34 7.1.2 Network Issues ............................................................................................ 34 7.2 Comparisons ................................................................................................... 35 7.3 Process Evaluation.......................................................................................... 36 7.4 Summary......................................................................................................... 36

CHAPTER 8 FUTURE DEVELOPMENTS ................................................... 38


8.1 Image Format Changes ................................................................................... 38 8.2 100Mb/s Upgrade ........................................................................................... 38 8.2.1 16-bit RAM Functionality ........................................................................... 39 8.2.2 CRC Alteration ........................................................................................... 39 8.3 Fragment the UDP Packet............................................................................... 40 8.4 Image Compression ........................................................................................ 40 8.5 Audio streaming.............................................................................................. 41 8.6 Summary......................................................................................................... 41

CHAPTER 9 CONCLUSION ...................................................................... 42 References .................................................................................................. 43 APPENDIX A IMPLEMENTATION DATA.................................................. A-1 APPENDIX B PARTIAL VHDL SOURCE CODE ...................................... B-1 APPENDIX C PARTIAL PC SOURCE CODE ............................................ C-1
vii

List of Tables

by Jorgen Peddersen

List of Tables
Table 1: TCP header........................................................................................................13 Table 2: UDP header .......................................................................................................13 Table 3: Desired expectations and limitations ................................................................14 Table 4: Ram memory map.............................................................................................17 Table 5: Some alternate image formats...........................................................................38

viii

List of Figures

by Jorgen Peddersen

List of Figures
Figure 1: MidStream's streaming server ........................................................................... 7 Figure 2: Axis 2400 Video Server .................................................................................... 8 Figure 3: Axis 2100 Web Camera .................................................................................... 8 Figure 4: XSV-300 board and block diagram................................................................. 15 Figure 5: Example of image quality................................................................................ 24 Figure 6: Block diagram of final design ......................................................................... 27 Figure 7: OpenPTC demonstrations ............................................................................... 30 Figure 8: GUI for PC program........................................................................................ 32

ix

Chapter 1 Introduction

by Jorgen Peddersen

Chapter 1
1.1

Introduction

Introduction to Video Streaming

Video streaming has become one of the most popular uses for the Internet in recent times. Many Internet sites are dedicated to providing this service and it is used for many different applications. The TV program Big Brother demonstrates how popular video streaming has become, allowing Internet users to watch people in an enclosed house 24 hours a day, 7 days a week. The demand for this type of media is growing and new technologies must be developed to match this demand. Initially, video media was played by downloading a movie file and displaying it on the users computer after the download had completed. This method is very slow, and the user had to wait for long periods while the video downloaded before they could play it. The video streaming phenomena began when real-time streaming was introduced. By using a free commercial product such as RealPlayer, streaming could now be accomplished in real-time. New compression technology allowed the video to be viewed as it is downloaded to its destination. With this method there are only delays while waiting for the first few frames to be transmitted, then further frames are displayed as they are transmitted, as if the video were playing in real-time. Unfortunately, the speed of downloading comes at the cost of losing some image quality. Real-time video streaming technology was taken further to introduce live video streaming. Live streaming is the real-time streaming of live images from an input source. It allows a video source such as a webcam or TV to be displayed as a real-time stream on the destination computer. Many uses for live video streaming have been developed, including applications like video conferencing. Real-time and live streaming require massive amounts of resources to operate at high resolution, but anyone can do it at home with the right hardware and software. Even though this is the case, there is much more that can be done in the field. As technology advances, so will the quality and availability of streaming media, e.g. TV channels in the future may be accessed through the Internet and this method could be used for video phones and other similar applications.

Chapter 1 Introduction

by Jorgen Peddersen

1.2

The Problem

Currently, almost all streaming applications are performed by computers with complicated, expensive hardware requirements. There are many computers around the world dedicated purely to video and audio streaming. Streaming of live video also requires very fast machines to compress and transmit the massive amounts of data involved. The problem addressed in this thesis was to design a hardware system capable of streaming live video data through a standard Ethernet network. This area is a relatively new direction for video streaming, and has only been performed partially before. The final design is not expected to fully match current software techniques, but to demonstrate that streaming can be accomplished in hardware. There are many advantages to implementing a purely hardware system for video streaming. Cost and size can be reduced significantly, while improving the quality and speed of transmission of the video stream itself. Unfortunately, hardware methods take longer to design and are much more difficult to debug. The ideal product is a system that can be connected to a network port and a video source (such as a TV, VCR, DVD or game console), and allow viewing of the sources output from anywhere on the network. The device should not require any computer at the video source to run, making it a purely hardware solution. Possible applications are letting someone view a video tape externally, watching TV on your computer, or remote viewing of a security camera among others.

1.3

FPGA Solution

One solution to the problems of debugging and testing of hardware is the FPGA1. These are hardware chips that are reconfigurable, allowing many different functions to be performed with one device. These are useful for development purposes as partial products can be tested in hardware to aid simulation as the complexity increases.

Field Programmable Gate Array

Chapter 1 Introduction

by Jorgen Peddersen

The FPGA is a good choice for a live video streaming design as many designs can be implemented on one piece of hardware, eliminating further hardware costs. FPGAs have been around for some time, but they have only recently become an adequate size2 to build anything complicated. As the size increases, new uses for them are being found, from neural networks to advanced digital signal processing. This thesis discusses one implementation of FPGAs to produce a live video streaming solution in pure hardware. For this purpose, the XSV-300 FPGA development board from XESS Corporation [1] was provided. This board includes on-board hardware that will make video streaming possible without requiring any external hardware to be interfaced. This type of solution to the live streaming problem has not been previously performed, so it is a very useful application for the FPGA and board.

1.4

Overview

The remainder of this thesis explores some earlier work performed in the field of video streaming and FPGAs, and discusses a complete, working solution to the general problem presented in this chapter. The design and implementation of each of the areas required to achieve a working video streaming design are each described separately and comparisons to existing solutions are made. Chapter 2 discusses previous work in the field of video streaming and network applications in both software and hardware. The state of current hardware implementations of video streaming are also assessed in this chapter. Chapter 3 defines a general solution to the problem discussed in section 1.2 using the method described in section 1.3. The problem is split into its major tasks and each tasks desired specifications for the final design are discussed. Based on the specifications determined above, Chapters 4, 5 and 6 describe the implementation of each of the major tasks in detail. Chapter 4 explains the programming environment used, including the board and tools. Chapter 5 discusses the

Number of equivalent gates

Chapter 1 Introduction

by Jorgen Peddersen

implementation of the live streaming design in VHDL. Chapter 6 gives an overview of the decisions made for the PC program that will display the resulting stream. Chapter 7 evaluates the final design in areas such as performance, quality of streaming and stability. The design described in Chapters 4, 5 & 6 is evaluated and compared to other systems in use. Chapter 8 discusses future work that could be performed to make the implementation produced more complete. Improvements that could be made are defined with the method that needs to be taken to achieve those improvements. Finally, Chapter 9 presents a conclusion to the thesis, summing up the major points and discussing the results.

Chapter 2 Review of Previous Work

by Jorgen Peddersen

Chapter 2

Review of Previous Work

This chapter describes some of the work done in the fields of video streaming and network architectures on FPGAs. Advantages and disadvantages are discussed, as well as their relevance to the thesis.

2.1

Previous Work with Board

Some VHDL implementations for components of the XSV-300 board mentioned in section 1.3 were designed by the author and two other students before the commencement of the thesis. These designs included a program to create digital images from the standard video cables used for TV, DVD etc. and a network stack design that includes the Ethernet, ARP3, IP4, ICMP5 and partial UDP6 layers of the TCP7/IP protocol suite. These designs are documented on the supervisors web page [2]. A brief description of these designs follows.

2.1.1

Video Decoder

The video decoder project utilises the video decoder chip on the XSV-300 board to convert images from RCA or S-Video format into a digital format. These formats are how TV signals are typically transmitted over short cables. This project stores images into on-board SRAM, which are then read by a design that displays the images on a VGA monitor. The project can be used to convert the images into a format valid for network transmission with some minor editing.

2.1.2

Network Stack

The network stack design contains a partial implementation of the TCP/IP protocol suite on the board. A network stack consists of multiple protocols that exist in theoretical layers. Each layer provides services to the layers above, and utilises the layers below for transmission purposes. The protocols that are implemented in this design include Ethernet, ARP, IP, partial ICMP (request/reply support only) and a UDP receive
3 4

Address Resolution Protocol Internet Protocol 5 Internet Control Message Protocol 6 User Datagram Protocol 7 Transport Control Protocol

Chapter 2 Review of Previous Work

by Jorgen Peddersen

application. This allows the board to be pinged from any computer on the network. It is also possible to add further transport layers (such as UDP transmit and TCP) to the design if required. This project also included a PC SRAM8 viewer for troubleshooting. At any time, the PC can take a snapshot of the entire contents of RAM, downloading it into a file on the PC. This file can be viewed in a hex editor for troubleshooting purposes. This feature is an aid for designing new protocols to add to the stack as the data can be checked to make sure that it is stored correctly.

2.2

Video Streaming Formats

Video streaming can occur in many different formats. Many of these formats employ some sort of compression algorithm to lower the amount of data that is transmitted by the stream, while not affecting the quality of the stream. Most commercial streaming software (e.g. RealPlayer, QuickTime, Media Player etc.) use MPEG9 or MotionJPEG10 as the streaming formats. MPEG is a complicated format that is very hard to encode in real-time. The reason for this is that it uses future frames as part of the encoding scheme. MPEG can be used for real-time playback, with the information decoded for future frames being stored for later use. Motion-JPEG is a different type of streaming. This involves transmitting complete still images encoded separately and sent to be displayed one by one. In the Motion-JPEG scheme, images are encoded using the JPEG algorithm. Many applications use this type of scheme as it is easier to encode and decode each image. Also, if an image is lost, it wont affect a large number of frames in the stream, whereas methods like MPEG may. Many other streaming types are also possible that may be simpler or have better compression, with a lower quality image. The simplest form is to send an image without any compression taking place which is termed as raw formatting.

8 9

Static Random Access Memory Motion Picture Experts Group 10 Joint Photographic Experts Group

Chapter 2 Review of Previous Work

by Jorgen Peddersen

2.3

Other work

This section describes other work being undertaken in the fields of video streaming and network interfaces on FPGAs.

2.3.1

Xilinx/MidStream Server

On October 1st 2001 Xilinx announced that MidStream Technologies used Xilinx Virtex-II Platform FPGAs to develop the worlds first true dedicated streaming server that redefines performance in scalability, reliability and manageability while dramatically lowering the cost of building and maintaining IP-based media delivery infrastructures. The server is the first multi-gigabit streaming server capable of serving all popular formats (Windows Media, Real Networks, Quick Time and MPEG-2) at multiple bit rates simultaneously [3].

Figure 1: MidStream's streaming server11 This server, pictured in Figure 1, was released three weeks before the completion of the thesis. It demonstrates how hardware-based methods can match and greatly surpass software-based designs. The MidStream server allows a high number of connections for data transfer, and is much faster than its software counterparts. This device streams purely from hardware, and does not require a computer to operate. Although the current device does not support live video streaming from RCA or S-Video inputs, it is likely

11

Figure copied from MidStreams home page[4]

Chapter 2 Review of Previous Work

by Jorgen Peddersen

that a future product will. Unfortunately, no white papers had been produced at the time of writing of this thesis, but product information can be found at [4].

2.3.2

Axis Web Cameras and Servers


Embedded systems involve using a small microprocessor

Another approach to hardware-like systems is the use of embedded systems to provide web server capabilities. inside a product, running software to control the hardware. This method is employed in a group of products manufactured by Axis Communications [5]. Axis provides two types of product that provide video streaming: video servers and web cameras. The servers can convert live video data into information on the network with high quality images and refresh rates of up to 30 frames/second. One of these servers, the 2400, is pictured in Figure 2. It can support up to four inputs of video and can be connected via modem or Ethernet network.

Figure 2: Axis 2400 Video Server The web cameras are a video camera with a web server on board. These systems are usually not as powerful as the servers, typically outputting 10 frames/second. The 2100 camera pictured in Figure 3 is an example of these. Some of the newer cameras can match the refresh rates of the servers, such as the 2120.

Figure 3: Axis 2100 Web Camera 8

Chapter 2 Review of Previous Work

by Jorgen Peddersen

The only problem with these designs comes with the cost. Placing an entire web server inside the device requires very complicated code and hardware. The servers are the most expensive with the Axis 2400 costing $A4305.0012. The Axis 2100 camera is the cheapest product at $A1362, but the high quality Axis 2120 costs $A3212.

2.3.3

JPEG on FPGA
This performs high level

Several Motion-JPEG compression designs have been produced in FPGAs such as the Motion-JPEG CODEC13 from 4i2i Communications [7]. Motion-JPEG encoding and decoding. Unfortunately, these designs fill an XCV-600 FPGA, so there is no chance of fitting it into an XCV-300 FPGA, especially taking into account the size of the rest of the design that would need to be included.

2.3.4

Ethernet Intellectual Properties

As an alternative to the Ethernet network interface mentioned in section 2.1.2, it is possible to use one of the 10/100Mbit Intellectual Property cores that are available. These cores can operate at 10Mbps and/or 100Mbps, so would provide a faster functionality than the other interface. One of these is the Paxonet CS-1100 Fast 10/100 Ethernet Media Access Controller. This design would take about of the FPGAs available space so it is not very large. Unfortunately this core is not free and requires a fee to be used in any design. An evaluation version of this core was requested, but was never provided. A similar core to implement 10/100Mbit Ethernet can be found through OpenCores [8]. This design is an almost complete design written in Verilog. This code does not seem as compact as the Paxonet design, but it is free to be used.

2.4

Summary

The concept of designing a technique to stream video in hardware is a new one that has just begun to be explored. The MidStream server claims to be the first hardware

12 13

All costs were provided by Webcam Solutions [6] COder DECoder

Chapter 2 Review of Previous Work

by Jorgen Peddersen

streaming server on the market, and it was released at about the same time as this thesis. This shows that hardware streaming is the way of the future The components involved in video streaming have been implemented in hardware, but at the present time, designs only fit into the larger FPGAs. As FPGAs evolve, their ability to perform digital signal processing will improve to a point where it is feasable to perform high compression on live video for streaming, but for now, simplifications must be made which will unfortunately impact on the quality.

10

Chapter 3 Problem Definition

by Jorgen Peddersen

Chapter 3

Problem Definition

This chapter defines the problem that was chosen for this thesis. The required task is explained and separated into its major components. The problems inherent in each component are then described and solutions are proposed. This chapter does not demonstrate how the problem was solved, but rather how it could be solved. Each of the major components implementations are discussed in future chapters.

3.1

General Problem

The solution described in this thesis does not provide a complete streaming solution that would be commercially viable. Instead, the solution that it attempts to provide is a demonstration that live video streaming is possible at a reasonable quality using hardware. Producing real-time streams from a live source is still slow in software, due to the large computation power required in compressing the data. Most video streaming technology can produce real-time decoding, but real-time encoding is a much more difficult task, and one that has not previously been performed completely in hardware. Therefore, the task is to create a complete live streaming board that may not have the same quality as a computer-based stream, but has the potential to show that with future work it could match these software-based implementations. The development boards features will aid the design to a great extent. It contains both a video decoding chip and an Ethernet Physical Layer encoder chip that can be accessed by a Xilinx XCV-300 FPGA. The board is further discussed in section 4.1. The environment required for programming the board is discussed in section 4.2. Other problems associated in designing the solution involve how to store images, how to transmit images over the network and how to view the stream at another computer. These issues are discussed further in the remainder of this chapter.

3.2

Video quality

Most image formats for streaming are based on MPEG or JPEG compression algorithms. Unfortunately, these algorithms take a large amount of FPGA space, and are difficult to do in real-time. RAM issues may also limit the types of compression 11

Chapter 3 Problem Definition

by Jorgen Peddersen

available. As the priority is to achieve fast live streaming, it may not be possible to read all the image data for compression from RAM within one refresh. Another factor involved in the quality of the video is the frame rate. An expectation of the design is that the refresh rate will be fast enough so that individual images are not seen and the picture appears to move seamlessly. Software-based video streaming with a fast dedicated server for the streaming can often offer this, and it would be good to match it. The human eye usually stops seeing individual frames at 24Hz14, but it can just barely detect separate frames at 12Hz.

3.3

Network Issues

As the solution must stream over an Ethernet network, a protocol for the data to be sent must be specified. The most common family used for the Internet and most LANs15 is the TCP/IP protocol suite. A stack of these protocols must be implemented to achieve a complete networking design. This stack is composed of various layers, each being composed of one, or several protocols. The topmost layer required for transmission is the transport layer, and for that there are two choices: TCP or UDP. The other layers are fixed for most networks, in this case, IP with ARP and Ethernet. Apart from the choice of protocol, the data format must also be defined.

3.3.1

UDP or TCP?

TCP or Transmission Control Protocol which is defined in RFC793 [9] is a common network protocol used when it is important that all data arrives at the destination without errors. A three-way handshake is used to establish and close the connection to avoid errors in communication. Acknowledgements are used to make sure that data arrived at the destination, and retransmissions occur if this does not happen. This is called a reliable connection. The header for TCP can be seen in Table 1. The sequence number, acknowledgement number and control bit fields are used to acknowledge packets transmitted via the connection. Other fields in the header are used for flow control. The checksum field is required and will detect errors that occur due to incorrect transmissions.
14 15

Hertz is frames per second Local Area Networks

12

Chapter 3 Problem Definition Table 1: TCP header Source Port Offset Reserved Checksum Sequence Number Acknowledgement Number Control bits Options

by Jorgen Peddersen

Destination Port Window Urgent Pointer Padding

UDP, or the User Datagram Protocol which is defined in RFC 768 [10] is usually used for applications where it doesnt matter if some or all of the data is lost. There are no acknowledgements, so if a packet is lost, it is never re-transmitted. This is commonly used in most streaming formats as one packet being lost should not affect the quality of the entire stream excessively. UDP is also easier to implement and is fast, without the retransmissions and timeouts of TCP. The header for UDP is shown in Table 2. The checksum is optional, but is calculated as a 1s complement sum of the UDP header and sections of the IP header. Table 2: UDP header Source Port Length Destination Port Checksum

Another technique that is often employed by real-time streaming systems is to use TCP to set up and control a connection between the source and destination, while UDP is used to transmit packets between the two endpoints. This method gives the benefits of no retransmissions in UDP while also allowing the source to monitor the status of the connection. In this case, it can stop the stream to avoid wasting network bandwidth.

3.3.2

Packet Format

The network packet format will provide many limitations. The maximum UDP or TCP packet size is 65535 bytes including the header. Packets output on an Ethernet network have a further restriction whereby the maximum size is only 1500 bytes. which are re-assembled at the destination, so this will also need to be used. Another limitation is the network data rate. Data can be sent on an 802.3 network at one or both of 2 speeds: 10Mb/s or 100Mb/s. Both these rates will limit how much data can be transmitted over the network, and image quality may need to be sacrificed if the 13 IP fragmentation can be used to cut the larger UDP packet size down into small fragments

Chapter 3 Problem Definition

by Jorgen Peddersen

speed used is too slow. Packet size and sending rate will need to be controlled to handle this.

3.4

PC Program

The PC program needs to receive incoming data and convert it into images that can be displayed on a standard PC. For the purpose of this thesis, this other endpoint is a Windows based PC, although programs for Unix, DOS etc. could also be written if needed. This program is by no means the focus of the thesis and should simply allow the video to be seen. This program must include options to choose the IP address of the board, and allow streaming to start and stop. In addition, it must include an image box or window where the video stream will be displayed. Hopefully this will be able to be resized without needing much code for the program. Access to network sockets will also be needed to receive data from the network card on the computer.

3.5

Summary

Collating the information in this chapter provides several limitations and expectations that the final design should be able to embody. These criteria are summarised in Table 3. Table 3: Desired expectations and limitations Criterion Implementation Size Image quality Refresh rate PC code image display rate Network data rate Packet size No. of concurrent users Expectation/Limitation Must fit on XCV-300 FPGA At least recognisable At least 12Hz (Seamless) At least 12Hz < 100Mb/s or < 10Mb/s for 10Mbit only < 65536 1

14

Chapter 4 Hardware Environment

by Jorgen Peddersen

Chapter 4

Hardware Environment

This chapter discusses the hardware environment required to solve the problem in the manner specified in section 3.1. Factors affecting the hardware environment were the FPGA board used and the method to program it.

4.1

Description of Board

As stated in section 3.1 the board and FPGA choice were already defined at the start of the thesis. The design makes use of the XSV-300 board from XESS Corporation. This board was chosen for its additional on-board features. The board contains an on-board video decoder chip and an Ethernet PHY chip that are both used in the implementation. Figure 4 shows a picture of the board and a layout diagram.

Figure 4: XSV-300 board and block diagram16 The choice of the board means that the FPGA will be the only device requiring a large amount of programming. SRAM17 included on the board is also used for the temporary storage of data that is being transmitted or received by the network stack. A brief description of how each of the components being used on the board works follows.

4.1.1

FPGA
The FPGA is configured through the parallel port via an XC95108 Multiple implementations can be

The XCV-300 FPGA [12] included on the board is a standard 300k gate Virtex FPGA from Xilinx. CPLD18 which can also be seen in Figure 4.
16 17

Images copied from XESS [11] Static Random Access Memory 18 Complex Programmable Logic Device

15

Chapter 4 Hardware Environment

by Jorgen Peddersen

programmed into the FPGA to utilise its various features including block RAM/ROM, logic and three-state circuits on-chip.

4.1.2

CPLD

CPLDs are reprogrammable logic devices like FPGAs, except for a few differences. Firstly, CPLDs are usually much smaller than FPGAs, they do not have the ability to handle very complex designs. The other main distinction is that they are typically nonvolatile meaning that once programmed, they remember their configuration, even without power. FPGAs lose their configuration every time they are turned off, and they must be reprogrammed if this occurs. The XC95108 CPLD [13] is not only used to program the FPGA via the parallel port, but also can control much of the hardware on the board such as the LEDs. The CPLD also has exclusive access to several of the network interface chip inputs and therefore must be programmed to control these inputs correctly.

4.1.3

Video Decoder Chip

The SAA7113H [14] chip is used to convert PAL, NTSC or SECAM19 data from an RCA or S-Video source into digital data. The SAA7113 provides its own clock for data transmission, but before decoding can take place, it must be programmed using an I2C bus interface. The chip is capable of 9-bit precision, but is usually used with 8-bit precision. The chip outputs data in Standard ITU 656 YUV 4:2:2 format [15]. Through extensive testing of this chip, and e-mail correspondence with the board technical support team, it was discovered that there was a problem with the crystal used by the decoder chip to lock on to colour data. Some boards had been manufactured and shipped with the wrong crystal installed. The incorrect crystal needed to be replaced to get the colour to be decoded correctly by the SAA7113H. Once this component was received from XESS, the decoding started working perfectly.

19

PAL is the television standard used in Australia and Europe. NTSC and SECAM are used in the US and Japan respectively.

16

Chapter 4 Hardware Environment

by Jorgen Peddersen

4.1.4

Ethernet Port

The Ethernet port interface is controlled by the LXT970A Dual-Speed Fast Ethernet Transceiver [16]. This chip can communicate at either 10Mbps or 100Mbps using the MII20. The inputs that control the functionality of the chip (data rate, duplex mode etc.) are connected from the CPLD. The FPGA controls the data lines for receiving and transmitting data. Therefore, both the CPLD and FPGA have to be programmed to use this device fully. The chip contains CSMA/CD21 error checking as well as plain full duplex operation. By connecting to an RJ45 socket on the board, full network functionality can be provided.

4.1.5

SRAM

The SRAM [17] contained on the XSV board is composed of 44Mbit SRAMs organised as a 2 512K16-bit banks of SRAM. This RAM is used in the design for storing packets used by each layer of the network stack. Only one bank needs to be used for the storage of data packets due to the size of each bank of RAM, so there is a free bank that other applications on the board could use if required. A memory map for how the RAM is allocated for the final design is in Table 4. Table 4: Ram memory map Memory Range (hex) Memory Usage 00000 007FF ARP Send buffer holds ARP packets to be sent (1500 bytes) 00800 00FFF IP Sending Buffer folds IP frames to be sent (1500 bytes) 01000 0FFFF Free 10000 107FF IP Receive buffer (1500 bytes) 10800 2FFFF Free 30000 3FFFF UDP transmit buffer holds outgoing images (64 Kbytes) 40000 4FFFF ICMP Reply buffer holds outgoing echo replies (64 Kbytes) 50000 7FFFF Free

4.1.6

Flash RAM

The board includes a 16MB Flash RAM [18] chip which can be used as a non-volatile storage for bitstreams. The CPLD can be programmed to download the bitstream in two ways, either wait for a configuration to arrive from the parallel port and download that, or download the configuration stored in Flash RAM whenever the board is turned on. It
20 21

Media Independent Interface Carrier Sense Multiple Access/ Collision Detect

17

Chapter 4 Hardware Environment

by Jorgen Peddersen

is easy to program the Flash RAM and this method is required if the board must be used away from a computer. As the solutions main purpose is to eliminate the need for a computer, this is required.

4.2

VHDL / Foundation

To program the internal configuration of the FPGA, a bitstream must be produced. A bitstream determines exactly how each flip-flop and interconnect must be configured to achieve the implementation that is required. To produce it, advanced software tools are required to theoretically compile code into the format for the bitstream. These tools come in the form of VHDL and the Xilinx Foundation Series. VHDL or VHSIC22 Hardware Description Language is the language that the design is written in, while the Foundation Series can compile code into a bitstream. These two tools are described further below.

4.2.1

VHDL

VHDL allows the programmer to describe the type of hardware to be created inside the FPGA. Common items are state machines, counters, shift registers, three-state buffers etc. The language is based on the Ada syntax and allows very complicated hardware designs to be created with relative ease. All the code for the FPGA and CPLD is written in VHDL.

4.2.2
FPGAs.

The Foundation Series


The most important features that it provides are synthesis and

The Foundation Series contains many tools to create and analyse designs for Xilinx implementation. These two tasks allow the creation of a bitstream that can then be programmed into an FPGA or CPLD. Synthesis involves reading the VHDL code and determining what sort of hardware, logic and memory is involved, then outputting that in a form that is ready to be placed in the hardware. Flip-flops and registers are identified and logic is simplified and defined. The output can be thought of as being a schematic for the internals of the chip.

22

Very High Speed Integrated Circuit

18

Chapter 4 Hardware Environment

by Jorgen Peddersen

Implementation takes the output from the synthesis stage and maps it to the specific hardware being used. This stage routes the design inside the FPGA, configuring each of the logic blocks inside to create the required design. The output of implementation is the bitstream file which contains the configuration of every logic block within the FPGA. Most logic blocks contain some logic and 2 flip-flops, so the required design is mapped to achieve minimum distance between logic blocks and to eliminate clock skew. This stage takes a very long time to complete due to the massive amounts of computation that routing algorithms require. Foundation also includes additional tools for analysing implementations after they have been created. They include a tool to analyse the timing of the critical paths in the design, tools to illustrate the relative placement of various parts of the design and tools to program devices through a JTAG23 connector, among others.

4.3

Summary

This chapter has explained the hardware that is used and the software that is used to design and program that hardware. The XSV-300 board is perfectly suited for this project, and the design will make use of the on-board video decoder, network chip and SRAM. The Foundation Series also allows simple programming of the board and some simulation capabilities. With this environment, it should be easy to design and program the live video streaming design into the FPGA on the board.

23

Joint Test Action Group

19

Chapter 5 VHDL Implementation

by Jorgen Peddersen

Chapter 5

VHDL Implementation

This chapter details the choices made for the implementation of the design in VHDL. The VHDL implementation is the main focus of the thesis, as this is how the video streaming in hardware is performed. This implementation is designed to talk to the PC program described in Chapter 6, but another design could easily be created to make another XSV board read the incoming packets and convert it to VGA data for pure hardware-to-hardware streaming.

5.1

Video Decoding

The video decoding circuit has not changed much from the Video in design mentioned in section 2.1. Three main changes have been made to the code from this project to allow it to be used in the streaming project. The first was the removal of the code which displayed the images on a VGA monitor as it was no longer needed. The other two changes are much more extensive to the functionality of the previous design to prepare it for the new implementation.

5.1.1

Initialisation

The SAA7113H video decoder chip contains many configuration registers that are initialised using the Philips I2C interface [19]. Previously, programming had been performed by using the parallel port to emulate the I2C interface via software on a PC. As this is not an option for a pure hardware implementation, the I2C interface had to be created as part of the design. Performing the initialisation as part of the design also meant that an initialisation table of 72 8-bit entries had to be added. Using logic to implement this would require many flip-flops and would waste a lot of space. Fortunately, the Virtex series of FPGAs includes block RAM which can emulate a standard RAM or ROM module. By using the Core Generator from the Foundation tools, it was possible to create a ROM which stores the initialisation data. The rest of the I2C interface requires the design of an I2C controller to control the two serial lines used in the communication. The state machine required is started when the chip is reset, programs the SAA7113 and stops as

20

Chapter 5 VHDL Implementation

by Jorgen Peddersen

programming only needs to occur once. The initial values stored in the ROM can be altered if NTSC or SECAM playback is preferred to PAL.

5.1.2

RAM Format

The video input decoding state machine accessed RAM in a way that was not very optimised, using more clock cycles than were needed. This method would not be suitable for adding to the network stack design, so the protocol was simplified. Instead of using delay signals, the signals to write RAM are now clocked in a way that does not require extra clock cycles to complete like the original version.

5.2

Networking

Many upgrades have been made to the network stack discussed in section 2.1. Each of these upgrades was completed to simplify the hardware, achieve better routing capabilities and/or clean up the code. Most of them were required for the new design, but others were simply completed to make the design more compact.

5.2.1

Removal of IP Re-assembly

The first of these changes was the removal of IP re-assembly. IP fragmentation and reassembly are required if packet sizes of more than 1500 bytes of data are being sent by the IP layer24. As the board does not require large amounts of data to be received, this part of the IP stack can be removed. Fragmentation on the sending side needs to remain, as the board may need to send large packets of data. Removing this also meant altering the ICMP ping program so that it no longer chooses which of the IP buffers to read from (there is now only one).

5.2.2

Fixing Ethernet

Another minor cosmetic change was to redesign the Ethernet sending layer. The design had a case statement that did not synthesise well, causing several warnings. Some of the logic in the case statement was removed and implemented as combinational logic, fixing the design problem. The case statement was also rearranged to produce a design that required less logic.

24

See Internet Protocol RFC 791 [20]

21

Chapter 5 VHDL Implementation

by Jorgen Peddersen

5.2.3

ICMP

The main implemented feature of the original IP stack was the partial ICMP layer. Although it is not required to achieve video streaming, it is useful to leave it in the design for testing capabilities. The ICMP layer allows the board to be pinged, a mechanism to test whether the board is working on the network. ICMP is not officially a transport layer protocol, but it can be treated as one and was previously the only layer that controlled the IP sending layer. Some minor cosmetic changes were required in the signals connecting these two layers so that the design could accept a new layer that could also send data via IP.

5.2.4

RAM Arbitration

Many of the layers in the IP stack read and write to RAM to perform their functions. Each layer also runs in parallel to the other layers in the design, so many different state machines may need to access RAM at once. Unfortunately, only one access to RAM can occur at a time, and these operations require multiple clock cycles. This means some arbitration is required to only let one device access the address and data signals for RAM at a time. Currently the stack design uses a simple multiplexing state machine to determine which set of control signals should be allowed to access RAM. Multiplexers require a lot of logic though, so some other designs were tested. One of these designs was designed to make each layer normally output high impedance and tie all the outputs for each signal on the bus together. Some arbitration logic is Implementing this used to determine which device is allowed to control the bus.

method was quite difficult and involved changing almost every design in the system to handle the new RAM format. When it was implemented however, it turned out to have worse timing constraints than the multiplexer design so this format was not used. Another arbitration mechanism that was attempted was to use a similar format to the one just mentioned, except each layer would normally output logic zeroes until it was told it could output its control values. All the outputs were connected via a logical OR. This is similar to the three-state method, but does not require three-state elements to be used. This design performed better than the three-state design, but it was slower than the original design as well, so the original design was kept.

22

Chapter 5 VHDL Implementation

by Jorgen Peddersen

5.2.5

PC SRAM Viewer

As mentioned in section 2.1.2, the network stack project includes an SRAM viewer that can be accessed through a parallel port via the PC. This design aided the debugging process for alterations made to the design, but is not required in the final version. This part of the code was therefore removed before the final implementation to simplify logic slightly.

5.3

Image Format

Due to the way the data in RAM is stored, it is very difficult to produce any sort of compression. To achieve a good frame rate, there is less than half a frame delay between packets being sent to perform full image compression. Formats like MotionJPEG are impossible to implement in this design due to the way 2-dimensional blocks are used to encode the data. As the data received moves left-to-right from the top to the bottom of the image and the data is stored sequentially in RAM, there is not enough time to load all the bytes needed from RAM for encoding before the next refresh time. The only feasible algorithm to use would be one that only compresses each horizontal line separately. This way, several of the previous bytes can be remembered until the write to RAM occurs for those bytes. Also, the algorithm should have a fixed compression ratio, as this will make it easier for the PC program to decode when it receives the packet. Unfortunately, no good, simple to implement algorithm to fit these criteria was found, so the problem was solved by sampling the image, e.g. taking every 2nd or 4th pixel of the image in the horizontal and vertical directions. This is extremely lossy, but the image should still be adequate enough to demonstrate that video streaming is occurring properly.

5.4

Video to Network Interface

The video to network interfacing program had to be written as the main part of the code. This includes two state machines that interact with the video decoding interface and the network interface to implement the streaming of the images from the input. The UDP transport protocol was chosen for its ease of use and as packet losses are not detrimental to the system. Each of the design issues that arose during implementation of this design are now discussed. 23

Chapter 5 VHDL Implementation

by Jorgen Peddersen

5.4.1

Video-In to UDP Packet Converter

The video decoded from the input is sent an image at a time through the network. The main state machine controls the decoder, and writes the required packet into RAM so that it can be read and fragmented by the IP sending layer. The maximum data rate allows the following packet types to be sent: 180144 resolution at 25Hz refresh. 360144 resolution at 12.5Hz refresh. 360288 resolution interleaved at 12.5Hz refresh.

By testing all these image formats, it was decided that 360144 at 12.5Hz allowed a good looking image that updates quite frequently. A screenshot at this resolution can be seen in Figure 5. This quality is often better than many software-based systems, although as there is no compression, the image does look rather blocky. A compression algorithm would greatly increase the quality of the transmitted images, but the restrictions mentioned in section 5.3 do not allow that to be done in the time and scope of this thesis.

Figure 5: Example of image quality The UDP header shown in Table 3 must be included with each packet that is transmitted. The source port is hard-coded into the design as port 2038025. The
25

This is an arbitrarily chosen number (my birthday is 20 March 1980)

24

Chapter 5 VHDL Implementation

by Jorgen Peddersen

checksum is not required, and would just add extra complexity to the design so it is set to zero (which indicates unused). The data length is easy to calculate, and the destination port is provided by the connection handler (see section 5.4.2). The conversion from the decoder chip to a packet in RAM is performed by another state machine. Whenever streaming is enabled, this machine will tell the decoder state machine to decode one frame, or a 720576 interleaved image. The machine chooses which pixels it wants to write to RAM and then writes these pixels into RAM as the decoding continues. Writes to RAM have to be timed well, as the decoding machine operates off a clock generated by the SAA7113 chip at about 27MHz. The main system clock is 50MHz so some interface logic is needed. When the packet is complete in memory, it tells the IP layer to transmit the data. The extra logic added to the main stack program to allow ICMP communication is set up to allow all UDP packets through in preference to ICMP. This means that some ICMP packets may not be sent in time, but when the board is idle, it will still be possible to ping the board from any location on the network to test its network status. When the board is streaming data, ICMP replies will be sent during the pause between images being sent, but replies to all requests will not necessarily be made.

5.4.2

UDP Connection Handler

Also included as part of the network interface is the connection handler. This controls whether or not streaming should occur, and handles the destination IP and port. This state machine reads all incoming UDP packets, and listens to port 20380 for incoming data. If the packets first byte is 5Bh26, the board will start streaming packets to the source IP and port of the packet. If it receives a packet whose first byte is not 5Bh, it will stop streaming after sending the current packet. To determine if an incoming UDP packet is actually a control packet, the UDP header is checked. The checksum field is ignored as it is valid to do so for UDP, and once again it would just add unwanted complexity. If the destination port is 20380, the source IP

26

Also arbitrarily assigned

25

Chapter 5 VHDL Implementation

by Jorgen Peddersen

and port in the header are used to determine where data should be sent. The source IP is provided by the IP layer so that the destination IP address can be determined. This method is very useful for controlling the stream. The only problem occurs if the source never sends a stop packet, or the stop packet is lost. If this were to occur, the board would keep sending packets endlessly, denying service to the destination host. Therefore a timeout is required and is implemented so that if another packet with 5Bh as the first byte is not sent within 32 packets being output from the image streamer, streaming stops. The destination for the streaming data must therefore send these keep-alive packets periodically so the board will not timeout and stop sending. An extra problem was noticed when the design was completed that would cause a serious security hole. If an attacker on the network sent a start packet with their source address set as the broadcast IP, the board would then stream 32 large packets onto the network to whichever source port that was specified in the UDP header! In essence, they could force the board to deny access to a port of their choice on every machine connected to the network. Some filtering was added to stop this obvious security flaw from being a possibility.

5.5

Complete FPGA Design

By performing all the updates and implementing the new state machines described above, the final design can be created. A block diagram showing all the main state machines in the design is shown in Figure 6. This figure shows the state machines present in the design, and the communication paths between them.

26

Chapter 5 VHDL Implementation

by Jorgen Peddersen

Figure 6: Block diagram of final design

5.6

CPLD Alteration

To remove the need for a computer altogether, the design needs to be programmed into permanent storage on the board as the FPGA loses its configuration whenever the power is turned off. The Flash RAM mentioned in section 4.1.6 will accomplish this task. To get this method working, some alterations were required for the CPLD. The CPLD controls the programming of the FPGA and also handles some configuration options for the Ethernet PHY chip. These options include whether the PHY should operate at 10Mb/s or 100Mb/s and similar choices, so they are very important to the design. A new configuration for the CPLD was produced for the original design that handled downloading through the parallel port, but not from Flash RAM. Therefore, a 27

Chapter 5 VHDL Implementation

by Jorgen Peddersen

new configuration had to be written to make the CPLD download from Flash and control the network chip correctly. This design was modified from the source for the normal Flash programmer available on the XESS website examples page [21].

5.7

Summary

The VHDL design is the heart and soul of this designs implementation. This performs all the required tasks for the hardware to produce the video stream. The state machines involved are often similar in their design, yet each is very complicated, implementing its part of the design and interfacing with the rest of the state machines to achieve the overall goal. Much of the complete design was written before this thesis began, but many updates have been made to each of the previous designs to prepare it for the new project. The new state machines are also the most complicated hardware in the design as they do most of the work in the system. The final design allows full video streaming output to be achieved with a very simple input protocol. Although no compression other than sampling was used in the encoding of transmitted images, the design matches all expectations imposed in the problem description. A version of the source file that includes the video decoder, UDP packet converter and connection handler state machines is given in Appendix B.

28

Chapter 6 PC Implementation

by Jorgen Peddersen

Chapter 6

PC Implementation

This chapter describes the implementation issues that arose during the writing of the PC side of the streaming software. This must provide a simple, easy to use interface that reads the packets arriving from the board and displays the images on the screen. The first main decision involved was the choice of I.D.E. 27

6.1

Programming I.D.E.

As the design was to work on a Windows 9X system, there were only two real choices, Visual Basic and Visual C++. Both I.D.Es provide object orientation, but there are a few large differences between them. Visual Basic is a much simpler I.D.E. to use, but is slow and harder to control. There are many more functional modules provided for Visual C++ and it also produces smaller code. If the program was written in VB, it would be harder to specifically handle sockets for the network connection and to display images at a fast enough frame rate. Therefore, Visual C++ was chosen as it is very fast and much more powerful compared to VB. Although VC++ is harder to learn and program, it can do many more of the tasks required by the program than VB.

6.2

OpenPTC

OpenPTC gives programmers a surface of pixels to draw to and high speed routines to copy these pixels to the display. It also provides some other useful features such as basic keyboard input and a high resolution timer [22]. It contains many libraries for Windows, Dos, Unix and many other programming environments. The OpenPTC for Windows libraries are available on the web under the terms of the GNU Library General Public License. Essentially, this means that applications may link to the OpenPTC dynamic link library free of charge so long as any improvements made to the library itself are submitted back to the OpenPTC community [23].

27

Integrated Development Environment

29

Chapter 6 PC Implementation

by Jorgen Peddersen

OpenPTC was chosen to display the incoming images for its ease of use and its features that allow the program to display images without needing to handle window sizing etc. The OpenPTC implementation uses two classes to display the image. These classes are the console and the surface, and together they provide a very powerful imaging tool. To use the libraries, the program must first create a console and a surface associated with the console. The console is the window in which the images are displayed and can be any size. The surface has a size determined by the programmer for the number of pixels in the horizontal and vertical directions. Whenever an image needs to be displayed, the surface pixels are updated and then mapped to the console. Changing the console size will display the correct number of surface pixels no matter what, and this is all taken care of in the libraries. This shows how easy the OpenPTC libraries are to use, and the library also comes with many examples that make it easy to create other implementations. Some samples of OpenPTC programs are shown in Figure 7.

Figure 7: OpenPTC demonstrations

6.3

Winsock Sockets

Windows sockets allow the Windows programmer to add network functionality to the design. Windows sockets are required to receive UDP or TCP packets in a Visual C++ application. There are several different ways to use these that range from using MFC28 to very low-level socket reads and writes. when they are triggered. Sockets can also be set to be either synchronous or asynchronous, which determines whether they block or cause an event

28

Microsoft Foundation Classes

30

Chapter 6 PC Implementation

by Jorgen Peddersen

6.3.1

Microsoft Foundation Classes


These classes implement the common network

Microsoft provides two types of sockets to be used for network communications. These are CSocket and CAsyncSocket. functions for a blocking socket and an asynchronous type socket. There was a problem in attempting to use these classes, the correct .dll to get these working did not seem to be present. Therefore, the implementation was not made with these types of sockets.

6.3.2

Blocking Sockets

The original implementation was completed in blocking sockets as they are much easier to create and use without much preparation. These are similar to the sockets in Unix and other systems. By using SOCK_DGRAM as the socket type (this is the type for UDP), it is possible to create a socket, then use the sendto() and recvfrom() functions to transfer data. The problem with blocking sockets is that there is no timeout on receive, so if the system goes into a blocking receive and nothing arrives, the whole program stalls. This caused the program to lock up occasionally if there was a transmission error which is an obvious design flaw. For this reason, a non-blocking asynchronous socket was required for the receive. A blocking send could still be used as the system should always be ready to send when we need it to.

6.3.3

Non-blocking Sockets
Asynchronous sockets allows the sockets to

The final implementation uses non-blocking asynchronous sockets to control the sending and receiving of UDP data. employ Windows event handling. The WSAAsyncSelect() function is used to tell the socket to trigger an event on the current window whenever a receive arrives. This event is set up to call a specific function using similar code to that used in Programming Microsoft Visual C++ [24]. Using a non-blocking receive allows the system to perform other tasks and not poll the socket waiting for a packet to arrive. The socket code can be used to receive data whenever it arrives, and the function that is called operates on that data.

6.4

Protocol Definition

The protocol that the PC must use to keep the streaming going had to match the one used for the board (see section 5.4.2). This meant that a keep-alive packet with the

31

Chapter 6 PC Implementation

by Jorgen Peddersen

data 5bh must be sent to port 20380 of the board once to start transmission, and at least once for every 32 packets received. This was implemented by sending the keepalive on every received packet. Although it is not required as often as this, it will definitely keep the stream going. All the program has to do is listen on the port that it sent the original start message on to receive all image packets from the board and display them. This protocol was chosen for its simplicity and the fact that it reduces bandwidth on the network. A timer could also be used to generate when the keep-alive messages should be sent, but this would require more coding than was required, and the implemented method works just as well.

6.5

Graphical User Interface

The GUI for the device was designed to be as simple as possible, as the hardware component is much more important than this software design. All the GUI needs is the ability to select an IP address, and buttons to start and stop the stream. The OpenPTC console acts as another window and is created at start-up. Therefore, the start and stop buttons affect this window. Due to the power of OpenPTC, it is easy to resize the console and still get the image at whatever window size is required. The GUI for the design is shown in Figure 8.

Figure 8: GUI for PC program

32

Chapter 6 PC Implementation

by Jorgen Peddersen

6.6

Summary

The PC program discussed in this chapter was produced to demonstrate the output of the video streaming. This was done for a windows environment, but similar programs can also be created for Unix, DOS and other environments. This program is limited as it was designed simply to display the output of the hardware board, although issues forced it to become more complicated than initially planned. The use of a free library package to control the imaging was extremely helpful to ensure the code is small. This design would have taken a lot more time to create if OpenPTC was not available. There are also versions for Unix, Dos etc. that could be used for other versions of the PC program. Appendix C shows the main source file for this code.

33

Chapter 7 Design Evaluation

by Jorgen Peddersen

Chapter 7

Design Evaluation

This chapter discusses the final implementations results and how well they compare to other streaming methods. Topics discussed are the video quality and network usage. The implementation data for the final design can be viewed in Appendix A.

7.1

Streaming Results

The streaming results that the board produces perform rather well, but need some work. The limitations of a 10Mb/s network interface only allows raw data to be sent with a very lossy compression algorithm.

7.1.1

Image Quality

The quality of the streaming image is recognisable, but quite blocky. This is due to the large lossy compression ratio used. Attempts were made to increase the quality of the streamed image, although the limitations stopped this from making the image even better. As chapter 8 will discuss, there are many upgrades that can be made to the program that would allow a much cleaner, less lossy image to be displayed. Although the image quality is not perfect, it is OK to prove that hardware methods for live video are possible, and could easily surpass software methods in the next few years. The refresh rate of 12.5Hz is nearly undetectable, and this is much better than some software methods that only achieve only a few frames per second. A dedicated computer with a high bandwidth network card, a good network connection and a very good video card are required to get really high quality results at a high refresh rate.

7.1.2

Network Issues

There is one rare bug remaining in the final design. If one IP fragment is lost, a buffer on the PC is wasted while the system waits for the packet to timeout before clearing the buffer. After a long time of streaming, these partial packets start filling up all the buffers on the system. This causes the system to start to lag, and the PC may not send enough packets to keep the connection with the board alive. In this situation, streaming stops and the start button must be pressed again.

34

Chapter 7 Design Evaluation

by Jorgen Peddersen

Attempts have been made to bypass this error, but no solution that works perfectly has been devised. Lowering packet size decreases this effect, but sacrifices must be made in image quality. Some alternative solutions that would alleviate this problem are also discussed in Chapter 8. Apart from this bug, communication through the network is fine on reliable networks. Unfortunately, due to the large UDP packet size, the system will not work very well at all on an unreliable network, as every fragment of a packet must arrive for the viewing program to display it.

7.2

Comparisons
Against software methods, the board is

Although this design may not be better than other commercially available products, it does almost match up in most respects. adequate in some areas, but severely lacking in others. To achieve good live streaming using a computer, the requirements are a Pentium III processor, good video card and good network card. This would end up costing more than the board, but the quality of the video will be much better in the software. The design could match the refresh rates of the board with less network bandwidth and clearer, more compressed images. The MidStream server introduced in section 2.3.1 provides capability to work on Gigabit networks and comes with hard drives to store the data. It can accept multiple connections at once and provides a much better quality than the thesis implementation. It does not, however do live streaming of RCA or S-Video data. The streaming data must already exist on the drives in the hardware. This design also uses multiple VirtexII FPGAs in its construction. These FPGAs can provide much better digital signal processing algorithms for encryption etc. This server shows what the thesis set out to do in the first place, provide a solution which could match software in a pure hardware design. The Axis Webcams and video servers described in section 2.3.2 can also outperform the thesis implementation in image quality, although they are also very expensive. The main addition that allows this is compression which decreases the size of images. The 2100 webcam can only stream at rates up to 10 frames/second though, which is not as 35

Chapter 7 Design Evaluation

by Jorgen Peddersen

good as the thesis board which can reach 25 frames/second at low resolution. The webcam can only output the images it sees as well, it cant convert the output of TV or a game console like the thesis implementation does. By adding a proper compression algorithm to the implementation, it may be possible to increase the boards specifications to match those of the other servers and cameras. These comparisons may seem to say that this implementation is not at the level of other methods but, as the MidStream server shows, hardware versions of streaming will slowly become cheaper and better than the computer-based software methods. The thesis implementation has provided a new method of implementing live video streaming that could lead to a range of low cost products that implement this task in the future.

7.3

Process Evaluation

The performance of the final design has been discussed, but what of the process that it took to get to the final design? This section evaluates the work methods used and experience learned during the process of the thesis. The methods used were mostly correct, and they have produced a final design which meets all the specifications and has the ability to exceed the specifications with some extra work. The one design decision that was incorrect was to give up on compression too early. The final image appears distorted and unclear as the sampling used is extremely lossy. Some more time dedicated to this process would produce an image with a much better quality. Apart from this, the other decisions made all seem correct and helped to produce the final design. The VHDL and PC implementations give a complete solution to the specified problem. Much experience was gained through the process of completing the thesis design. Many design methods were learnt, like the instantiation of block RAMs and ROMs as described in section 5.1.1.

7.4

Summary

Considering these evaluations, it is plausible to say that the final design could be classed as a success, but it is not the best result. The design has met all the goals defined in the 36

Chapter 7 Design Evaluation

by Jorgen Peddersen

problem definition, and has proven that hardware streaming is possible, and should improve to a point where hardware-based designs outperform software-based designs. The experience gained through applying correct methods during the design will also aid in future work. Several improvements were desired to the final design, but limited time stopped them from being implemented. These improvements are presented in the next chapter.

37

Chapter 8 Future Developments

by Jorgen Peddersen

Chapter 8

Future Developments

The final product was complete within the limitations of this thesis, but there are many other upgrades that would make it into a viable product. Time constraints stopped these upgrades from occurring but the method for each upgrade is easy to implement. This chapter discusses each of the improvements that could be made, and details how much they would improve the quality of the final design.

8.1

Image Format Changes

The current method only uses 8-bit colour data so there are only 256 possible colours. This was chosen to keep packet sizes down. Higher colour densities can be achieved by decreasing the resolution or frame rate so a trade off must be made. The maximum data that can currently be sent must be less than 10Mb/s as this is all the network code can handle. Ignoring the overhead of the network the data rate can be found with the following formula: Data rate = Hori. Resolution Vert. Resolution Refresh Rate Colour Depth(in bits) The method that was chosen uses 36014412.5Hz with 8-bit colour depth, which comes out to be about 5.184Mb/s. It is also possible to interleave the vertical lines, only sending either the even or old half per packet. It is easy to alter these values by a factor of 2 so that the image formats shown in Table 5 can be used instead if desired. All these formats result in a 5.184Mb/s data rate. Table 5: Some alternate image formats Horizontal Resolution 360 180 180 360 Vertical Resolution 144 144 144 288 Refresh Rate 12.5Hz 25Hz 12.5Hz 12.5Hz Colour Depth 8-bit 8-bit 16-bit 8-bit Interleaved? No No No Yes

8.2

100Mb/s Upgrade

Problems in the low image quality were mainly due to the network data rate being only 10Mb/s. The original network stack code that was provided only allowed this speed and

38

Chapter 8 Future Developments

by Jorgen Peddersen

not the 100Mb/s speed common for most networks these days. The Ethernet PHY chip is capable at running at 100Mb/s and by altering the stack, it is possible to achieve this data rate. If this upgrade were made, it would be possible to send more data per image or use a higher colour depth, therefore greatly improving the resulting video streaming quality. The main difficulty in converting to a 100Mb/s design is that a nibble of data is required every two clock cycles, whereas previously there were 20 cycles. This means much of the design needs to be altered to allow 100Mb/s functionality to occur. The following sections explain how these alterations should be performed.

8.2.1

16-bit RAM Functionality

In order to achieve the 100Mb/s control signal rate, more RAM must be read at one time. The RAM is not fast enough to only read or write one byte at a time, as the network chip requires the data faster at the new clock rate. To fix this problem, the entire stack needs to be redesigned to read and write 16-bit data rather than 8-bit data. The RAM code provided will accept 16-bit reads and writes, and most of the other fixes to the code are relatively minor.

8.2.2

CRC Alteration

CRC29 is used in the Ethernet layer to check the integrity of packets transmitted over the network. A CRC generator is required in the design for both incoming and outgoing packets. This CRC generator is designed to accept one byte at a time and takes eight clock cycles before it can accept another, i.e. 1 cycle per bit. At 100Mb/s, it must handle a byte in every four cycles, so it would need to be redesigned. The best way to do this is to use a look-up table in block ROM that stores the CRC generator vector [25]. Using a nibble to index the table would require a 1632-bit table that will allow four bits to be encoded simultaneously, while minimizing the amount of block ROM required to perform the alteration.

29

Cyclic Redundancy Check

39

Chapter 8 Future Developments

by Jorgen Peddersen

8.3

Fragment the UDP Packet

Streaming quality will be increased on non-reliable networks by segmenting the UDP packet into multiple UDP segments rather than relying on IP fragmentation to do it. Segmenting into multiple UDP packets rather than using IP fragmentation can alleviate the problem described in section 7.1.2. If a packet is lost during this method, a buffer is not filled and there is no timeout waiting for an entire image to arrive like the implemented version. Only part of the image is lost, and that part will not update until the next refresh cycle. Segmentation is accomplished by reducing the size of UDP packets so that they fit within the maximum packet size. The boundary can be placed on the end of a line so that, for example, 4 lines can be transmitted per packet. This would mean that a line width of 360 will give a packet size of 3604 + 8 = 1448 bytes. One more byte would be needed to indicate which set of lines this packet includes so that the PC side knows where to place the lines on the OpenPTC surface. This method would eliminate any problem of filling up the receive buffers, although implementing it is difficult. Implementing this as part of the UDP layer is possible, but tricky due to the amount of data that needs to be moved around to create each UDP packet. It is much easier to alter the IP fragmenter to tack on the UDP header to any outgoing fragments. This is fine as long as the board would only be sending UDP packets. Unfortunately, it also has to recognise ICMP packets and not add the header to these.

8.4

Image Compression

Another addition that would be useful is some sort of compression algorithm for the images being transmitted. Due to the way the image is stored in RAM, there are only a few options for compression. To be able to maintain live streaming, it is impossible to do two-dimensional compression. It is possible to remember the last few values, so some sort of purely horizontal line compression algorithm could be implemented and still retain the live streaming. This will lower the amount of data sent per image, so it is possible to increase the image size and not flood the network with data.

40

Chapter 8 Future Developments

by Jorgen Peddersen

8.5

Audio streaming

Included in the work mentioned in section 2.1 was an application to use the audio input port on the board to record audio data and play it back. Part of this design could be utilised to add audio data to outgoing packets. This would be easy to add on the board side of the communication, but a way to play the music must be found at the destination. Many software streams use audio, and it would be a useful addition to the project.

8.6

Summary
With a better compression algorithm and faster

The improvements mentioned in this chapter would all make the design work faster or better than it currently performs. network interface, the design could match its current software counterparts. Methods for making all the improvements have been given, and would have been implemented, except time constraints stopped them from being done.

41

Chapter 9 Conclusion

by Jorgen Peddersen

Chapter 9

Conclusion
This device performs streaming without

This thesis has presented an overview of the design and development of a hardware implementation of live video streaming. requiring a computer for decoding and transmitting the images. A device capable of providing proof that video streaming in hardware is possible and could be extended to overtake software in the market has been specified, designed and implemented. An evaluation of the final design implementation is also performed to demonstrate that it achieved its expectations. The project can be considered a success due to the level of the final design. Although the final implementation was complete, there are several areas of the design that could be improved upon. These are listed, and possible methods for achieving these improvements are suggested. This design proves that hardware implementations of live video streaming are possible and will probably improve to a level to match software techniques in future years.

42

References

by Jorgen Peddersen

References
[1] [2] [3] XESS Corp., XESS Corporation home page, http://www.xess.com (current 15 Oct, 2001). Brennan J, Partis A. and Peddersen J., VHDL XSV Board Interface Projects, http://www.itee.uq.edu.au/~peters/xsvboard (current 15 Oct, 2001). Xilinx, Xilinx Enables Breakthrough Video Streaming Technology in New Server from MidStream, http://www.xilinx.com/prs_rls/0190MidStream.html (current 15 Oct, 2001). MidStream, MidStream Technologies, http://www.MidStream.com (current 15 Oct, 2001). Axis, Axis Communications, http://www.axis.com (current 15 Oct, 2001). Webcam Solutions, Webcam Solutions Price List, http://www.webcamsolutions.com.au/Pricing.htm (current 15 Oct, 2001). Riley L., Motion JPEG Video/Still Image CODEC, http://www.4i2i.com/JPEG_Core.htm (current 15 Oct, 2001). Mohor I., Mahmud G. and Novan H., Ethernet MAC 10/100 Mbps, http://www.OpenCores.org/cores/ethmac/ (current 15 Oct, 2001). RFC 793, Transmission Control Protocol, University of Southern California, 1981.

[4] [5] [6] [7] [8] [9]

[10] RFC 768, User Datagram Protocol, University of Southern California, 1980. [11] XESS Corp., XSV-300 Virtex Prototyping Board, http://www.xess.com/prod014_3.php3 (current 15 Oct, 2001). [12] Xilinx, Xilinx Home : Products : Devices : Virtex Series, http://www.xilinx.com/xlnx/xil_prodcat_landingpage.jsp?title=Virtex_Series (current 15 Oct, 2001) [13] Xilinx, XC95108 In-System Programmable CPLD, http://www.xilinx.com/partinfo/95108.pdf (current 15 Oct, 2001). [14] Philips, Philips Semiconductor; SAA7113H; 9-bit video input processor, http://www-us7.semiconductors.philips.com/pip/SAA7113H (current 15 Oct, 2001). [15] Digital Video Coding, Digital Video Coding: Digital Video, http://umi.eee.rgu.ac.uk/umi/digvid/digvid.html (current 15 Oct, 2001). [16] Dragon, Product Lines, http://www.dragonhk.com/products/intel/features/LXT970A.htm (current 15 Oct, 2001). [17] Alliance Semiconductor, 5V/3.3V 512K 8 CMOS SRAM, http://www.gaw.ru/doc/Alliance/as7c34096.pdf (current 15 Oct, 2001). [18] Intel, 5 Volt FlashFile Memory; 28F004S5, 28F008S5, 28F016S5 (x8), http://developer.intel.com/design/flcomp/datashts/290597.htm (current 15 Oct 2001). 43

References

by Jorgen Peddersen

[19] Philips, Philips Semiconductors;I2c, http://www.semiconductors.philips.com/buses/i2c/ (current 15 Oct, 2001). [20] RFC 791, Internet Protocol, University of Southern California, 1981. [21] XESS Corp., Example Designs, Tutorials, Application Notes, http://www.xess.com/ho03000.html#Examples (current 15 Oct, 2001). [22] OpenPTC, OpenPTC, http://www.gaffer.org/ptc (current 15 Oct, 2001). [23] Fiedler, G., OpenPTC for Windows, http://www.gaffer.org/ptc/distributions/Windows/index.html (current 15 Oct, 2001). [24] Kuglinski, D., Programming Microsoft Visual C++, Microsoft Press, Redmond, Wash., 1998. [25] Modicon, LRC/CRC Generation, http://www.modicon.com/techpubs/crc7.html (current 15 Oct 2001).

44

Appendix A Implementation Data

by Jorgen Peddersen

Appendix A Implementation Data


Included here is some of the implementation data for the design. These reports are segments of the reports generated by Foundation during implementation. Design Summary -------------Number of errors: 0 Number of warnings: 33 Number of Slices: 1,761 out of Number of Slices containing unrelated logic: 0 out of Number of Slice Flip Flops: 1,538 out of Total Number 4 input LUTs: 2,323 out of Number used as LUTs: Number used as a route-thru: Number of bonded IOBs: 81 out of Number of Block RAMs: 1 out of Number of GCLKs: 4 out of Number of GCLKIOBs: 4 out of Total equivalent gate count for design: 45,896 Additional JTAG gate count for IOBs: 4,080

3,072 1,761 6,144 6,144 2,298 25 166 16 4 4

57% 0% 25% 37%

48% 6% 100% 100%

Timing summary: --------------Timing errors: 547 Score: 3115439

Constraints cover 2633556 paths, 0 nets, and 10578 connections (96.5% coverage) Design statistics: Minimum period: 34.683ns (Maximum frequency: 28.833MHz) Maximum path delay from/to any node: 14.813ns Minimum input arrival time before clock: 17.300ns Minimum output required time after clock: 20.523ns

A-1

Appendix B Partial VHDL Source Code

by Jorgen Peddersen

Appendix B Partial VHDL Source Code


Included here is the main file of the VHDL source code that performs all the video decoding and network interface code. This version performs 18014425Hz video. SAA7113.vhd :
-------------------------------------------------------------------------------- saa7113.vhd --- Author(s): Jorgen Peddersen -- Created: Jan 2001 -- Last Modified: Sep 2001 -------------------------------------------------------------------------------library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; use work.global_constants.all; entity saa7113 is port( clk: in std_logic; -- 50 MHz clock rstn: in std_logic; stop: in std_logic; scl: inout std_logic; -- bidirectional I2C clock signal sda: inout std_logic; -- bidirectional I2C data signal llck: in std_logic; -- SAA7113 video clock (27 MHz) vpo: in std_logic_vector(7 downto 0); -- data from SAA7113 rts: in std_logic_vector(1 downto 0); -- real-time video status complete: in std_logic; datagramSent: in std_logic; bar: out std_logic_vector(6 downto 2); -- status displays rdRAM: out std_logic; rdAddr: out std_logic_vector(18 downto 0); rdData: in std_logic_vector(7 downto 0); rdComplete: in std_logic; newDatagram: in std_logic; protocolIn: in std_logic_vector(7 downto 0); sourceIP: in std_logic_vector(31 downto 0); wrRAM : out std_logic; wrData: out std_logic_vector(7 downto 0); wrAddr: out std_logic_vector(18 downto 0); sendDatagram: out std_logic; datagramSize: out std_logic_vector(15 downto 0); destinationIP: out std_logic_vector(31 downto 0) ); end saa7113; architecture saa7113_arch of saa7113 is component i2c is port ( clk: in STD_LOGIC; rstn: in STD_LOGIC; go: in STD_LOGIC; done: out STD_LOGIC; sda: out STD_LOGIC; scl: out STD_LOGIC ); end component; -- states for the SAA7113 interface circuit type VIDEO_STATE_TYPE is ( stIdle, stWaitForEscape, stCheckEscape1,

B-1

Appendix B Partial VHDL Source Code

by Jorgen Peddersen

stCheckEscape2, stCheckForNewPage, stCheckForFirstLine, stChromaBlue, stLumaBlue, stChromaRed, stLumaRed, stCheckForEndLine, stCheckForNewLine, stError, stNew, stError2 ); type CONTROL_STATE_TYPE is ( stResetC, stWrHeaderC, stIdleC, stGoC, stWriteFrameC, stWriteImageC, stSendFrameC, stSyncC ); signal presStateC: CONTROL_STATE_TYPE; signal nextStateC: CONTROL_STATE_TYPE; type READ_STATE_TYPE is ( stIdleR, stReadR, stSignalR ); signal presStateR: READ_STATE_TYPE; signal nextStateR: READ_STATE_TYPE;

-- video in state signals signal presState: VIDEO_STATE_TYPE; signal nextState: VIDEO_STATE_TYPE; signal returnState: VIDEO_STATE_TYPE; signal nextReturnState: VIDEO_STATE_TYPE; signal bclk : std_logic; signal gnd_net : std_logic; -- buffered version of the clock signal -- ground signal for connecting to unused ports

signal capture : STD_LOGIC; -- assert to capture a frame signal error : STD_LOGIC; -- is high while there is an error in decode signal sda_out, sda_in, scl_out, scl_in : std_logic; -- internal I2C interface signals signal grab_addr, disp_addr: std_logic_vector(18 downto 0); -- RAM addresses signal grab_cntr_hori: std_logic_vector(9 downto 0); -- horizontal write counter signal grab_cntr_vert: std_logic_vector(8 downto 0); -- vertical write counter signal clr_grab_cntr: STD_LOGIC; -- clear both grab counters signal inc_grab_hori: STD_LOGIC; -- increment horizontal counter for each 2 pixels signal inc_grab_vert: STD_LOGIC; -- increment vert. counter and clear hori. counter signal field : STD_LOGIC; signal nextField : STD_LOGIC; -- remember which field we are operating on. -- signal to remember the field

signal write : STD_LOGIC; signal grab : std_logic; signal nextGrab : STD_LOGIC;

-- non 50MHz signal to tell RAM to write -- Assert to write current data when ready -- Delays write signal by one clock cycle

signal vpoLatch: std_logic_vector(7 downto 0);-- synchronise data on the vpo bus to LLCK -- signals to store and remember luminance and chrominance values signal luminanceB: std_logic_vector(7 downto 0); signal luminanceR: std_logic_vector(7 downto 0); signal chrominanceB: STD_LOGIC_VECTOR(7 downto 0); signal chrominanceR: STD_LOGIC_VECTOR(7 downto 0); signal nextLuminanceB: std_logic_vector(7 downto 0); signal nextLuminanceR: std_logic_vector(7 downto 0);

B-2

Appendix B Partial VHDL Source Code

by Jorgen Peddersen

signal nextChrominanceB: STD_LOGIC_VECTOR(7 downto 0); signal nextChrominanceR: STD_LOGIC_VECTOR(7 downto 0); -- colour conversion signals from YUV to RGB signal red: STD_LOGIC_VECTOR(17 downto 0); signal green: STD_LOGIC_VECTOR(17 downto 0); signal blue: STD_LOGIC_VECTOR(17 downto 0); signal colour : STD_LOGIC_VECTOR(14 downto 0); signal colour8 : STD_LOGIC_VECTOR(7 downto 0); signal divclk: STD_LOGIC; signal divclkcnt: STD_LOGIC_VECTOR(4 downto 0); signal vidindone: STD_LOGIC;

signal colourLatch: STD_LOGIC_VECTOR(7 downto 0); signal nextColourLatch: STD_LOGIC_VECTOR(7 downto 0);

signal datagramReady : STD_LOGIC;

signal sdaint:STD_LOGIC; signal sclint:STD_LOGIC;

signal signal signal signal signal signal signal signal

captureLatch:STD_LOGIC; sw: STD_LOGIC; swLatch:STD_LOGIC; busy : STD_LOGIC; busyLatch : STD_LOGIC; grabLatch : STD_LOGIC; grabLatch2 : STD_LOGIC; i2cgo : STD_LOGIC;

signal wrCnt : STD_LOGIC_VECTOR(15 downto 0); signal clrWrCnt: STD_LOGIC; signal incWrCnt: STD_LOGIC; constant FRAME_SIZE : STD_LOGIC_VECTOR(15 downto 0) := x"6548"; signal rdCnt : STD_LOGIC_VECTOR(3 downto 0); signal clrRdCnt : STD_LOGIC; signal incRdCnt : STD_LOGIC;

signal LatchluminanceB : STD_LOGIC_VECTOR(7 downto 0); signal Latchgrab_cntr_hori : STD_LOGIC_VECTOR(9 downto 0); signal Latchgrab_cntr_vert : STD_LOGIC_VECTOR(8 downto 0);

signal signal signal signal signal signal

destinationPortLatch : STD_LOGIC_VECTOR(15 downto 0); sourcePortLatch : STD_LOGIC_VECTOR(15 downto 0); latchSourcePort: STD_LOGIC; latchSourceIP: STD_LOGIC; latchDestinationData: STD_LOGIC; sourceIPlatch : STD_LOGIC_VECTOR(31 downto 0);

signal keepAlive: STD_LOGIC; signal stopTimer: STD_LOGIC; signal timerCount: STD_LOGIC_VECTOR(4 downto 0); signal sendDatagramInt: STD_LOGIC; begin process(clk, rstn) begin if rstn = '0' then divclkcnt <= (others => '0'); elsif clk'event and clk = '1' then divclkcnt <= divclkcnt + 1; end if; end process; divclk <= divclkcnt(4);

B-3

Appendix B Partial VHDL Source Code

by Jorgen Peddersen

vidi2c: i2c port map( clk => divclk, rstn => rstn, go => i2cgo, done => vidindone, sda => sda, scl => sclint ); scl <= '0' when sclint = '0' else 'Z'; sendDatagram <= sendDatagramInt; gnd_net <= '0'; bar(4) <= vidindone; -- ground signal for unused inputs in components -- shows when capture is occurring or not

-- state machine which decodes the data from the decoder chip process(llck,rstn) begin if rstn = '0' then -- reset signals asynchronously presState <= stIdle; grab <= '0'; returnState <= stIdle; field <= '0'; elsif llck'event and llck='1' then -- processes are clocked by llck captureLatch <= capture; vpoLatch <= vpo; presState <= nextState; grab <= nextGrab; -- synchronize asynchronous data -- go to next state -- delay so colour can be calculated

returnState <= nextReturnState; field <= nextField; chrominanceR <= nextChrominanceR; chrominanceB <= nextChrominanceB; luminanceR <= nextLuminanceR; luminanceB <= nextLuminanceB; -- operate on the grab counters for vert. and hori. movement if clr_grab_cntr = '1' then grab_cntr_hori <= (others => '0'); grab_cntr_vert <= (others => '0'); else if inc_grab_hori = '1' then grab_cntr_hori <= grab_cntr_hori + 1; end if; if inc_grab_vert = '1' then grab_cntr_vert <= grab_cntr_vert + 1; grab_cntr_hori <= (others => '0'); -- clear horizontal counter with each new line end if; end if; end if; end process; process (presState, vpoLatch, returnState, field, luminanceB, luminanceR, chrominanceB, chrominanceR, captureLatch) begin -- default signal values clr_grab_cntr <= '0'; inc_grab_hori <= '0'; inc_grab_vert <= '0'; nextGrab <= '0'; nextReturnState <= returnState; nextField <= field; nextLuminanceB <= luminanceB; nextLuminanceR <= luminanceR; nextChrominanceB <= chrominanceB; nextChrominanceR <= chrominanceR; error <= '0'; datagramReady <= '1';

B-4

Appendix B Partial VHDL Source Code

by Jorgen Peddersen

--

busy <= '1'; bar(5) <= '0'; bar(6) <= '0'; case presState is when stIdle => busy <= '0'; -- wait until capture is asserted then write a frame if captureLatch = '1' then nextState <= stWaitForEscape; nextReturnState <= stCheckForNewPage; else nextState <= stIdle; display_frame <= '1'; end if; -- The following three states form a sort of subroutine. when stWaitForEscape => bar(5) <= '1'; bar(6) <= '1'; -- Look for the first character in the sequence if vpoLatch = X"FF" then nextState <= stCheckEscape1; else nextState <= stWaitForEscape; end if; when stCheckEscape1 => -- Second character in the escape sequence is 0. if vpoLatch = X"00" then nextState <= stCheckEscape2; else nextState <= stError; end if; when stCheckEscape2 => -- Third charcter in the escape sequence is 0. if vpoLatch = X"00" then nextState <= returnState; else nextState <= stError; end if; when stCheckForNewPage => -- Wait for an SAV or EAV in field 0 while in blanking if vpoLatch(6 downto 5) = "01" then -- If it is then wait until the first line nextState <= stWaitForEscape; nextReturnState <= stCheckForFirstLine; clr_grab_cntr <= '1'; -- initialise counter else -- Look for another SAV/EAV nextState <= stWaitForEscape; nextReturnState <= stCheckForNewPage; end if; when stCheckForFirstLine => -- Wait for an SAV in field 0 while in the active region if vpoLatch(6 downto 4) = "000" then -- start recording data nextState <= stChromaBlue; nextField <= '0'; -- initialise field else -- Look for another SAV/EAV nextState <= stWaitForEscape; nextReturnState <= stCheckForFirstLine; end if; when stChromaBlue => -- This may be the start of another pair of pixels -- If the byte is FF then it is the start of the EAV. if vpoLatch = X"FF" then nextState <= stCheckEscape1; nextReturnState <= stCheckForEndLine; elsif vpoLatch = X"00" then nextState <= stError; else

B-5

Appendix B Partial VHDL Source Code

by Jorgen Peddersen

-- latch data into register and continue nextState <= stLumaBlue; nextChrominanceB <= vpoLatch; end if; when stLumaBlue => -- As long as valid data is present continue latching data if vpoLatch /= X"FF" and vpoLatch /= X"00" then nextState <= stChromaRed; nextLuminanceB <= vpoLatch; else nextState <= stError; end if; -- As long as valid data is present continue latching data when stChromaRed => if vpoLatch /= X"FF" and vpoLatch /= X"00" then nextState <= stLumaRed; nextChrominanceR <= vpoLatch; else nextState <= stError; end if; when stLumaRed => if vpoLatch /= X"FF" and vpoLatch /= X"00" then nextState <= stChromaBlue; nextLuminanceR <= vpoLatch; nextGrab <= '1'; -- Set up a write inc_grab_hori <= '1'; -- Increment hori counter else nextState <= stError; end if; when stCheckForEndLine => -- possible conditions here are the end of field 0, end of field 1, -- or an EAV code indicating a new line in the active region. if vpoLatch(6 downto 4) = "111" then -- end of field 1 nextState <= stNew; datagramReady <= '1'; elsif vpoLatch(6 downto 4) = "011" then-- end of field 0 clr_grab_cntr <= '1'; -- reset counter for field 1 nextState <= stWaitForEscape; nextReturnState <= stCheckForNewLine; elsif vpoLatch(5 downto 4) = "01" then -- end of line inc_grab_vert <= '1'; -- go to next line nextState <= stWaitForEscape; nextReturnState <= stCheckForNewLine; else -- EAV expected but SAV received nextState <= stError; end if; when stCheckForNewLine => -- Wait until an SAV in the active video range arrives if vpoLatch(5 downto 4) = "00" then nextState <= stChromaBlue; -- capture next line nextField <= vpoLatch(6); else -- Wait for another code nextState <= stWaitForEscape; nextReturnState <= stCheckForNewLine; end if; when stError => -- Wait until another capture is requested bar(5) <= '1'; if captureLatch = '1' then nextState <= stWaitForEscape; nextReturnState <= stCheckForNewPage; else nextState <= stError; error <= '1'; -- indicate error

B-6

Appendix B Partial VHDL Source Code

by Jorgen Peddersen

end if; when stNew => if captureLatch = '1' then nextState <= stNew; else nextState <= stIdle; end if; when stError2 => bar(6) <= '1'; nextState <= stError2; when others => nextState <= stError2; end case; end process; process(clk, rstn) begin if rstn = '0' then presStateC <= stResetC; elsif clk'event and clk = '1' then colourLatch <= nextColourLatch; presStateC <= nextStateC; busyLatch <= busy; grabLatch <= grab; grabLatch2 <= grabLatch; -swLatch <= sw; LatchluminanceB <= luminanceB; Latchgrab_cntr_hori <= grab_cntr_hori; Latchgrab_cntr_vert <= grab_cntr_vert; if clrWrCnt = '1' then wrCnt <= (others => '0'); elsif incWrCnt = '1' then wrCnt <= wrCnt + 1; end if; end if; end process; rdAddr <= "001000000000000" & rdCnt; wrAddr <= "101" & wrCnt; datagramSize <= FRAME_SIZE; process(presStateC, wrCnt, complete, swLatch, busyLatch, grabLatch, grabLatch2, Latchgrab_cntr_hori, Latchgrab_cntr_vert, LatchluminanceB, vidindone, colourLatch, destinationPortLatch, datagramSent, colour8) begin nextColourLatch <= colourLatch; clrWrCnt <= '0'; incWrCnt <= '0'; wrData <= (others => '0'); wrRAM <= '0'; sendDatagramInt <= '0'; i2cgo <= '1'; capture <= '0'; bar(3) <= '0'; bar(2) <= '0'; sendDatagramInt <= '0'; case presStateC is when stResetC => clrWrCnt <= '1'; nextStateC <= stWrHeaderC; i2cgo <= '0'; when stWrHeaderC => i2cgo <= '0'; -- Write a byte to RAM if wrCnt(3 downto 0) = x"8" then -- header has been fully written so go to data stage nextStateC <= stIdleC; elsif complete = '0' then case wrCnt(2 downto 0) is when "000" => wrData <= VIDEO_PORT(15 downto 8);

B-7

Appendix B Partial VHDL Source Code

by Jorgen Peddersen

when "001" => wrData <= VIDEO_PORT(7 downto 0); when "010" => wrData <= destinationPortLatch(15 downto 8); when "011" => wrData <= destinationPortLatch(7 downto 0); when "100" => wrData <= FRAME_SIZE(15 downto 8); when "101" => wrData <= FRAME_SIZE(7 downto 0); when "110" => wrData <= x"00"; when "111" => wrData <= x"00"; when others => wrData <= (others => '0'); end case; -- Wait for RAM to acknowledge the write nextStateC <= stWrHeaderC; wrRAM <= '1'; else -- When it does increment the counter nextStateC <= stWrHeaderC; incWrCnt <= '1'; end if; when stIdleC => bar(3) <= '1'; bar(2) <= '1'; if swLatch = '1' and vidindone = '1' then nextStateC <= stGoC; else nextStateC <= stIdleC; end if; when stGoC => bar(3) <= '1'; capture <= '1'; if busyLatch = '1' then nextStateC <= stWriteFrameC; else nextStateC <= stGoC; end if; when stWriteFrameC => bar(2) <= '1'; if wrCnt = FRAME_SIZE then nextStateC <= stSendFrameC; elsif grabLatch = '0' and grabLatch2 = '1' and Latchgrab_cntr_hori(0) = '0' and Latchgrab_cntr_vert(0) = '0' then nextStateC <= stWriteImageC; nextColourLatch <= colour8; else nextStateC <= stWriteFrameC; end if; when stWriteImageC => if complete = '0' then -- Wait for RAM to acknowledge the write nextStateC <= stWriteImageC; wrData <= colourLatch; wrRAM <= '1'; else nextStateC <= stWriteFrameC; incWrCnt <= '1'; end if; when stSendFrameC => sendDatagramInt <= '1'; nextStateC <= stSyncC; when stSyncC => if busyLatch = '1' then nextStateC <= stSyncC; else nextStateC <= stResetC; end if; end case; end process;

B-8

Appendix B Partial VHDL Source Code

by Jorgen Peddersen

process(clk,rstn) begin if rstn = '0' then presStateR <= stIdleR; -sw <= '0'; timerCount <= (others => '0'); elsif clk'event and clk = '1' then presStateR <= nextStateR; if clrRdCnt = '1' then rdCnt <= (others => '0'); elsif incRdCnt = '1' then rdCnt <= rdCnt + 1; end if; if latchSourceIP = '1' then sourceIPLatch <= sourceIP; end if; if latchSourcePort = '1' then sourcePortLatch <= sourcePortLatch(7 downto 0) & rdData; end if; if latchDestinationData = '1' then destinationIP <= sourceIPLatch; destinationPortLatch <= sourcePortLatch; end if; if keepAlive = '1' then timerCount <= (others => '1'); elsif stopTimer = '1' or stop = '1' then timerCount <= (others => '0'); elsif sendDatagramInt = '1' and timerCount /= 0 then timerCount <= timerCount - 1; end if; end if; end process; swLatch <= '0' when timerCount = 0 else '1'; process (presStateR, newDatagram, protocolIn, rdCnt, rdComplete, rdData) begin clrRdCnt <= '0'; incRdCnt <= '0'; latchSourceIP <= '0'; rdRAM <= '0'; latchSourcePort <= '0'; latchDestinationData <= '0'; keepAlive <= '0'; stopTimer <= '0'; case presStateR is when stIdleR => if newDatagram = '1' and protocolIn = 17 then nextStateR <= stReadR; clrRdCnt <= '1'; latchSourceIP <= '1'; else nextStateR <= stIdleR; end if; when stReadR => if rdCnt = 8 then nextStateR <= stSignalR; elsif rdComplete = '0' and rdCnt(3 downto 2) = "00" then nextStateR <= stReadR; rdRAM <= '1'; else incRdCnt <= '1'; nextStateR <= stReadR; if rdCnt(2 downto 0) = "010" and rdData /= VIDEO_PORT(15 downto 8) then nextStateR <= stIdleR; end if; if rdCnt(2 downto 0) = "011" and rdData /= VIDEO_PORT(7 downto 0) then nextStateR <= stIdleR; end if; if rdCnt(2 downto 1) = "00" then latchSourcePort <= '1'; end if; end if;

B-9

Appendix B Partial VHDL Source Code

by Jorgen Peddersen

when stSignalR => if rdComplete = '0' then nextStateR <= stSignalR; rdRam <= '1'; elsif rdData = x"5b" then nextStateR <= stIdleR; keepAlive <= '1'; latchDestinationData <= '1'; else nextStateR <= stIdleR; stopTimer <= '1'; end if; when others => nextStateR <= stIdleR; end case; end process;

red <= ("00" & luminanceB & x"00") + (("01" & x"24") * chrominanceR) - ("00" & x"7D00"); blue <= ("00" & luminanceB & x"00") + (("10" & x"07") * chrominanceB) - ("00" &x"EE80"); green <= ("00" & luminanceB & x"00") + ("00" & x"9200") - (("00" & x"65") * chrominanceB) - (("00" & x"95") * chrominanceR); -- eliminate overflow caused by the calculations above -- Comment out the colour set that isn't needed -- colour is 15-bit 5:5:5 RGB format colour. colour 8 is 8-bit 3:3:2 RGB format. with red(17 downto 16) select -colour(14 downto 10) <= red(15 downto 11) when "00", -(others => '0') when "11", -(others => '1') when "01", -(others => '0') when others; colour8(7 downto 5) <= red(15 downto 13) when "00", (others => '0') when "11", (others => '1') when "01", (others => '0') when others; with green(17 downto 16) select -colour(9 downto 5) <= green(15 downto 11) when "00", -(others => '0') when "11", -(others => '1') when "01", -(others => '0') when others; colour8(4 downto 2) <= green(15 downto 13) when "00", (others => '0') when "11", (others => '1') when "01", (others => '0') when others; with blue(17 downto 16) select -colour(4 downto 0) <= blue(15 downto 11) when "00", -(others => '0') when "11", -(others => '1') when "01", -(others => '0') when others; colour8(1 downto 0) <= blue(15 downto 14) when "00", (others => '0') when "11", (others => '1') when "01", (others => '0') when others;

end saa7113_arch;

B-10

Appendix C Partial PC Source Code

by Jorgen Peddersen

Appendix C Partial PC Source Code


Included here is the main file of the PC source code that performs all the network communication, controls the GUI and updates the images. 18014425Hz video. ThesisAsyncDlg.cpp:
// ThesisAsyncDlg.cpp : implementation file // #include "stdafx.h" #include "ThesisAsync.h" #include "ThesisAsyncDlg.h" #include "ptc.h" #ifdef _DEBUG #define new DEBUG_NEW #undef THIS_FILE static char THIS_FILE[] = __FILE__; #endif #define width 180 #define length 144 Console console; Format format(32,0x00FF0000,0x0000FF00,0x000000FF); Surface surface(width,length,format); WSAData wsaData; SOCKET sd; sockaddr_in sinBoard; unsigned char acReadBuffer[30000]; int status = 0; static int gnWSNotifyMsg = RegisterWindowMessage(__FILE__ ":wsnotify"); ///////////////////////////////////////////////////////////////////////////// // CAboutDlg dialog used for App About class CAboutDlg : public CDialog { public: CAboutDlg(); // Dialog Data //{{AFX_DATA(CAboutDlg) enum { IDD = IDD_ABOUTBOX }; //}}AFX_DATA // ClassWizard generated virtual function overrides //{{AFX_VIRTUAL(CAboutDlg) protected: virtual void DoDataExchange(CDataExchange* pDX); // DDX/DDV support //}}AFX_VIRTUAL // Implementation protected: //{{AFX_MSG(CAboutDlg) //}}AFX_MSG DECLARE_MESSAGE_MAP() }; CAboutDlg::CAboutDlg() : CDialog(CAboutDlg::IDD)

This version performs

C-1

Appendix C Partial PC Source Code

by Jorgen Peddersen

{ //{{AFX_DATA_INIT(CAboutDlg) //}}AFX_DATA_INIT } void CAboutDlg::DoDataExchange(CDataExchange* pDX) { CDialog::DoDataExchange(pDX); //{{AFX_DATA_MAP(CAboutDlg) //}}AFX_DATA_MAP } BEGIN_MESSAGE_MAP(CAboutDlg, CDialog) //{{AFX_MSG_MAP(CAboutDlg) // No message handlers //}}AFX_MSG_MAP END_MESSAGE_MAP() ///////////////////////////////////////////////////////////////////////////// // CThesisAsyncDlg dialog CThesisAsyncDlg::CThesisAsyncDlg(CWnd* pParent /*=NULL*/) : CDialog(CThesisAsyncDlg::IDD, pParent) { //{{AFX_DATA_INIT(CThesisAsyncDlg) // NOTE: the ClassWizard will add member initialization here //}}AFX_DATA_INIT // Note that LoadIcon does not require a subsequent DestroyIcon in Win32 m_hIcon = AfxGetApp()->LoadIcon(IDR_MAINFRAME); } void CThesisAsyncDlg::DoDataExchange(CDataExchange* pDX) { CDialog::DoDataExchange(pDX); //{{AFX_DATA_MAP(CThesisAsyncDlg) DDX_Control(pDX, IDC_IPADD, m_IpAddress); //}}AFX_DATA_MAP } BEGIN_MESSAGE_MAP(CThesisAsyncDlg, CDialog) //{{AFX_MSG_MAP(CThesisAsyncDlg) ON_WM_SYSCOMMAND() ON_WM_PAINT() ON_WM_QUERYDRAGICON() ON_BN_CLICKED(IDC_STARTBUTTON, OnStartbutton) ON_BN_CLICKED(IDC_STOPBUTTON, OnStopbutton) //}}AFX_MSG_MAP ON_REGISTERED_MESSAGE(gnWSNotifyMsg, OnWinsockNotify) END_MESSAGE_MAP() ///////////////////////////////////////////////////////////////////////////// // CThesisAsyncDlg message handlers BOOL CThesisAsyncDlg::OnInitDialog() { CDialog::OnInitDialog(); // Add "About..." menu item to system menu. // IDM_ABOUTBOX must be in the system command range. ASSERT((IDM_ABOUTBOX & 0xFFF0) == IDM_ABOUTBOX); ASSERT(IDM_ABOUTBOX < 0xF000); CMenu* pSysMenu = GetSystemMenu(FALSE); if (pSysMenu != NULL) { CString strAboutMenu; strAboutMenu.LoadString(IDS_ABOUTBOX); if (!strAboutMenu.IsEmpty()) { pSysMenu->AppendMenu(MF_SEPARATOR); pSysMenu->AppendMenu(MF_STRING, IDM_ABOUTBOX, strAboutMenu); } }

C-2

Appendix C Partial PC Source Code

by Jorgen Peddersen

// Set the icon for this dialog. The framework does this automatically // when the application's main window is not a dialog SetIcon(m_hIcon, TRUE); // Set big icon SetIcon(m_hIcon, FALSE); // Set small icon // TODO: Add extra initialization here console.option("windowed output"); console.open("FPGA Streaming",format); if ((WSAStartup(MAKEWORD(1, 1), &wsaData)) != 0) { MessageBox("WSAStartup error"); } sd = socket(AF_INET, SOCK_DGRAM, 0); if (sd != INVALID_SOCKET) { sockaddr_in sinInterface; sinInterface.sin_family = AF_INET; sinInterface.sin_addr.s_addr = htonl(INADDR_ANY); sinInterface.sin_port = htons(0); bind(sd, (sockaddr*)&sinInterface, sizeof(sockaddr_in)) ; } WSAAsyncSelect(sd, m_hWnd, gnWSNotifyMsg, FD_READ); sinBoard.sin_family = AF_INET; sinBoard.sin_port = htons(20380); m_IpAddress.SetAddress(130,102,75,192); unsigned int rcv_size = 30000; setsockopt(sd, SOL_SOCKET , SO_RCVBUF, (char *)&rcv_size, sizeof(rcv_size));

return TRUE; }

// return TRUE

unless you set the focus to a control

void CThesisAsyncDlg::OnSysCommand(UINT nID, LPARAM lParam) { if ((nID & 0xFFF0) == IDM_ABOUTBOX) { CAboutDlg dlgAbout; dlgAbout.DoModal(); } else { CDialog::OnSysCommand(nID, lParam); } } // If you add a minimize button to your dialog, you will need the code below // to draw the icon. For MFC applications using the document/view model, // this is automatically done for you by the framework. void CThesisAsyncDlg::OnPaint() { if (IsIconic()) { CPaintDC dc(this); // device context for painting SendMessage(WM_ICONERASEBKGND, (WPARAM) dc.GetSafeHdc(), 0); // Center icon in client rectangle int cxIcon = GetSystemMetrics(SM_CXICON); int cyIcon = GetSystemMetrics(SM_CYICON); CRect rect; GetClientRect(&rect); int x = (rect.Width() - cxIcon + 1) / 2; int y = (rect.Height() - cyIcon + 1) / 2; // Draw the icon dc.DrawIcon(x, y, m_hIcon); } else { CDialog::OnPaint(); } } // The system calls this to obtain the cursor to display while the user drags // the minimized window.

C-3

Appendix C Partial PC Source Code

by Jorgen Peddersen

HCURSOR CThesisAsyncDlg::OnQueryDragIcon() { return (HCURSOR) m_hIcon; } void CThesisAsyncDlg::OnStartbutton() { unsigned long address; if (m_IpAddress.GetAddress(address) != 4) MessageBox("Invalid IP Address"); else { sinBoard.sin_addr.s_addr = htonl(address); status = 1; char msg = 0x5b; connect(sd, (struct sockaddr *) &sinBoard, sizeof(sockaddr_in)); send(sd, &msg, 1, 0); } } void CThesisAsyncDlg::OnStopbutton() { status = 0; char msg = 0x00; unsigned int rcv_size; char string[30]; int rcv_size_len = sizeof(rcv_size); send(sd, &msg, 1, 0); } LRESULT CThesisAsyncDlg::OnWinsockNotify(WPARAM, LPARAM lParam) { int nError = WSAGETASYNCERROR(lParam); if (nError != 0) { switch (nError) { case WSAECONNRESET: MessageBox("Connection was aborted."); closesocket(sd); break; case WSAECONNREFUSED: MessageBox("Connection was refused."); closesocket(sd); break; default: MessageBox("Async failure notification"); } return 0; } int i; int nReadBytes; int32 *pixels; switch (WSAGETSELECTEVENT(lParam)) { case FD_READ: nReadBytes = recv(sd, (char *)acReadBuffer, 29000, 0); if (nReadBytes < 0) { MessageBox("Receive error"); } else if (nReadBytes == 0) { MessageBox("Received no data"); } if (status != 1) return 0; msg = 0x5b; send(sd, &msg, 1, 0); // lock surface pixels pixels = (int32*) surface.lock(); for (i = 0; i const int const int const int < nReadBytes; i++) { x = i % width; y = i / width; colour = (unsigned long)acReadBuffer[i];

C-4

Appendix C Partial PC Source Code

by Jorgen Peddersen

pixels[x+y*width] = ((colour & 0xE0) << 16) + ((colour&0x1C) << 11) + ((colour & 0x03) << 6); } // unlock surface surface.unlock(); // copy to console surface.copy(console); // update console console.update(); break; default: MessageBox("WSEV: Unknown event recieved: "); } return 0; }

C-5

Вам также может понравиться