Вы находитесь на странице: 1из 13

A Combined Hardware / Software ETHERNET Protocol Implementation for

Embedded DSP Based Environments


by
1. Pavel Karmazin (*)
2. Aggelos Liveris(**)
3. Yannis Papadimitriou (**)
4. Nikos Moshopoulos(**),
5. George Stassinopoulos(**)

(*)Infineon Technologies AG,


P.O. Box 800949, 81609 Munich, Germany
(**)National Technical University of Athens (NTUA)
Heroon Polytechnioy 9, Zografoy Campus, Athens, GREECE
Tel. ++ 30 1 772 2425

P. KARMAZIN - A. LIVERIS Y. PAPADIMITRIOU N. MOSHOPOULOS G. STASSINOPOULOS

Abstract
This paper proposes a system architecture and a set of procedures for a combined H/W S/W
implementation of the ETHERNET protocol in a DSP based environment. The architecture comprises a
protocol dedicated H/W block, a programmable DSP core and a uP. The ETHERNET MAC layer protocol
functions are partitioned between the DSP S/W and the dedicated H/W block. This approach allows the
minimization of dedicated H/W functions, thus ensuring maximum flexibility and economizing valuable die
resources for possible IC implementation while offering the capability for functional modifications per
application by amending DSP S/W in the context of a given system architecture. The paper starts with a
description of the system architecture. The generic functional blocks and the implementation of protocol
functions in each one are described and the relative information flows are elaborated. Next, the
implementation of the ETHERNET level 2 protocol functions is discussed and the criteria for function
partitioning are presented. The paper concludes with a discussion of the implementation parameters and
performance.

A H/W S/W ETHERNET PROTOCOL IMPLEMENTATION

System structure

The system functional blocks are depicted in Figure 1:

TXD

DSP
and

Micro
Controller

TEN
COL

Ethernet-dedicated
H/W block (DHB)

Upper Layers

TCLK

Data
Memory
Bank

Transceiver

CD
RXD
RCLK

Code
RAM

MAC Layer

Layer 1

Figure 1: Functional blocks of the architecture


The system comprises a host micro controller, a DSP block, a dedicated H/W block (DHB) and a
commercial IEEE 802.3 transceiver IC.
The host micro controller implements the functions of the LLC layer of the IEEE 802.3 protocol. More
specifically, it has access over a dedicated Data Memory Bank (DMB) which is utilized for storing of
frames in the receive or transmit directions. Frames to be transmitted are written in the DMB by the micro
controller. From there, they are transferred to the DSP at the appropriate moment for transmission.
Received frames are transferred from the DSP the DMB. From there, the micro controller reads them at an
appropriate moment for further processing.
The host micro controller has direct access over a big part of the DSP and DHB resources and registers.
This is accomplished with memory mapping of these resources on the memory area of the micro controller.
Furthermore, interrupt lines in both directions exist between the micro controller and the DSP. Data stored
in the DMB can be transferred to dedicated DSP registers (mailboxes) with direct micro controller
instructions. The DMB can also be accessed by a DMA Controller, which has access on the DSP mailbox
resources. In this way, data transfer between the DSP and the data memory is possible without micro
controller intervention.
The external transceiver IC implements all the functions of the PLS/PMA layers of the IEEE 802.3
protocol. More specifically, it implements Manchester encoding / decoding of frames, signal filtering and
clock recovery in the receive direction, signal conditioning and line driving in the transmit direction,
medium state monitoring and collision detection. As shown in the figure above, interfacing of the DHB
with the transceiver is carried out via the following pins:

The Transmit Data (TXD) which is an output from the DHB.


The Transmit Clock (TCLK) which is an input to the DHB.
The Transmitter Enable (TEN) which is an output from the DHB.
The Collision Detect (COL) which is an input to the DHB.
The Carrier Detect (CD) which is an input to the DHB.
The Receive Data (RXD) which is an input to the DHB.
The Receive Clock (RCLK) which is an input to the DHB.

P. KARMAZIN - A. LIVERIS Y. PAPADIMITRIOU N. MOSHOPOULOS G. STASSINOPOULOS

Implementation of the ETHERNET MAC protocol functions

In the context described above, the DSP together with the DHB implement the functions of the MAC layer
of the IEEE 802.3 protocol. Part of the MAC functions are implemented by S/W in the DSP and another
part by the DHB.

2.1

Implementation of MAC functions in the DHB

The DHB contains two functional blocks: the receiver and the transmitter.
2.1.1
Receiver H/W block
The Receiver H/W block contains one 64-byte receive memory bank which is DSP memory mapped. It also
contains a Deserializer unit. When a frame is received from the medium, the transceiver activates the CD
signal and drives the bits of the received frame on the RXD pin one by one and synchronously with the
RCLK pin. The Deserializer unit accepts these bits, assembles them in bytes and stores them in successive
byte positions of the receive memory bank. The 64-byte memory bank is managed by the Deserializer as
two 32-byte switchable FIFOs. The Deserializer writes 32 successive bytes in one FIFO, then 32
successive bytes to the other and so on. This allows DSP read access to one half of the receive memory
while the other half is written by the Deserializer. When one of the two FIFOs has been filled by the
Deserializer, an interrupt indicating this event is issued to the DSP in order that it transfers the 32 received
frame bytes to the external data memory (DMB). CRC Calculation is carried out on the fly over the bits of
the received frame. The calculated CRC value is compared to the CRC value carried in the frame at the end
of reception and a DSP accessible pass/fail flag is updated for each new frame.
The Receiver H/W block reports protocol and system events to the DSP by activating interrupt signals
towards the Interrupt Control module which in turn activates an interrupt signal towards the DSP.
All interfacing functions between the DSP and the Receiver H/W block are carried out via the DSP internal
data and address buses. In order to implement the matching of the module with the DSP data and address
buses, an Address Decoder and a Multiplexer are included in the block. The Address Decoder is decoding
the Address bus in order to generate the FIFO read signals and the control and status register signals. The
Multiplexer is controlled by the Control Module and switches the data flow between the two FIFOs.
Interfacing with the external Ethernet transceiver is carried out via the following external pins:
RCLK Serial clock (10 MHz)
RXD Serial data
CD Carrier Detect
The hardware modules of the Receiver H/W block and the basic interconnection signals are shown in
Figure 2:

A H/W S/W ETHERNET PROTOCOL IMPLEMENTATION

DSP Side

Address_Bus

Ethernet Side

Address
Decoder
Status_reg

DSP_clk

Control

Data_Bus

Frame_length

FIFO1

RCLK

MUX
FIFO1_Full

DeMUX

Deserializer

RXD
FIFO2

CRC_en
FIFO2_Full

Interrupt

Interrupt
Controller

CRC_flag

CRC
Comparator

CRC

CD

Figure 2: Receiver block diagram


2.1.2
Transmitter H/W block
The Transmitter H/W block contains one 64-byte transmit memory bank which is DSP memory mapped. It
also contains a Serializer unit which reads successive bytes from the transmit memory bank, serializes them
and outputs the bits one by one on the TXD pin to the external transceiver synchronously with the TCLK
pin while activating the TEN pin for the duration of the whole procedure. The 64-byte memory bank is
handled by the Serializer as two 32-byte switchable FIFOs. The Serializer reads 32 successive bytes from
one FIFO, then 32 successive bytes from the other and so on. This allows DSP write access to one half of
the transmit memory while the other half is read by the Serializer. When one half of the FIFO has been read
by the Serializer and its contents have been transmitted to the external transceiver, a FIFO empty interrupt
indicating this event is issued to the DSP in order that it loads the next 32 bytes of the frame in the empty
FIFO so that transmission of the frame continues. The frame CRC field is calculated on the fly over the
transmitted bits, appended to the end of each transmitted frame and transmitted to the medium. In case of
collisions, the transmit FIFOs are overridden and the appropriate jam sequence is instantly transmitted on
the medium.
The Transmitter H/W block reports protocol and system events to the DSP by activating interrupt signals
towards the Interrupt Control module which in turn activates an interrupt signal towards the DSP.
All interfacing functions between the DSP and the Transmitter H/W block are carried out via the DSP
internal data and address buses. In order to implement the matching of the module with the DSP data and
address buses an Address Decoder and a Multiplexer are included in the block. The Address Decoder is
decoding the Address bus in order to transmit the FIFO write signals and the control and status register
signals. The Multiplexer is controlled by the Control module and switches the data flow between the two
FIFOs.
Interfacing with the external Ethernet transceiver is carried out via the following external pins:
TCLK Serial data clock (10 MHz)
TXD Serial data

P. KARMAZIN - A. LIVERIS Y. PAPADIMITRIOU N. MOSHOPOULOS G. STASSINOPOULOS

TEN Serial TX data enable


COL Collision signal

The hardware modules of the Transmitter H/W block and the basic interconnection signals are shown in
Figure 3:
Ethernet Side

DSP Side

COL

Address_Bus

DSP_clk

Data_Bus

Address
Decoder

Control

End_of_trans

TEN

JAM_oe
Status_reg

JAM

FIFO1

TCLK

FIFO1_Empty

MUX

Serializer

TXD
FIFO2

CRC_oe

CRC_en

FIFO2_Empty

Interrupt

Interrupt
Controller

CRC

Figure 3: Transmitter block diagram


2.2
Implementation of MAC functions in DSP S/W
A number of basic functions of the Ethernet MAC layer are implemented in DSP S/W.
2.2.1
DSP software structure
The DSP software is constructed as an interrupt driven state machine . There are eight different states as
included in the following table:
State Number
0
1
2
3
4
5
6
7

State name
Startup_Wait_For_Carrier_Off
Ready_To_Receive_Not_Transmit
Ready_To_Receive_And_Transmit
Receiving
Pending_6usec_After_Reception
Pending_3.6usec_After_Reception
Pending_9.6usec_After_Reception
Transmitting

Table 1: States of the interrupt driven state machine


All external events that cause state transitions are reported to the S/W state machine via interrupt signals.
Consequently, the S/W is built as a number of independent ISRs, one ISR for each type of interrupt that
may be issued towards the DSP (and consequently, each type of external event that may occur). ISRs

A H/W S/W ETHERNET PROTOCOL IMPLEMENTATION

exchange necessary information through semaphores or dedicated memory positions. Basic state machine
information, such as current state, is stored in common variables accessible to all ISRs.
The implementation of the MAC functionality is achieved by constructing each ISR as a jump tree that runs
specific C functions which implement specific functions of the MAC protocol. The C functions that will be
executed are determined by the signal which has been received and the current state.
2.2.2
Interfacing with other system components
The interfaces between the DSP S/W and the other system components together with the global status
variables are shown in Figure 4:
Interrupt Register
to uP

Control
Interface
to
Host

EORF(number,bool )
EOTF
LCol
ExCol
Collision
Setup_DMAC(oper)

Interrupt Register
from uP
TransmitRequest
Startup/Reset

Commands to Timers
Timers
- General
- Backoff

Start_TimerGeneral ( )
Start_TimerBackoff ( )
Stop/Reset_TimerGeneral

DSP

DSP Flags
to H/W

S/W Variables

StartRCV
StopRCV
StartTRM
ResetRx_Machine
ResetTx_Machine

State
NewRCVFrame
FirstRFIFOfull
SecondRFIFOfull
Last_RCVPage_Read
Pending_Last_Transfer
_To_Buffer
Backoff_Clear
Pending_Transmission
SecondTFIFOempty
PartOfTxFrame
Preparing_Transmission
Pending_Transmit
_Request

H/W Interrupt
Register
CarrierSense
NoCarrierSense
RFIFOfull
TFIFOempty
Collision
EOCP

Interface
to H/W

H/W Flags
EOTF
RxCRC
Carrier

Commands for Transfers


& DMA Mailbox
Read_Word_From_Mailbox
Send_Word_To_Mailbox ( )
Read_Word_From_RFIFO
Send_Word_To_TFIFO ( )
Reset_RxMailbox
Reset_TxMailbox

Interrupt Register
From DMAC
Transfer_From_Buffer_
Completed
Transfer_To_Buffer_
Completed

Interrupt Register
To DMAC
Transfer_From_Buffer (num)
Transfer_To_Buffer (num)

DMAC
Interface

Figure 4: S/W interfacing with system resources


The S/W state machine exchanges information with four independent agents. These agents are:
1. The Dedicated Hardware Block
The DSP receives interrupts reporting protocol events from the DHB and may access status information
and issue commands to the DHB by reading / writing dedicated memory mapped registers.
2. The external uP
The DSP and the external uP may exchange words via dedicated registers (Mailboxes). The transfer of
words via the Mailboxes causes an interrupt to the transfer target. In order that specific events are reported
from the DSP to the uP, a proprietary code for the words exchanged via the Mailboxes has been
constructed.
3. The DMA Controller
The DSP writes to a dedicated register the number of words to be transferred from the external Tx Buffer to
the DMA Tx Mailbox, and to another dedicated register the number of words to be transferred from the
DMA Rx Mailbox to the external Rx Buffer. As soon as this write operation ends, a DREQT / DREQR
signal is activated. The DMA Controller informs that the requested transfer has completed by issuing an
interrupt towards the DSP.

P. KARMAZIN - A. LIVERIS Y. PAPADIMITRIOU N. MOSHOPOULOS G. STASSINOPOULOS

4. The Timers
The DSP may load and start two independent, programmable down counters. At the end of the count, a
dedicated interrupt for each one of the counters is issued towards the DSP. These interrupts are used for the
counting of time intervals.
2.2.3
Functionality and procedures
In the following paragraphs, the functionality of the DSP S/W is presented in detail. As is described in the
Ethernet protocol specifications, Transmission and Reception are two independent functions. It is not
possible to have both transmission and reception, at the same time, in the same station. That is why, both
hardware and software are designed in such a way, that transmission and reception constitute two
independent functions. Following this principle, we are going to describe the Reception functionality
separately from the Transmission functionality.
2.2.3.1 Reception
The code implementing the Receive Direction will be presented through six different scenarios, which
demonstrate all the possible states and situations that may occur.
Normal Frame Reception
The reception of a new frame begins when the Carrier changes from 0 to 1, while in
Ready_To_Receive state. As a result of this Carrier transition, the DSP commands the H/W to start
receiving (Command_StartRCV), initializes the global variables that refer to the Receive Direction and
changes the state to Receiving.
When the first Rx FIFO full interrupt arrives, the DSP reads the words, that correspond to the Destination
Address Field of the incoming frame, and checks if there is an address matching between that address and
the current stations address. If yes, the DSP reads the word, which corresponds to the Length (Data) Field
of the Rx frame, saves the length in a variable and checks if the length is acceptable (data bytes should not
be more than 1500). If there is no problem, the DSP transfers the words (except the 7 preamble bytes and
the SFD byte) from the Rx FIFO to the DMA Rx Mailbox, stores the number of words remaining until the
end of the current Rx frame in the RxWordsRemaining variable and sends a DREQR interrupt to the
DMA Controller, in order to transfer the words from the DMA Rx Mailbox to the external Rx Buffer.
When the requested transfer is completed, the DMA Controller informs the DSP about this
(Transfer_To_Buffer_Completed Interrupt), but the DSP ignores this interrupt.
After a while, the H/W will send another Rx FIFO full interrupt. Nevertheless, this interrupt will arrive
later than the Transfer_To_Buffer_Completed interrupt (referring to the first part of the Rx frame),
because the DMA transfer is much quicker than the rate that the Rx FIFOs are filled. The DSP reads the 16
words of the full Rx FIFO, transfers them to the DMA Rx Mailbox, updates the RxWordsRemaining
variable and sends a DREQR interrupt to the DMA Controller. When the requested transfer is completed,
the DMA Controller informs the DSP about this (Transfer_To_Buffer_Completed Interrupt), but the DSP
ignores this interrupt again.
This cycle between the interrupts Rx FIFO full, DREQR and Transfer_To_Buffer_Completed
continues until the Carrier drops (changes from 1 to 0). This Carrier transition is translated as the end of the
Rx frame. The DSP commands the H/W to stop receiving (Command_StopRCV), transfers the remaining
(according to the RxWordsRemaining variable) words of the current Rx frame to the DMA Rx
Mailbox, starts the timer of 6.0 usec and changes the state to Pending_6usec_After_Reception.
When the last Transfer_To_Buffer_Completed interrupt arrives, the whole Rx frame is now stored in the
external Rx Buffer. The DSP reads then the RxCRC_Flag in order to check if the Rx frame was CRC
correct or wrong. Then, it resets the H/W Rx Machine (Command_ResetRx_Machine) and informs the
uP that a CRC correct / wrong Rx frame of Total Rx frames Length bytes is stored in the
external buffer.

A H/W S/W ETHERNET PROTOCOL IMPLEMENTATION

Frame fits exactly the last Rx FIFO


In this case, it is likely that we read the last 16 words twice (via the Rx FIFO full and via the Carrier
transition from 1 to 0 interrupt handlers). In order to avoid it, we use the Last_RCVPage_Read
variable, which forbids the Rx FIFO full interrupt handler to read the last 16 words of the Rx frame when
they have been already read via the Carrier transition from 1 to 0 interrupt handler and vice versa. In case
that the Rx FIFO full interrupt arrives first, the Carrier transition from 1 to 0 interrupt is completely
ignored. All the other procedures are executed in exactly the same way as described in the paragraph above.
Frame Length Field > 1500
As mentioned before, the data field of an Ethernet frame can not be longer than 1500 bytes. This means that
if the number stored in the Length (Data) Field of an Rx frame is bigger than 1500, then an error has
occurred. The DSP commands the H/W to stop receiving (Command_StopRCV) as soon as this error is
detected. When the Carrier drops, the Carrier transition from 1 to 0 interrupt handler realizes that no
bytes are sent to the external Rx buffer and the whole Rx frame is discarded.
No Address Matching
In case that the Destination Address Filed of the incoming frame does not match with the stations address,
the frame must be discarded. The DSP commands the H/W to stop receiving (Command_StopRCV).
When the Carrier drops, the Carrier transition from 1 to 0 interrupt handler realizes that no bytes are sent
to the external Rx buffer and the whole Rx frame is discarded.
Actual Rx Frame Length < Frame Length Field
It is an abnormal situation, so the uP must be informed to ignore the Rx frame. As the reception of the
frame continues, the Carrier drops, although more bytes are expected, according to the Length (Data) Field
of the frame. The dedicated H/W ignores this situation, so the RxCRC_Flag remains at its reset value,
showing Wrong. From the other part, the S/W will detect this abnormal situation only if more than 16
words are expected when the Carrier drops. In this case, the DSP commands the H/W to stop receiving
(Command_StopRCV), resets the H/W Rx Machine (Command_ResetRx_Machine), informs the uP
that all the bytes transferred to the external Rx buffer during the reception of the current Rx frame must be
discarded (EORF interrupt), starts the timer of the 6.0 usec and changes the state to
Pending_6usec_After_Reception. In case that the RxWordsRemaining value is less than
16, then the DSP will transfer all the words that expects to the DMA Rx Mailbox and when the last
Transfer_To_Buffer_Completed interrupt arrives, the RxCRC_Flag will be in the Wrong status. So, an
EORF interrupt to the uP will inform it that the current Rx frame must be discarded.
Actual Rx Frame Length > Frame Length Field
In this case, Rx bytes continue to come, even though the Rx frame should have ended, according to the
Length (Data) Field. This does not mean that the frame is surely wrong. What we have to do is to discard
these dribbling bytes. So, the Rx FIFO full and the Carrier transition from 1 to 0 interrupt handlers are
designed in such a way that any dribbling bytes are discarded and not sent to the Rx Mailbox. Dribbling
bits (0 7 bits) are discarded by the receiver H/W Block.
2.2.3.2 Transmission
The system functionality in the Transmit Direction is different depending on whether a collision occurs or
not.
Normal Frame Transmission
The Transmit Procedure starts when a Transmit_Request Interrupt arrives from the uP. We assume that
when this happens, a whole frame (except the preamble and the SFD bytes) is stored in the external Tx
Buffer. In case that the interrupt comes while receiving, the request is ignored for a while, in order not to
have a simultaneous request for use of the DMA for both directions. In any other state, the DSP initializes
the global variables that correspond to transmission and sends a DREQT message, in order to have the

P. KARMAZIN - A. LIVERIS Y. PAPADIMITRIOU N. MOSHOPOULOS G. STASSINOPOULOS

first bytes of the Tx frame transferred from the external Tx Buffer to the DMA Tx Mailbox. Soon, a
Transfer_From_Buffer_Completed interrupt comes, acknowledging that the requested transfer has
completed. Then, the DSP sends the preamble and the SFD bytes to the Tx FIFO, reads the Length Field of
the Tx Frame and stores it in a variable (TRM_Length), transfers the words from the DMA TX Mailbox to
the Tx FIFO and requests the next transfer from the external Tx Buffer.
When the next Transfer_From_Buffer_Completed interrupt comes, the DSP moves the data from the
DMA Tx Mailbox to the other Tx FIFO, stores the number of remaining data + pad words in the
TRM_DataPadWords_Remaining variable and sends a DREQT interrupt to the DMA Controller. If
we are in a Ready_To_Transmit state, the transmission starts by sending a StartTRM Command to
the H/W. In any case, the next Transfer_From_Buffer_Completed interrupt will be ignored.
When a Tx FIFO empty interrupt arrives, the DSP moves the words from the DMA Tx Mailbox to the
emptied Tx FIFO, updates the TRM_DataPadWords_Remaining variable and requests the next
transfer of data from the external Tx Buffer to the Tx Mailbox, by sending a DREQT interrupt. The next
Transfer_From_Buffer_Completed interrupt will be ignored again.
This cycle between the interrupts Tx FIFO empty, DREQT and Transfer_From_Buffer_Completed will
continue until the Carrier changes from 1 to 0. This means that the transmission of the frame has finished.
At this moment, the DSP checks the EOTF_Flag. If the EOTF_Flag shows that the transmission of the
frame ended normally, the DSP sends an EOTF interrupt to the uP, in order to inform it that the current
Tx frame has been normally transmitted. The DSP also resets the H/W Tx Machine
(Command_ResetTx_Machine), starts the 9.6 usec and changes the state to
Pending_9_6usec_After_Transmission. In case that the EOTF_Flag shows that the
transmission did not end properly, then we expect a Collision interrupt to arrive soon. What the DSP does if
a collision occurs is described below.
Collision
When a collision occurs, the H/W sends a Collision Interrupt to the DSP. The Collision Interrupt handler
checks if 64 bytes have already been transmitted. If yes, then we have a Late Collision, which is an
abnormal situation. The uP is informed about this situation via a LateCollision Interrupt. If less than 64
bytes have been transmitted, then we have a collision and the DSP informs the uP by causing a Collision
interrupt. In both cases, the DSP checks the Collision_Counter variable, which carries the number
of collisions that occurred for the current Tx frame. If its value is smaller than 16, then the DSP blocks the
beginning of transmission of the current frame via the Backoff_Clear variable, calculates the Backoff
time, according to the Binary Backoff Algorithm, starts the Backoff_Timer and changes the state to
Ready_To_Receive_Not_Transmit. If more than 15 collisions occur for the current Tx frame,
then we have an abnormal situation of excessive collision. The DSP informs the uP via an
ExcessiveCollisions Interrupt.

2.3

Partitioning criteria

The implementation of any function of the Ethernet IEEE 802.3 protocol can be carried out either in
hardware by using an FPGA chip or in software by programming the DSP core. Hardware implementation
is fast, but it is not reconfigurable and consumes area on the die. Software implementation is flexible and
easily reconfigurable but consumes on chip RAM and ROM capacity. Generally speaking, simple functions
that are repeated many times are preferable for hardware implementation to avoid extensive depletion of
DSP processing capacity; complicated functions, which are not repeated often, are suitable for DSP
software implementation. Consequently, the implementation of any function can be optimized by careful
partitioning between hardware and software. A schematic description of how partitioning optimization can
be achieved is depicted in the following figure, where the hardware and software resources required for
implementation are plotted as a function of partitioning percentage in the same graph. The sum of the
overall resources required provides an overall efficiency function which has to be minimized in order for
optimum implementation.

A H/W S/W ETHERNET PROTOCOL IMPLEMENTATION

Total Resources

Software Resources

Hardware Resources

optimum design

100%
Hardware

100%
Software

Figure 5: Schematic description of Software/Hardware partitioning.


In order to achieve the optimum partitioning of functions between the hardware module and the DSP
software, we firstly constructed a minimum hardware / maximum software MAC implementation with all
possible functions in DSP software. Then, we calculated the DSP cycles and code lines required for this
implementation. Based on these tests, we determined the optimum partitioning by implementing in
hardware the functions that can not be efficiently implemented in software.
In the following paragraphs, the functions that are implemented in hardware and in software are presented.
For each function, the number of execution cycles required and the number of MIPS, if implemented in the
DSP software, are given. The general idea is that these specific functions (mainly data transfer and data
processing functions) are executed a fixed number of times during the transmission or reception of frames.
This fixed number of times is dictated by the data transfer rate on the medium. In this way, a rough
estimation for the overall DSP processing power requirements during a given time interval can be derived
by multiplying the DSP cycles required for the execution of a series of specific functions with the number
of times that these functions must be executed during a given time interval. More specifically, the time
interval that is considered in all cases is 25,6 usec which is required for the transfer of 32 bytes of data over
the medium.
2.3.1
H/W functions
In the receive direction, it has been found that the CRC Calculation Function is very difficult to
accomplish in S/W. Even if the calculation is maximally optimized, it takes up a large part of the DSP
processing power. More precisely, approximately 4000 DSP cycles (i.e. 156 MIPS) are needed for the
transfer of 32 bytes of a frame from the Rx FIFO to the mailbox, including CRC calculation, while no more
than 270 cycles (i.e. 10,5 MIPS) are required for the same function if the CRC is calculated in H/W. That is
why, the CRC calculation function is implemented in H/W.

P. KARMAZIN - A. LIVERIS Y. PAPADIMITRIOU N. MOSHOPOULOS G. STASSINOPOULOS

For the same reasons, the CRC Calculation Function in the transmit direction is implemented in H/W. It
has been found that approximately 4200 DSP cycles (i.e. 164 MIPS) are needed for the transfer of 32 bytes
of a frame from the mailbox to the Tx FIFO, including CRC calculation, while no more than 200 cycles
(i.e. 7,8 MIPS) are required for the same function if the CRC is calculated in H/W. In addition to this, the
H/W module also sends the JAM signal , in case that a collision occurs, as it is time critical for the protocol
to transmit the JAM pattern as soon as the collision is detected.
2.3.2
S/W functions
In the receive direction, the DSP S/W implements the rest of the MAC functions of the Ethernet protocol.
First of all, it has been found that about only 35 cycles (i.e. 1.3 MIPS) are required for the Address
Matching Module, which identifies whether the incoming frame is targeted for the specific station or not.
Apart from this, the DSP S/W detects and discards short and runt packets.
In the transmit direction, the DSP S/W controls the flow of data from the uP to the Tx FIFO. It requests
data bytes from the uP and writes them to the Tx FIFO. It also starts the H/W Transmitter.

Implementation parameters

The implementation is realised by designing two functionally independent blocks, namely the transmitter
and the receiver, and combining them into an FPGA chip. According to the LCA architecture of FPGAs,
the area consumed in such a design is measured in Configurable Logic Blocks (CLBs). A CLB contains
some memory cells, some flip-flops and some combinational logic, depending on the FPGA Vendor. The
architecture described in former paragraphs has been implemented in a XILINX XC4044XLC-3 FPGA,
consuming approximately 1300 CLBs.

3.1

H/W parameters

The amount of CLBs that are necessary for the implementation of the basic internal elements of the
transmitter and the receiver block is shown at the following table:
Packed CLBs
Element Name
Address Decoder
Control
FIFO
CRC
CRC Comparator
Deserializer
Serializer
Bus Multiplexer
JAM
Interrupt Controller
TOTAL

Receiver
92
22
(2 x 154) 308
14
19
195

Transmitter
92
16
(2 x 154) 308
14

127
16
22
82
SUM
650 SUM
595
1327 (including Interrupt Controller)

Table 2: CLB consummation of internal elements


The Interrupt Controller is a common element to both the receiver and the transmitter. The CLBs of this
element are added to the final result.
The implementation leads to significantly better results by applying resource sharing to both functional
blocks. With a 15% area overhead, we reconstruct some of the most area consuming elements, so that they
can operate in both directions. By applying this optimization, we use only two FIFOs with a small
combinational control circuit, reducing dramatically the area consummation. The CRC and the Address

A H/W S/W ETHERNET PROTOCOL IMPLEMENTATION

Decoding element are common to both modes. Thus, they are redesigned adding some CLBs and
interconnection nets to the chip.
The post optimization area consummation is shown at the following table:
Packed CLBs
Element Name
Address Decoder
Control
FIFO
CRC
CRC Comparator
Deserializer
Serializer
Bus Multiplexer
JAM
Interrupt Controller
TOTAL

Receiver

Transmitter
106
22
354
15
19
195

16

127
16
22
82
974

Table 3: Post optimization CLB consummation


A 26.6% area optimization has been achieved by sharing resources throughout the design. However, this
optimization affects the timing parameters of the system because of the increase of the combinational
circuit, but not in a severe way.

3.2

DSP S/W parameters

As mentioned before, a considerable portion of the MAC functionality is implemented by the DSP S/W.
The program code written for the support of the Ethernet functionality regarding the architecture described
above occupies 1.6 KB of the DSP internal memory. In addition to this, it has been found that
approximately 10 MIPS are required in the transmit direction of the DSP S/W while approximately 12
MIPS are required in the receive direction. The difference in DSP performance between the two directions
is due to the extra processing performed by the DSP during reception (address matching, frame-length error
recognition).

3.3

Parameter comparison with a full H/W implementation

The architecture described in former paragraphs provides full flexibility in terms of the partitioning of the
Ethernet functionality. An overall approach of the implementation in software in the DSP would require
more memory for the program code. Thus, more die area would be necessary, given that memory occupies
the greatest percentage of the chip. The hardware functions can be realized in a low-cost common FPGA
part without affecting the performance of the system. On the other hand, a complete implementation of the
Ethernet functionality in hardware requires a greater FPGA part in terms of area and speed, thus increasing
the cost.