Вы находитесь на странице: 1из 16

CAMs for COMMs

Content Addressable Memories for Data Communication Applications

Introduction

The primary purpose of data communication switch is to route packets to their appropriate destinations. This involves searching through routing tables. A routing table is a table constructed of addresses and associated information entries of addressing data required for the routing of the packets to their destined ports. An entry in such a routing table, corresponding to a certain address, provides the switch with some associated information for a decision of how to route the packets.

Searching through the routing table should ideally be accomplished within the timeframe it takes to read the packet off the link, or, if cut-through switching is exercised, where the head of the packet is routed out before the tail arrives, the time it takes to read the address fields of the header off the incoming link. As bandwidth and switching speeds increase, the time allocated for implementation of the lookup procedure is reduced to the point where a software or Random Access Memory (RAM)- based approach is not fast enough.

Taking advantage of the inherent parallelism of Content Addressable Memory (CAM) is evident since it offers low latency over a wide variety of address structures. The two most common search-intensive tasks that use CAMs are packet forwarding and packet classification in Internet routers. Other applications include processors' cache memory, Translation Look-aside Buffers (TLB), lossless data compression applications, database accelerators, and neural networks.

What is a CAM?

Most memory devices store and retrieve data by addressing specific memory locations. Finding specific data patterns within a standard RAM often becomes the bottleneck for systems that rely on fast memory access due to the fact that it requires several accessing cycles. The time required to find specific data stored in memory could be reduced considerably if the stored data can be identified for access by the content of the data itself rather than by its address. Memory that is accessed in this way is called CAM.

Figure 1: RAM Vs. CAM CAM provides significant performance advantages compared with other memory search

Figure 1: RAM Vs. CAM

CAM provides significant performance advantages compared with other memory search algorithms such as binary and tree-based searches or look-aside tag buffers, by comparing the desired content against all pre-stored entries simultaneously, frequently resulting in an order-of-magnitude reduction of search time. Thus, CAM is hardware, associative search engine, much faster than algorithmic approaches for search-intensive applications.

CAMs are composed of conventional semiconductor memory, usually Static RAM (SRAM), with added comparison circuitry that enables a search operation to complete in a single clock cycle.

CAM is ideally suited for several applications, including Ethernet address lookup, data compression, pattern-recognition, cache tags, high-bandwidth address filtering, and fast lookup of routing, high-bandwidth address filtering, user privilege, security, or encryption information on a packet-by-packet basis for high-performance data switches, firewalls, bridges, and routers.

CAM Basics

Basic Building Block — The RAM

Since CAMs are derivative of the RAM technology, explaining the CAM technology is based on the RAM.

Note:
Note:

Although possible, implementation of the CAM, based on Dynamic RAM (DRAM),

is not popular mainly due to the refreshing required for DRAMs, which reduces the device's throughput performances. Therefore, the following explanation refers to SRAM.

RAM is an integrated circuit that stores data temporarily in a matrix fashion. Data is stored in RAM at a particular location, which is called an address – the combination of its column and row address (interaction).

Figure 2: SRAM Architecture In RAM, the user presents the address on the address lines

Figure 2: SRAM Architecture

In RAM, the user presents the address on the address lines (bus) and the memory outputs the data stored in that address.

The number of address lines dictates the depth of the memory, but the width of the memory can theoretically be extended as far as desired.

A single-bit SRAM cell is presented in Figure 3

extended as far as desired. A single-bit SRAM cell is presented in Figure 3 Figure 3:

Figure 3: Single-Bit SRAM Cell

Transistors T1 and T2 forms a flip-flop circuit which is the basic storage device. Transistors T3 and T4 forms constant current sources which are the flip-flop's transistors loads.

To access the cell for read or write operation, two circuits are added:

cell for read or writ e operation, two circuits are added: Figure 4: Single-Bit SRAM Cell

Figure 4: Single-Bit SRAM Cell with Addressing

Once a valid address is applied, transistors T5 and T6 transfers the cell's data to (in case of read) or from (in case of write) sense amplifiers.

The sense amplifiers translate the signal from dual-ended (differential) to single-ended, or vise-versa. These amplifiers are common to all columns of the memory array.

The circuit presented in above Figure 4 forms the basic SRAM Single cell circuit.

Memory Organization

Combining several single bit circuits (as presented in Figure 4) forms a memory word. Combining several memory words forms the memory array. The organization of the memory component is, in fact, the combination of the circuits in the above Figure 2 and Figure 4, and presented below:

Figure 5: Memory Organization Using SRAM Cell to Build a CAM Cell To use the

Figure 5: Memory Organization

Using SRAM Cell to Build a CAM Cell

To use the above described SRAM cell as CAM cell, three transistors need to be added, as presented in Figure 6 below.

These additional transistors compare the outputs of the SRAM cell (the stored data bit) to a coparand (search key) provided via the data bus through the Write sense amplifiers (thus turning the bit lines into search lines).

All the cells' output transistors (like: T9) are wired-NOR together so that when all bits of the search key equals the content of the memory word — the Match signal line is

pulled-down to logic "0", generating a True (inverted logic) signal at the output of the

CAM.

Figure 6: Single-Bit CAM Cell With CAM, the user presents the data and re ceives

Figure 6: Single-Bit CAM Cell

With CAM, the user presents the data and receives a match signal (True/False, which is equal to match/ mismatch), sometimes with additional information (for example:

address where the match was found or some additional associated data, as presented in Figure 7 below).

The CAM searches through the memory within a single clock cycle and returns the results.

The CAM can be pre-loaded with its database at device startup and re-written during device operation.

The CAM can accelerate any application requiring fast searches of databases, lists, or patterns, such as in image or voice recognition, or computer and communication designs.

For this reason, CAM is used in applications where search time is critical and must be very fast. For example, the search key could be the Internet Protocol (IP) address of a network user, and the associated information could be a user’s access privileges and location on the network.

If the search key presented to the CAM is stored in the CAM’s table, the CAM indicates a match and returns the associated information, which consists of the user’s privileges.

A CAM can thus operate as a data-parallel or Single Instruction / Multiple Data (SIMD) processor.

In a typical application, the CAM generates only one (or a small number of) matches. Non-matching bits need a transition of their logic level during the search. These transitions create relatively higher power consumption on the match lines.

Furthermore, having all the memory data storage circuits connected and active at all times, the search lines are presented with high capacitive loads — a large source of power consumption.

capacitive loads — a large source of power consumption. Figure 7: (a) CAM with Associated Memory

Figure 7: (a) CAM with Associated Memory and its (b) Algorithm of Operation

The above-mentioned are the reasons why CAMs are considered a major source of power consumption and, therefore, sources of heat dissipation.

Most of the recent studies and innovations in the field of CAMs concentrate on reducing this effect to allow integration of CAMs into Packet Processor (PP) Application Specific Integrated Circuits (ASIC).

Types of CAMs

There are two basic types of CAM:

1. Binary CAM — supporting storage and search of dual-type binary bits: Zero and One (0,1).

2. Ternary CAM — supporting storage and search of triple-type bits: Zero, One, and Don't Care (0,1,X), thus adding flexibility to the search patterns.

Binary CAM

The above described circuit is the most basic form of CAM. It is also known as Binary CAM (BCAM) – a CAM which compares dual-type (0, 1) logic bits of the search key to the content of the memory.

Each bit is compared for True or False (0 or 1) values and only when a precise match is found, the Match signal is generated.

Ternary CAM

In some cases (for example: searching for patterns), only a partial match is required, where only a few of the bits need to be precisely matched to the search key and the rest — to be considered Don't Care. For example — a stored word of 10XX0 which will match any of the four search words 10000, 10010, 10100, or 10110.

In these cases, the unattended (compared) bits should be masked (considered match- in-any-case) and generate a match signal regardless of the actual data stored.

The added search flexibility comes at an additional cost, over BCAM, since the internal memory cell must encode three possible states instead of the two.

This additional state is typically implemented by adding a mask bit to every memory cell.

This is also known as Ternary CAM (TCAM) – a CAM which compares triple-type (0,1 and X) logic bits of the search key to the content of the memory.

Figure 8 below presents the logic operation of the TCAM.

Figure 8: Ternary CAM Search and Mask Logic Other Types of CAM Throughout the last

Figure 8: Ternary CAM Search and Mask Logic

Other Types of CAM

Throughout the last few years, where data communication equipment designers grew to appreciate the advantages of CAM, many innovative new CAM types were suggested.

These include:

Power saving implementations — due to the relatively high power consumption of the CAM circuit (see discussion on page 10), restrictions regarding the array size, response speed and the abilities to integrate CAM cores into ASICs, were imposed.

New technologies, circuit designs and architectures allow higher array densities, higher speeds of operation and easier integration of CAM cores into PPs' ASIC designs.

The most popular power saving scheme is the Bank-Selection scheme. In this scheme, only a subset of the CAM is active in any given cycle and the high power consumption search lines are shared between these banks.

Other technologies — attempts are made to improve the CAM circuits with respect to:

Minimizing the cell's footprint to allow higher array densities, by utilizing DRAM cell structure (a DRAM cell requires only four transistors to compare with the above described 6/8 transistors SRAM cell), or single transistor memory cell implementation.

Other types of search logic — the above described search mechanism is NOR-based logic. NAND- and XOR-based were suggested too.

Other Architectures — special architectures were developed for special purposes. Among these, the following need to be mentioned:

Additional memory array(s) for associated data (mentioned above).

CAMs for special applications — like the Prefix CAM.(PCAM) — a Ternary CAM optimized for longest prefix matching tasks (IPv4 and other), and the Label Encoded CAM (LECAM) — a parallel packet classification CAM employing some special algorithmic techniques with a modified CAM architecture.

CAMs with cache memory (searching for recently used key searches) or with pipelined hierarchical search scheme to speed up the CAM's search operation.

And others.

CAM Application

CAMs are well suited to performing search operations and can be used to accelerate any application ranging from Local Area Networks (LANs), database management, file- storage management, pattern recognition, artificial intelligence, fully associative and processor-specific cache memories, disk cache memories, and high-end data communication devices like data switches and routers.

Typical data communication applications are: Virtual Path Identifier / Virtual Circuit Identifier (VPI/VCI) translation in Asynchronous Transfer Mode (ATM) switches up to OC12 (622 Mbps) data rates, packet forwarding, IP filtering and packet classification for

Quality of Service (QoS) applications and Media Access Control (MAC) address lookup in Ethernet bridges.

In each one of these applications, users do not necessarily know the addresses of words that have particular pieces of information stored within a specific portion of the word length.

MAC Address Lookup for Network Switches

The end points (outgoing ports) of every data communication switch are the MACs.

Targeting packets onto a specific network user, its MAC address is included, as a field, within the packet's header.

In a typically switch circuit, a BCAM, with an associated RAM, will be used to compare the destination address applied onto the search mechanism, with a table of addresses (MAC addresses table) stored within it.

In case the comparison yields a match, the CAM will activate the RAM to translate the specific matched destination address to the specific hardware MAC's address.

address to the specific hardware MAC's address. Figure 9: MAC Address Lookup Address Lookup for Routers

Figure 9: MAC Address Lookup

Address Lookup for Routers

TCAMs are used in network routers, where each address has two parts:

The network address, which can vary in size depending on the subnet configuration, and,

The host address, which occupies the remaining bits.

Each subnet has a network mask that specifies which bits of the address are the network address and which bits are the host address.

Routing is done by comparing against a routing table, which is maintained by the router, which contains:

Each known destination network address,

The associated network mask, and,

The information needed to route packets to that destination.

Without a TCAM circuit, the router need to:

Compare the destination address of the packet to be routed with each entry in the routing table,

Perform a logical AND with the network mask, and,

Compare it with the network address.

If these are equal, the corresponding routing information is used to forward the packet.

Using a TCAM for the routing table makes the lookup process very efficient — The addresses are stored using Don't Care for the host part of the address, so looking up the destination address in the TCAM immediately retrieves the correct routing entry; both the masking and comparison are done by the TCAM hardware circuits.

The above described Address Lookup function inspects the destination address of the packet and selects an output port associated with that address. The list of destination addresses of the router, and their corresponding output ports, is called the Routing Table. An example of a simplified Routing Table is displayed in Table 1 below.

Line Number

Address (Binary)

Output Port

1

101XX

A

2

0110X

B

3

011XX

C

4

10011

D

Table 1: Simplified Routing Table

All four entries in the above table are 5-bit words. Due to the X (Don't Care) bits, the first three entries in Table 1 represent a range of input addresses. For example: the entry on Line 1 indicates that all addresses in the range of 10100 2 ÷10111 2 are forwarded to port A. The router scans, for each incoming packet, its destination port in the Address Lookup Table. For example: if the router receives a packet with incoming address of 01101 2 , the Address Lookup will yield matches of both Line 2 and Line 3 in Table 1. Line 2 will be selected since it best defines the search key's bit pattern. This is the indication that port B is the most direct route to the destination.

This lookup style is called longest-prefix matching and is required to implement the most recent Internet Protocol (IP) networking standard. The routing parameters determining the complexity of the implementation are:

Entry size

Table size

Search rate

Table update rate

IPv4 protocol's addresses are 32-bits long while IPv6 protocol's addresses are 128-bits long. Supplementary information like the source address and QoS information can expand IPv6 Routing Table entry sizes to 288÷576 bits.

Terabit-class routers need to perform hundreds of millions of searches per second in addition to thousands of routing table updates per second. Almost all algorithmic approaches are too slow to keep up with such high-speed routing requirements. Only hardware-based CAM can meet such requirements due to their high search throughput.

IP Filtering

An IP filter is a security feature that restricts unauthorized access to LAN resources or restricts traffic on a WAN link (IP traffic that goes through the router). IP filters can be used to restrict the types of Internet traffic that are permitted to access a LAN, and LAN workstations can be restricted to specific Internet-based applications (such as e-mail).

TCAMs can be used as a filter that blocks all access except for those packets that are given explicit permission according to the rules of the IP filter. In this application, the TCAM compares the packet being routed to the port against the IP Filter Rules residing within CAM. When a match is found, the packet is either permitted or denied, as presented in Figure 10 below.

Figure 10: TCAM as an IP Filter ATM Switch CAMs can be used as a

Figure 10: TCAM as an IP Filter

ATM Switch

CAMs can be used as a translation table in ATM switching network components.

Due to the fact that ATM networks are connection-oriented, virtual circuits need to be set up across them prior to any data transfer.

Two types of ATM virtual circuits exist:

Virtual Path, identified by a Virtual Path Identifier (VPI), and,

Channel Path, identified by a Channel Path Identifier (VCI).

VPI/VCI values are localized — each segment of the total connection has unique VPI/VCI combinations.

Whenever an ATM cell travels through a switch, its VPI/VCI value must be changed into the value used for the next segment of connection. This process is called VPI/VCI translation.

Due to the fact that speed is a significant factor in an ATM network, the speed at which this translation is done forms a critical factor in the network’s overall performance.

CAM can be used for the address translation and contribute significantly to the process rate. During the translation process, the CAM takes incoming VPI/VCI values in ATM cell headers and generates addresses that access data in the associated RAM. The CAM/RAM combination (see discussion in page 10) enables the realization of multi- mega-bit translation tables with full parallel search capability.

VPI/VCI fields from the ATM cell header are compared to a list of current connections stored in the CAM array. As a result of the comparison, CAM generates an address that is used to access an associated RAM where VPI/VCI mapping data and other connection information is stored.

The ATM controller modifies the cell header using the VPI/VCI data from the associated RAM, and the cell is sent to the switch, as presented in Figure 11.

the cell is sent to the switch, as presented in Figure 11. Figure 11: CAM in

Figure 11: CAM in an ATM switch

Translation Look-aside Buffer

A Translation Look-aside Buffer (TLB) is a cache-buffer in a CPU containing parts of the page-table translating from virtual into physical addresses.

This buffer has a fixed number of entries and is used to improve the speed of virtual address translation.

The buffer is typically implemented with a CAM in which the search key is the virtual address and the search result is a real or physical address. If the CAM search yields a match — the translation is known and the match data is used. If no match found — the translation proceeds via the page-table, requiring several more cycles to complete.

The TLB can reside between the CPU and the cache, or between the cache and primary storage memory. This is pending whether the cache is using virtual addressing or physical addressing.

In case the cache is virtually addressed, requests are sent directly from the CPU to the cache, which then accesses the TLB as necessary. If the cache is physically addressed, the CPU does a TLB lookup on every memory operation, and the resulting physical address is sent to the cache.

Although not intended by design, if system security has been breached, a restoration sub-system can use the translation look-aside buffer to alter the view of memory in order to hide a subversive program or backdoor on a computer.

Data Compression

Data compression eliminates the inherent redundancy in a given data file, thus

generating an equivalent but smaller file. CAM is well suited for data compression since

a significant portion of compression algorithm time is spent on searching for pre-defined data patterns. Replacing the algorithms with a hardware based search engine can significantly increase the throughput of a compression function.

In a data compression application, CAM lookup is performed following the presentation of each word of the original data as can be seen in F. If the presented word bit-pattern

is found, then the appropriate code is output. If the word is not found in the CAM, then

another word is shifted in.

The CAM will generate the results in a single transaction regardless of table size or search list length.

This virtue makes CAM an ideal candidate for data compression.

list length. This virtue makes CAM an ideal candidate for data compression. Figure 12: Data Compression

Figure 12: Data Compression