High Speed Lossless Data Compression

HIGH SPEED LOSSLESS DATA
COMPRESSION
PRESENTED BY:
www.final-yearprojects.co.cc
INTRODUCTION
 Data compression is a technique for reducing data redundancy to
preserve the band width of a communication channel and to
increase the capacity of data storage and in turn improves the
overall network performance.
 Lossless data compression is a necessary to handle the enormous

amount of digital data storage and retrieval.
 It also finds applications in high-speed communication networks
for preserving the bandwidth of both the wired as well as wireless
channels.
 Lossless data compression system assures that the data at the

decoder output will be exactly identical to the data at the encoder
input.
LOSSLESS DATA
COMPRESSION METHODS
1. Huffman Coding.
2. Run Length Encoding.
3. Arithmetic Coding.
4. Lempel-Ziv algorithms.
like-LZ77
LZW.
5. X-Match Pro Algorithms.
DISADVANTAGE OF
STATISTICAL APPROACHES
1. Huffman coding and other statistical approaches
are not good options for modern day
communications because the network traffic is
neither predictable nor consistent.
3. Software implementations of these algorithms is

cost effective for low to moderately high link
speed connections.
5. Software solutions are not well suited for real

time applications like network routers when
compression and decompression at high
throughputs is required on the fly.
LZ77 ENCODING TECHNIQUE
 The LZ77 algorithm also called as LZ1 was
proposed by Ziv and Lempel in 1977.
 It is a sequential algorithm that compresses

strings of binary bytes of variable length into a
fixed length compressed format.
 The two important steps in the algorithm are

string parsing and coding.
 The characters or symbols are elements of the

alphabet A and is a set of extended ASCII
characters i.e.256 distinct symbols in our case
and each symbol is one byte long.
 The repeating phrases of incoming data are
replaced with fixed length code words.
 CR gives the compression ratio=(Length of

original data-length of code)/(length of original
data)
Compression method
 The algorithm employs a principle called sliding-window and
data to be compressed is pumped in the buffer of length 2n
symbols.
 Initially, the first or the left half of the buffer referred as
search buffer is filled with zeros.

 The second half of the buffer of length L, holds the data
that needs to be encoded and is referred as the new or look

ahead buffer.
FOR EXAMPLE:
Consider a string of alphabet set of 3 symbols (0,1,2).
The LZ77 or LZ1 compression is applied on the data.
S=0001010210210212021021200...(input string).
Ls=9 (length of new buffer)
2n=18 (window size)
Fig 1:LZ1 compression
000000000 000101021 C1=22101
000000001 010210210 C2=21102
000010102 102102120 C3=20212
210210212 021021200 C4=02220

 For applying LZ1 compression, longest match of the string
starting with the first position of the look ahead buffer is
found in the search buffer.
 As shown in fig.1,the longest match is a string consisting of
3 symbols "000“ of length 3.
 The match can start from any position (0-8) of the search
buffer but it can extend to look ahead buffer.
 For convenience position 8 ( it is 22 in base 3
representation i.e. 8 base1o= 22base3) is chosen as the
pointer where the match started and length of match is 3
(310=103) and last symbol after the match is 1.
 Codeword is formed by concatenation of "pointer
length lastsymbol".
 So the first code word is 22 10 1

 The code word is usually represented in the
same alphabet as the source data.
 After forming the code, the buffer is shifted by

length+1 positions on the left and look ahead
buffer is filled with same number of symbols from
right with incoming data.
 The algorithm looks at the data through a window
of fixed size Anything outside this window can
neither be referenced nor encoded.
 As more data is being encoded, the window slides

along, removing the oldest encoded data from
the window and adding new encoded data to it.
Algorithm
while (look Ahead Buffer not empty)
{
get a reference (position and length) to longest match;
if (length > 0)
{
output (position, length, next symbol);
shift the window length+1 positions along;

}
Else
{
output (0, 0, first symbol in the look ahead buffer);
shift the window 1 character along;

}
}
LZ1 DESIGN METHODOLOGY
 LZ1 data compression is a sequential

algorithm that compresses strings of
binary bytes of variable length into a
fixed length compressed format.
 The major step in data compression
is the reduction of repeated strings

of incoming data into compact code
words.
 This involves comparisons of the
symbols with each other to find
similar symbols and of course finding
the longest match.
 Obviously the software solutions are
limited by the processor speed in

achieving high throughputs,
therefore hardware solutions become
inevitable.
 In order to achieve high speeds we
propose that the serial comparisons can be
translated to a parallel architecture thus
achieving higher throughputs.
 The required degree of parallelism can be
achieved by providing hardware for future
comparisons also which implies looking
ahead or unfolding of the hardware to
speed up the comparison operation.
 The look ahead buffer holds the data to be
compressed while the already encoded
data is present in the search buffer.
The sizes of the two buffers are
critical to obtain better compression
ratio.
 By looking at the compression ratio
with different buffer sizes, it is

evident that greater the sizes of
search and look ahead buffers, better
is the compression performance.
 On the other hand, increasing the
size of these buffers not only
increases the area requirements of
the hardware design but also
increases the critical path delay of
the hardware, resulting in the
decrease of throughput.
 Thus, it is imperative that our design
is easily scalable
Hardware Implementation
 For hardware implementation, a total

buffer of size 2n is considered where
first half contains n symbols (xo,
xl....... Xn-1) that have recently
been coded.
 The second half of buffer holds next
n symbols (yO, yr,... Yn-1) that are

yet to be coded.
 For understanding, consider n=4.
The symbols belong to the set of 256
extended ASCII characters and each
symbol is a byte long (8-bits).
 Parallel comparisons of source

symbols are shown in Table 1.
Scalable Architecture for LZ1
Algorithm
Working
 The BMC select logic is the most
critical part.
 The column select logic starts from
the left most BMC and based on its L

jumps to the L th column to the right
and from this column's L selects the
next column while leaving the rest of
the columns not required by the
algorithm.
 The minimum throughput achieved
by the architecture is equal to the
amount of unfolding or the number
of BMCs bytes per cycles whereas
the best case depends on the length
L of the last BMC.
 Based on the last BMC's L, the barrel
shifter at the top shifts the y buffer to x
buffer while the second barrel shifter
brings the same amount of data from the
FIFO.
 Though the logic works in a single cycle
but any amount of pipelining permissible
by the user's constraints can be
intelligently added to increase the speed of
execution.
CONCLUSIONS
 The paper presented a data compression

architecture that provided a throughput of more
than 1 Gbits/sec.
 Applying the pipelining technique for reducing the
critical path can further optimize the architecture.
 The future work would also include the
parameterization of different synthesizable blocks
of the architecture so that an Integrated Design
Environment (IDE) can be designed and
developed.
 The IDE serving as a tool would provide the
flexibility of design space exploration to generate
several variants of high throughput architecture
while optimizing a set of design parameters
subject to a set of design constraints.
REFERENCES
 [1] Shih-Arn Hwang and Cheng-Wen Wu, "Unified VLSI

Systolic Array for LZ Data Compression", IEEE Transactions
on Ver Large Scale Integration, Vol. 9, No. 4, August 2001.
 [2] D.Huffman,"A method for the construction of minimum
redundancy codes," Proc. IRE, 1958, Vol 40,pp. 1098-1101,
Sep 1952.
 [3] S. Colomb., "Run Length Encoding" , IEEE Trans.
Inform. Theory, Vol. IT-12, pp 399-401,July 1966.
 [4] G.G. Langdon Jr., "An Introduction to Arithmetic
Coding", IBM J.Res. Development, pp. 135-149, Mar 1984.
 [5] J. Ziv and A. Lempel, "A Universal Algorithm for
Sequential Data Compression", IEEE trans. On Information
Theory, vol. IT-23 No. 2,May 1977.
 [6] T.Welsh, "A Technique for high-performance data
compression",IEEE Computer, vol. 17, pp 8-10, 1984.
 [7] J. luis Nunez and Simon Jones, "Gbits/s Lossless Data
Compression Hardware", IEEE Transactions on VLSI
Systems, vol. 11, No 3, June 2003.
 [8] Chun-Te Chen, Liang Gi Cheni , "High Speed VLSI
Design for LZ Based Data Compression", IEEE international
symposium on Circuits and Systems, June 9-12 1997,
Hongkong.
 [9] S. Jones, "100 Mbit/s Adaptive Data Compressor Design
using selectively Shiftable Content addressable Memory" in
Proc. Pt. G,vol.1 39, no. 8, August 1992.
 [10] C.Y. Lee and R.Y. Yang, "High throughput Data
Compressor Design using Content Addressable Memory",
Proc. Pt. G., vol.142, Feb 1995.
 [11] D. Mark Royals, Tasso Markas, Nick
Kanopoulos, John H. Reif, and James A.
Storerer, "On the design and
Implementation of Lossless Data
Compression Chip", IEEE Journal of Solid
State Circuits, vol.28, No. 9, 1993
 [12] N. Ranganathan and S. Henriques, "
High Speed VLSI designs for Lempel-Ziv
-Based Data Compression", IEEE
Transactions on Circuits and Systems-Il:
Analog and Digital Signal Processing,
vol.40, February 1993.
THANK YOU

High Speed Lossless Data Compression

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

High Speed Lossless Data Compression

Загружено:

Авторское право:

Доступные форматы

HIGH SPEED LOSSLESS DATA

 Lossless data compression is a necessary to handle the enormous

 Lossless data compression system assures that the data at the

3. Software implementations of these algorithms is

5. Software solutions are not well suited for real

 It is a sequential algorithm that compresses

 The two important steps in the algorithm are

 The characters or symbols are elements of the

 CR gives the compression ratio=(Length of

search buffer is filled with zeros.

that needs to be encoded and is referred as the new or look

000000001 010210210 C2=21102

000010102 102102120 C3=20212

210210212 021021200 C4=02220

 So the first code word is 22 10 1

 After forming the code, the buffer is shifted by

 As more data is being encoded, the window slides

shift the window length+1 positions along;

shift the window 1 character along;

 LZ1 data compression is a sequential

is the reduction of repeated strings

limited by the processor speed in

with different buffer sizes, it is

 For hardware implementation, a total

n symbols (yO, yr,... Yn-1) that are

 Parallel comparisons of source

the left most BMC and based on its L

 The paper presented a data compression

 [1] Shih-Arn Hwang and Cheng-Wen Wu, "Unified VLSI

Вам также может понравиться