Data Compression Project-Huffman Algorithm

Data Compression Project
Mini Project Report Submitted to DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING By Samir Sheri Satvik N In partial fullment of the requirements for the award of the degree
BACHELOR OF ENGINEERING
IN COMPUTER SCIENCE AND ENGINEERING
R V College of Engineering
(Autonomous Institute, Aliated to VTU) BANGALORE - 560059
May 2012
DECLARATION
We, Samir Sheri and Satvik N bearing USN number 1RV09CS093 and 1RV09CS095 respectively, hereby declare that the dissertation entitled Data Compression Project completed and written by us, has not been previously formed the basis for the award of any degree or diploma or certicate of any other University.
Bangalore
Samir Sheri USN:1RV09CS093 Satvik N USN:1RV09CS095
R V COLLEGE OF ENGINEERING
(Autonomous Institute Aliated to VTU) DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CERTIFICATE This is to certify that the dissertation entitled, Data Compression Project, which is being submitted herewith for the award of B.E is the result of the work completed by Samir Sheri and Satvik N under my supervision and guidance.
Signature of Guide (Name of the Guide)
Signature of Head of Department (Dr. N K Srinath) Name of Examiner 1:
Signature of Principal (Dr. B.S Sathyanarayana) Signature of Examiner
2:
ACKNOWLEDGEMENT
The euphoria and satisfaction of the completion of the project will be incomplete without thanking the persons responsible for this venture. We acknowledge RVCE (Autonomous under VTU) for providing an opportunity to create a mini-project in the 5th semester. We express our gratitude towards Prof. B.S. Satyanarayana, principal, R.V.C.E for constant encouragement and facilitates extended in completion of this project. We would like to thank Prof. N.K.Srinath, HOD, CSE Dept. for providing excellent lab facilites for the completion of the project. We would personally like to thank our project guides Chaitra B.H. and Suma B. and also the lab in charge, for providing timely assistance & guidance at the time. We are indebted to the co-operation given by the lab administrators and lab assistants, who have played a major role in bringing out the mini-project in the present form. Bangalore Samir Sheri 6th semester, CSE USN:1RV09CS093 Satvik N 6th semester, CSE USN:1RV09CS095
ABSTRACT
The Project Data Compression Techniques is aimed at developing programs that transform a string of characters in some representation (such as ASCII) into a new string (of bits, for example) which contains the same information but whose length is as small as possible. Compression is useful because it helps reduce the consumption of resources such as data space or transmission capacity. The design of data compression schemes involve trade-o s among various factors, including the degree of compression, the amount of distortion introduced (e.g., when using lossy data compression), and the computational resources required to compress and uncompress the data. Many data processing applications require storage of large volumes of data, and the number of such applications is constantly increasing as the use of computers extends to new disciplines. Compressing data to be stored or transmitted reduces storage and/or communication costs. When the amount of data to be transmitted is reduced, the eect is that of increasing the capacity of the communication channel. Similarly, compressing a le to half of its original size is equivalent to doubling the capacity of the storage medium. It may then become feasible to store the data at a higher, thus faster, level of the storage hierarchy and reduce the load on the input/output channels of the computer system.
ii
Contents
ACKNOWLEDGEMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 INTRODUCTION 1.1 SCOPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i ii ii 1 1 3 4 4 5 6 6 6 8 8 9 9 10 11 12 14
2 REQUIREMENT SPECIFICATION 3 Compression 3.1 3.2 3.3 3.4 A Naive Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Building the Human Tree . . . . . . . . . . . . . . . . . . . . . . . . . . An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 3.4.2 3.4.3 An Example: go go gophers . . . . . . . . . . . . . . . . . . . . Example Encoding Table . . . . . . . . . . . . . . . . . . . . . . . Encoded String . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Decompression 4.1 4.2 4.3 Storing the Human Tree . . . . . . . . . . . . . . . . . . . . . . . . . . Creating the Human Table . . . . . . . . . . . . . . . . . . . . . . . . . Storing Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 CONCLUSION AND FUTURE WORKS BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
APPENDICES15
iv
Chapter 1 INTRODUCTION
The Project Data Compression Techniques is aimed at developing programs that transform a string of characters in some representation (such as ASCII) into a new string (of bits, for example) which contains the same information but whose length is as small as possible. Compression is useful because it helps reduce the consumption of resources such as data space or transmission capacity. The design of data compression schemes involve trade-os among various factors, including the degree of compression, the amount of distortion introduced (e.g., when using lossy data compression), and the computational resources required to compress and uncompress the data.
1.1
SCOPE
The data compression techniques nd applications in almost all elds. To list a few, Audio data compression reduces the transmission bandwidth and storage requirements of audio data. Audio compression algorithms are implemented in software as audio codecs. Lossy audio compression algorithms provide higher compression at the cost of delity, are used in numerous audio applications. These algorithms almost all rely on psychoacoustics to eliminate less audible or meaningful sounds, thereby reducing the space required to store or transmit them. Video
Software Requirements Specication
Data Compression Techniques
Video compression uses modern coding techniques to reduce redundancy in video data. Most video compression algorithms and codecs combine spatial image compression and temporal motion compensation. Video compression is a practical implementation of source coding in information theory. In practice most video codecs also use audio compression techniques in parallel to compress the separate, but combined data streams. Grammar-Based Codes They can extremely compress highly-repetitive text, for instance, biological data collection of same or related species, huge versioned document collection, internet archives, etc. The basic task of grammar-based codes is constructing a contextfree grammar deriving a single string. Sequitur and Re-Pair are practical grammar compression algorithms which public codes are available.
Dept. of CSE, R V C E, Bangalore.
Feb 2012 - May 2013
Chapter 2 REQUIREMENT SPECIFICATION

Software Requirement Specication (SRS) is an important part of software development process. We describe about the overall description of the Data Compression Project, the specic requirements of the Data Compression Project, the software requirements and hardware requirements and the functionality of the system. Software Requirements Front End: Qt GUI Application. Back End: C++ Operating System: Linux. Hardware Requirements Processor: Intel Pentium 4 or higher version RAM: 512MB or more Hard disk: 5 GB or less
Chapter 3 Compression
Well look at how the string go go gophers is encoded in ASCII, how we might save bits using a simpler coding scheme, and how Human coding is used to compress the data resulting in still more savings.
3.1
A Naive Approach
With an ASCII encoding (8 bits per character) the 13 character string go go gophers requires 104 bits. The table below on the left shows how the coding works. 4
Compression
The string go go gophers would be written (coded numerically) as 103 111 32 103 111 32 103 111 112 104 101 114 115. Although not easily readable by humans, this would be written as the following stream of bits (the spaces would not be written, just the 0s and 1s) 1100111 1101111 1100000 1100111 1101111 1000000 1100111 1101111 1110000 1101000 1100101 1110010 1110011 Since there are only eight dierent characters in go go gophers, its possible to use only 3 bits to encode the dierent characters. We might, for example, use the encoding in the table on the right above, though other 3-bit encodings are possible. Now the string go go gophers would be encoded as 0 1 7 0 1 7 0 1 2 3 4 5 6 or, as bits: 000 001 111 000 001 111 000 001 010 011 100 101 110 111 By using three bits per character, the string go go gophers uses a total of 39 bits instead of 104 bits. More bits can be saved if we use fewer than three bits to encode characters like g, o, and space that occur frequently and more than three bits to encode characters like e, p, h, r, and s that occur less frequently in go go gophers.
3.2
The Basic Idea
This is the basic idea behind Human coding: to use fewer bits for more frequently occurring characters. Well see how this is done using a tree that stores characters at the leaves, and whose root-to-leaf paths provide the bit sequence used to encode the characters. Well use Humans algorithm to construct a tree that is used for data compression. Well assume that each character has an associated weight equal to the number of times the character occurs in a le, for example. In the go go gophers example, the characters g and o have weight 3, the space has weight 2, and the other characters have weight 1. When compressing a le well need to calculate these weights, well ignore this step for now and assume that all character weights have been calculated.
Feb 2012 - May 2013
Compression
3.3
Building the Human Tree
Humans algorithm assumes that were building a single tree from a group (or forest) of trees. Initially, all the trees have a single node with a character and the characters weight. Trees are combined by picking two trees, and making a new tree from the two trees. This decreases the number of trees by one at each step since two trees are combined into one tree. The algorithm is as follows: Begin with a forest of trees. All trees are one node, with the weight of the tree equal to the weight of the character in the node. Characters that occur most frequently have the highest weights. Characters that occur least frequently have the smallest weights. Repeat this step until there is only one tree: Choose two trees with the smallest weights, call these trees T1 and T2. Create a new tree whose root has a weight equal to the sum of the weights T1 + T2 and whose left subtree is T1 and whose right subtree is T2. The single tree left after the previous step is an optimal encoding tree..
3.4
3.4.1
An Example
An Example: go go gophers
Well use the string go go gophers as an example. Initially we have the forest shown below. The nodes are shown with a weight/count that represents the number of times the nodes character occurs.
Feb 2012 - May 2013
Compression
Feb 2012 - May 2013
Decompression
3.4.2
Example Encoding Table
The character encoding induced by the last tree is shown below where again, 0 is used for left edges and 1 for right edges.
3.4.3
Encoded String
The string go go gophers would be encoded as shown (with spaces used for easier reading, the spaces wouldnt appear in the real encoding). 00 01 100 00 01 100 00 01 1110 1101 101 1111 1100 Once again, 37 bits are used to encode go go gophers. There are several trees that yield an optimal 37-bit encoding of go go gophers. The tree that actually results from a programmed implementation of Humans algorithm will be the same each time the program is run for the same weights (assuming no randomness is used in creating the tree).
Feb 2012 - May 2013
Chapter 4 Decompression
Generally speaking, the process of decompression is simply a matter of translating the stream of prex codes to individual byte values, usually by traversing the Human tree node by node as each bit is read from the input stream (reaching a leaf node necessarily terminates the search for that particular byte value). Before this can take place, however, the Human tree must be somehow reconstructed.
4.1
Storing the Human Tree
In the simplest case, where character frequencies are fairly predictable, the tree can be preconstructed (and even statistically adjusted on each compression cycle) and thus reused every time, at the expense of at least some measure of compression eciency. Otherwise, the information to reconstruct the tree must be sent a priori. A naive approach might be to prepend the frequency count of each character to the compression stream. Unfortunately, the overhead in such a case could amount to several kilobytes, so this method has little practical use. Another method is to simply prepend the Human tree, bit by bit, to the output stream. For example, assuming that the value of 0 represents a parent node and 1 a leaf node, whenever the latter is encountered the tree building routine simply reads 9
Decompression
the next 8 bits to determine the character value of that particular leaf. The process continues recursively until the last leaf node is reached; at that point, the Human tree will thus be faithfully reconstructed. The overhead using such a method ranges from roughly 2 to 320 bytes (assuming an 8-bit alphabet).
Many other techniques are possible as well. In any case, since the compressed data can include unused trailing bits the decompressor must be able to determine when to stop producing output. This can be accomplished by either transmitting the length of the decompressed data along with the compression model or by dening a special code symbol to signify the end of input (the latter method can adversely aect code length optimality, however).
4.2
Creating the Human Table
To create a table or map of coded bit values for each character youll need to traverse the Human tree (e.g., inorder, preorder, etc.) making an entry in the table each time you reach a leaf. For example, if you reach a leaf that stores the character C, following a path left-left-right-right-left, then an entry in the C-th location of the map should be set to 00110. Youll need to make a decision about how to store the bit patterns in the map. At least two methods are possible for implementing what could be a class/struct BitPattern: Use a string. This makes it easy to add a character (using +) to a string during tree traversal and makes it possible to use string as BitPattern. Your program may be slow because appending characters to a string (in creating the bit pattern) and accessing characters in a string (in writing 0s or 1s when compressing) is slower than the next approach.
Feb 2012 - May 2013
10
Conclusion
Alternatively you can store an integer for the bitwise coding of a character. You need to store the length of the code too to dierentiate between 01001 and 00101. However, using an int restricts root-to-leaf paths to be at most 32 edges long since an int holds 32 bits. In a pathological le, a Human tree could have a root-to-leaf path of over 100. Because of this problem, you should use strings to store paths rather than ints. A slow correct program is better than a fast incorrect program.
4.3
Storing Sizes
The operating system will buer output, i.e., output to disk actually occurs when some internal buer is full. In particular, it is not possible to write just one single bit to a le, all output is actually done in chunks, e.g., it might be done in eight-bit chunks. In any case, when you write 3 bits, then 2 bits, then 10 bits, all the bits are eventually written, but you can not be sure precisely when theyre written during the execution of your program. Also, because of buering, if all output is done in eight-bit chunks and your program writes exactly 61 bits explicitly, then 3 extra bits will be written so that the number of bits written is a multiple of eight. Because of the potential for the existence of these extra bits when reading one bit at a time, you cannot simply read bits until there are no more left since your program might then read the extra bits written due to buering. This means that when reading a compressed le, you CANNOT use code like this. int bits; while (input.readbits(1, bits)) { // process bits } To avoid this problem, you can write the size of a data structure before writing the data structure to the le.
Feb 2012 - May 2013
11
Chapter 5 CONCLUSION AND FUTURE WORKS

Summary Limitations 1. Human code is optimal only if exact probability distribution of the source symbols is known. 2. Each symbol is encoded with integer number of bits. 3. Human coding is not ecient to adapt with the changing source statistics. 4. The length of the codes of the least probable symbol could be very large to store into a single word or basic storage unit in a computing system. Further enhancements The human coding the we have considered is simple binary Human codingbut many variations of Human coding exist, 1. n-ary Human coding: The n-ary Human algorithm uses the {0, 1, ... , n 1} alphabet to encode message and build an n-ary tree. This approach was considered by Human in his original paper. The same algorithm applies as for binary (n equals 2)codes, except that the n least probable symbols are taken together, instead of just the 2 least probable. Note that for n greater than 2, not all sets of source words 12
Conclusion
can properly form an n-ary tree for Human coding. In this case, additional 0probability place holders must be added. If the number of source words is congruent to 1 modulo n-1, then the set of source words will form a proper Human tree. 2. Adaptive Human coding: A variation called adaptive Human coding calculates the probabilities dynamically based on recent actual frequencies in the source string. This is some what related to the LZ family of algorithms. 3. Human template algorithm: Most often, the weights used in implementations of Human coding represent numeric probabilities, but the algorithm given above does not require this; it requires only a way to order weights and to add them. The Human template algorithm enables one to use any kind of weights (costs,frequencies etc) 4. Length-limited Human coding: Length-limited Human coding is a variant where the goal is still to achieve a minimum weighted path length, but there is an additional restriction that the length of each codeword must be less than a given constant. The package-merge algorithm solves this problem with a simple greedy approach very similar to that used by Humans algorithm. Its time complexity is O(nL),where L is the maximum length of a codeword. No algorithm is known to solve this problem in linear or linear logarithmic time, unlike the presorted and unsorted conventional Human problems, respectively.
Feb 2012 - May 2013
13
Bibliography
14
Appendix A: Source Code
Appendices Appendix A : Source Code

Listing 5.1: The deniton of the class Charnode each node of the human tree is an object of this class. #i f n d e f #define Charnode h Charnode h
#define DEBUG 1 #i f DEBUG #define LOG( s ) cout<<s<<e n d l ; #e l s e #define LOG( s ) // #endif
using namespace s t d ;
template <c l a s s TYPE > c l a s s Charnode { TYPE ch ; int count ; Charnode l e f t ; Charnode r i g h t ;
public : Charnode (TYPE ch , int count = 0 ) ; Charnode ( const Charnode New ) ; int GetCount ( ) ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013
15
int Value ( ) ; void S e t L e f t ( Charnode l e f t ) ; void S e t R i g h t ( Charnode r i g h t ) ; Charnode GetL eft ( void ) ; Charnode GetRight ( void ) ; TYPE GetChar ( void ) ; void show ( ) ; bool operator <(Charnode &o b j 2 ) ;
void setChar (TYPE ch ) ; };
template <c l a s s TYPE > Charnode<TYPE> : : Charnode (TYPE ch , int count ) { LOG( new Charnode <<count<< r e q u e s t e d ) ; this>ch = ch ; this>count = count ; this> l e f t = this>r i g h t = NULL; }
template <c l a s s TYPE > Charnode<TYPE> : : Charnode ( const Charnode New) { LOG( new Charnode <<New>count<< r e q u e s t e d ) ;
this>ch = New>ch ; this>count = New>count ;

16
this> l e f t = New> l e f t ; this>r i g h t = New>r i g h t ; }
template <c l a s s TYPE > int Charnode<TYPE> : : GetCount ( ) { return count ; }
template <c l a s s TYPE > int Charnode<TYPE> : : Value ( ) { return count ; }
template <c l a s s TYPE > void Charnode<TYPE> : : S e t L e f t ( Charnode l e f t ) { this> l e f t = l e f t ; }
template <c l a s s TYPE > void Charnode<TYPE> : : S e t R i g h t ( Charnode r i g h t ) { this>r i g h t = r i g h t ; }
template <c l a s s TYPE > Charnode<TYPE Charnode<TYPE> : : GetLef t ( void ) >

17
{ return l e f t ; }
template <c l a s s TYPE > Charnode<TYPE Charnode<TYPE> : : GetRight ( void ) > { return r i g h t ; }
template <c l a s s TYPE > TYPE Charnode<TYPE> : : GetChar ( void ) { return ch ; }
template <c l a s s TYPE > void Charnode<TYPE> : : show ( ) { cout<<ch<< \ t <<count<<e n d l ; }
template <c l a s s TYPE > bool Charnode<TYPE> : : operator <(Charnode &o b j 2 ) { return ( count < o b j 2 . GetCount ( ) ) ; }
template <c l a s s TYPE > void Charnode<TYPE> : : setChar (TYPE ch )

18
{ this>ch = ch ; }
#endif Listing 5.2: The denition of the class Human this class helps in building the human tree for an input le. #include #include Charnode . h #include g l o b a l s . h #include b i t o p s . h #include <v e c t o r > #include <map> #include <f s t r e a m >
#i f n d e f #define
HuffmanCode h HuffmanCode h
template <c l a s s TYPE > c l a s s Huffman { private : v e c t o r <Charnode<TYPE > charactermap ; > Charnode<TYPE huffmanTreeRoot ; > map<TYPE, s t r i n g > t a b l e ;
Feb 2012 - May 2013
19
map<TYPE, int> f r e q t a b ;
private : void p r o c e s s f i l e ( const char f i l e n a m e , map<TYPE, int> & charmap ) ;
v e c t o r <Charnode<TYPE > convertToVector (map<TYPE, int> &chamap ) ; >
bool compare ( Charnode<TYPE i , Charnode<TYPE j ) ; > >
void MinHeapify ( v e c t o r <Charnode<TYPE > & charactermap , int i , cons >
void BuildMinHeap ( v e c t o r <Charnode<TYPE > &charactermap ) ; >
void buildHuffmanTree ( ) ;
void delNode ( Charnode<TYPE ) ; >
public :
Huffman ( ) ;
Huffman ( const char f i l e n a m e ) ;
Huffman ( ) ;
void createHuffmanTable ( Charnode<TYPE t r e e , int code , int h e i g h t ) ; >
void d i s p l a y C h a r a c t e r m a p ( ) ;
Feb 2012 - May 2013
20
void displayHuffmanTable ( ) ;
Charnode<TYPE getRoot ( ) ; >
map<TYPE, s t r i n g > getHuffmanTable ( ) ;
map<TYPE, int> getFrequencyMap ( ) ;
int ge tC ha rV ec Si ze ( ) ; };
template<c l a s s TYPE > int Huffman<TYPE> : : ge tC ha rV ec Siz e ( ) { return charactermap . s i z e ( ) ; }
template <c l a s s TYPE >
void Huffman<TYPE> : : p r o c e s s f i l e ( const char f i l e n a m e , map<TYPE, int> & cha { ibstream i n f i l e ( filename ) ;
int i n b i t s ; while ( i n f i l e . r e a d b i t s (BITS PER WORD, i n b i t s ) != f a l s e ) { // c o u t << (TYPE) i n b i t s ; charmap [ (TYPE) i n b i t s ]++; }
21
LOG( \n\n\nEND\n ) }
v e c t o r <Charnode<TYPE > Huffman<TYPE> : : convertToVector (map<TYPE, int> &cham > { v e c t o r <Charnode<TYPE > charactermap ; >
for (typename map<TYPE, int > : : i t e r a t o r i i =chamap . b e g i n ( ) ; i i !=chamap . { // c o u t << ( i i ) . f i r s t << : << ( i i ) . second << e n d l ;
Charnode<TYPE ch = new Charnode<TYPE>(( i i ) . f i r s t , ( i i ) . > charactermap . push back ( ch ) ; #i f DEBUG // ch>show ( ) ; i f ( ch>GetL eft ()==NULL && ch>GetRight()==NULL) LOG( L e a f Node i n i t i a l i z e d p r o p e r l y ) ; #endif }
return charactermap ; }
template <c l a s s TYPE > bool Huffman<TYPE> : : compare ( Charnode<TYPE i , Charnode<TYPE j ) > > { return ( i < j ) ; }
Feb 2012 - May 2013
22
void Huffman<TYPE> : : MinHeapify ( v e c t o r <Charnode<TYPE > & charactermap , int > { int l e f t = 2 i + 1 ; int r i g h t = l e f t + 1 ; int s m a l l e s t = 1;
i f ( l e f t <n && charactermap [ l e f t ]>Value ()< charactermap [ i ]>Value ( ) ) smallest = l e f t ; else smallest = i ;
i f ( r i g h t <n && charactermap [ r i g h t ]>Value ()< charactermap [ s m a l l e s t ]> smallest = right ;
i f ( s m a l l e s t != i ) { Charnode<TYPE temp = charactermap [ i ] ; > charactermap [ i ] =charactermap [ s m a l l e s t ] ; charactermap [ s m a l l e s t ] = temp ;
MinHeapify ( charactermap , s m a l l e s t , n ) ; } }
template <c l a s s TYPE > void Huffman<TYPE> : : BuildMinHeap ( v e c t o r <Charnode<TYPE > &charactermap ) > { int n = charactermap . s i z e ( ) ; for ( int i = n / 2 ; i >=0 ; i ) MinHeapify ( charactermap , i , n ) ;
23
template <c l a s s TYPE > void Huffman<TYPE> : : buildHuffmanTree ( ) { LOG( func );
v e c t o r <Charnode<TYPE > charactermap = this>charactermap ; >
/ HUFFMAN (C) R e f e r CLRS ( nonu n i c o d e c h a r a c t e r s . ) / int n = charactermap . s i z e ( ) ; LOG( S i z e o f t he char map = <<n ) ; for ( int i =1; i <n ; i ++) { LOG( i << th i t e r a t i o n ) BuildMinHeap ( charactermap ) ;
Charnode<TYPE l e f t = new Charnode<TYPE>( charactermap [ 0 ] ) ; > LOG( l e f t >GetCount ( ) ) ; charactermap . e r a s e ( charactermap . b e g i n ( ) + 0 ) ; BuildMinHeap ( charactermap ) ;
Charnode<TYPE r i g h t = new Charnode<TYPE>( charactermap [ 0 ] ) > charactermap . e r a s e ( charactermap . b e g i n ( ) + 0 ) ; LOG( r i g h t >GetCount ( ) ) ;
Charnode<TYPE z = new Charnode<TYPE>( \0 , l e f t >Value ()+ > z>S e t L e f t ( l e f t ) ;

24
z>S e t R i g h t ( r i g h t ) ;
LOG( z>GetCount ( ) ) LOG( z>GetL eft ()>GetCount ( ) ) ; LOG( z>GetRight()>GetCount ( ) ) ;
charactermap . push back ( z ) ; }
huffmanTreeRoot = charactermap [ 0 ] ;
// I n i t i a l i z
template <c l a s s TYPE > Huffman<TYPE> : : Huffman ( ) {}
template <c l a s s TYPE > Huffman<TYPE> : : Huffman ( const char f i l e n a m e ) { map<TYPE, int> charmap ; p r o c e s s f i l e ( f i l e n a m e , charmap ) ; charactermap = convertToVector ( charmap ) ; f r e q t a b = charmap ;
buildHuffmanTree ( ) ; createHuffmanTable ( huffmanTreeRoot , 0 , 0 ) ; }
Feb 2012 - May 2013
25
template <c l a s s TYPE > void Huffman<TYPE> : : delNode ( Charnode<TYPE node ) > { i f ( node == NULL) return ; delNode ( node>GetL eft ( ) ) ; delNode ( node>GetRight ( ) ) ;
delete node ; }
template <c l a s s TYPE > Huffman<TYPE> : : Huffman ( ) { delNode ( huffmanTreeRoot ) ; huffmanTreeRoot = NULL; }
template <c l a s s TYPE > void Huffman<TYPE> : : createHuffmanTable ( Charnode<TYPE t r e e , int code , int > { LOG( func ); // This c o d i t i o n n e v e r o c c u r s !
i f ( t r e e==NULL) return ;
i f ( t r e e >GetL eft ()==NULL && t r e e >GetRight()==NULL) // Leaf Node : P {
// cout <<C h a r a c t e r <<t r e e >GetChar()<<\ t <<Count = <<t r // cout <<Code : ;
s t r i n g c o d e S t r i n g = ;
26
for ( int j = h e i g h t 1; j >=0; j ) { i f ( code & (1<< j ) ) { // cout < < 1 ; c o d e S t r i n g += 1 ; } else { // cout < < 0 ; c o d e S t r i n g += 0 ; } } // cout<<e n d l ;
t a b l e [ t r e e >GetChar ( ) ] = c o d e S t r i n g ;
return ;
} code = code <<1; createHuffmanTable ( t r e e >GetL eft ( ) , code , h e i g h t +1); createHuffmanTable ( t r e e >GetRight ( ) , code | 1 , h e i g h t +1); }
template <c l a s s TYPE > void Huffman<TYPE> : : d i s p l a y C h a r a c t e r m a p ( ) { LOG( func );
int n = charactermap . s i z e ( ) ;
27
LOG( S i z e = <<n ) for ( int i = 0 ; i <n ; i ++) charactermap [ i ]>show ( ) ; cout<<e n d l ; }
template <c l a s s TYPE > Charnode<TYPE Huffman<TYPE> : : getRoot ( ) > { return huffmanTreeRoot ; }
template <c l a s s TYPE > void Huffman<TYPE> : : displayHuffmanTable ( ) { LOG( HUFFMAN TABLE ) ; for (typename map<TYPE, s t r i n g > : : i t e r a t o r i i =t a b l e . b e g i n ( ) ; i i != t a b { cout << e n d l << ( i i ) . f i r s t << \ t << ( i i ) . second ; } cout << e n d l ; }
template <c l a s s TYPE > map<TYPE, s t r i n g > Huffman<TYPE> : : getHuffmanTable ( ) { return t a b l e ; }
Feb 2012 - May 2013
28
template <c l a s s TYPE > map<TYPE, int> Huffman<TYPE> : : getFrequencyMap ( ) { return f r e q t a b ; }
#endif Listing 5.3: The denition of the class CompressionWriting this class helps in writing the bits to the compressed le. #i f n d e f COMP H #define COMP H
#include #include <v e c t o r > #include <map> #include <s t r i n g > #include <f s t r e a m >
#include g l o b a l s . h #include b i t o p s . h #include Charnode . h
template<c l a s s TYPE > c l a s s CompressionWriting {

29
map<TYPE, s t r i n g > huffmanTable ; Charnode<TYPE huffmanTreeRoot ; > s t r i n g outputFilename ; s t r i n g inputFilename ;
map<TYPE, int> freqMap ;
private : int c o n v e r t S t r i n g T o B i t P a t t e r n ( s t r i n g s t r ) ; int totalNumOfBits ( void ) ;
public : CompressionWriting ( ) { }
CompressionWriting ( Charnode<TYPE r o o t , map<TYPE, s t r i n g > t a b l e , m >
void writeCompressedDataToFile ( ) ;
void d i s p l a y O u t p u t F i l e ( ) ;
void w r i t e H u f f m a n T r e e B i t P a t t e r n ( Charnode<TYPE t r e e , obstream &o u > };
template<c l a s s TYPE >
CompressionWriting<TYPE> : : CompressionWriting ( Charnode<TYPE r o o t , map<TYPE > { huffmanTreeRoot = r o o t ; huffmanTable = t a b l e ;

30
outputFilename = oname ; i n p u t F i l e n a m e = iname ;
freqMap = freMap ; }
template<c l a s s TYPE > void CompressionWriting<TYPE> : : writeCompressedDataToFile ( ) { LOG( \ nWriting P a t t e r n : \n ) ;
ibstream i n f i l e ( inputFilename . c s t r ( ) ) ; obstream o u t f i l e ( outputFilename . c s t r ( ) ) ;
o u t f i l e . w r i t e b i t s ( BITS PER INT , freqMap . s i z e ( ) ) ; w r i t e H u f f m a n T r e e B i t P a t t e r n ( huffmanTreeRoot , o u t f i l e ) ;
// W
o u t f i l e . w r i t e b i t s ( BITS PER INT , totalNumOfBits ( ) ) ;
// Writing Compressed Data
int i n b i t s ; i n f i l e . rewind ( ) ; while ( i n f i l e . r e a d b i t s (BITS PER WORD, i n b i t s ) ) {
// c o u t << (TYPE) i n b i t s << = << huffmanTable [ (TYPE) i n b i t
int b i t P a t t e r n = c o n v e r t S t r i n g T o B i t P a t t e r n ( huffmanTable [ (TY
// c o u t <<
<< b i t P a t t e r n << e n d l ;
31
Feb 2012 - May 2013
o u t f i l e . w r i t e b i t s ( huffmanTable [ (TYPE) i n b i t s ] . l e n g t h ( ) , b i t P }
outfile . flushbits (); i n f i l e . close (); outfile . close (); }
template <c l a s s TYPE > int CompressionWriting<TYPE> : : totalNumOfBits ( ) { int count = 0 ; int n = freqMap . s i z e ( ) ;
for (map<char , int > : : i t e r a t o r i i = freqMap . b e g i n ( ) ; i i != freqMap . en {
// Length o f each c h a r a c t e r code n u m o f t i m e s t h e char a p p
count += huffmanTable [ ( i i ) . f i r s t ] . l e n g t h ( ) ( i i ) . second ; }
LOG( Count = << count << e n d l ) ; return count ; }
template<c l a s s TYPE > int CompressionWriting<TYPE> : : c o n v e r t S t r i n g T o B i t P a t t e r n ( s t r i n g s t r )

32
{ int b i t P a t t e r n = 0 ; int n = s t r . l e n g t h ( ) ; for ( int i =0; i <n ; i ++) b i t P a t t e r n += ( 1 << ( ni 1)) ( s t r [ i ] 0 ) ;
return b i t P a t t e r n ; }
template<c l a s s TYPE > void CompressionWriting<TYPE> : : d i s p l a y O u t p u t F i l e ( ) { i b s t r e a m i n f i l e ( outputFilename . c s t r ( ) ) ; o f s t r e a m o u t f i l e ( xxx ) ;
cout << \ n D i s p l a y i n g Output F i l e : << e n d l ; int i n b i t s ; while ( i n f i l e . r e a d b i t s ( 1 , i n b i t s ) != f a l s e ) { cout <
void CompressionWriting<TYPE> : : w r i t e H u f f m a n T r e e B i t P a t t e r n ( Charnode<TYPE n > { i f ( node == NULL)

33
return ;
i f ( node>GetL eft ( ) == NULL && node>GetRight ( ) == NULL) { o u t f i l e . writebits (1 , 1); o u t f i l e . w r i t e b i t s (BITS PER WORD, node>GetChar ( ) ) ; }
else { o u t f i l e . writebits (1 , 0); w r i t e H u f f m a n T r e e B i t P a t t e r n ( node>GetL eft ( ) , o u t f i l e ) ; w r i t e H u f f m a n T r e e B i t P a t t e r n ( node>GetRight ( ) , o u t f i l e ) ; } }
#endif Listing 5.4: The main program of the human compression algorithm. #include<f s t r e a m > #include<c s t d i o > #include<a l g o r i t h m > #include #include<c s t r i n g > #include<map> #include<v e c t o r > #include<c s t d l i b >
#include Charnode . h
34
#include HuffmanCode . h #include g l o b a l s . h #include b i t o p s . h #include CompressionWriting . h
int main ( int argc , char argv [ ] ) { LOG( func );
i f ( a r g c != 3 ) { cout<< Usage <<argv [0]< h u f f ( argv [ 1 ] ) ; // h u f f . d i s p l a y C h a r a c t e r m a p ( ) ;
cout << e n d l << e n d l ; // h u f f . d i s p l a y H u f f m a n T a b l e ( ) ;
map<char , s t r i n g > huffmanTable = h u f f . getHuffmanTable ( ) ;
CompressionWriting<char> w r i t i n g O b j ( h u f f . getRoot ( ) , huffmanTable , h w r i t i n g O b j . writeCompressedDataToFile ( ) ; cout<<Done ! <<e n d l ; // w r i t i n g O b j . d i s p l a y O u t p u t F i l e ( ) ;
// t e s t ( ) ;
35
// c i n . g e t ( ) ; } Listing 5.5: The denition of the class Decompressor this class helps in decompressing the compressed le using human algorithm. #i f n d e f DECOMP H #define DECOMP H #include #include <v e c t o r > #include <map> #include <s t r i n g >
#include g l o b a l s . h #include b i t o p s . h #include Charnode . h
template <c l a s s TYPE > c l a s s Decompressor { Charnode<TYPE huffmanTreeRoot ; > s t r i n g outputFilename ; s t r i n g compressedFilename ; int numChars ;
private : i n l i n e int readCount ( i b s t r e a m &i b s ) ;
Feb 2012 - May 2013
36
void c o n s t r u c t T r e e ( Charnode<TYPE &, int n , i b s t r e a m &i b s ) ; >
void p r e o r d e r ( Charnode<TYPE node ) ; >
public : Decompressor ( ) { } Decompressor ( s t r i n g cname , s t r i n g oname ) ;
Decompressor ( ) ;
void decompress ( ) ;
void delNode ( Charnode<TYPE ) ; > };
template <c l a s s TYPE > Decompressor<TYPE> : : Decompressor ( s t r i n g cname , s t r i n g oname ) { outputFilename = oname ; compressedFilename = cname ; }
template <c l a s s TYPE > void Decompressor<TYPE> : : delNode ( Charnode<TYPE node ) > { i f ( node == NULL) return ;
i f ( node>GetL eft ( ) != NULL)

37
delNode ( node>GetL eft ( ) ) ; i f ( node>GetRight ( ) != NULL) delNode ( node>GetRight ( ) ) ;
delete node ; }
template <c l a s s TYPE > Decompressor<TYPE> : : Decompressor ( ) { LOG( func )
// delNode ( huffmanTreeRoot ) ; huffmanTreeRoot = NULL; }
template < c l a s s TYPE > int Decompressor<TYPE> : : readCount ( i b s t r e a m & i b s ) { int count = 0 ; i b s . r e a d b i t s ( BITS PER INT , count ) ; return count ; }
template <c l a s s TYPE > void Decompressor<TYPE> : : p r e o r d e r ( Charnode<TYPE node ) > { i f ( node == NULL) {
38
return ; }
cout << e n d l << node>GetChar ( ) ; p r e o r d e r ( node>GetL eft ( ) ) ; p r e o r d e r ( node>GetRight ( ) ) ; }
template <c l a s s TYPE > void Decompressor<TYPE> : : c o n s t r u c t T r e e ( Charnode<TYPE &node , int n , i b s t r > { i f ( n == 0 ) return ;
i f ( node != NULL && node>GetL eft ( ) != NULL && node>GetRight ( ) != N return ;
int b i t r e a d ; ibs . readbits (1 , bitread ) ; i f ( b i t r e a d == 1 ) { i b s . r e a d b i t s (BITS PER WORD, b i t r e a d ) ; node = new Charnode<TYPE>(( char ) b i t r e a d ) ; n; } else { node = new Charnode<TYPE>( \0 ) ;
39
Charnode<TYPE l e f t n o d e = node>GetL eft ( ) ; > Charnode<TYPE r i g h t n o d e = node>GetRight ( ) ; > constructTree ( leftnode , n , ibs ) ; constructTree ( rightnode , n , ibs ) ;
node>S e t L e f t ( l e f t n o d e ) ; node>S e t R i g h t ( r i g h t n o d e ) ; } }
template <c l a s s TYPE > void Decompressor<TYPE> : : decompress ( ) { // Read and b u i l d t h e t r e e
/ 1) Read t h e f i r s t 8 b i t s t h a t r e p r e s e n t s t h e count o f b i t s i n t h e 2) ( Reading t h e t r e e c o n t e n t s ) 0 i n d i c a t e s an i n t e r n a l node and
i s e n c o u n t e r e d i t s a l e a f and t h e n e x t 8 b i t s r e p r e s e n t t h a t
3) Thus read a l l t h e c h a r s i n t o a l i s t and c o n s t r u c t t h e huffman 4) Use t h e t r e e and decompress t h e f i l e . /
// S t e p 1
v e c t o r <Charnode<TYPE > a l l c h a r s ; >
i b s t r e a m c o m p r e s s e d F i l e ( compressedFilename . c s t r ( ) ) ; obstream o u t p u t F i l e ( outputFilename . c s t r ( ) ) ;
Feb 2012 - May 2013
40
int n = readCount ( c o m p r e s s e d F i l e ) ; LOG( Huffman Tree S i z e read = << n )
// S t e p 2
huffmanTreeRoot = NULL; //new Charnode<TYPE> ( \ 0 ) ; c o n s t r u c t T r e e ( huffmanTreeRoot , n , c o m p r e s s e d F i l e ) ; // p r e o r d e r ( huffmanTreeRoot ) ;
// S t e p 4 int i = readCount ( c o m p r e s s e d F i l e ) ; Charnode<TYPE t r a v e r s e r = huffmanTreeRoot ; > while ( i ) { int b i t r e a d ; compressedFile . readbits (1 , bitread ) ;
// c o u t << Read b i t = <GetRight ( ) : t r a v e r s e r >
// c o u t << > << t r a v e r s e r >GetChar ( ) << e n d l ;
i f ( t r a v e r s e r >GetL eft ()==NULL && t r a v e r s e r >GetRight()==NUL {
o u t p u t F i l e . w r i t e b i t s (BITS PER WORD, t r a v e r s e r >GetC // c o u t << Leaf = << t r a v e r s e r >GetChar ( ) << e n d
t r a v e r s e r = huffmanTreeRoot ; } i ;
41
} outputFile . close ( ) ; compressedFile . c l o s e ( ) ; }
#endif Listing 5.6: The main program of the human decompression algorithm. #include<f s t r e a m > #include<c s t d i o > #include<a l g o r i t h m > #include #include<c s t r i n g > #include<map> #include<v e c t o r > #include<c s t d l i b >
#include Charnode . h //#i n c l u d e HuffmanCode . h #include g l o b a l s . h #include b i t o p s . h #include Decompressor . h
int main ( int argc , char argv [ ] ) { LOG( func );
i f ( a r g c != 3 ) {
42
cout<< Usage <<argv [0]<< Input f i l e << O u t p u t f i l e \n ; exit (0); } Decompressor<char> c o m p r e s s e d f i l e ( argv [ 1 ] , argv [ 2 ] ) ; c o m p r e s s e d f i l e . decompress ( ) ;
// c i n . g e t ( ) ; // c i n . g e t ( ) ; }
Feb 2012 - May 2013
43
Appendix B: Screen shots
Appendix B : Screen Shots
Figure 5.1: The Data Compression Server window.
Figure 5.2: Creation of a new le from the server window.
Feb 2012 - May 2013
44
Figure 5.3: Compressing a le (google) at the server.
Figure 5.4: Compressing a le (samir.txt) at the server.
Feb 2012 - May 2013
45
Figure 5.5: The Data Compression Client window.
Figure 5.6: The Client after receiving a le from the server .
Feb 2012 - May 2013
46
Figure 5.7: The Client after receiving a le from the server.
Feb 2012 - May 2013
47

Data Compression Project-Huffman Algorithm

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Data Compression Project-Huffman Algorithm

Загружено:

Авторское право:

Доступные форматы

Data Compression Project

Samir Sheri USN:1RV09CS093 Satvik N USN:1RV09CS095

Signature of Guide (Name of the Guide)

Signature of Head of Department (Dr. N K Srinath) Name of Examiner 1:

Signature of Principal (Dr. B.S Sathyanarayana) Signature of Examiner

5 CONCLUSION AND FUTURE WORKS BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Software Requirements Specication

Data Compression Techniques

Dept. of CSE, R V C E, Bangalore.

Feb 2012 - May 2013

Chapter 2 REQUIREMENT SPECIFICATION

Data Compression Techniques

The Basic Idea

Dept. of CSE, R V C E, Bangalore.

Feb 2012 - May 2013

Data Compression Techniques

Building the Human Tree

Dept. of CSE, R V C E, Bangalore.

Feb 2012 - May 2013

Data Compression Techniques

Dept. of CSE, R V C E, Bangalore.

Feb 2012 - May 2013

Data Compression Techniques

Example Encoding Table

Dept. of CSE, R V C E, Bangalore.

Feb 2012 - May 2013

Storing the Human Tree

Data Compression Techniques

Creating the Human Table

Dept. of CSE, R V C E, Bangalore.

Feb 2012 - May 2013

Data Compression Techniques

Dept. of CSE, R V C E, Bangalore.

Feb 2012 - May 2013

Chapter 5 CONCLUSION AND FUTURE WORKS

Data Compression Techniques

Dept. of CSE, R V C E, Bangalore.

Feb 2012 - May 2013

Appendix A: Source Code

Data Compression Techniques

Appendices Appendix A : Source Code

#define DEBUG 1 #i f DEBUG #define LOG( s ) cout<<s<<e n d l ; #e l s e #define LOG( s ) // #endif

Appendix A: Source Code

Data Compression Techniques

void setChar (TYPE ch ) ; };

this>ch = New>ch ; this>count = New>count ;

Appendix A: Source Code

Data Compression Techniques

this> l e f t = New> l e f t ; this>r i g h t = New>r i g h t ; }

template <c l a s s TYPE > int Charnode<TYPE> : : GetCount ( ) { return count ; }

template <c l a s s TYPE > int Charnode<TYPE> : : Value ( ) { return count ; }

template <c l a s s TYPE > void Charnode<TYPE> : : S e t L e f t ( Charnode l e f t ) { this> l e f t = l e f t ; }

template <c l a s s TYPE > void Charnode<TYPE> : : S e t R i g h t ( Charnode r i g h t ) { this>r i g h t = r i g h t ; }

template <c l a s s TYPE > Charnode<TYPE Charnode<TYPE> : : GetLef t ( void ) >

Appendix A: Source Code

Data Compression Techniques

template <c l a s s TYPE > TYPE Charnode<TYPE> : : GetChar ( void ) { return ch ; }

template <c l a s s TYPE > void Charnode<TYPE> : : show ( ) { cout<<ch<< \ t <<count<<e n d l ; }

template <c l a s s TYPE > void Charnode<TYPE> : : setChar (TYPE ch )

Appendix A: Source Code

Data Compression Techniques

Dept. of CSE, R V C E, Bangalore.

Feb 2012 - May 2013

Appendix A: Source Code