Академический Документы
Профессиональный Документы
Культура Документы
Still image
24 bits/pixel, 512 x 512 pixel/image results in 512 x 512 x 24 = 8 Mbit/image
Audio
CD quality, sampling rate 44,1 KHz, 16 bits per sample results in 44,1 x 16 = 706 kbit/s stereo: 1,412 Mbit/s
Video
Full-size frame 1024 x 768 pixel/frame, 24 bits/pixel, 30 frames/s results in 1024 x 768 x 24 x 30 = 566 Mbit/s.
compression
Data compression is the representation of an information source (e.g. a data file, a speech signal, an image, or a video signal) as accurately as possible using the fewest number of bits. Compressed data can only be understood if the decoding method is known by the receiver.
compression
Why data compression?
Data storage and transmission cost money. This cost increases with the amount of data available. This cost can be reduced by processing data so that it takes less memory and less transmission time.
Disadvantage of data compression: compressed data must be decompressed to be viewed, thus extra processing is required. Compression is possible because information usually contains redundancies, or information that is often repeated, examples include reoccurring letters, numbers or pixels, and compression programs remove this redundancy.
Lossy Compression
There is a difference between the original object and the reconstructed object Physiological and psychological properties of the ear and eye are taken into account The aim is to obtain the best possible fidelity for a given bit-rate or minimizing the bit-rate to achieve a given fidelity measure. Video and audio compression techniques are most suited to this form of compression. Lossy techniques usually achieve higher compression rates than lossless ones but the latter are more accurate.
Lossless techniques are classified into static, adaptive (or dynamic), and hybrid.
In a static method the mapping from the set of messages to the set of code-words is fixed before transmission begins, so that a given message is represented by the same codeword every time it appears in the message being encoded.
Static coding requires two passes: one pass to compute probabilities (or frequencies) and determine the mapping, and a second pass to encode.
In an adaptive method the mapping from the set of messages to the set of code-words changes over time.
Mean codeword may vary from one transfer to another Note: have to have same set of codewords at the transmitter and the receiver All of the adaptive methods are one-pass methods; only one scan of the message is required.
An algorithm may also be a hybrid, neither completely static nor completely dynamic
These methods are fairly straight forward to understand and implement. Their simplicity is their downfall in terms of attaining the best compression ratios. However, the methods have their applications, as mentioned below:
Simple Repetition Suppression Run-length Encoding
Run-length Encoding
Principle
Replace all repetitions of the same symbol in the text (runs) by a repetition counter and the symbol.
Run-length Encoding
This encoding method is frequently applied to images (or pixels in a scan line). It is a small compression component used in JPEG compression. In this instance, sequences of image elements are mapped to pairs where ci represent image intensity or colour and li the length of the ith run of pixels (Not dissimilar to zero length supression above).
Example 2:
Original Sequence: It can be encoded as: 111122233333311112222 (1,4),(2,3),(3,6),(1,4),(2,4)
This is a simple form of statistical encoding. Here we substitute a frequently repeating pattern(s) with a code. The code is shorter than the pattern giving us compression. A simple Pattern Substitution scheme could employ predefined code (for example replace all occurrences of `The' with the code '&').
A predefined symbol table may be used i.e. assign code i to token i. However, it is more usual to dynamically assign codes to tokens. The entropy encoding schemes basically attempt to decide the optimum assignment of codes to achieve the best compression.
Lossless compression frequently involves some form of entropy encoding and are based in information theoretic techniques. Shannon is father of information theory and we briefly summaries information theory before looking at specific entropy encoding methods.
Where -pi is the probability that symbol Si in S will occur. indicates the amount of information contained in Si, i.e., the number of bits needed to code Si.
This is a basic information theoretic algorithm. A simple example will be used to illustrate the algorithm: Symbol A B C D E ---------------------------------Count 15 7 6 6 5
Symbol ---------A
B C D E
Huffman Coding
Lossless Compression Algorithms
Huffman coding is based on the frequency of occurrence of a data item (pixel in images). The principle is to use a lower number of bits to encode the data that occurs more frequently. Codes are stored in a Code Book which may be constructed for each image or a set of images. In all cases the code book plus encoded data must be transmitted to enable decoding.
Huffman Coding
Lossless Compression Algorithms
Find the frequency of each character in the file to be compressed For each distinct character create a one-node binary tree containing the character and its frequency as its priority While (there are more than one tree in the priority queue)
De-queue two trees t1 and t2 Create a tree t that contains t1 and its left sub-tree and t2 as its right sub-tree (1*) Priority (t) = priority (t1) + priority (t2) Insert t in its proper location in the propriety queue (2*)
Assign 0 and 1 weights to the edges of the resulting tree, such that the left and right edge of each node do not have the same weight (3*)
EXAMPLE
Character Frequency a 45 E 65 L 13 n 45 o 18 s 22 t 53
If the message is sent uncompressed with 8-bit ASCII representation for the characters, we have 261*8 = 2088 bits. Assuming that the number of character-codeword pairs and the pairs are included at the beginning of the binary file containing the compressed message in the following format:
Number of bits for the transmitted file = bits(7) + bits(characters) + bits(codewords) + bits(compressed message) = 3 + (7*8) + 21 + 696 = 776 Compression ratio = bits for ASCII representation / number of bits transmitted = 2088 / 776 = 2.69 Thus, the size of the transmitted file is 100 / 2.69 = 37% of the original ASCII file (i.e., 37% compression has been achieved)
Construction of tree
Sender and receiver begin with an initial tree consisting of a root node and a left child with a null character and weight=0 First character is sent uncompressed and is added to the tree as the right branch from the root. The new node is labeled with the character, its weight is 1 and the tree branch is labeled 1 also A list shown the tree entries in order Whenever a new character appears in the message, it is sent as follows: Send the uncompressed representation of the new character Place the new character into the tree and update the list representation Example: Banana
A(3) 1 *(0)
3
N(2) B(1)
Continue.
Huffman coding creates separated code-word for each character If transmitted letter is existed in current tree then only transmit its code else transmit uncompressed letter and add it to the tree. Each step must update tree
LZW (Lempel-Ziv-Welch)
The compressor algorithm builds a string translation table from the text being compressed
The string translation table maps fixed-length codes to strings The string is initialized with all single-character strings (256) entries in the case of 8-bit characters As the compressor character serially examines in the text, it stores every unique twocharacter string into the table as a code/character concatenation, with the code mapping to the corresponding first character Whenever a previously-encountered string is read from the input, the longest such previously-encountered string is determined, and then the code for this string concatenated with the extension character (the next character in the input) is stored in the table
Decoding LZW data is the reverse of encoding. The decompressor reads a code from the encoded data stream and adds the code to the data dictionary if it is not already there. The code is then translated into the string it represents and is written to the uncompressed output stream.
The encoding algorithm can summarized as follows w = NIL; while ( read a character k ) { if wk exists in the dictionary w = wk; else add wk to the dictionary; output the code for w; w = k; }
w k output index symbol ----------------------------------------NIL ^ ^ W ^ 256 ^W W E W 257 WE E D E 258 ED D ^ D 259 D^ ^ W ^W E 256 260 ^WE E ^ E 261 E^ ^ W ^W E ^WE E 260 262 ^WEE E ^ E^ W 261 263 E^W W E WE B 257 264 WEB B ^ B 265 B^ ^ W ^W E ^WE T 260 266 ^WET T EOF T
A 19 symbol input has been reduced to 7 symbol plus 5 code output . Usually, compression doesnt start until a large number of bytes (e.g., >100) are read in
The LZW decompression algorithm as follows read a character k; output k; w = k; while ( read a character k ) /* k could be a character or a code. */ { entry = dictionary entry for k; output entry; add w + entry[0] to dictionary; w = entry; } Example (continued): Input string is "^WED<256>E<260><261><257>B<260>T" w k output index symbol ------------------------------------------^ ^ ^ W W 256 ^W W E E 257 WE E D D 258 ED D <256> ^W 259 D^ <256> E E 260 ^WE E <260> ^WE 261 E^ <260> <261> E^ 262 ^WEE <261> <257> WE 263 E^W <257> B B 264 WEB B <260> ^WE 265 B^ <260> T T 266 ^WET
Advantages
LZW compression provides a better compression ratio, in most applications, than any well-known method available up to that time It usually runs very fast, as the bit parsing is easy and table lookup is automatic
Disadvantages
Substantial memory requirements