Вы находитесь на странице: 1из 9

ResM: Contents

CS 775 ResM
Research Methodology Formulation of Research Problem
Descriptive Statistics; Probability & Probability Distribution;
Random Variables;
Sampling Distribution;
Rajeev Kumar
Hypothesis Testing;
SC&SS, JNU Error Analysis & Accuracy;
cse.iitkgp.ac.in/~rkumar Regression Analysis; Multivariate Analysis.

Intranet: 172.16.6.7/~rkumarcs/resM/ Today:


Formulation of Research Problem
Lectures 9 10
Formulation of Research Problems Understanding of Algorithm Design, Implementation & DS
Info. Theory & Coding : Non-Statistical
(Includes 3rd Party Resources) Simplicity works the Best.

Normal Distribution: Video Codecs

99%
95%
n

Video Codecs : Huffman Encoding Huffman / Entropy Codes ... H pi log( pi )


i 1

12 34 0 54 0 0 0 0
87 0 0 12 0 0 0 0 Variable Length Codes
16 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 Optimal Codes
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 Uniqely Decoded
0 0 0 0 0 0 0 0
Pre-fix free codes Binary tree property
0 0 0 0 0 0 0 0
Non-deterministic
After zig-zag scanning the sequence of DCT coefficients to be transmitted looks like:
12 34 87 16 0 0 54 0 0 0 0 0 0 12 0 0 3 0 0 0 ..... Used for those sources which are transformed to yield
The DC coefficient (12) is sent via a separate Huffman table.
Pre-defined probability density function

After Run-Level parsing, the remaining coefficients and associated runs of zeros are:
34 | 87 | 16 | 0 0 54 | 0 0 0 0 0 0 12 | 0 0 3 | 0 0 0 ..... Statistics is wonderful if used appropriately
in absence of such Statistical Properties in Source
Universal Codecs

Formulation of Research Problem: Formulation of Research Problem:



In the context of
Descriptive Statistics Given: A Data Source
Probability & Probability Distribution
Info. Theory & Coding Process: Repeated Patterns : A Dictionary Approach
Transform the Source s.t. Statistical Properties could be used
Output: Codes (lossless)
Next, in the absence of such Statistical Properties in the Source
Universal Codecs

Design an Algorithm

encodes a string instead of a character


Again a Greedy Algo. (Be greedy, Dont be too greedy)
Formulation of Research Problem:
Dictionary ?

AAAAAAAAAAAAAAAA
Given: A Data Source
Process: Repeated Patterns : A Dictionary Approach ABABABABABABABABABAAB
Output: Codes (lossless) ABCABCABCABCABCABCABCABCABC

Sample Sources: Let us make a dictionary


AAAAAAAAAAAAAAAA
???
ABABABABABABABABABAAB

ABCABCABCABCABCABCABCABCABC

Devise a Universal Algo.

Again, an MIT Student is one of the contributors

Issues in Dictionary ? Issues & Challenges ???


AAAAAAAAAAAAAAAA
ABABABABABABABABABAAB Variable Length Codes ?
ABCABCABCABCABCABCABCABCABC
Non-deterministic ?
Can I have a Universal Dictionary ?
Same for Encoder & Decoder Optimal Codes ?
On-the-fly
Longest Match
Uniqely Decoded ?
Symmetric Codec
Least Carry Forward to Decoder
Pre-fix free codes ?
Uniqely Decodable
...
Dictionary-Based Compression
The compression algorithms we studied so far use a statistical
model to encode single symbols
Compression: Encode symbols into bit strings that use fewer bits.

Dictionary-based algorithms do not encode single symbols as


variable-length bit strings; they encode variable-length strings of
Non-Statistical Codecs symbols as single tokens
The tokens form an index into a phrase dictionary
Dictionary Based Codec If the tokens are smaller than the phrases they replace, compression
occurs.

Dictionary-based compression is easier to understand because it


Lempel - Ziv- Welch (1977, 78, 84) uses a strategy that programmers are familiar with
using indexes into databases to retrieve information from
(Ziv: An MIT Graduate) large amounts of storage.
Telephone numbers
Postal codes

Introduction to LZW Introduction to LZW . . .


As mentioned earlier, static coding schemes require some Codes 0-255 in the code table are always assigned to
knowledge about the data before encoding takes place. represent single bytes from the input file.

Universal coding schemes, like LZW, do not require When encoding begins the code table contains only the first
advance knowledge and can build such knowledge on-the- 256 entries, with the remainder of the table being blanks.
fly.
Compression is achieved by using codes 256 through 4095
to represent sequences of bytes.
LZW is the foremost technique for general purpose data
compression due to its simplicity and versatility.
As the encoding continues, LZW identifies repeated
sequences in the data, and adds them to the code table.
It is the basis of many PC utilities that claim to double the
capacity of your hard drive Decoding is achieved by taking each code from the
compressed file, and translating it through the code table
LZW compression uses a code table, with 4096 as a to find what character or characters it represents.
common choice for the number of table entries.
LZW Encoding Algorithm Example 1: Compression using LZW

1 Initialize table with single character strings Example 1: Use the LZW algorithm to compress the string
2 P = first input character
3 WHILE not end of input stream
BABAABAAA
4 C = next input character
5 IF P + C is in the string table
6 P=P+C
7 ELSE
8 output the code for P
9 add P + C to the string table
10 P=C
11 END WHILE

12 output code for P

Example 1: LZW Compression Step 1 Example 1: LZW Compression Step 2

BABAABAAA P=A BABAABAAA P=B


C=empty C=empty
ENCODER OUTPUT STRING TABLE ENCODER OUTPUT STRING TABLE
output code representing codeword string output code representing codeword string
66 B 256 BA 66 B 256 BA
65 A 257 AB
Example 1: LZW Compression Step 3 Example 1: LZW Compression Step 4

BABAABAAA P=A BABAABAAA P=A


C=empty C=empty
ENCODER OUTPUT STRING TABLE ENCODER OUTPUT STRING TABLE
output code representing codeword string output code representing codeword string
66 B 256 BA 66 B 256 BA
65 A 257 AB 65 A 257 AB
256 BA 258 BAA 256 BA 258 BAA
257 AB 259 ABA

Example 1: LZW Compression Step 5 Example 1: LZW Compression Step 6

BABAABAAA P=A BABAABAAA P=AA


C=A C=empty
ENCODER OUTPUT STRING TABLE ENCODER OUTPUT STRING TABLE
output code representing codeword string output code representing codeword string
66 B 256 BA 66 B 256 BA
65 A 257 AB 65 A 257 AB
256 BA 258 BAA 256 BA 258 BAA
257 AB 259 ABA 257 AB 259 ABA
65 A 260 AA 65 A 260 AA
260 AA
LZW Decompression LZW Decompression Algorithm

1 Initialize table with single character strings


The LZW decompressor creates the same string table 2 OLD = first input code
during decompression. 3 output translation of OLD
4 WHILE not end of input stream
5 NEW = next input code
It starts with the first 256 table entries initialized to single 6 IF NEW is not in the string table // kwk syndrome
characters. 7 S = translation of OLD
8 S=S+C
The string table is updated for each character in the input 9 ELSE
10 S = translation of NEW
stream, except the first one.
11 output S
12 C = first character of S
Decoding achieved by reading codes and translating them 13 OLD + C to the string table
through the code table being built. 14 OLD = NEW
15 END WHILE

Example 2: LZW Decompression 1 Example 2: LZW Decompression Step 1


Example 2: Use LZW to decompress the output sequence of
<66><65><256><257><65><260> Old = 65 S=A
Example 1:
New = 66 C=A

<66><65><256><257><65><260>. ENCODER OUTPUT STRING TABLE


string codeword string
B
A 256 BA
Example 2: LZW Decompression Step 2 Example 2: LZW Decompression Step 3

<66><65><256><257><65><260> Old = 256 S = BA <66><65><256><257><65><260> Old = 257 S = AB


New = 256 C = B New = 257 C = A

ENCODER OUTPUT STRING TABLE ENCODER OUTPUT STRING TABLE


string codeword string string codeword string
B B
A 256 BA A 256 BA
BA 257 AB BA 257 AB
AB 258 BAA

Example 2: LZW Decompression Step 4 Example 2: LZW Decompression Step 5

<66><65><256><257><65><260> Old = 65 S = A <66><65><256><257><65><260> Old = 260 S = AA


New = 65 C = A New = 260 C = A

ENCODER OUTPUT STRING TABLE ENCODER OUTPUT STRING TABLE


string codeword string string codeword string
B B
A 256 BA A 256 BA
BA 257 AB BA 257 AB
AB 258 BAA AB 258 BAA
A 259 ABA A 259 ABA
AA 260 AA
LZW: Some Notes LZW: Limitations
This algorithm compresses repetitive sequences of data What happens when the dictionary gets too large (i.e., when all the
well. 4096 locations have been used)?
Here are some options usually implemented:

Since the codewords are 12 bits, any single encoded Simply forget about adding any more entries and use the table as
character will expand the data size rather than reduce it. is.

In this example, 72 bits are represented with 72 bits of Throw the dictionary away when it reaches a certain size.
data. After a reasonable string table is built, compression
improves dramatically. Throw the dictionary away when it is no longer effective at
compression.

Advantages of LZW over Huffman: Clear entries 256-4095 and start building the dictionary again.
LZW requires no prior information about the input data stream.
LZW can compress the input stream in one single pass. Some clever schemes rebuild a string table from the last N
Another advantage of LZW its simplicity, allowing fast execution. input characters.

Conclusions: Info Theory

Variable Length Codes


Optimal Codes
Uniqely Decoded
Pre-fix free codes Binary tree property
Non-deterministic
Used for those sources which are transformed to yield
Pre-defined probability density function

Statistics is wonderful if used appropriately

If no such property exists, look for laternate ways,


as in this case LZW Codec

Вам также может понравиться