Академический Документы
Профессиональный Документы
Культура Документы
КР В2
КР В2
: . . 6-77-1
..
: ..
2013.
:
..3
1. .4
1.1 .5
1.1.1. -5
1.1.2. .8
1.2 .9
2.
............................................16
2.1. (PPM)16
2.2. (LZ77 LZ78) .19
2.3 (BWT)......22
...26
- ..28
..30
.
(lossless)
(lossy). ,
,
.
,
(
).
,
. , ,
,
.
.
,
.
- .
(, . .), .
,
.
,
.
. , -
,
,
- ,
,
.
1.
( )
.
,
.
( ).
:
.
,
.
, ,
.
.
,
.
.
,
, , - , ,
.
, ,
. ,
.
.
.
,
.
. ,
, .
, -
, , ,
.
, .
,
, .
,
.
4
, -
(
).
,
.
,
.
,
.
1.1.
1.1.1.
,
(. Robert
Fano). ,
.
: ,
. ,
.
.
(. Shannon-Fano coding)
.
(, ).
,
,
() ,
,
.
(
, 1948 ) , , (
).
:
1.
. (ai1,...,aiN).
j = 1, 2,:, N - 1 aij
aij+1
2. - t 1: t := 1.
3. M ,
,
1/M.
G1, G2,..., GM:
G1 = (ai1, ai2, : ,ai k1),
G2 = (ai k1+1, ai k1+2, ... ,ai k2), :
4. t- .
GS bS, s = 1, 2,.., M.
5. - 1: t := t + 1.
5
6. - .
GS aj aj
.
, 2 ,
, 3 4.
t- .
, 5.
,
.
:
.
.
( ).
.
, .
.
. ,
;
. ,
. 1
0, .
, , .
n (n + 1)
. ,
.
,
.
, , ,
.
,
, .
:
A ( 50)
B ( 39)
C ( 18)
D ( 49)
E ( 35)
F ( 24)
. 1. -
,
.
, , ,
.
,
.
1.1.2.
. 1952
.
.
,
m2 .
:
.
- .
:
. . 1952 . :
,
, .
.
(..
), .
.
(-).
.
, ,
.
.
, .
,
.
, , 1,
0.
, , ,
. .
, :
15
7
6
6
5
,
,
, n0
. .
, ,
,
, (
).
,
.
0
100 101 110 111
,
. ,
, .
,
, 87 ( 2,2308 ).
117 (
3 ). , ,
, ~2,1858
, .. ,
, ,
0,05 .
. ,
, . ,
,
, .
,
:
( -),
. -,
,
2. -, ,
1 (, ),
.
. 2.
1.2.
. , ,
, .
9
[0, 1) [a,
b), -
.
.
[0, 1), ,
. ,
.
, [a, b), [c, d). [0, 1) [a, b). [c, d) [a, b) [a + (b - a) c, a + (b - a) d). . , [a, b),
, , [a, b )
[0, 1) [c, d). [c, d)
- .
. 3.
. 3.
.
.
,
. ()
.
,
.
. , ,
,
. ,
, .
, . , aaab.
3/4,
b - 1/4. ,
[0, 3/4), b - [3/4, 1) ( ).
.
10
aa [0+(3/4-0)-00+(3/4-0)-3/4) = [0,
9/16), - [0+(9/16-0) 0, 0+(9/16-0) 3/4) = [0,
27/64) , , b - [0+(27/64-0) 3/4,
0+(27/64-0)-1) = [81/256, 27/64).
96/256 = 3 /8.
0.011. , b
( , , ).
,
,
, .
, ,
81/256. , ,
81/256 0.01010001. ,
. , , ,
,
,
, .
, . . ,
,
.
[a, b ) p (
). ,
m- , m
- .
[0, 1) , m N
( N- ),
() [a, b). , , p,
.
.
[1/4, 1), b - [0, 1/ 4). , 11
[0, 1).
,
;
,
.
. ,
.
.
. [0, 1)
m- ,
.
. (
)
.
, , ,
. ,
.
IBM
.
.
, . , ,
-
.
.
,
.
.
, ,
, 1 ,
IBM.
. ,
.
,
.
, 010101
12
0101 s
.
:
.
1.
.
2. ,
.
3. ,
. ,
.
4. (3) .
5. .
.
left = 0
right = 1
while !eof
read(symb)
newRight = left + (right - left) * segment[symb].right //segment[symb]
[0; 1), symb
newLeft = left + (right - left) * segment[symb].left
left = newLeft
right = newRight
ans = (left + right) / 2
:
.
1.
, ,
, ,
. , ,
.
2. .
3. . (1-2) , ( ).
do
for i = 1 to n
if code >= segment[i].left && code < segment[i].right
13
write(segment[i].character)
code = (code segment[i].left) / (segment[i].right segment[i].left)
break
while (segment[i].character != eof)
, .
, .
, .
:
:
14
:
::
15
, ( ),
.
2.
.
, .
.
: ,
.
.
.
.
, ,
, - ,
, .
,
- .
.
2.1.
,
,
. , ,
. ,
16
,
.
.
,
,
, - .
, ,
.
, , ,
.
.
,
.
.
,
: PPM,
DMC, CTW ,
.
80- PPM,
. DMC,
PPM, .
PPM
.
. CTW
.
CTW, ,
,
.
-
.
PPM.
PPM (. Prediction by Partial Matching
) ,
. PPM
,
,
. PPM ,
, , , .
17
. -
,
.
- 'esc'. - ,
.
.
, ,
.
S PPM- M,
M.
S , ,
S. ,
S.
, S
. -1 ,
. ,
. ,
.
m
,
(M...m+1),
, S .
,
.
(exclusions).
, ,
. n PPM,
PPM(n).
PPM
, .
. PPM
, ,
. , , PPM-D,
, , ,
. ( , PPM-D
).
PPM
1980- . 1990- ,
PPM .
PPM
.
:
PPM ,
.
PPM[3]:
boa, PPMz (Ian Sutton)
18
2.2.
.
.
. ,
,
.
?
. , ,
.
. ,
.
,
,
.
.
,
,
. ,
, - ,
. . .
,
, .
,
,
.
, ,
().
,
.
19
,
,
. , , :
,
?
.
,
.
. ,
abac, ,
: a, ab, c, bac. ,
: abac = ab + a + c abac = a + bac.
, (greedy parsing),
. ,
(optimal parsing),
. -
.
, , ,
.
.
,
, . ,
.
(lazy matching).
.
, : a, ab, bac.
abac
: aba = ab +
a (
, c).
,
.
, ,
,
.
, , ,
()
.
,
.
.
,
,
. (
).
20
,
, , ,
.
, LZ77,
.
LZ77 LZ78 ,
(.) (.) 1977 1978 .
LZ*,
LZW, LZSS, LZMA .
,
, RLE . LZ77
,
, LZ78.
LZ77
.
.
, ,
. ,
LZ77 ,
,
. LZ77
:
(match length)
(offset) (distance)
, :
, .
, -
, ,
.
: 1 7
, . 7
, 1
?
: 7 () 1
. ,
,
-.
LZ77 . ,
. ,
21
< + >
. .
:
;
;
.
+1.
kabababababz
LZ78
LZ77, , LZ78
, (LZ78
, ).
,
.
, ,
,
, , .
. ,
.
2.3
()
,
.
,
.
,
. , ,
,
.
(Burrows-Wheeler transform, BWT,
- ,
) ,
. BWT bzip2.
.
BWT , BWT
.
22
,
. , BWT RLE
, ,
LZ.
, ( )
, .
(. . move to front, MTF)
.
BWT MTF/RLE ,
bzip2, LZH
.
,
BWT, . ,
, ,
.
, bucket
sort+qsort
ABABABAB bucket sort 2 A B,
, qsort
.
(radix sort), ,
.
BWT
,
( ) ,
.
LZH (gzip )
, .
BWT ( )
, PPM.
( )
:
.VANYA..VANYA.TANYA.MANYAVANYA
BWT :
ANYA.V VANYA
ANYA.T
ANYA.M
, ANYA,
.
V, T .
23
MTF ,
T M
:
BWT,
. .
, ,
,
. ,
, ,
, .
, :
SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES
* , ,
:
TEXYDST.E.XIIXIXXSMPPSS.B...S.EEUSFXDIOIIIIT
, . ,
.BANANA. BNN.AA.A
( ):
.BANANA.
..BANANA
A..BANAN
NA..BANA
.BANANA.
ANA..BAN
NANA..BA
ANANA..B
BANANA..
ANANA..B
ANA..BAN
A..BANAN
BANANA..
NANA..BA
NA..BANA
.BANANA.
..BANANA
BNN.AA.A
,
BWT . ,
(EOL) .
24
BNN.AA.A
1 1 2 2
B
N
N
.
A
A
.
A
A
A
A
B
N
N
.
.
BA
NA
NA
.B
AN
AN
..
A.
AN
AN
A.
BA
NA
NA
.B
..
3 3 4 4
BAN
NAN
NA.
.BA
ANA
ANA
A..
BAN
BANA
NANA
NA..
.BAN
25
ANAN
ANA.
A..B
BANA
ANA
ANA
..B
A..
NAN
NA.
.BA
..B
ANAN
ANA.
..BA
A..B
NANA
NA..
.BAN
..BA
5 5 6 6
BANAN
NANA.
NA..B
.BANA
ANANA
ANA..
..BAN
A..BA
ANANA
ANA..
A..BA
BANAN
NANA.
NA..B
.BANA
..BAN
BANANA
NANA..
NA..BA
.BANAN
ANANA.
ANA..B
..BANA
A..BAN
ANANA.
ANA..B
A..BAN
BANANA
NANA..
NA..BA
.BANAN
..BANA
7 7 8 8
BANANA.
NANA..B
NA..BAN
.BANANA
ANANA..
ANA..BA
..BANAN
A..BANA
ANANA..
ANA..BA
A..BANA
BANANA.
NANA..B
NA..BAN
.BANANA
..BANAN
BANANA..
NANA..BA
NA..BANA
.BANANA.
ANANA..B
ANA..BAN
..BANANA
A..BANAN
ANANA..B
ANA..BAN
A..BANAN
BANANA..
NANA..BA
NA..BANA
.BANANA.
..BANANA
.BANANA.
. BWT
,
.
BWT
. ,
, .
, ,
. , ,
.
, .
, 'EOL',
. BWT
.
, BWT .
:
.
26
.
,
,
.
, ,
.
.
, , ,
,
,
.
,
.
,
.
. , ,
, .
. ,
, ,
.
,
-
. , ,
. , , -
,
.
, -
.
,
.
27
-
#include
#include
#include
#include
#include
<unistd.h>
<stdlib.h>
<assert.h>
<stdio.h>
<string.h>
28
int i;
for(i=0; i<size; i++)
indices[i] = i;
rotlexcmp_buf = buf_in;
rottexcmp_bufsize = size;
qsort (indices, size, sizeof(int), rotlexcmp);
for (i=0; i<size; i++)
buf_out[i] = buf_in[(indices[i]+size-1)%size];
for (i=0; i<size; i++)
{
if (indices[i] == 1) {
*primary_index = i;
return;
}
}
assert (0);
}
void bwt_decode(byte *buf_in, byte *buf_out, int size, int primary_index)
{
byte F[size];
int buckets[256];
int i,j,k;
int indices[size];
for (i=0; i<256; i++)
buckets[i] = 0;
for (i=0; i<size; i++)
buckets[buf_in[i]] ++;
for (i=0,k=0; i<256; i++)
for (j=0; j<buckets[i]; j++)
F[k++] = i;
assert (k==size);
for (i=0,j=0; i<256; i++)
{
while (i>F[j] && j<size)
j++;
buckets[i] = j; // it will get fake values if there is no i in F, but
// that won't bring us any problems
}
for(i=0; i<size; i++)
indices[buckets[buf_in[i]]++] = i;
for(i=0,j=primary_index; i<size; i++)
{
buf_out[i] = buf_in[j];
j=indices[j];
}
}
29
int main()
{
byte buf1[] = "Wikipedia";
int size = strlen(buf1);
byte buf2[size];
byte buf3[size];
int primary_index;
bwt_encode (buf1, buf2, size, &primary_index);
bwt_decode (buf2, buf3, size, primary_index);
assert (!memcmp (buf1, buf3, size));
printf ("Result is the same as input, that is: <%.*s>\n", size, buf3);
return 0;}
:
1).. . , 2001.
2). . , . . . . .: , 1973.
3)A. Moffat, Implementing the PPM data compression scheme, IEEE Transactions on
Communications, Vol. 38 (11), pp. 19171921, November 1990.
4) , LZ77 "
" // ., 11.04.2007
5) M. Burrows and D. Wheeler. A block sorting lossless data compression algorithm.
Technical Report 124, Digital Equipment Corporation, 1994.
6) . , . , . , .
: = Introduction to Algorithms. 2- . .:
, 2006. 1296 . ISBN 0-07-013151-1
7). . , . .: , 2004.
368 . 3000 . ISBN 5-94836-027-X
8) . . 9. : //
: = Introduction to The Design and
Analysis of Aigorithms. .: , 2006. . 392398. ISBN 0-201-74395-7
9) http://algolist.manual.ru/compress/standard/huffman.php
10) http://algolist.manual.ru/compress/standard/shannon_fano.php
11) http://compression.ru/download/articles/huff/tiger_shannon-fano.html
12) http://habrahabr.ru/post/130531/
13) , , , .
. , 2011 . 1296 . ISBN
978-5-8459-0857-5, 5-8459-0857-4, 0-07-013151-1
14) http://www.compression.ru/arctest/descript/ppm-faq.htm
15) ., ., ., .. . : , 2003.
16) M. Crochemore, T.Lecroq. Text data compression algorithms. In: Atallah M.J. Ed.,
Algorithms and theory of computation handbook. Ch. 12. CRC Press, 1999.
17) ... . : ,
2001.
18) CCITT group 4. International telecommunication union, 1988.
19) M. Maniscalco, S. Puglisi, Faster lightweight suffix array construction, Proceedings of the
17th Australasian Workshop on Combinatorial Algorithms (AWOCA'06), 2006. pp.16-29.
20). K.M. Likhomanov, A.M. Shur. Two combinatorial criteria for BWT images. Computer
Science Theory and Applications. Proceedings of the 6th Symposium on Computer Science in
Russia. 2011. pp.385-396. [Lecture Notes in Computer Science Vol. 6651].
30
31