Compression ICCIT 2011 RAJON

An Enhanced Scheme for Lossless Compression of Short
Text for Resource Constrained Devices

Md. Rafiqul Islam and S. A. Ahsan Rajon

The Department of Computer Science, American International University Bangladesh, AIUB, Dhaka, Bangladesh.

Computer Science and Engineering Discipline, Khulna University, Khulna-9208, Bangladesh.
dmri1978@yahoo.com, ahsan.rajon@gmail.com

Abstract
Text compression is an elementary concern for data
engineering and management. The rapid use of battery
powered small memory smart devices especially;
mobile phones and wireless sensors for communication
and monitoring have turned short text compression into
a more important and prevailing research arena than
large scale text compression. In this paper, we present
an effective approach of short English text compression
for smart devices. This paper also provides a review on
recent research on text data compression. The main
target of this research is to provide a light-weight
compression scheme which is computationally simple
and the storage required for compressing the source
text is also lower. The main contribution of this paper is
the integration of statistical text ranking or statistical
component categorization with static coding in
obtaining the compression. We have also presented a
mathematical model of our proposed scheme in terms of
compression parameters. The obtained compression
ratio indicates a better performance in terms of
resource consumption including better compression
ratio, lower compression and decompression time with
reduced memory requirements and lower complexity.
This paper also incorporates an extensive analysis on
power consumption for presented scheme. In overall
analysis, the simplicity of computational requirement
encompasses the compression effective and efficient.
Keywords: Data Compression, Text Ranking, Type-
to-Token Ratio (TTR), Smart Device, Data
Management.
I. INTRODUCTION
Data Compression is an age-old research issue in the
arena of Data Engineering and Management. With the
advancement of computer memory and storage the
necessity of compressing huge volume of data for
saving storage requirement has been reduced. On the
contrary, with the emerging growth of embedded
devices, compressing small text has got momentum in
order to cope with the low memory of embedded
devices. Compression of small data to make the data
smaller using limited computational overhead and lower
processing power and operating memory is particularly
complex and also challenging at the same time because,
a great portion of existing data compression algorithms
makes use of knowledge-base, creation and
manipulation of which is computationally expensive.
Smart devices have got tremendous popularity due to
its lower-cost, availability and portability. With the
aspects of cost, facility and reliability, a new trend of
introducing the small sized devices with greater sorts of
computing and communicating power have established
its place in the arena of recent research. It is now a great
concern to embed maximum necessary applications
within these smart devices where it is an extreme
problem to re-design and provide a low-complex and
low-memory consuming version of the applications for
smart devices. Consequently, most of the applications
which generally requires large memory and greater
processing speed are still now unavailable for smart
devices. In case of communication, this is equally true.
Wireless communication is extremely a bandwidth
limited communication system and hence, it is utmost
important to provide minimum traffic on wireless
communication. In order to minimize the data traffic
concerning short message communication through
cellular phones, compression of the short message may
be a nice policy. For data transmission and
communication between two mobile devices through
Bluetooth and infrared, it is also effective to have
compacted data. That is why; our aim is to make short
texts shorter.

II. SHORT TEXT COMPRESSION: BASIC
CONCEPTS
Compression of short text for energy constrained devices
is fundamentally different from compression of huge
text. Firstly, for energy constrained devices it is a must
to minimize the number of required operations to
compress the text in order to ensure normal life-time of
the devices. Secondly, small text compression for energy
constrained devices is normally viewed as a multi-
objective optimization problem with the objective of
compression ratio and compression-decompression time
(i.e. performance) and energy consumption (i.e.
complexity), whereas, the main target of generic
compression scheme is achieving better compression
ratio at a reduced compression decompression time.
Thirdly, generic compression schemes often consider
building of dictionaries on the fly, which in most
researches has been discouraged for compression of
short text since, construction of the dictionary on the fly
will undoubtedly include additional operational cost on

the devices which should be avoided. Fourthly, the
memory overhead for operation as well as for the
dictionary should be reduced as much as possible for
compression of text for energy constrained devices
whereas, for generic text compression schemes this
constraint is a secondary aspect. Fifthly, recent generic
text compression schemes consider (and have
successfully proved and implemented) online
dictionaries with the provision of sending the dictionary
along with the compressed data in order to enhance the
compression. Often the dictionary itself is made
compressed and thus, for the cases of compressing large
amount of text, it is beneficial. But it is of no doubt, for
compression of very small text it will result into an
expanded file. Finally, since, the characteristics and
sentence pattern i.e. morphological structure of small
text is not the same as that of huge text. As a result, the
compression schemes sophisticated for huge text
compression often fails to compress small text
successfully.
III. RELATED RESEARCH: A SURVEY
In [17], Batista and Alexandre considered the generic
properties of textual data and hence propose applying
number of transformations that make the redundancy
more visible to the compressor to improve
compression performance. They propose the creation of
online dictionaries presenting different alternatives for
word orderings - by frequency, prefix, suffix and
frequency H size-of-word - and a new implementation of
the Capital Conversion technique. In their
implementation of Capital Conversion, they use two
flags to deal with words having capital letters. Flags that
encode words with first capital letter and totally
uppercase words are used in the same way. When the
word starts with a few uppercase letters and ends with
lowercase letters, the uppercase flag is used for the first
part and a flag that encodes a word with first capital
letter is inserted before the last capital letter [17]. They
also argue that, since providing dictionary in advance
restricts the data compression scheme into language
dependent and also makes unsuitable for files with
specific vocabulary, it is preferable to employ the
schemes creating dictionaries considering the input text
and append it to the pre-processed file to be compressed.
For small files, they consider only words that occur at
least 25 times over the text to form the dictionaries
which practically makes this scheme unusable for the
smaller files that is the subject of our paper.
Recent literature regarding short text compression titled
Compression of Short Text on Embedded Systems by
Rein et al. [1] proposes a low-complexity version of
PPM (Prediction by Partial Matching). They use
statistical models with hash table lengths of 16384,
32768, 65536 and 131072 elements requiring two bytes
for each element, which result an allocation of 32, 64,
128 and 256 Kbytes of RAM respectively. If this
memory requirement may be substantially decreased, we
may achieve more efficient compression and hence may
make the scheme usable to even very low-quality
cellular phones. Another concerned approach by Rein et
al. is Low Complexity Compression of Short Message
[2] with Low Complex and Power Efficient Text
Compressor for Cellular and Sensor Networks [3] are
variations of [1].
The research presented in [21] exploits a modified
genetic programming (GP) approach for solving the data
compression problem. Compression of Small Text
Files Using Syllables proposed by Lansky et al. [4, 5]
concerns on compressing small text files using syllables.
To implement their concept they created database of
frequent syllables. Here, condition for adding syllable to
database is that, its frequency is greater than 1:65000. In
this scheme, the primary knowledge-base size is more
than 4000 entries initially. For low memory devices, it is
obviously difficult to afford this amount of storage as
well as to facilitate a well suited mechanism of
searching; which leads our proposed scheme to redefine
the knowledge-base span as short as possible and hence
to reduce the scope of loosely choosing the syllables or
n-grams. Moreover, in formation of the syllables, space
is not considered with any special concern. But, as in
any text document, it is a common assumption that, at
least 20% of the total characters may be spaces, it may
be a good idea to have specific consideration of syllable
involving spaces. In [4, 5], all the training syllable
entries are stored without any categorization. This often
results for coding redundancy, which can be handled by
integrating text ranking or component categorization
scheme with syllable selection.
In the most recent scheme of text compression,
Wichaiwong et al. [18] has presented a new
compression algorithm using Capitalization. The
mechanism has three steps: Firstly, Remove White
space. Secondly, Compressing data to UpperCamelCase
capitalization style and lastly, to Decompress
compressed data. Though compressed data is easy to
read and understand like naming convention in several
programming languages, the compression performance
is poor and more importantly this scheme doesnt
provide any exact reproduction of source data; that is
the scheme is not completely a lossless compression
scheme.
In [1] the compression starts for text sequences larger
than 75 Bytes, and in [10] the starting point is 50 Bytes.
If it is possible to make the lower threshold value into
less than ten characters, the compression may really be
a very small text file supported one that may place a
new milestone in very small text file compression
ensuring short text gets shortest. Our prime aim is to
design such an effective and efficient very short text
compression scheme.

IV. PROPOSED SCHEME FOR
COMPRESSION OF SHORT TEXT FOR
ENERGY CONSTRAINED DEVICES
The main idea of this paper is compression of short
text rather than compressing huge amount of text. This
paper mainly proposes a novel scheme for construction
of the dictionary to be used for compression of small
text. This paper makes improvement of various existing
ideas on text compression including capital conversion,
word replacement, prefix replacement, frequency and
prefix based ordering of the dictionary. The proposed
approach of text compression has been greatly benefited
from the earlier works of Batista and Alexamdre [17]
and Lansky and Zemlicka [7]. Since, the scheme
presented in [17] is for large text files, we propose a
couple of modification in [17] to keep it up with small
text compression. In fact, the core modification lies on
the basic part of their scheme which is construction of
the dictionary that facilitates us to compress very short
text. For construction of the dictionary, i.e. choosing the
entries of the dictionary, [17] takes only frequency
aspect of the word into account, whereas we propose a
combination of three parameters for choosing the
dictionary entries which is a very new aspect.
Moreover, the paper [17] considers words as the basic
unit, whereas in proposed scheme, we consider syllables
as the unit component of text compression. In [7], it has
been proved by Lansky and Zemlicka that, for
compression of short text, consideration of syllables is
much more advantageous than taking words as the
primary element of dictionary. In fact, in our proposed
scheme we use a slightly different concept of syllable
which opens up the greatest opportunity to compress
text. We consider blank spaces as a part of pseudo-
vowel which ultimately includes blank spaces into
syllable rather than isolating as a distinct symbol.
Additionally, unlike [17], our scheme doesnt
recommend use of online dictionary since, for
compression of very small files, attaching dictionary (no
matter, whether dictionary is compressed or not) will
definitely increase the size of the file and the main aim
of compression would suffer. We propose a static
dictionary for ensuring the low memory requirement as
well as guaranteeing computational simplicity.
For our research it is neither primarily important nor of
supreme concern to decompose the words into syllables
accurately; rather it is our aim to get a group of words
that occur quite frequently. We consider the definition
of syllable as used in [7] stating Syllable is a sequence
of sounds, which contains exactly one maximal
subsequence of vowels [7]. This definition implies that
the number of syllables in a word is equal to the number
of maximal sequences of vowels in the same word [7].
Modified Capital Conversion for text-units is motivated
from the concept presented in [17]. It suggests two flags
that encode text-units with first capital letter and totally
uppercase text-units respectively. When the text-unit
starts with a few uppercase letters and ends with
lowercase letters, the uppercase flag is used for the first
part but, instead of using a third flag (which was
proposed in the original research proposing Capital
Conversion) to separate it from the lowercase part, the
same flag that encodes a text-unit with first capital letter
is inserted before the last capital letter [17].
The choice of the new sizes was motivated by the work
of Alexander Ratushniak and [17]. In [17] three groups
with the following amount of bytes: 80 for the first
group, 32 for the second group and 16 for the third,
allowing that the second symbol of the code-word
might be both from the second and third groups. This
gives a total of 80 + (80*32) + (80*16) + (80*32*16) =
44880 code-words available [17]. Alike [17], this span
of the dictionary is considered acceptable amount of
text-units.

A. Construction of the Dictionary
The main contribution of our proposed scheme resides
into the novel aspects of construction of the dictionary.
Choosing entries of the dictionary is particularly
important for having better compression, because
ensuring a well organization of the dictionary will
definitely reduce the search overhead of the codes and
hence speed up the compression. Number of elements to
be considered for accommodating into the dictionary is
also important especially for battery powered devices
since greater number of elements in the dictionary will
lead towards greater consumption of operating memory.
We employ three parameters for choosing any text-unit
to be included into dictionary. These are as follows:
Availability is a parameter indicating the ratio of the
total occurrence of the text-unit with the total text-units
into which the concerned document may be
decomposed. If a document D consists of total N text-
units (0 < N), and concerned text unit t
i
(t
i
D) occurs n
i

(0 < n
i
<N) times, then, availability is defined as
A
i
= n
i
/ N
Recurrence factor is the second criteria for choosing any
entry to be included into the dictionary. It is a novel
concept and in fact is the key point of accelerating the
compression gain. Recurrence factor is a measure which
indicates the average gap of the occurrence of each text-
unit. We propose this parameter in order to select the
text entry when there appear a number of text units with
same frequency. Whenever we have multiple text entries
with same availability index, the texts are ranked
according to the increasing value of recurrence. The
greater the value the distance between two consecutive
occurrences is greater. As a result, the chance to be
repeated into the short text of interest is lower.
If for a document D the concerned text unit t
i
(t
i
D)
occurs n
i
times (0 < n
i
<N; where N is the total number
of text units comprising the document) into the text and
the starting positions of occurrences are p
1
, p
2
, p
3
, ..,
p
n
. Recurrence factor is calculates as
R
i
=((p
2
p
1
) +( p
3
p
2
)

+ + (p
n
p
n-1
)) / n
i

R
=
1
n
i
_ (p
]
- p
]-1
)
n
]=2

In case of overrun of the length of t
text-units are sorted according to the i
the recurrence factor, since the greater
greater the interval of the two consecu
the text unit and consequently, for com
text, the lower the probability. Simila
value of recurrence factor, the lower
two consecutive occurrence of th
therefore, for compression of short te
probability.
The third parameter we consider is or
concept of Type to Token Ratio (TT
has been considered as a means of
corpora and Natural Language Pr
including Information Retrieval, takin
it is possible to be benefitted i
compression. TTR is a measurement o
a previously encountered words r
before a new word makes its appearan
TTR is calculated by dividing the tota
tokens by the total number of distinct
relation between TTR and word-distrib
If TTR is larger (greater) then, the num
matching word is also greater; conseq
distinct words is smaller. If TTR is sm
the number (frequency) of matching
consequently, number of distinct word
are a total of w words in a document a
distinct words be d, (d w) then, TTR
I =
w
d

If for a document D the concerned
occurs n
i
times (0 < n
i
<N; where N i
of text units comprising the docume
document and the number of distinct w
then, the third parameter uniqueness f
as
u
=
n
i
d

This parameter contributes a lot fo
short text since; TTR provides an insi
document and reports its suitability (
for small text compression concerns.
TTR alike parameters into cons
influence the construction of the dict
weighting the text-unit into considerat
influence the total compression appr
point of privilege into consideration,
we determine its type to token factor
type to token ratio with inverse a
Inverse availability factor is defined as
the availability factor from the max
factor.
The overall steps for construction
are bit simplistic. At first for any
(preferably document with lower TTR
is applied. In case conversion step, if t
consists only uppercase letters, it
sentence case. Otherwise the documen
it is. The next step involves dividing
text-units as described into the
the dictionary, the
increasing value of
the value of R
i
the
utive occurrence of
mpression of short
arly, the lower the
is the interval of
he text unit and
ext, the greater the
riginated from the
TR). Though TTR
forming linguistic
rocessing corpora
ng it into concerns,
in case of data
of how many times
repeat themselves
nce in the text [2].
al number of word
t words. The basic
bution is: [20]
mber (frequency) of
quently, number of
maller (lower) then,
g word is smaller;
d is larger. If there
and the number of
is expressed as
text unit t
i
(t
i
D)
is the total number
ent) into the text,
words be d, (d w)
factor is expressed
for compression of
ight to the training
(and unsuitability)
As a result taking
sideration should
tionary and hence
tion will positively
roach. Taking this
for each text unit,
by multiplying its
availability factor.
s the divergence of
ximum availability
n of the dictionary
chosen document
R) case conversion
the total document
is converted to
nt is considered as
the document into
above subsection
considering syllables and pse
three novel parameters (as
calculated. The text-units are
to the availability factor. For an
number of texts, it is quite usua
units with same availability fa
recurrence factor is taken into
becomes the same then we
factor for choosing the dict
overflow, we organize the text
first appeared.
After getting the text entries, w
the dictionary. As the code wor
dictionary is get prepared.
B. Compression of Input T
For compression of the input t
presented in [17]. The input t
Modified Capital Conversion
flags on the text. In the starting
description of capital convers
After capital conversion, the t
units according to the text d
defined above. Then it is hiera
the dictionary which is create
resented in subsection A. For
substituted with the code-wo
match, it is moved at the lowe
dictionary which encodes the si
text is transmitted to the receive
V. PERFORMAN
The performance of the pro
several existing schemes has
following graph, we have pr
performance analysis was p
choosing text (with a limit of 10

Fig. 1: Comparison of the prop
existing schemes.
41
42
43
44
45
46
47
48
49
paper4paper5paper6 b

eudo-syllables. Then the
described above) are
then organized according
ny document with limited
al to have a number of text
actor. In those cases, the
consideration. If that also
consider the uniqueness
tionary entries. For any
t entries in the order they
we simply place those into
rds are predefined, and the
Text
text we adopt the scheme
text is first passed to the
step, which applies two
g of this section, a detailed
sion has been provided.
text is divided into text-
decomposition procedure
archically compared with
d in light of the scheme
r any match, the text is
ord. For not getting any
est level i.e. level 1 of the
ingle characters. Then the
er.
NCE ANALYSIS
oposed scheme along with
s been analyzed. In the
resented the result. The
performed by randomly
00 to 10000 characters)

osed scheme and relevant
book1 book2
Compressi
on Ratio
(%) for [1]
Compressi
on Ratio
(%) for
Syllable
based
scheme
Proposed
scheme

VI. ANALYSIS OF POWER CONSUMPTION
In cellular mobile communication (GSM), each mobile
station (MS) communicate with its nearest base
transceiver station (BTS). In order to transmit a signal
from (MS) to BTS, the MS must transmit the signal
with sufficient power. This transmitted signal follows a
power law function of the distance between the
transmitter and receiver. If the distance between
transmitter and receiver within some threshold value
(cross-over distance), then use friss free model.
Otherwise we have to apply two-ray ground
propagation model.
Now the threshold/cross-over point is defined as
J
cossoc=
4nh
t
h
rL
x

Where L is the system loss factor not related to
propagation.
h
r
is the height of the receiving antenna above the
ground.
h
l
is the height of the transmitting antenna above the
ground.
is the wave length of the carrier frequency.
Friss free space equation as follows:
p
(J) =
p
tu
r
u
t
x
2
(4nJ)
2
I

Two ray model as follows
p
(J) =
p
tu
t
h
t
2
h
r
2
J
4

In our calculation, we use an omnidirectional antenna,
so the parameters,
0
= 1 onJ 0
t
= 1
Normally the height of the transmitter in GSM network
considered to 1.5 meter where the BTS height from the
ground is 32m or 42m or 62 meter. Here, we consider
32m antenna.
We know that, in GSM-900 the uplink frequency range
is 890-915MHz. We use 900MHz frequency for our
calculation.
Using 900MHz, we can calculate the
thresholding/crossover frequency as follows,
: = z
So, the value of z = u.SSS
And thus
J
cossoc
=
4n 1.S S2
u.SSS
= 18u8.8 m
Radio Energy Calculation:
To transmit l bit message to a distance d can be
modeled as E
1
x
(l, J) = E
1
x-clcc
(l) + E
1
x-cmp
(l, J)
Where E
Tx
(l , d) = Energy required to transmit l bits at
a distance d
E
1
x-clcc
(l) = Encrgy rcquircJ to proccss l bits
E
1
x-cmp
(l, J) = Encrgy rcquircJ to tronsmit l bit
ot o Jistoncc J by tbc ontccno.
Now to receive the message, the received energy can be
expressed as,
E
Rx
(l) = E
R
x-clcc
(l) = lE
cIcc

E
elec
depends on the factor such as a digital coding,
modulation and filtering of the signal before it is sent to
the transmitting antenna.
Again, compression factor (or Compression Ratio) is
the metric described as the ratio of bit present after
applying the compression and before the compression.
E
1x-totuI
(l. p, J) = kE
1x
(l, J)
Where, k =
p
I

Now, E
1x
(l, J) = lE
cIcc
+ le
]ss-ump
J
2

Here, J < J
cossoc

Or, E
1x
(l, J) = lE
cIcc
+ le
two-u-ump
J
4

Here, J J
cossoc

Now consider the experimental performance for our
proposed scheme, where the average compression ratio
of the total of 17 tests is 0.445
So the value of l = 1uu, P = 4u,
so tbc :oluc o k = u.44S
Now consider the energy dissipated per bit in the
transceiver electronics to be
E
cIcc
=
Sun[
bit
,
e
]ss-ump
= S.272 1u
-11
]
bt
m
2

The value of e
two-u-ump
is
e
two-u-ump
= 1.616 1u
-17
[bitm
4

J
cossoc
=
4n 1.S S2
. SSS
= 18u8.8m
At first, we consider the value of
J is lcss tbon J
cossoc
,
lct tbc :oluc o J = 1Suum
Now the total energy will be
E
1
x
(l, J) = lE
cIcc
+ le
]ss-ump
J
2
:
E
1
x
(1uu,1Suu) = 1uuE
cIcc
+ 1uue
]ss-ump
1Suu
2
= .u11867[

Now consider of J is grcotcr tbon J
cossoc
,
lct tbc :oluc o J = 2uuum
Now the total energy will be
E
1
x
(1uu,2uuu) = 1uuE
cIcc
+1uue
two-u-ump
2uuu
4
= .u2S861[
Now the value of is 0.445
So the total energy required to transmit compressed data
when d=1500 m is
E
1
x-tctcl
(1uu.4u,1Suu) = E
1
x
(1uu,1Suu)
= u.44S E
1
x
(1uu,1Suu)
= u.44S .u11867
= S.28u81S 1u
-3

And, the total energy required to transmit compressed
data when d=2000 m is
E
1
x-tctcl
(1uu.4u,2uuu) = E
1
x
(1uu,2uuu)
= u.44S E
1
x
(1uu,2uuu)
= u.44S .u2S861
= .11Su814S
VII. CONCLUSION
In this paper, we have presented an effective approach
for compression of small text for energy constrained
devices. The overall strategy of computational simplicity
has also ensured the reduced time complexity for the
proposed compression and decompression process. The
main aspect of our proposed scheme resides in the
knowledgebase entry choosing criteria that initiates a
new arena of text compression methodology. A
consistent and relevant mathematical analysis of the

overall performance also establishes a strong technical
basis of the proposed scheme. Moreover, the prime
achievement is in the scale of starting threshold of text
compression; that we have reduced to less than five
characters. With limited knowledge-base size, the
achieved compression is of no doubt efficient and
effective. As the knowledge base is not accepted to be
grown through the continuous applications, we may
keep out the low-memory system from the risk of
expanding its knowledgebase crossing optimal memory
size and thus, the applicability of the proposed system
even in any very low memory devices is ensured.
REFERENCES

[1] Stephan Rein, Clemens Guhmann, Frank H. P. Fitzek:
Compression of Short Text on Embedded Systems,
Journal of Computers : Volume 1, No: 06, September 2006.
[2] S. Rein, C. Guhmann, and F. Fitzek , Low Complexity
Compression of Short Messages, Proceedings of the IEEE
Data Compression Conference (DCC06), March 2006,
pp.123132.
[3] Stephan. Rein, F. Fitzek, M. P. G. Perucci, T. Schneider and
C. Guhmann, Low Complex and Power Efficient Text
Compressor for Cellular And Sensor Networks, In 15
th
IST
Mobile and Wireless Communication Summit, June 2006.
[4] J. Lansky , M. Zemlicka , Compression of Small Text Files
Using Syllables. Proceedings of Data Compression
Conference (DCC06) , Los Alamitos, CA, USA, 2006.
[5] J. Lansky and M. Zemlicka. Text Compression: Syllables.
In Annual International Workshop on DAtabases, TExts,
Specifications and Objects (DATESO), Volume 129 of
CEUR Workshop Proceedings, pp. 3245. 2005.. CEUR-WS.
[6] Md. Rafiqul Islam, Sajib Kumar Saha, Mrinal Kanti Baowaly.
A modification of Greedy Sequential Grammar Transform
based Universal Lossless data Compression. Proceedings
of 9th International Conference on Computer and
Information Technology (ICCIT 06), 28-30 December, 2006,
Dhaka, Bangladesh.
[7] Przemysaw Skibiski, Two-Level Dictionary Based
Compression, Proceedings of the IEEE Data Compression
Conference (DCC05), page 481.
[8] F. Awan and A. Mukherjee, "LIPT: A Lossless Text
Transform to improve compression", Proceedings of
International Conference on Information and Theory:
Coding and Computing, IEEE Computer Society, Las Vegas
Nevada, 2001.
[9] H. Kruse and A. Mukherjee, Preprocessing Text to Improve
Compression Ratios, Proceedings of Data Compression
Conference, IEEE Computer Society, Snowbird Utah, 1998,
pp. 556.
[10] S. A. Ahsan Rajon, A Study on Text Corpora for
Evaluating Data Compression Schemes: Summary of
Findings and Recommendations, Research Report,
Computer Science and Engineering Discipline, Khulna
University, Khulna, Bangladesh, December, 2008.

[11] Md. Rafiqul Islam, S. A. Ahsan Rajon and Anonda Podder,
Short Text Compression for Smart Devices", Proceedings
of 11th International Conference on Computer and
Information Technology (ICCIT 2008), 25-27 December,
2008, Khulna, Bangladesh, pp. 453-558.
[12] S. Rein and C. Guhmann, Arithmetic CodingA Short
Tutorial, Wavelet Application Group, Technical Report,
April 2005.
[13] David Hertz: Secure Text Communication for the Tiger
XS. Master of Science Thesis, Department of Electrical
Engineering, Linkpings University, Linkping, Sweden.
[14] S. A. Ahsan Rajon and Anonda Podder, Lossless
Compression of Short English Text for Low-Powered
Devices- Undergraduate thesis, CSE Discipline, Khulna
University, Khulna, Bangladesh, March, 2008.
[15] Md. Rafiqul Islam, S. A. Ahsan Rajon, Anonda Podder,
Lossless Compression of Short English Text for Low-
Powered Deices, in the proceedings of International
Conference on Data Engineering and Management (ICDEM
2008) Tiruchirappalli, Tamil Nadu, India. February 9, 2008.
[16] Ross Arnold, and Tim Bell, A Corpus for the evaluation pf
lossless compression algorithms, Data Compression
Conference, pp. 201-210, IEEE Computer Society Press,
1997.
[17] Luis Batista and Luis A. Alexandre, Text Pre-processing for
Lossless Compression, Data Compression Conference,
2008
[18] J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, and K.
Pister, System architecture directions for networked
sensors, in ASPLOS-IX, Proceedings of the ninth
international conference on Architectural support for
programming languages and operating systems. New York,
NY, USA: ACM Press, 2000, pp. 93104
[19] Tanakorn Wichaiwong, Kitti Koonsanit, and Chuleerat
Jaruskulchai, A Simple Approach to Optimized Text
Compressions Performance, 4th International Conference
on Next Generation Web Services Practices, 2008, IEEE
Computer Society, pp. 66-70.
[20] Md. Rafiqul Islam and S. A. Ahsan Rajon, Design and
Analysis of an Effective Corpus for Evaluation of Bengali
Text Compression Schemes, Journal of Computers,
Academy Publishers, Vol. 5, No. 1, January 2010, pp. 59-
68.
[21] M. Zaki and M. Sayed, "The use of genetic programming for
adaptive text compression", Int. J. Information and Coding
Theory, Vol. 1, No. 1, 2009 , pp. 88-108
[22] M.P. Bakulina, Application of the Zipf Law to Text
Compression, Journal of Applied and Industrial
Mathematics, 2008, Vol. 2, No. 4, pp. 477483, Pleiades
Publishing.
[23] Khair Md. Yeasir Arafat Majumder, Md. Zahurul Islam, and
Majuzmder Khan, Analysis of and Observations from a
Bangla News Corpus, Proceedings of 9th International
Conference on Computer and Information technology ICCIT
2006, pp. 520-525, 2006.

Compression ICCIT 2011 RAJON

Загружено:

Сведения о документе

Авторское право

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Compression ICCIT 2011 RAJON

Загружено:

Авторское право:

An Enhanced Scheme for Lossless Compression of Short

Text for Resource Constrained Devices

Вам также может понравиться