Вы находитесь на странице: 1из 5

Distributed DBMS

Content-aware compression of structured data


What we are and what we do
Xtreme Compression, Inc. is a software design consultancy with special expertise in
structured data compression. We apply our proprietary algorithms and source code to
compress and encrypt structured data whenever performance beyond the reach of
conventional methods is required.

Why we're different, smarter, and better


We're different because our technology is content-aware, semantics-cognizant, and targets
structured data. We're smarter because we recognize and exploit the structure of
redundancy. We're better because we deliver more compression and faster decompression
simultaneously.

Applications
We target in-memory DBMSs, analytics, business intelligence, data mining, data
warehousing, and other read-intensive applications with discrete structured data; plus
wireless devices, mobile apps, and infrastructures constrained by bandwidth, latency,
memory, and/or power.

Software Implementation
Typically, data structures are compressed and encrypted individually, and then used in that
form. Decompression is generally encapsulated and invisible above the lowest software
levels. Doing that sets a distinct level above which the host system remains unchanged.

Your competitive advantage


Your advantage is compression performance so far beyond the reach of conventional
methods that otherwise-infeasible products, services, and operations can now become
reality.

Benefits
Xtreme Compression's proprietary technology reduces storage costs and speeds data
transfer and program execution. Here is how:
By concentrating more information into every physical word read from memory and
processed by the CPU, reducing the influence of the von Neumann bottleneck
In SAN/NAS-type applications, by reducing spindle count to save power
By reducing transmission time over links, networks, and other expensive-to-scale
infrastructure components
By storing more information in internal caches to improve data locality and cache
hit rates
By increasing effective disk cache size to retrieve more real information from hard
disk per read operation
By allowing storage on faster media (i.e., RAM vs hard disk, hard disk vs CD or DVD,
local hard disk vs network, etc.)
By permitting more access data structures and paths for a given amount of storage.

Our Proprietary Data Compression Methods


Attribute vector coding
Attribute vector coding is a content-aware vector transform method for compressing
multidimensional database tables. It works by recognizing semantic features in data. It
breaks new ground by capturing and exploiting the innumerable relationships, functional
dependencies, and statistical correlations in data without having to solve the intractable
problem of explicitly identifying and defining them as individual conditional mutual
information terms.
Attribute vector coding recognizes the structure of redundancy, and produces complex
structured symbols that are fast to decompress. It achieves unequaled compression by
systematically modeling data at high levels of abstraction, across dimensions and data
types. That makes it far less subject than conventional methods to compressibility limits
imposed by information theory.

Repopulation
Repopulation is a structural method for compressing monotonic integer sequences in hash
tables and similar data structures. It populates table locations that would otherwise be
unused with subsequences that would otherwise occupy memory.
Unlike almost every other lossless compression method, repopulation is not a replacement
scheme. Instead, repopulation is transpositional and mechanistic; it works like a chessplaying automaton. It draws on no information-theoretic concepts. Repopulation
simultaneously achieves the access speed of a low load factor and the table compactness of
a high one, thus avoiding that historical compromise.

Superpopulation
Superpopulation is a variable-to-variable-length algorithm that compresses index tables,
lists, arrays, zerotrees, and similar data. It systematically accommodates wide local
variations in data statistics. Superpopulation may be used by itself or in conjunction with
repopulation.
Superpopulation recognizes that distributions of values in access data structures are often
far from random, having areas of high and low correlation. It works by classifying each such
area as one of two distinct target types, and applying a target type-specific encoding
method to each.

Wordencoding
Wordencoding is a 0-order (context-independent) variable-to-variable-length algorithm
for compressing text strings in database table record fields. It achieves compression close
to the 0-order source entropy without sacrificing speed. It does that by efficiently
maximizing effective combined data locality over compressed record fields, lexicons
holding strings, and access data structures. Wordencoding deals explicitly with the
structure and statistics of the data by recognizing that redundancy in text strings exists at
multiple levels of granularity.

A 21st Century Approach


Beyond the state of the art
Xtreme Compression's proprietary compression technology delivers performance beyond
the reach of conventional methods. Here is why:
The exchange principle states that during design, moving function and algorithmic
complexity from decompression to compression and from encoding to modeling can

simultaneously increase compression efficacy and decrease decompression time. It


aligns the otherwise-competing performance goals of compression ratio and
decompression speed.
The sequence-symbol continuum principle states that for every set of discrete
messages, a continuum exists within which the messages can be separated into
sequences of symbols. At one extreme, each symbol is treated as atomic, the symbols
have no substructure, and the sequence is maximally complex dimensionally and
statistically. At the other, a single symbol having maximally complex substructure
represents the entire message, so there is no sequence.
Through representational equivalence, finite sets of transparently decomposable
symbols can have multiple generative models that produce identical data but differ
in entropy. Among all representationally equivalent models for the same data, the
original may not be the one with the lowest entropy.
Xtreme Compression's multilevel data modeling exploits our separable redundancy
principle. That states that data having sufficiently complex dimensional and
statistical structure can be represented at multiple levels of abstraction that differ in
redundancy. Doing that can separate structured and distributed redundancies.

Three Truths of Compressing Big Data

What is Xtreme? 8.5 to 1!


Breakthrough compresses benchmark TPC-H data 8.5 to 1!
That's over 5 times what Oracle* reported for 11g. That's over 3 times what IBM**
measured for DB2.

8.5 to 1! That's Xtreme Compression.


It's radical! Xtreme Compression is a radical technology delivering phenomenal
compression.
It's smarter! Our technology exploits the structure & complexity of today's
multidimensional data.
It's different! It's content-aware. It's cognizant of semantics. It decorrelates across
dimensions.
It's better! Our technology delivers results beyond the reach of conventional
methods.
Xtreme Compression is less subject than conventional technology to information
theory's compressibility limits. That's why it outperforms the decades-old, onedimensional, 'industry standard' conventional methods still used to compress
structured data.

For more information please visit


http://www.xtremecompression.com

Вам также может понравиться