Tech English

1 - Intro
A significant increase in hand-held devices leads to the necessity of reporting up-to-date location
information of users, so LBS (location based services) technologies are widely developing today.
Location data is inherently multi-dimensional, usually includes: a user id, a latitude, a longitude, a time
stamp. There are some multi-dimensional using techniques K-d trees, Quad trees and R-trees.
As for MD-HBase, it was created by adding multi-dimensional data processing capabilities to HBase, a
range partitioned Key-value store. Linearization uses linearization to implement a scalable multidimensional index structure layered over a range-partitioned Key-value store. This design can be used to
implement a K-d tree and a quad tree, which are standard multi-dimensional index structures.
The investigation of the authors presents also 3 alternative implementations of the storage layer in the
Key-value store, evaluates the tradeoffs associated with each implementation, contains analysis of MDHBases scalability and efficiency.
2 - Background
a) Two simple examples are provided to illustrate the use of location data for providing
location aware services. The first ones are location based advertisements and coupon distribution,
the second ones are location based social applications. Location information has two important
characteristics it is inherently skewed, both spatially and temporally; as for time dimension in
potentially unbounded and monotonically increasing
b) Speaking about linearization, its a method to transform multi-dimensional data points to a
simple dimension aspect of the index layer in MD-HBase and it allows leveraging a singledimensional database to provide efficient multi-dimensional query processing. Its advantage is a
simple design and the main disadvantage is an inefficient query processing.
c) Two most popular multi-dimensional indexing structures are the Quad tree and the K-d tree.
They split the multi-dimensional space recursively into subspaces as a search tree. The main
advantages of such structures are: capturing data distribution statistics such partition results in
more splits in hot regions; the other thing is that because of maintaining the boundaries of
subspaces in the original spaces efficient pruning of the space, reducing the number of false positive
scans during queue processing and robust skewing becomes possible.
3 Methods
As for the methods, in MD-HBase standard index structures like K-d trees and Quad trees are adapted
to be layered on top of a Key-value store underlying data storage layer stores the items sorted by their
key and range-partitions the key space. The keys correspond to the Z-value of the dimensions being
indexed. The trie-based approach is used for space-splitting. For naming subspaces after splitting, longest
common prefix naming is used as a scheme. To perform the scheme, we need to implement: Index Layer,
Subspace Lookup and Point Queries,Insertion, Range Query, Nearest Neighbor query, Space Split.
The data storage layer of MD-HBase is a range partitioned key-value store, so HBase (the open source
implementation of Bigtable) is used as the storage layer. However, a number of designs exist to implement
the data storage layer, so MD-HBase uses different approaches, such as table share model, table per bucket
model, hybrid model and region per bucket model.
Several optimizations are possible to the further improvement of the storage level. They are: space
splitting pattern learning and random projection.
The authors also evaluated their prototype implementation of MD-HBase and checked their product
for insert throughput, range query and kNN query, did some another related work.
4 - Result
MD-HBase is one or the representations of the next generation of scalable location data.
Implementation layers standard index structures like K-d trees and Quad trees, as well as using a design
based on linearization. MD-HBase is a standard open-source key-value store, with minimal changes to
underlying system. Speaking about scalability, efficiency of the design proposed by the authors, this all is
demonstrated through thorough experimental evaluation. As for the future, design of MD-HBase can be
extended by adding more complex analysis operators and exploring other alternative design for the index
and data storage layers.
Which of the points represented below is NOT included in location data?

a) Time stamp
b) Longitude
c) Attitude
One of the most popular multi-dimensional index structures is:
a) Binary tree
b) Quad tree
c) B+ tree
The ___________ algorithm is based on the best-first algorithm where the subspaces are scanned in order
of distance from the queried point.
a) Nearest Neighbor Query
b) kNN Query
c) Range Query
A sorted sequence of subspace names is:
a) Storage layer
b) Index layer
c) Key group
For queries with high selectivity, this kind of trees gives better performance:
a) R-trees
b) Quad trees
c) K-d trees
1
c
2
b
3
a
4
b
5
c

Tech English

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Tech English

Загружено:

Авторское право:

Доступные форматы

1 - Intro

Which of the points represented below is NOT included in location data?

Вам также может понравиться