Вы находитесь на странице: 1из 7

application of scaling performance

An development platform.
And while it is
on a big data appliance.
However, many

Introduction important to review the


ways these different
algorithms that expect
to take advantage of a
to NoSQL components are
blended together, from
high-performance,
elastic, distributed data
Data the perspective of
application
environment are not
suited to consume data
Management development, the in traditional RDBMS
interdependence systems. That means
for Big Data between analytic developers must
by David Loshin | algorithms and the consider different
September 4, 2013 underlying data methods for data
3:24 pm | 1 management management.
Comments framework warrants a
more in-depth review of These analytic
Editors note: This big data storage algorithms can employ
article is part of a series paradigms. one of an array of
examining issues alternative means for
related to evaluating Related Stories data management that
and implementing big The evolution of the are typically bundled
data analytics in enterprise data warehouse
under the term NoSQL
starring Hadoop.
business. databases. The term
Read the story
As with any emerging How Pig, Hive and NoSQL conveys two
yet complex application Zookeeper build apps on different concepts. The
development Hadoop and MapReduce.
first suggests a data
Read the story
framework, successful management
Sentiment analysis tool
implementation relies designed to predict framework that is not a
on an ecosystem of veterans suicide risk. SQL-compliant one. The
different components Read the story second more generally
that can be combined Why more data and simple
acknowledged (that is,
algorithms beat complex
to address the more frequently
analytical models.
development of the Read the story presented) meaning is
appropriate solution. The reason is that the term stands for
For big data, that straightforward: when Not only SQL,
ecosystem revolves applications already suggesting
around some key depend on a traditional environments that
architectural artifacts, relational database combine traditional SQL
including scalable (RBDMS) model and/or (or SQL-like query
storage, parallel a data warehousing languages) with
computing, and data approach to data alternative means of
management management, it may be querying and access.
paradigms that are sufficient to port the Schema-less
united through an RDBMS tools as a way Models: Increasing
Flexibility for Data intended to allow A key-value store is a
Manipulation different models to be schema-less NoSQL
NoSQL data systems adapted to specific model in which data
provide a more relaxed types of analyses. For objects are associated
approach to data example, some are with distinct character
modeling often referred implemented as key- strings called keys,
to as schema-less value stores, which similar to the data
modeling, in which the nicely align to big data structure known as
semantics of the data programming models a hash table. Many of
are embedded within a like MapReduce. the NoSQL architectures
flexible connection Although the relaxed rely on variations on the
topology and a approach to modeling key-value theme, in
corresponding storage and management paves that unique keys are
model. This provides the way for employed to both
greater flexibility for performance identify entities and to
managing large data improvements for locate attribute
sets while analytical applications, information about those
simultaneously it does not enforce entities. This pervasive
reducing the adherence to strictly- use of unique keys
dependence on the defined structures, and lends a degree of
more formal database the models themselves credibility to this basic
structure imposed by do not necessarily approach to a schema-
the relational database impose any validity less model.
systems. rules. This potentially As an example,
The flexible model introduces risks consider the data
enables automatic associated with subset represented in
distribution of data and ungoverned data Table 1, where
elasticity with respect management activities the key is the name of
to the use of such as inadvertent the automobile make,
computing, storage, inconsistent data while the value is a list
and network bandwidth replication, of names of models
in ways that dont force reinterpretation of associated with that
specific binding of data semantics, and automobile make.
to be persistently currency and timeliness Table 1: Example data represented in key-value
stored in particular issues. This article
physical locations. discusses four different Key Value
NoSQL databases also NoSQL approaches:

provide for integrated
data caching that helps Key-value stores BMW {1-Series, 3-Series

reduce data access Document stores GT, 7-Series, X3,

latency and speed Tabular stores Buick {Enclave, LaCrosse


Object stores
performance. Cadillac {CTS, DTS, Escala
Escalade EXT, SRX,
The loosening of the
Key-Value Stores
relational structure is
consider the (such as the number of
representations of times specific phrases
The key-value store
those values and how occur within massive
does not impose any
they are to be linked to numbers of documents)
constraints about data
the key. and for producing those
typing or data
results for reports, the
structure. It is the
Key-value stores are model does pose some
responsibility of the
essentially long, thin potential drawbacks.
consuming business
tables, and can be One weakness is that
applications to interpret
indexed by key value to the model will not
the semantics of the
speed data queries (in inherently provide any
data organization.
that there are not many kind of traditional
columns associated database capabilities
The core operations
with each row). The (such as atomicity of
performed on a key-
tables rows can be transactions, or
value store include:
sorted by the key value consistency when
Get(key), which to simplify finding the multiple transactions
returns the value key during a query. A are executed
associated with the query essentially simultaneously). Those
provided key. comprises two steps: capabilities must be
Put(key, value),
the first step is to provided by the
which associates
the value with calculate the unique application itself.
the key. key, and the second is
Multi-get(key1, to use that key as an Another potential
key2, .., keyN), index into the table. weakness: as the
which returns the Because of the need to volume of data
list of values
calculate the key to increases, maintaining
associated with the
list of keys. access any information unique values as keys
Delete(key), about the entity, it is may become more
which removes the difficult to expect to difficult; addressing this
entry for execute general SQL- issue requires the
the key from the style queries such as introduction of some
data store.
what are the most complexity in
popular models of cars generating character
based on sales? These strings that will remain
When using a key-value
kinds of questions unique among an
store, ensuring that the
would typically be extremely large set of
values can be accessed
answered using code, keys. For example, a
means that the key
as opposed to a query global company may
must be unique. To
engine. attempt to manage
associate multiple
data associated with
values with a single key
While key-value pairs millions of customers,
(such as the list of car
are very useful for both many of whom sharing
models in the example
storing the results of the same or similar
in Table 1), the
analytical algorithms names. Duplication in
developer must
the set of names will of storage may be with stored content,
mean that the name employed. which essentially
itself will be insufficient Figure 1 shows provides a way to query
when used to examples of data the data based on the
differentiate different values collected as a contents. For example,
entities. The upshot is document using the example in
that additional data representing the names Figure 1, one could
attributes will need to of specific retail stores. search for all
be added to the Note that while the documents in which
composed character three examples all MallLocation is
string to be used to represent locations, the Wheaton Mall that
generate a unique key. representative models would deliver a result
are different. The set containing all
Document Stores document documents associated
A document store is representation embeds with any Retail Store
similar to a key-value the structure of the that is in that particular
store in that stored model, allowing the shopping mall.
objects are associated meanings of the
(and therefore accessed document values to be Tabular Stores
via) character-string inferred by the Tabular, or table-based
keys. The difference is application. stores are largely
that the values being descended from
stored, which are Googles original
referred to as BigTable design to
documents, provide manage structured
some structure and data. Hadoops HBase
encoding of the model is an example of
managed data. There a NoSQL data
are different common, management system
standard encodings, that evolved from
including XML BigTable. (For
(Extensible Markup background on BigTable
Language), JSON (Java design, see this
Script Object Notation), paper via Googles
BSON (which is a binary research website. )
encoding of JSON The tabular model
objects). Aside from allows sparse data to be
these standard stored in a three-
Figure 1: Example of
approaches to dimensional table that
document store.
packaging data, other is indexed by a row key
One key distinction
means of linearizing the (that is used in a
between a key-value
data values associated fashion that is similar to
store and a document
with a data record or the key-value and
store is that the latter
object for the purposes document stores), a
embeds attribute
column key that
metadata associated
indicates the specific drivers that make new involved in bringing big
analytics applications worth data analytics into
attribute for which a
evaluating given todays production.
data value is stored, exploding data volumes, Data Governance for Big
and a timestamp that and the technology Data Analytics:
may refer to the time at innovations that make such Considerations for Data
which the rows column systems more accessible to Policies and Processes
more companies. With emerging big data use
value was stored.
Business Problems Suited cases, datasets created for
to Big Data Analytics one purpose can be used
As an example, various
Enterprises need clear for an entirely different
attributes of a Web processes for determining purposea dynamic that
page can be associated the value proposition of a challenges traditional
with the Web pages big data analytics project. approaches to data
In this article, David Loshin governance. This article
URL: the HTML content
examines the applications explores ways to manage
of the page, URLs of that make sense for these this conflict and build new
other web pages that projects and the criteria governance policies.
link to this Web page, that enterprises should use Considerations for Storage,
the author of the to weigh the costs and Appliances and NoSQL
benefits of such a strategic Systems for Big Data
content. Columns in a investment. Analytics Management
BigTable model are
Achieving Organizational Big data management and
grouped together as Alignment for Big Data analytics applications rely
families and the Analytics on an ecosystem of
timestamps enable Numerous aspects of big components that can be
management of data analytics hold appeal, combined in a variety of
and while individuals within ways to address application
multiple versions of an an organization can test requirements. This article
object. The timestamp drive them, these new examines three aspects of
can be used to maintain technologies need to win this ecosystem and
historyeach time the adoption in a broader associated technologies:
enterprise setting. storage, appliances, and
content changes, new
Managers need to answer: data management.
column affiliations can What is the process for An Introduction to Big Data
be created with the piloting technologies to Application Development
timestamp of the when determine their feasibility and MapReduce
and business value? And:
the content was For any target big data
What must happen to bring platform, you must have an
downloaded. big data analytics into application development
organizations system framework that supports a
More Articles in This development lifecycle? system development
Series Developing a Strategy for lifecycle and provides a
Market and Business Integrating Big Data means for loading and
Drivers for Big Data Analytics into the executing the developed
Analytics Enterprise application. This article
To best understand what As with any innovative discusses the principles
big data can mean to technology that promises involved and how
your organization, start by business value, there is a programmers use the
understanding the rush to embrace big data MapReduce and ECL
conditions that has led to analytics as a key source of frameworks to analyze big
its growing acceptance. In business value. This article datasets.
this article, the first in a explains how to consider
series, David Loshin the challenges and issues
explains the economic
Understanding the Big Data languages such as C+ Fast accessibility,
Stack: Hadoops Distributed whether that
+, Objective-C, Java,
File System means inserting
Hadoop is a collection of and Smalltalk.
data into the model
open source projects, As opposed to some of or pulling it out via
combined to enable a the other NoSQL some query or
software-based big data models , object access method, and
appliance. This article Scalability for
introduces a core aspect of
database management
systems are more likely volume, so as to
Hadoops utilities, the
support the
Hadoop Distributed File to provide traditional
accumulation and
System. ACID compliance (that management of
How Pig, Hive and is Atomicity, massive amounts
Zookeeper Build Apps on Consistency, Isolation, of data.
Hadoop and MapReduce
and Durability)
This article examines the
prototypical big data characteristics that are
platform using Hadoop, and bound to database Both of these criteria
how Pig, Hive, HBase, reliability. Yet this is one are addressed through
Zookeeper and Mahout distribution and
of the few similarities to
address these pieces of the
a traditional relational parallelization, and the
puzzle.
Object Data Stores database, and it is NoSQL styles described
Object data stores are important to remember above are amenable to
essentially a hybrid that object databases extensibility, scalability,
approach to data are not relational and distribution.
storage and databases and are not Moreover, these
management; in some queried using SQL. characteristic features
ways, object data stores dovetail with
and object databases Considerations for programming models
seem to bridge the Implementing NoSQL like MapReduce that
worlds of schema-less The decision to use a effectively manage the
data management and NoSQL data store creation and running of
the traditional relational instead of a relational multiple parallel
models. On the one model must be aligned execution threads. The
hand, approaches to with business users key is leveraging data
object databases can expectations. The key distribution.
be similar to document question: How will the Fortunately, distributing
stores except that while performance of a a tabular data store or a
the document stores NoSQL data store key-value store allows
explicitly serialize the compare to their many queries and data
object so the data experiences using accesses to be
values are stored as relational models? performed
strings, object As should be apparent, simultaneously,
databases maintain the many NoSQL data especially when the
object structures. That management hashing of the keys
is because they are environments are maps to different data
bound to object- engineered for two key storage nodes. NoSQL
oriented programming criteria: methods are designed
for high performance trying out the different not completely replace
computing for reporting approaches, and it may a relational database
and analysis, and smart make sense to develop management system.
data allocation a simple pilot project Choosing to use NoSQL
strategies will enable model that can be is not necessarily an
linear performance deployed in different easy decision. One
scalability in relation to ways to explore the must weigh the
data volume. similarities and business requirements
differences in terms of as well as the skills
There are many new ease-of-use, space needed to transition
companies who have performance, and from a traditional
embraced the different execution speed. approach to a NoSQL
NoSQL models and are approach before
bringing their Yet while the committing to the
customized versions to performance behaviors technology.
market. If you are for NoSQL data
interested in NoSQL, management systems
there is little risk in are appealing, they will

Вам также может понравиться