Вы находитесь на странице: 1из 13

EDITORS NOTE VARIED NOSQL OPTIONS

NEED CAREFUL
WEIGHING, SORTING
FIT BY FIT, NOSQL
DATABASES VIE TO
DISPLACE RDBMSes
NOSQL JUST ONE PART
OF IT MIX ON BIG DATA
PROJECTS
NoSQL Software Adds New
Database Choices, Challenges
Theyre often a better t for big data than mainstream relational technology is, but the diversity of
NoSQL databases can be bafing. To avoid going in the wrong direction, you need to crack the code.
HOME
EDITORS NOTE
VARIED NOSQL OPTIONS
NEED CAREFUL WEIGHING,
SORTING
FIT BY FIT, NOSQL DATABASES
VIE TO DISPLACE RDBMSes
NOSQL JUST ONE PART OF IT
MIX ON BIG DATA PROJECTS
NOSQL SOFTWARE ADDS NEW DATABASE CHOICES, CHALLENGES 2
EDITORS
NOTE
Recognition Arrives for NoSQLAdoption to Follow?
NoSQL databases, upstart technologies
that offer more design exibility than SQL-
based relational software does, have started
being accepted into the mainstream IT frater-
nity. For example, Gartner included ve NoSQL
vendors when it plotted the top providers of
operational database management systems in a
Magic Quadrant report issued by the consult-
ing company in late 2013. One of those ven-
dors also made it into the ranks of leading data
warehouse database developers in a similar
report published in March 2014.
But NoSQL technology still hasnt found a
place in many user organizations. In a survey
of IT and business professionals conducted by
The Data Warehousing Institute in Novem-
ber 2013, only 11% of the 538 respondents said
their organizations were using NoSQL data-
bases in their primary data warehouse archi-
tectures. Another 24% said they planned to do
so within three yearsbut that left 65% with
no adoption plans for NoSQL. Even on a survey
of people with experience managing big data
environments, done by TDWI earlier in 2013,
just 32% of the 189 respondents said they had
deployed NoSQL systemsthe lowest adoption
rate among six types of technology platforms.
In its report on operational databases, Gart-
ner said that more of its clients were starting
to use NoSQL products for specic purposes,
such as running Web applications requiring
high scalability. To help you decide if you have
a use for NoSQL technologies, the three stories
in this guide examine what theyre suited for.
First, we assess the four primary categories of
NoSQL databases. Next we explore NoSQLs
t-for-purpose nature. We close by looking at
the mix of data management platforms typi-
cally needed to support big data applications. n
Craig Stedman
Executive Editor, SearchDataManagement
HOME
EDITORS NOTE
VARIED NOSQL OPTIONS
NEED CAREFUL WEIGHING,
SORTING
FIT BY FIT, NOSQL DATABASES
VIE TO DISPLACE RDBMSes
NOSQL JUST ONE PART OF IT
MIX ON BIG DATA PROJECTS
NOSQL SOFTWARE ADDS NEW DATABASE CHOICES, CHALLENGES 3
GETTING
STARTED
Varied NoSQL Options Need
Careful Weighing, Sorting
NoSQL databases are designed to address
processing issues created by expanding data
volumes and diversity, particularly in big data
applications. But theres no lack of either vol-
ume or diversity in the NoSQL ranks, leaving
IT and data managers with lots of alternatives
to sort through when evaluating technology
options.
There are so many NoSQL databases
todayI think were challenged by two or three
on a daily basis, quipped Michael Simone,
global head of CitiData platform engineering at
Citigroup Inc., during a presentation at the 2014
MongoDB World conference in New York. In
reality, Citi currently has limited itself to using
the MongoDB database as a NoSQL alterna-
tive to relational software in a small number of
applications, Simone said. But his joke pointed
to the need for organizations considering
NoSQL technologies to focus on nding the one
that can best solve their application problems.
That starts with understanding the different
types of NoSQL databases, which are broken
down into four primary categories: document
databases, key-value stores, wide column stores
and graph databases. They all share some com-
mon traitsmost notably, support for more
exible and dynamic database designs than are
feasible in SQL-based relational databases. But
each NoSQL category is suited to particular
uses, according to Gartner analyst Nick Heu-
decker. In guring out which way to go, he said,
you should ask yourself what kind of data
youre working with and how your applications
are going to use that data.
For example, document databases are often
used in content management systems and to
collect and process data from high-volume Web
and mobile applications for uses such as appli-
cation monitoring. Betting their name, these
databases store data elements in document-
like structures, which can be simple sometimes
HOME
EDITORS NOTE
VARIED NOSQL OPTIONS
NEED CAREFUL WEIGHING,
SORTING
FIT BY FIT, NOSQL DATABASES
VIE TO DISPLACE RDBMSes
NOSQL JUST ONE PART OF IT
MIX ON BIG DATA PROJECTS
NOSQL SOFTWARE ADDS NEW DATABASE CHOICES, CHALLENGES 4
GETTING
STARTED
to the point of being schema-less. MongoDB,
CouchDB, Couchbase Server and MarkLogic are
prominent examples of document databases.
Simone said Citis use of MongoDB origi-
nated with application developers who were
looking for a way to deal with data replication
problems in an online nancial application with
a variety of data structures. The application
was initially deployed on a relational database,
but processing the data with that platform was
slow and prone to errors. It became clear that
we couldnt keep up with all the data formats
coming from the data scientists, he said.
A MORE DYNAMIC APPROACH
MongoDBs support for dynamic schemas
turned out to be a good t for the rapidly
evolving application, according to Simone.
We found that we could model everything
that came at us, he said. The modeling work
also could be done much faster than with the
relational approach: The developers built a pre-
production model on MongoDB in just four
months.
Key-value databases, such as Aerospike,
Redis and Riak, are the simplest form of
NoSQL software; they pair unique keys with
their associated values in data elements, with
a goal of enabling ultrafast application perfor-
mance against relatively simple data sets. Key-
value stores are incredibly lightweight, said Joe
Caserta, president of consulting and technical
services provider Caserta Concepts. We can
do lookups in seconds.
Flywheel Software Inc. uses Riak, developed
by Basho Technologies, to run a mobile app
that lets users hail taxis by tapping on their
smartphones. Cuyler Jones, former chief archi-
tect at Flywheel, said the database can scale to
meet the companys needs. Just as important
is its high-availability nature and support for
consistent data access times, added Jones, who
now works at another startup.
All NoSQL databases share
some com mon traitsmost
notably, support for more
exible database designs than
are feasible in SQL databases.
HOME
EDITORS NOTE
VARIED NOSQL OPTIONS
NEED CAREFUL WEIGHING,
SORTING
FIT BY FIT, NOSQL DATABASES
VIE TO DISPLACE RDBMSes
NOSQL JUST ONE PART OF IT
MIX ON BIG DATA PROJECTS
NOSQL SOFTWARE ADDS NEW DATABASE CHOICES, CHALLENGES 5
GETTING
STARTED
Wide column stores keep data in tables that
can have very large numbers of columns, offer-
ing the opportunity for high levels of perfor-
mance and scalability in processing large data
sets. Favored uses include Internet search and
other large-scale Web applications as well as
petabyte-level analytics apps; Accumulo, Cas-
sandra and HBase are among the databases in
the wide column category.
The column-based approach was a good
match for a DNA matching application
launched in 2012 by Ancestry.com, according
to Jeremy Pollack, a development manager at
the online provider of family history data. The
Provo, Utah, company uses HBase in combina-
tion with Hadoop to run DNA calculations that
help customers trace their ethnic backgrounds
and geographic origins and look for unknown
relatives.
WONKS WANTED FOR DATABASE TUNING
Getting the desired performance from the
database required considerable tuning and
tweaking, said Pollack, who described HBase
programming as a wonky process. There are
a million buttons you can dial or tune, he said.
You have to be willing to get your hands dirty.
But the NoSQL technology enables Ancestry
to rapidly compare 700,000 data points in new
and stored DNA samples to look for matching
characteristics.
Graph databases, including InniteGraph and
Neo4j, store related data elements in graph-
like structures that exploit their associative
qualities to power applications such as recom-
mendation engines and social networks. For
example, graph technology can be used to map
the relationships between different people as
well as their interests, said Alex Trofymenko,
head of technology at HealthUnlocked, a Lon-
don-based company that operates a website
supporting user forums on different medical
topics.
NoSQL technology enables
Ancestry.com to rapidly compare
700,000 data points in new and
stored DNA samples to look for
matching characteristics.
HOME
EDITORS NOTE
VARIED NOSQL OPTIONS
NEED CAREFUL WEIGHING,
SORTING
FIT BY FIT, NOSQL DATABASES
VIE TO DISPLACE RDBMSes
NOSQL JUST ONE PART OF IT
MIX ON BIG DATA PROJECTS
NOSQL SOFTWARE ADDS NEW DATABASE CHOICES, CHALLENGES 6
GETTING
STARTED
Trofymenko and his team use Neo4j, from
Neo Technologies, to do such mappings. We
can get a lot of information in a graph data-
base, he said. Say a user is very interested
in diabetes, or exerciseyou see it. Thats
important for a site that seeks to take millions
of free-text searches, relate them to relevant
health terms and build a data platform that
helps users nd information about possible
treatment and assistance.
With the various technology options that
the emergence of NoSQL software has added,
the database selection process is very differ-
ent than it was just a few years ago, when, in
Casertas words, you asked, Should I go with
Microsoft, Oracle or IBM?
The wider array of choices can be a good
thing for user organizationsas long as they
manage the process carefully and avoid going
down the wrong database path. Jack Vaughan
HOME
EDITORS NOTE
VARIED NOSQL OPTIONS
NEED CAREFUL WEIGHING,
SORTING
FIT BY FIT, NOSQL DATABASES
VIE TO DISPLACE RDBMSes
NOSQL JUST ONE PART OF IT
MIX ON BIG DATA PROJECTS
NOSQL SOFTWARE ADDS NEW DATABASE CHOICES, CHALLENGES 7
SOFTWARE
SELECTION
Fit by Fit, NoSQL Databases Vie to Displace RDBMSes
Cassandra, HBase, MongoDBtheyre
just a few of the many NoSQL databases look-
ing to solve problems encountered by the rela-
tional database management systems that have
long ruled the IT roost. But the very variety
that makes the NoSQL sector so vibrant can
make comparing different products challenging
for would-be users.
First, its reasonable to ask why NoSQL tech-
nologies matter at all. The short answer is that
large-scale distributed processing is taking
hold in more applications, thus exposing some
of the creaky ooring on which the RDBMS
sits. In Web and enterprise applications alike,
a new reality has been emerging: The relational
database may not always be the best t.
For example, relational software can be too
expensive to scale out in widely distributed
applications. It doesnt easily adapt to new
styles of data, such as the unstructured infor-
mation thats common in big data applications.
And it struggles with the massive data volumes
coming from in-the-eld sensors and Web
server activity logs.
As IT managers and software developers
have found more reasons to move work off
of incumbent relational databases, what has
emerged is a t for purpose mentalityof
the kind that was prevalent before the RDBMS
became the all-purpose our in the database
server pantry. And the number of NoSQL data-
base options developed to t various purposes
has grown greatly.
SOCIAL CLIMBER HAD BIG BACKER
Like some other NoSQL technologies, the
Apache Cassandra database came about
because of a big Web 2.0 shin this case,
Facebook. It created Cassandra to enable users
of the social network to search their inboxes.
When the database was launched in 2008, it
HOME
EDITORS NOTE
VARIED NOSQL OPTIONS
NEED CAREFUL WEIGHING,
SORTING
FIT BY FIT, NOSQL DATABASES
VIE TO DISPLACE RDBMSes
NOSQL JUST ONE PART OF IT
MIX ON BIG DATA PROJECTS
NOSQL SOFTWARE ADDS NEW DATABASE CHOICES, CHALLENGES 8
SOFTWARE
SELECTION
supported replication across geographically
distributed data centers to quickly service the
searches of as many as 100 million users.
Cassandra is a distributed key-value database
that uses a wide column-store scheme and a
peer-to-peer, shared-nothing architecture. Its
design incorporates some of the characteris-
tics of Google BigTable and Amazon Dynamo,
two early and inuential NoSQL databases.
Along the way, Cassandra has gained support
for MapReduce, as well as a SQL Server-like
query language, triggers and lightweight trans-
actionsall features commonly built into rela-
tional databases.
Facebook eventually replaced its Cassandra-
based search system with a Hadoop implemen-
tation that includes HBase, another NoSQL
database. But after the company released the
software for open source development, a com-
munity arose to carry it forward, and Cassandra
became a top-level project at the Apache Soft-
ware Foundation in 2010.
Cassandra represented a good t for the
needs of Internet Identity, said Jason Atlas,
vice president of technology and engineering
at the security services company in Tacoma,
Washington. Known as IID, the company had a
rapidly growing database of IP addresses run-
ning on a MySQL-based cluster. But for cost
and other reasons, the relational MySQL path
didnt seem tenable going forward, according to
Atlas.
IID was harvesting and collecting 600,000
unique IPv4 addresses and host names per
week. Related metadata collections were also
growing. We started to see that we couldnt
store more than 30 days of information at
one time, Atlas said. The problems largely
revolved around scale. He added that the IPv4
data lent itself to a key-value approach, which
ultimately led IID to DataStax Enterprise, a
commercial version of Cassandra.
ONWARD AND UPWARD
Cassandra was developed to run on commod-
ity clusters, and its focus on scalability has
borne fruit at IID: Atlas said the database is
coming as close to linear scaling as any tech-
nology he has seen. But he cautioned others
who are looking to embrace NoSQL databases
that its unwise to force-t technologies into
HOME
EDITORS NOTE
VARIED NOSQL OPTIONS
NEED CAREFUL WEIGHING,
SORTING
FIT BY FIT, NOSQL DATABASES
VIE TO DISPLACE RDBMSes
NOSQL JUST ONE PART OF IT
MIX ON BIG DATA PROJECTS
NOSQL SOFTWARE ADDS NEW DATABASE CHOICES, CHALLENGES 9
SOFTWARE
SELECTION
IT environments. Its always best to map the
problem onto the solution, he said.
Some NoSQL vendors are becoming house-
hold names in database circles. For example,
DataStax and a quartet of other NoSQL data-
base makersBasho Technologies, Couch-
Base, MarkLogic and MongoDBwere listed
among the leading vendors of operational
database management systems in a Gartner
Magic Quadrant report published in October
2013. But there are dozens of NoSQL offerings
in several distinct product categoriesand
different databases in the same category were
built to support different uses. Its all a bit of a
maze to navigate.
On Twitter, Gartner analyst Merv Adrian
pointed to a Linux Journal reader poll compar-
ing NoSQL databases. Adrian deadpanned: In
related newsdo you prefer apples, cocktails
or broccoli? I tweeted that I understood his
point. His response: Its uselessand mean-
inglessto compare NoSQL products that are
so wildly different in structure and intent.
Atlas made a similar point. MongoDB and
Cassandra are both called NoSQL databases but
have nothing to do with one another, he said.
Their use cases are very different.
Caveat emptorif you arent careful in sort-
ing out which NoSQL technology best ts the
particular application you need to run, your
organization may end up t to be tied over its
choice of software. Jack Vaughan
Its uselessand meaningless
to compare NoSQL products
that are so wildly different in
structure and intent.
MERV ADRIAN, analyst at Gartner
HOME
EDITORS NOTE
VARIED NOSQL OPTIONS
NEED CAREFUL WEIGHING,
SORTING
FIT BY FIT, NOSQL DATABASES
VIE TO DISPLACE RDBMSes
NOSQL JUST ONE PART OF IT
MIX ON BIG DATA PROJECTS
NOSQL SOFTWARE ADDS NEW DATABASE CHOICES, CHALLENGES 10
IMPLEMENTATION
NoSQL Just One Part of IT Mix on Big Data Projects
When people think about big data tech-
nologies, Hadoop and NoSQL databases are
usually the rst things that come to mind.
But in many cases, big data environments are
supported by a mix of data management plat-
formsand Hadoop clusters and NoSQL sys-
tems arent the predominant ones being tapped
by organizations.
For example, a survey conducted in 2013 by
Enterprise Management Associates (EMA)
and 9sight Consulting found that NoSQL and
Hadoop ranked sixth and eighth, respectively,
on a list of eight technology platforms being
used as a part of big data projects. Traditional
technologiessuch as analytical databases,
operational data stores and enterprise data
warehouseswere deployed more broadly than
the putative big data duo, according to the sur-
vey of 259 IT and business professionals.
EMA analyst John Myers said a similar sur-
vey the year before validated the buzzwords
about big data: what it was, what it wasnt. By
comparison, he added, the 2013 survey found
an increasing number of organizations that
were moving forward on projects and bringing
big data tools and applications into their opera-
tional workows and processes.
Hadoop and NoSQL software are clearly part
of the picture, Myers said, but they arent syn-
onymous with big data. Only 16% of the survey
respondents said they were using Hadoop; for
NoSQL, it was 22%. To power their big data
programs, many of the companies represented
in the survey are creating what EMA calls a
Hadoop and NoSQL are not
synonymous with big data.
Many companies are using a
blend of old and new technol-
ogies for big data programs.
HOME
EDITORS NOTE
VARIED NOSQL OPTIONS
NEED CAREFUL WEIGHING,
SORTING
FIT BY FIT, NOSQL DATABASES
VIE TO DISPLACE RDBMSes
NOSQL JUST ONE PART OF IT
MIX ON BIG DATA PROJECTS
NOSQL SOFTWARE ADDS NEW DATABASE CHOICES, CHALLENGES 11
IMPLEMENTATION
hybrid data ecosystem, with a blend of old and
new technologies. So its not one platform to
rule them all, so to speak, he said, but rather,
how do you coordinate between a series of
data management platforms to meet these
challenges?
RAPID RESPONSE REQUIRED
One of the big challenges, he noted, is meeting
the need for speed in big data analytics appli-
cations. According to Myers, the prominent
applications turned up by the EMA-9sight
survey included risk management and asset
optimizationthings that are core in an oper-
ational business. Such uses dont necessarily
involve continuous real-time analytics, Myers
said. But when data scientists, business ana-
lysts and other end users run analytical queries,
they need to be able to hit the button and get
that speed-of-response back.
SumAll Inc., a marketing analytics services
startup, faced just that issue with its clients.
The New York company collects and analyzes
large amounts of data about website trafc
and social media advertising campaigns for
small businesses; it uses a cloud-based imple-
mentation of MongoDBs namesake NoSQL
database to capture all the data, but the tech-
nology wasnt a good analytics platform, said
Korey Lee, SumAlls chief information ofcer.
MapReduce-based queries were taking hours,
if not days, to run on MongoDB.
The IT team rst tried to export the data to
a MySQL database for analysis, but Lee said
that process also started taking too much time
as the company collected more data. So in late
2013, SumAll turned to data warehouse soft-
ware from vendor BitYota that supports SQL
queries against non-SQL data. Lee said the
software, also cloud-based, adds a mapping
layer on top of MongoDB that enables SumAll
to query its full store of data using familiar SQL
tools.
GO YOUR OWN WAY
Other organizations might need to take dif-
ferent approaches, though. In the big data era,
enterprise architectures are no longer nice, neat
and replicable from company to company, said
William McKnight, president of McKnight
HOME
EDITORS NOTE
VARIED NOSQL OPTIONS
NEED CAREFUL WEIGHING,
SORTING
FIT BY FIT, NOSQL DATABASES
VIE TO DISPLACE RDBMSes
NOSQL JUST ONE PART OF IT
MIX ON BIG DATA PROJECTS
NOSQL SOFTWARE ADDS NEW DATABASE CHOICES, CHALLENGES 12
IMPLEMENTATION
Consulting Group. In fact, he uses the concept
of a no-reference architecture in discussing
the technologies that could be incorporated
into a big data ecosystem. Every company is
different, McKnight said. Gone are the days
when a vendor or a consultant could walk into
a shop with a laminated sheet of paper and say,
This is what everybody needs to do.
And McKnight agreed with Myers that in
most cases, a big data environment requires a
variety of technology platforms. Everybody
has a dirty sheet of paper right now with all
sorts of lines crisscrossing about data integra-
tion, he said. The idea is just to keep moving
it forward, though; keep moving it forward into
a modern architecture that stores all data and
serves it up to the user community.
The traditional data warehouse still has
a role to play in supporting basic reporting
needs, McKnight said. But technologies such as
columnar databases and in-memory process-
ing systems might also be called forthe same
for Hadoop and NoSQL software. According to
McKnight, the latter are starting to nd their
way into most large companiesfor prototyp-
ing and proofs of concept, if not necessarily
for production uses at this point. They have to
work with other types of technologies, though.
Despite all the hoopla, Hadoop and NoSQL
are not going to do away with the data ware-
house or relational databases in general.
Craig Stedman
The traditional data ware-
house still has a role to play
in supporting basic reporting
needs, but new technologies
might also be called for.
HOME
EDITORS NOTE
VARIED NOSQL OPTIONS
NEED CAREFUL WEIGHING,
SORTING
FIT BY FIT, NOSQL DATABASES
VIE TO DISPLACE RDBMSes
NOSQL JUST ONE PART OF IT
MIX ON BIG DATA PROJECTS
NOSQL SOFTWARE ADDS NEW DATABASE CHOICES, CHALLENGES 13
ABOUT
THE
AUTHORS
JACK VAUGHAN is news and site editor of SearchData
Management. He covers topics such as big data manage-
ment, data warehousing, databases and data integration.
Vaughan previously was an editor for TechTargets
SearchSOA, SearchVB, TheServerSide and SearchDomino
websites. Email him at jvaughan@techtarget.com and
follow him on Twitter: @JackVaughanatTT.
CRAIG STEDMAN is an executive editor in TechTargets
Business Applications and Architecture Media Group.
Stedman oversees editorial processes and writes for
SearchBusinessAnalytics and SearchDataManagement
as well as the SearchOracle and SearchSQLServer
websites. Email him at cstedman@techtarget.com and
follow him on Twitter: @craigstedman.
NoSQL Software Adds New Database Choices, Challenges
is a SearchDataManagement.com e-publication.
Scot Petersen | Editorial Director
Jason Sparapani | Managing Editor, E-Publications
Joe Hebert | Associate Managing Editor, E-Publications
Craig Stedman | Executive Editor
Linda Koury | Director of Online Design
Neva Maniscalco | Graphic Designer
Doug Olender | Publisher | dolender@techtarget.com
Annie Matthews | Director of Sales
amatthews@techtarget.com
TechTarget
275 Grove Street, Newton, MA 02466
www.techtarget.com
2014 TechTarget Inc. No part of this publication may be transmitted or re-
produced in any form or by any means without written permission from the
publisher. TechTarget reprints are available through The YGS Group.
About TechTarget: TechTarget publishes media for information technology
professionals. More than 100 focused websites enable quick access to a deep
store of news, advice and analysis about the technologies, products and pro-
cesses crucial to your job. Our live and virtual events give you direct access to
independent expert commentary and advice. At IT Knowledge Exchange, our
social community, you can get advice and share solutions with peers and experts.
COVER ART: THINKSTOCK
STAY CONNECTED!
Follow @SearchDataManagement on Twitter

Вам также может понравиться