Академический Документы
Профессиональный Документы
Культура Документы
3. User Devices: One of the largest sources of high-velocity data is the use of smartphones. Everything you do on your smartphone is logged, providing
valuable data.
4. Social Media: Whether it is the Twittertweets, Facebook posts, Foursquare check-ins or any number of other social data streams, these create
massive amounts of real-time data that degrades in value quickly.
5. Online Gaming: Another source of real-time data based on user interactions, not just with the game but also with other users. This group includes
the Massive Multiplayer Online Gaming (MMOG) like World of Warcraft as well as 1:1 games, many played on mobile phones, like Words with Friends.
6. SaaS Applications: SaaS applications typically start with a limited set of functionality. As they mature, the functionality grows and user relationships
and interactions also grow, creating a massive flow of real-time data. Linkedin is perfect examples of this trend. This high-velocity stream of events
has led Linkedin to create Kafka a Complex Event Processor (CEP) that handles the routing and delivery of high-velocity event data.
There are many more sources of high-velocity data, including vertical sources, like the flood of GIS data found in oil and gas companies. As technologies
come online to extract value from this high-velocity data, it is transforming many industries.
Traditional Database Management Systems (DBMS) simply cannot handle the high-velocity data coming from modern applications. This is a data ingestion
problem; think of a human sipping from a firehose and youll get the idea. Hadoopprovides batch processing of high-volume data, but when dealing with
high-velocity data you need real-time processing. This has led to a few innovations.
Add a SQL Interface to Hadoop
The demand for persisting and querying high-velocity data in real-time has led a number of companies to add limited SQL interfaces to Hadoop. Examples
of this approach include Apache Tez (Hortonworks), Impala (Cloudera), Hadapt (Hadapt) and Apache HBase. Hadoop and HDFS werent designed for
database requirementsin fact their storage is based on large files, not small blocksbutcorporate demand for a solution to the high-velocity data
ingestion problem is certainly strong.Hadoop is really optimized for data volume, not data velocity.
NoSQL
NoSQL is one solution to the high-velocity data ingest problem. The challenge NoSQL faces is the same challenge faced by Hadoop, namely that
corporations have standardized upon and built expertise and tools around SQL, which doesnt work for NoSQL databases.
In-Memory DBMS
In-memory databases eliminate the slowest piece of the traditional databasethe diskenabling databases to ingest data at a much higher rate than
traditional databases. The two big contenders in the in-memory database world are HANA (SAP) and TimesTen (Oracle). However, in-memory databases
are ill-suited to high-velocity data because their data size is limited to memory;they simply cannot handle the volume of data created by a high-velocity
data source.
Extending MySQL to Handle High-Velocity Data: ScaleDB
Traditional databases, like MySQL, do not deliver sufficiently high data ingest rates to persist high-velocity data. ScaleDB changes all of that. ScaleDB
extends MySQL without changing a single line of MySQL code, so the entire ecosystem (tools, applications, etc.) works with ScaleDB. ScaleDBs new
Streaming Table technology enables a small cluster of MySQL databases to ingest millions of rows of data per second. This data is then available for
real-time manipulation using the rich tools that are already part of the MySQL ecosystem, such as Tableau Software [http://www.tableausoftware.com/]
, QlikView [http://www.qlikview.com/] and LogiAnalytics [http://www.logianalytics.com/] .
In addition to running leading analytics tools, persisting the data in a database gives you the ability to query the data in an ad hoc fashion. If we use the
exampleof a flow of colored balls, a stream processor can count green balls, or it can transform all data about red balls into orange balls. However, if you
want to ask questions of the data, across a time series, you need database functionality. For example, using a database you can ask how many red balls
were preceded by green balls, or how many orange balls we processed in the last hour, or any number of questions of any detail you need, all in an
interactive fashion.
Conclusion
High-Velocity Data, over time, accumulates to create Big Data. Think of high-velocity data as the firehose, pumping out water that forms into a pond that
represents big data. Hadoop has gained popularity for providing batch-oriented processing of big data. But batch processing is deficient in that it does
not provide real-time processing or ad hoc queries.
Several classes of applications are generating high-velocity data, where Hadoop-style batch processing is insufficient. For example, a Massive Multi-Player
Online Games (MMOG) might require a high-velocity data solution that serves multiple use cases, for example: (1) maintaining player state currently and
in between session; (2) generating real-time analytics as a mechanism for modifying game play or informing operations; (3) supporting ad hoc queries
from customer support; (4) Providing real-time action-based billing, and more. In this case a brief moving window of time, as provided by stream
processing engines is insufficient, it requires high-velocity streaming persistence with an ad hocideally SQL-basedinterface.
Hadoop opened up whole new possibilities for extracting value from big data, or high-volume data. This led more and more companies to start collecting
massive data, because they could extract value from it. The new wave of high-velocity data tools enable companies to extract real-time value from high-
velocity data, instead of waiting for it to pile up and then running a batch process on it. Look for more companies to recognize this opportunity to drink
upstream from their competition; using high-velocity data to make them more agile, responsive and ultimately more competitive.