Вы находитесь на странице: 1из 3

Big data is a broad term for data sets so large or complex that traditional data processing applications are

inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer,
visualization, and information privacy. The term often refers simply to the use of predictive analytics or
other certain advanced methods to extract value from data, and seldom to a particular size of data set.
Accuracy in big data may lead to more confident decision making. And better decisions can mean greater
operational efficiency, cost reductions and reduced risk. Analysis of data sets can find new correlations,
to "spot business trends, prevent diseases, combat crime and so on." Scientists, practitioners of media
and advertising and governments alike regularly meet difficulties with large data sets in areas including
Internet search, finance and business informatics. Scientists encounter limitations in e-Science work,
including meteorology, genomics, connectomics, complex physics simulations, and biological and
environmental research. Data sets grow in size in part because they are increasingly being gathered by
cheap and numerous information-sensing mobile devices, aerial (remote sensing), software logs,
cameras, microphones, radio-frequency identification (RFID) readers, and wireless sensor networks. The
world's technological per-capita capacity to store information has roughly doubled every 40 months
since the 1980s; as of 2012, every day 2.5 exabytes (2.5×1018) of data were created; The challenge for
large enterprises is determining who should own big data initiatives that straddle the entire organization.
Work with big data is necessarily uncommon; most analysis is of "PC size" data, on a desktop PC or
notebook that can handle the available data set. Relational database management systems and desktop
statistics and visualization packages often have difficulty handling big data. The work instead requires
"massively parallel software running on tens, hundreds, or even thousands of servers". What is
considered "big data" varies depending on the capabilities of the users and their tools, and expanding
capabilities make Big Data a moving target. Thus, what is considered to be "Big" in one year will become
ordinary in later years. "For some organizations, facing hundreds of gigabytes of data for the first time
may trigger a need to reconsider data management options. For others, it may take tens or hundreds of
terabytes before data size becomes a significant consideration.

Big data usually includes data sets with sizes beyond the ability of commonly used software tools to
capture, curate, manage, and process data within a tolerable elapsed time. Big data "size" is a constantly
moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data. Big data is a set
of techniques and technologies that require new forms of integration to uncover large hidden values
from large datasets that are diverse, complex, and of a massive scale. In a 2001 research report and
related lectures, META Group (now Gartner) analyst Doug Laney defined data growth challenges and
opportunities as being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of
data in and out), and variety (range of data types and sources). Gartner, and now much of the industry,
continue to use this "3Vs" model for describing big data. In 2012, Gartner updated its definition as
follows: "Big data is high volume, high velocity, and/or high variety information assets that require new
forms of processing to enable enhanced decision making, insight discovery and process optimization."
Additionally, a new V "Veracity" is added by some organizations to describe it. If Gartner’s definition (the
3Vs) is still widely used, the growing maturity of the concept fosters a more sound difference between
big data and Business Intelligence, regarding data and their use:  Business Intelligence uses descriptive
statistics with data with high information density to measure things, detect trends etc.;  Big data uses
inductive statistics and concepts from nonlinear system identification to infer laws (regressions,
nonlinear relationships, and causal effects) from large sets of data with low information density to reveal
relationships, dependencies and perform predictions of outcomes and behaviors. A more recent,
consensual definition states that "Big Data represents the Information assets characterized by such a
High Volume, Velocity and Variety to require specific Technology and Analytical Methods for its
transformation into Value".

“Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has
been created in the last two years alone. This data comes from everywhere: sensors used to gather
climate information, posts to social media sites, digital pictures and videos, purchase transaction records,
and cell phone GPS signals to name a few”[1]. such colossal amount of data that is being produced
continuously is what can be coined as Big Data. Big Data decodes previously untouched data to derive
new insight that gets integrated into business operations. However, as the amounts of data increases
exponential, the current techniques are becoming obsolete. Dealing with Big Data requires comp. Big
Data can be simply defined by explaining the 3V‘s – volume, velocity and variety which are the driving
dimensions of Big Data quantification. Gartner analyst, Doug Laney [3] introduced the famous 3 V‘s
concept in his 2001 Metagroup publication, ‗3D data management: Controlling Data Volume, Variety
and Velocity‘. Figure-1: schematic representation of the 3V‘s [4] of Big Data Volume: The increase in data
volume in enterprise-type systems is caused by the amount of transactions and other traditional data
types, as well as by new data types. Too much data becomes a storage problem, but also has a great
impact on the complexity of data analysis; This essentially concerns the large quantities of data that is
generated continuously. Initially storing such data was problematic because of high storage costs.
However with decreasing storage costs, this problem has been kept somewhat at bay as of now.
However this is only a International Research Journal of Engineering and Technology (IRJET) e-ISSN:
2395-0056 Volume: 05 Issue: 02 | Feb-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact
Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 126 temporary solution and better
technology needs to be developed. Smartphones, E-Commerce and social networking websites are
examples where massive amounts of data are being generated. This data can be easily distinguishes
between structured data, unstructured data and semi-structured data. Velocity: refers to both the speed
with which data is produced and that with which it must be processed to meet demand. This involves
data flows, the creation of structured records, as well as availability for access and delivery. The speed of
data generation, processing and analysis is continuously increasing due to real-time generation
processes, requests resulting from combining data flows with business processes, and decision-making
processes. The velocity of the data processing must be high, while the processing capacity depends on
the type of processing of the data flows; In what now seems like the pre-historic times, data was
processed in batches. However this technique is only feasible when the incoming data rate is slower than
the batch processing rate and the delay is much of a hindrance. At present times, the speed at which
such colossal amounts of data are being generated is unbelievably high. Variety: converting large
volumes of transactional information into decisions has always been a challenge for IT leaders, although
in the past the types of generated or processed data were less diverse, simpler and usually structured.
Currently, more information coming from new channels and emerging technologies - mainly from social
media, the Internet of Things, mobile sources and online advertising - is available for analysis and
generates semi structured or unstructured data. This includes tabular data (databases), hierarchical data,
documents, XML, emails, blogs, instant messaging, click streams, log files, data metering, images, audio,
video, information about share rates (stock ticker), financial transactions etc.; Implementing Big Data is a
mammoth task given the large volume, velocity and variety. ―Big Data‖ is a term encompassing the use
of techniques to capture, process, analyze and visualize potentially large datasets in a reasonable
timeframe not accessible to standard IT technologies. By extension, the platform, tools and software
used for this purpose are collectively called ―Big Data technologies‖. [7] Currently, the most commonly
implemented technology is Hadoop. Hadoop is the culmination of several other technologies like
Hadoop Distribution File Systems, Pig, Hive and HBase. Etc. However, even Hadoop or other existing
techniques will be highly incapable of dealing with the complexities of Big Data in the near future. The
following are few cases where standard processing approaches to problems will fail due to Big Data

Вам также может понравиться