Вы находитесь на странице: 1из 1

What Big Data really is?

The concept of Big Data is not big as its name but very popular name nowadays. As per Wikipedias
definition Big Data is the term for a collection of data sets so large and so complex that it becomes
difficult to process using on-hand database management tools or traditional data processing
applications. Now in this article we will be refine the definition of Big Data more clearly.
So let us first understand that why there is a need of study of this domain.
IBM estimates that as much as 90% of the data in the world today has been created in last two years
alone and is still being produced at a huge rate. The data needs to be processed. Here the problem
begins. And thus we defines the area of study as Big Data.
Let us understand this thing with an example of a Telecom Phone Company, and in order to make the
things simpler consider you mobile phone only. Whenever it is turned ON, it is connecting to the cell
towers to get reception. As you move around, it will connect to different towers, and at different signal
strengths depending on how far away from them you are. All of that connection data is collected by the
phone company, and it is logged. They can use it to find dead spots in their coverage, to work out that
which towers are the busiest and need to increased capacity. Also they provide some schemes to the
customers depending on their needs, by just estimating that what they actually use. As you might see
that if you are not an Internet user, you will get many attractive offers. Now this we call as Business
Analytics. So in order to process this data we need some tools.
What is the problem behind the Big Data? OR What are the issues that Big Data become the
study area? OR Is it only Big Data that is the problem of concern?
The answer to the above question lies in the following section, tells about the 3Vs of Big Data.
1. Volume: Size of Data is Big.
2. Variety: Data coming from various sources and in various different formats, i.e. the Big
Variety of Data is being generated.
3. Velocity: The rate at which the data is being generated is very Big.

Big Data a problem as Cloud Computing is on its peak and weve a Big amount of variety data
producing and a large Big.
Thus in 2003 a Technology came into existence when Google published papers on its data
processing infrastructure that solved the same problems. And the efforts were made to build
Distributed File System and a Distributed Processing Engine that could scale to thousands of
nodes in a network. This development gave an open source software platform for storage and
large-scale processing of datasets on clusters of commodity hardware that we call as Apache
Hadoop.

Вам также может понравиться