Академический Документы
Профессиональный Документы
Культура Документы
Definition
Characteristics
Challenges
analytical techniques and
Benefits of big data through our presentation.
moving ahead:
Next arpan will explain about the need, sources and appeal of
big data:
We have certain points that explains why we need big data and
its solutions:
Firstly. Its a fact that 90%of data present today has been
generated in last two years, which means the exponential
increase is really very high.
Moreover 80% of the data is unstructured or exists in widely
varying structures, which are difficult to analyse.
And among the structured data, we face a limitation with
respect to handling large quantities.
Next point is that it becomes difficult to integrate information
distributed across multiple systems that isdifferent sources
Next,The organisations and business men face difficulty in
analysis.
And regarding backups ,The database has a lot of useful
data but their life span is really short because of storage
problems.
Then we also face an issue with potential value of data being
discarded or not getting enough value.
Types of data:
One of the most important aspects of our discussion today:
This differentiates between data according to its types and
further the analysis depends on the categorization done here:
First to be structured then unstructured and finally hybrid or
multi structured.
Structured are data that can be processed by traditional
methods and stored by it like:
(state out points from image)
Mysql
Mainframe
Oracle
Db2
Sybase
Access,excel,txt etc
Teradata
Neteeza,other mpp
SAP,JDE,JDA etc
Unstructured are data that does not have a formal data model
but might have some resemblance of structure eg-xml files
And sources like
Social media
Digital
Video
Audio
Geospacial data
Third kind or hybrid is with mixed property:
From emerging market data
e commerce
weather
currency conversions
demographical etc
Types be like POS,POL,IR etc.
Last is
Interpretation
Human Collaboration
With big data, analysts have not only more data to work with,
but also the processing power to handle large numbers of
records with many attributes. Traditional machine learning uses
statistical analysis based on a sample of a total data set. You
now have the ability to do very large numbers of records and
very large numbers of attributes per record and that increases
predictability.
The combination of big data and compute power also lets
analysts explore new behavioral data throughout the day, such
as websites visited or location. This is sparse data,because to
find something of interest you must wade through a lot of data
that doesnt matter. Now you can find which variables are best
analytically by thrusting huge computing resources at the
problem. It really is a game changer.
To enable real-time analysis and predictive modeling out of the
same Hadoop core, thats where the interest is for us.The
problem has been speed, with Hadoop taking up to 20 times
longer to get questions answered than did more established
technologies. So Apache Spark, a large-scale data processing
engine, and its associated SQL query tool, Spark SQL. Spark has
this fast interactive query as well as graph services and
streaming capabilities. It is keeping the data within Hadoop, but
giving enough performance to close the gap for us.
5. SQL on Hadoop: Faster, better
If youre a smart coder and mathematician, you can drop data
in and do an analysis on anything in Hadoop. Thats the
promise and the problem. But we need someone to put it
into a format and language structure that we are familiar with.
Thats where SQL for Hadoop products come in, although any
familiar language could work. Tools that support SQL-like
querying let business users who already understand SQL apply
similar techniques to that data. SQL on Hadoop opens the door
to Hadoop in the enterprise.we can write scripts using Java,
JavaScript and Python something Hadoop users have
traditionally needed to do.
These tools are nothing new. Apache Hive has offered a
structured a structured, SQL-like query language for Hadoop for
7. Deep learning
Deep learning, a set of machine-learning techniques based on
neural networking, is still evolving but shows great potential for
solving business problems,Deep learning . . . enables
computers to recognize items of interest in large quantities of
unstructured and binary data, and to deduce relationships
without needing specific models or programming instructions.
it could be used to recognize many different kinds of data, such
as the shapes, colors and objects in a video or even the
presence of a cat within images. This notion of cognitive
engagement, advanced analytics and the things it implies . . .
are an important future trend.
8. In-memory analytics