Академический Документы
Профессиональный Документы
Культура Документы
Analytics Hadoop
What Is Hadoop?
It is a free, Java-based programming framework
that supports the processing of large data sets in
a distributed computing environment. It is part
of the Apache project sponsored by the Apache
Software Foundation.
What is HDFS?
The Hadoop Distributed File System (HDFS) is
designed to store very large data sets reliably, and to
stream those data sets at high bandwidth to user
applications.
HDFS is like the bucket of the Hadoop system: You
dump in your data and it sits there all nice and cozy
until you want to do something with it, whether
that's running an analysis on it within Hadoop or
capturing and exporting a set of data to another tool
and performing the analysis there.
SpringPeople Software Private Limited, All Rights Reserved.
Architecture Of HDFS
Types of variables
Identifying the business Y
Basic Statistics
Merging and Appending data Primary key concept
Missing values
Outliers
Data Visualization
Data visualization is the presentation of data in a pictorial or
graphical format. For centuries, people have depended on
visual representations such as charts and maps to understand
information more easily and quickly.
Visualizations help people see things that were not obvious to
them before. Even when data volumes are very large, patterns
can be spotted quickly and easily.
Visualizations convey information in a universal manner and
make it simple to share ideas with others.
Normal Distribution
A normal distribution is an arrangement of a data set in which
most values cluster in the middle of the range and the rest
taper off symmetrically toward either extreme.
Normal distribution curves are sometimes designed with
a histogram inside the curve. The graphs are commonly used
in mathematics, statistics and corporate data analytics.
Hypothesis Testing
Hypothesis testing refers to the process of choosing
between competing hypotheses about a probability
distribution, based on observed data from the
distribution.
The two main types of testing : T Test
Annova
More Details
Suggested Audience
Data analysts / Data scientists who want to know how to use
their expertise on Big Data
Database Managers with a knowledge of Hadoop / Java who
want to know what to do next in their career and how to
manage and draw insights from their data
Consultants who want to know what Big Data analytics is.
Syllabus
Our Partners