Вы находитесь на странице: 1из 15

Introduction to Big Data

Analytics Hadoop

SpringPeople Software Private Limited, All Rights Reserved.

What is Big Data?


Big data is a popular term used to describe the
exponential growth and availability of data, both
structured and unstructured. And big data may
be as important to business and society as
the Internet has become.

SpringPeople Software Private Limited, All Rights Reserved.

What Is Hadoop?
It is a free, Java-based programming framework
that supports the processing of large data sets in
a distributed computing environment. It is part
of the Apache project sponsored by the Apache
Software Foundation.

SpringPeople Software Private Limited, All Rights Reserved.

What is HDFS?
The Hadoop Distributed File System (HDFS) is
designed to store very large data sets reliably, and to
stream those data sets at high bandwidth to user
applications.
HDFS is like the bucket of the Hadoop system: You
dump in your data and it sits there all nice and cozy
until you want to do something with it, whether
that's running an analysis on it within Hadoop or
capturing and exporting a set of data to another tool
and performing the analysis there.
SpringPeople Software Private Limited, All Rights Reserved.

Architecture Of HDFS

SpringPeople Software Private Limited, All Rights Reserved.

About Map Reduce


MapReduce is a software framework that allows developers to
write programs that process massive amounts of unstructured
data in parallel across a distributed cluster of processors or
stand-alone computers.
The framework is divided into two parts:
Map, a function that parcels out work to different nodes in
the distributed cluster.
Reduce, another function that collates the work and resolves
the results into a single value.

SpringPeople Software Private Limited, All Rights Reserved.

Pig Latin Statement


A Pig Latin statement is a command that produces
a Relation. A relation is simply a data bag with a name.
That name is called the relation's alias. The simplest Pig
Latin statement is LOAD, which reads a relation from a
file in the file system. Other Pig Latin statements
process one or more input relations, and produce a
new relation as a result.

SpringPeople Software Private Limited, All Rights Reserved.

Data Preparation & Management

Types of variables
Identifying the business Y
Basic Statistics
Merging and Appending data Primary key concept
Missing values
Outliers

SpringPeople Software Private Limited, All Rights Reserved.

Data Visualization
Data visualization is the presentation of data in a pictorial or
graphical format. For centuries, people have depended on
visual representations such as charts and maps to understand
information more easily and quickly.
Visualizations help people see things that were not obvious to
them before. Even when data volumes are very large, patterns
can be spotted quickly and easily.
Visualizations convey information in a universal manner and
make it simple to share ideas with others.

SpringPeople Software Private Limited, All Rights Reserved.

Normal Distribution
A normal distribution is an arrangement of a data set in which
most values cluster in the middle of the range and the rest
taper off symmetrically toward either extreme.
Normal distribution curves are sometimes designed with
a histogram inside the curve. The graphs are commonly used
in mathematics, statistics and corporate data analytics.

SpringPeople Software Private Limited, All Rights Reserved.

Hypothesis Testing
Hypothesis testing refers to the process of choosing
between competing hypotheses about a probability
distribution, based on observed data from the
distribution.
The two main types of testing : T Test
Annova

SpringPeople Software Private Limited, All Rights Reserved.

Deductive Vs Inductive Reasoning


Deductive reasoning happens when a researcher works from the
more general information to the more specific. Sometimes this is
called the top-down approach because the researcher starts at
the top with a very broad spectrum of information and they work
their way down to a specific conclusion.
Inductive reasoning works the opposite way, moving from specific
observations to broader generalizations and theories. This is
sometimes called a bottom up approach. The researcher begins
with specific observations and measures, begins to then detect
patterns and regularities, formulate some tentative hypotheses to
explore, and finally ends up developing some general conclusions or
theories.

SpringPeople Software Private Limited, All Rights Reserved.

Become Big Data Expert


In Just 2 days
BigData Analytics on Hadoop will teach you all you
need to learn about BigData Analytics on Hadoop.

More Details

SpringPeople Software Private Limited, All Rights Reserved.

Suggested Audience
Data analysts / Data scientists who want to know how to use
their expertise on Big Data
Database Managers with a knowledge of Hadoop / Java who
want to know what to do next in their career and how to
manage and draw insights from their data
Consultants who want to know what Big Data analytics is.
Syllabus

SpringPeople Software Private Limited, All Rights Reserved.

For further info/assistance contact


training@springpeople.com
+91 80 656 79700
www.springpeople.com

Our Partners

SpringPeople Software Private Limited, All Rights Reserved.

Вам также может понравиться