Вы находитесь на странице: 1из 8

HADOOP 2.

0 and Analytics
Training Contents
page2

Course Contents

Mobile: +91 7719882295/ 9730463630


Email: sales@anikatechnologies.com
Website:www.anikatechnologies.com

HADOOP 2.0 and Analytics


TRAINING

Course Contents

Hadoop Ecosystem (Tools) Day1


Map Reduce Design Patterns Day2
Hadoop-2 Day3
Analytics

Mobile: +91 7719882295/ 9730463630


Email: sales@anikatechnologies.com
Website:www.anikatechnologies.com

HADOOP 2.0 and Analytics

Hadoop Ecosystem (Tools)


HBase Operations

Deep Dive in Hive

Co-Processor
Scan Operations
Column Value & Key Pair
Column Families
Index & Query
Counters
CRUD Operations
Result Scanner
Batch and Caching
MapReduce and HBase
Filters
Creating Table Shell and Programming
Importing into HBase
Understanding Hive , Architecture, Physical Model, Data
Model, Data Types
Hive QL- DDL,DML,other Operations
Playing with huge data and Querying extensively.
User defined Functions,Optimizing Queries, Tips and Tricks for
performance tuning
Tables in Hive, Partitioning, Indexes,Bucketing,Sub Queries,
Joining Tables,Data Load and appending data to exisiting Table

Deep Dive in Pig

Advance Pig Latin, Evaluation and Filter functions, Pig


and Ecosystem
Grunt, Script Mode, Data Model,
Real time use cases
Mobile: +91 7719882295/ 9730463630
Email: sales@anikatechnologies.com
Website:www.anikatechnologies.com

Sqoop

HBase DB Design

Flume

Map Reduce Design Patterns

Handling Index
Designing Keys
Transaction
Integration for search
Schema Design

Join Patterns
Metapatterns
Summarization Patterns
The Effects of YARN
Data Organization Patterns
Filtering Patterns
Input and Output Patterns
Final Thoughts

Hadoop-2
Apache Tez

Apache Tez: A New Chapter in Hadoop Data Processing


Data Processing API in Apache Tez
Writing a Tez Input/Processor/Output
Runtime API in Apache Tez
Apache Tez: Dynamic Graph Reconfiguration

Apache YARN

Agility
global ResourceManager
per-node slave NodeManager
Scalability
Support for workloads other than MapReduce
Compatibility with MapReduce
Mobile: +91 7719882295/ 9730463630
Email: sales@anikatechnologies.com
Website:www.anikatechnologies.com

HDFS-2

per-application Container running on a NodeManager


Improved cluster utilization
per-application ApplicationMaster
High Availability for HDFS
HDFS-append support
HDFS Federation
HDFS Snapshots

Analytics
Clustering

Measuring the similarity of items


Exploring distance measures
Clustering basics

Clustering algorithms in Mahout

Fuzzy k-means clustering


Model-based clustering
K-means clustering
Beyond k-means: an overview of clustering techniques

Topic modeling using latent Dirichlet allocation (LDA)

Taking clustering to production

Evaluating and improving clustering quality

Batch and online clustering


Tuning clustering performance
Quick-start tutorial for running clustering on Hadoop
Inspecting clustering output
Analyzing clustering output
Improving clustering quality

Clustering algorithms in Mahout

Topic modeling using latent Dirichlet allocation (LDA)

K-means clustering
Beyond k-means: an overview of clustering techniques
Mobile: +91 7719882295/ 9730463630
Email: sales@anikatechnologies.com
Website:www.anikatechnologies.com

Representing data

Inspecting clustering output


Analyzing clustering output
Fuzzy k-means clustering
Evaluating and improving clustering quality
Improving clustering quality
Model-based clustering
Improving quality of vectors using normalization
Representing text documents as vectors
Visualizing vectors
Generating vectors from documents

Classification

Work flow in a typical classification project


The fundamentals of classification systems
Introduction to classification
How classification works
Mahout for classification
Classification example

Training a classifier

Classifying the 20 newsgroups data set with SGD


Preprocessing raw data into classifiable data
Converting classifiable data into vectors
Mahout classifier
Choosing an algorithm to train the classifier
Classifying the 20 newsgroups data with naive Bayes

Evaluating and tuning a classifier

The classifier evaluation API


Process for deployment in huge systems
Thrift-based classification server
Building a training pipeline for large systems
When classifiers go bad
Classifier evaluation in Mahout
Mobile: +91 7719882295/ 9730463630
Email: sales@anikatechnologies.com
Website:www.anikatechnologies.com

Determining scale and speed requirements


Deploying a classifier

Recommendations

Introducing recommenders

Real-world applications of clustering

Coping without preference values


In-memory DataModels
Representing preference data

Making recommendations

Finding similar users on Twitter


Analyzing the Stack Overflow data set
Suggesting tags for artists on Last.fm

Representing recommender data

Evaluating the GroupLens data set


Defining recommendation
Evaluating precision and recall
Evaluating a recommender

Exploring similarity metrics


Slope-one recommender
New and experimental recommenders
Comparison to other recommenders
Understanding user-based recommendation
Item-based recommendation
Exploring the user-based recommender

Distributing recommendation computations

Designing a distributed item-based algorithm

Implementing a distributed algorithm with MapReduce

Analyzing the Wikipedia data set


Pseudo-distributing a recommender

Taking recommenders to production

Analyzing example data from a dating site


Mobile: +91 7719882295/ 9730463630
Email: sales@anikatechnologies.com
Website:www.anikatechnologies.com

Finding an effective recommender


Recommending to anonymous users
Injecting domain-specific information

Mobile: +91 7719882295/ 9730463630


Email: sales@anikatechnologies.com
Website:www.anikatechnologies.com

Вам также может понравиться