Вы находитесь на странице: 1из 4

Introduction to Data Storage and Processing

Installing the Hadoop Distributed File System (HDFS)

Defining key design assumptions and architecture

Configuring and setting up the file system

Issuing commands from the console

Reading and writing files


Setting the stage for MapReduce

Reviewing the MapReduce approach

Introducing the computing daemons

Dissecting a MapReduce job

Defining Hadoop Cluster Requirements


Planning the architecture

Selecting appropriate hardware

Designing a scalable cluster

Building the cluster

Installing Hadoop daemons


Optimizing the network architecture

Configuring a Cluster
Preparing HDFS

Setting basic configuration parameters

Configuring block allocation, redundancy and replication

Deploying MapReduce

Installing and setting up the MapReduce environment

Delivering redundant load balancing via Rack Awareness

Maximizing HDFS Robustness


Creating a faulttolerant file system

Isolating single points of failure

Maintaining High Availability

Triggering manual failover

Automating failover with Zookeeper

Leveraging NameNode Federation

Extending HDFS resources

Managing the namespace volumes


Introducing YARN

Critiquing the YARN architecture

Identifying the new daemons

Managing Resources and Cluster Health


Allocating resources

Setting quotas to constrain HDFS utilization

Prioritizing access to MapReduce using schedulers


Maintaining HDFS

Starting and stopping Hadoop daemons

Monitoring HDFS status

Adding and removing data nodes

Administering MapReduce

Managing MapReduce jobs

Tracking progress with monitoring tools

Commissioning and decommissioning compute nodes

Maintaining a Cluster
Employing the standard builtin tools

Managing and debugging processes using JVM metrics

Performing Hadoop status checks

Tuning with supplementary tools

Assessing performance with Ganglia

Benchmarking to ensure continued performance

Extending Hadoop
Simplifying information access

Enabling SQLlike querying with Hive

Installing Pig to create MapReduce jobs

Integrating additional elements of the ecosystem

Imposing a tabular view on HDFS with HBase

Configuring Oozie to schedule workflows

Implementing Data Ingress and Egress


Facilitating generic input/output

Moving bulk data into and out of Hadoop

Transmitting HDFS data over HTTP with WebHDFS


Acquiring applicationspecific data

Collecting multisourced log files with Flume

Importing and exporting relational information with Sqoop

Planning for Backup, Recovery and Security

Coping with inevitable hardware failures

Securing your Hadoop cluster

DURATION : 45 Working Days


FACULTY : MS. SINDU
BATCH TIMINGS : Click here
FOR HADOOP DEVELOPMENT COURSE : Click here

Hadoop Admin

How the Hadoop Distributed File System and Map Reduce work
What hardware configurations are optimal for Hadoop clusters
How to configure Hadoop's options for best cluster performance
How to configure NameNode High Availability
How to configure NameNode Federation
How to configure the FairScheduler to provide service-level agreements for
multiple users of a cluster
How to install and implement Kerberos-based security for your cluster
What system administration issues exist with other Hadoop projects such as
Hive, Pig, and HBase

Introduction to Big Data

Characteristics of Big Data


Why is parallel computing important
Discuss various products developed by vendors

Introducing Hadoop

Components of Hadoop
Starting Hadoop
Identify various processes
Hands on

Working with HDFS

Basic file commands


Web Based User Interface
Reading & Writing to files
Run a word count program
View jobs in the Web UI
Hands on

Installation & Configuration of Hadoop

Types of installation (RPMs & Tar files)


Set up ssh for the Hadoop cluster
Tree structure
XML, masters and slaves files
Checking system health
Discuss block size and replication factor

Benchmarking the cluster


Hands on

Advanced administration activities

Adding and de-commissioning nodes


Purpose of secondary name node
Recovery from a failed name node
Managing quotas
Enabling trash
Hands on

Monitoring the Hadoop Cluster

Hadoop infrastructure monitoring


Hadoop specific monitoring
Install and configure Nagios / Ganglia
Capture metrics
Hands on

Other Components of the Hadoop ecosystem

Discuss Hive, Sqoop, Pig, HBase, Flume


Use cases of each
Use Hadoop streaming to write code in Perl / Python
Hands on

Вам также может понравиться