Introduction To Data Storage and Processing

Introduction to Data Storage and Processing
Installing the Hadoop Distributed File System (HDFS)
Defining key design assumptions and architecture
Configuring and setting up the file system
Issuing commands from the console
Reading and writing files

Setting the stage for MapReduce
Reviewing the MapReduce approach
Introducing the computing daemons
Dissecting a MapReduce job
Defining Hadoop Cluster Requirements

Planning the architecture
Selecting appropriate hardware
Designing a scalable cluster
Building the cluster
Installing Hadoop daemons

Optimizing the network architecture
Configuring a Cluster
Preparing HDFS
Setting basic configuration parameters
Configuring block allocation, redundancy and replication
Deploying MapReduce
Installing and setting up the MapReduce environment
Delivering redundant load balancing via Rack Awareness
Maximizing HDFS Robustness

Creating a faulttolerant file system
Isolating single points of failure
Maintaining High Availability
Triggering manual failover
Automating failover with Zookeeper
Leveraging NameNode Federation
Extending HDFS resources
Managing the namespace volumes

Introducing YARN
Critiquing the YARN architecture
Identifying the new daemons
Managing Resources and Cluster Health

Allocating resources
Setting quotas to constrain HDFS utilization
Prioritizing access to MapReduce using schedulers

Maintaining HDFS
Starting and stopping Hadoop daemons
Monitoring HDFS status
Adding and removing data nodes
Administering MapReduce
Managing MapReduce jobs
Tracking progress with monitoring tools
Commissioning and decommissioning compute nodes
Maintaining a Cluster
Employing the standard builtin tools
Managing and debugging processes using JVM metrics
Performing Hadoop status checks
Tuning with supplementary tools
Assessing performance with Ganglia
Benchmarking to ensure continued performance
Extending Hadoop
Simplifying information access
Enabling SQLlike querying with Hive
Installing Pig to create MapReduce jobs
Integrating additional elements of the ecosystem
Imposing a tabular view on HDFS with HBase
Configuring Oozie to schedule workflows
Implementing Data Ingress and Egress

Facilitating generic input/output
Moving bulk data into and out of Hadoop
Transmitting HDFS data over HTTP with WebHDFS

Acquiring applicationspecific data
Collecting multisourced log files with Flume
Importing and exporting relational information with Sqoop
Planning for Backup, Recovery and Security
Coping with inevitable hardware failures
Securing your Hadoop cluster
DURATION : 45 Working Days

FACULTY : MS. SINDU
BATCH TIMINGS : Click here
FOR HADOOP DEVELOPMENT COURSE : Click here
Hadoop Admin
How the Hadoop Distributed File System and Map Reduce work
What hardware configurations are optimal for Hadoop clusters
How to configure Hadoop's options for best cluster performance
How to configure NameNode High Availability
How to configure NameNode Federation
How to configure the FairScheduler to provide service-level agreements for
multiple users of a cluster
How to install and implement Kerberos-based security for your cluster
What system administration issues exist with other Hadoop projects such as
Hive, Pig, and HBase
Introduction to Big Data
Characteristics of Big Data

Why is parallel computing important
Discuss various products developed by vendors
Introducing Hadoop
Components of Hadoop
Starting Hadoop
Identify various processes
Hands on
Working with HDFS
Basic file commands

Web Based User Interface
Reading & Writing to files
Run a word count program
View jobs in the Web UI
Hands on
Installation & Configuration of Hadoop
Types of installation (RPMs & Tar files)

Set up ssh for the Hadoop cluster
Tree structure
XML, masters and slaves files
Checking system health
Discuss block size and replication factor
Benchmarking the cluster

Hands on
Advanced administration activities
Adding and de-commissioning nodes

Purpose of secondary name node
Recovery from a failed name node
Managing quotas
Enabling trash
Hands on
Monitoring the Hadoop Cluster
Hadoop infrastructure monitoring

Hadoop specific monitoring
Install and configure Nagios / Ganglia
Capture metrics
Hands on
Other Components of the Hadoop ecosystem
Discuss Hive, Sqoop, Pig, HBase, Flume

Use cases of each
Use Hadoop streaming to write code in Perl / Python
Hands on

Introduction To Data Storage and Processing

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Introduction To Data Storage and Processing

Загружено:

Авторское право:

Доступные форматы

Introduction to Data Storage and Processing

Installing the Hadoop Distributed File System (HDFS)

Defining key design assumptions and architecture

Configuring and setting up the file system

Issuing commands from the console

Reading and writing files

Reviewing the MapReduce approach

Introducing the computing daemons

Dissecting a MapReduce job

Defining Hadoop Cluster Requirements

Selecting appropriate hardware

Designing a scalable cluster

Building the cluster

Installing Hadoop daemons

Setting basic configuration parameters

Configuring block allocation, redundancy and replication

Installing and setting up the MapReduce environment

Delivering redundant load balancing via Rack Awareness

Maximizing HDFS Robustness

Isolating single points of failure

Maintaining High Availability

Triggering manual failover

Automating failover with Zookeeper

Leveraging NameNode Federation

Extending HDFS resources

Managing the namespace volumes

Critiquing the YARN architecture

Identifying the new daemons

Managing Resources and Cluster Health

Setting quotas to constrain HDFS utilization

Prioritizing access to MapReduce using schedulers

Starting and stopping Hadoop daemons

Monitoring HDFS status

Adding and removing data nodes

Managing MapReduce jobs

Tracking progress with monitoring tools

Commissioning and decommissioning compute nodes

Managing and debugging processes using JVM metrics

Performing Hadoop status checks

Tuning with supplementary tools

Assessing performance with Ganglia

Benchmarking to ensure continued performance

Enabling SQLlike querying with Hive

Installing Pig to create MapReduce jobs

Integrating additional elements of the ecosystem

Imposing a tabular view on HDFS with HBase

Configuring Oozie to schedule workflows

Implementing Data Ingress and Egress

Moving bulk data into and out of Hadoop

Transmitting HDFS data over HTTP with WebHDFS

Collecting multisourced log files with Flume

Importing and exporting relational information with Sqoop

Planning for Backup, Recovery and Security

Coping with inevitable hardware failures

Securing your Hadoop cluster

DURATION : 45 Working Days

Introduction to Big Data

Characteristics of Big Data

Working with HDFS

Basic file commands

Installation & Configuration of Hadoop

Types of installation (RPMs & Tar files)

Benchmarking the cluster