Вы находитесь на странице: 1из 10

Big Data/ Hadoop Analyst +Coop(Online)

Spring 2018
Contents

1. Introduction .......................................................................................................................................... 4
2. Objective ............................................................................................................................................... 4
3. Schedule ................................................................................................................................................ 4
4. Cost ....................................................................................................................................................... 5
5. Career Roadmap ................................................................................................................................... 5
6. Course Outline ...................................................................................................................................... 5
7. Contact Us ........................................................................................................................................... 10

Q u a l i t y i s n e v e r a n a c c i d e n t ; i t i s a l w a y s t h e r e s u l t o f g o o d p l a n n i n g ,
i n t e l l i g e n t d i r e c t i o n a n d s k i l l f u l e x e c u t i o n – J o h n R u s k i n
Page 2
Copyright Notice
Copyright © 2018, busyQA Inc

All rights reserved


No part of this document may be reproduced in any form, including photocopying or transmission electronically to any computer,
without prior written consent of busyQA. The information contained in this document is proprietary to busyQA Inc and may not be
used or disclosed except as expressly authorized in writing by busyQA.

Trademarks
Company name, logo and other products mentioned in this document are registered trademarks of busyQA.

Q u a l i t y i s n e v e r a n a c c i d e n t ; i t i s a l w a y s t h e r e s u l t o f g o o d p l a n n i n g ,
i n t e l l i g e n t d i r e c t i o n a n d s k i l l f u l e x e c u t i o n – J o h n R u s k i n
Page 3
1. Introduction

This training course is a professional development program design to give students the theoretical
background and practical knowledge and skills required to succeed in the Big Data field as Hadoop
Big Data test analyst and Big Data Engineer. The course covers various components of Hadoop,
MapReduce, HDFS, Pig, Hive, Sqoop, Flume, Spark along with NoSQL Databases.

2. Objective
The primary objectives of this course are to:

 Help students master the concepts of the Hadoop framework.


 Show students how the Hadoop ecosystem fits into the Big Data processing lifecycle.
 Help students gain in-depth knowledge of spark, HDFS, YARN, and MapReduce.
 Teach students on how to use Pig, Hive, and Impala to process and analyze large dataset
 Train on how to use Sqoop and Flume for data ingestion with big data training
 Master real-time data processing, functional programming and parallel processing in Spark
 Master big data hadoop testing activities
 Run a resume and career workshop that will aid students with career opportunities as a Big
Data Analyst.

3. Schedule

The course is a 75hours (10 weeks) program with 1 ½ hrs daily Monday- Friday online training.
Students are advised to spend more time learning class materials and gaining hands on experience
during their personal time.

Module 1 Data warehousing, ETL & Big Data concepts


Module 2 Linux & Shell scripting
Module 3 Java/Python essentials for Hadoop
Module 4 Hadoop HDFS, MapReduce & Yarn
Module 5 Querying Data using Hive/Impala
Module 6 Ingesting data using Apache Sqoop
Module 7 Streaming data using Apache flume
Module 8 Processing and Transforming data using Apache Pig
Module 9 NoSQL Database (MongoDB/Cassandra)
Module 10 Apache Spark
Module 11 Scala
Module 12 Hadoop/Big Data testing
Co-op (bonus) 1 - 3 month Hands on work experience
Resume Workshop, Mock Interview

Q u a l i t y i s n e v e r a n a c c i d e n t ; i t i s a l w a y s t h e r e s u l t o f g o o d p l a n n i n g ,
i n t e l l i g e n t d i r e c t i o n a n d s k i l l f u l e x e c u t i o n – J o h n R u s k i n
Page 4
4. Cost
Register to secure yourself a spot in the program. Program cost is $1800 + tax ($234)

 Payment plans are available


 5% discount on a onetime payment
 $50 discount on referrals
 Training cost is tax deductible

5. Career Roadmap

6. Course Outline

Module1: Data warehousing, ETL & Big Data concepts

 Need for Data Warehousing


 What is Data Warehousing?
 Advantages of Data Warehouse
 Properties of A Data Warehouse
 Data Warehouse Architecture

Q u a l i t y i s n e v e r a n a c c i d e n t ; i t i s a l w a y s t h e r e s u l t o f g o o d p l a n n i n g ,
i n t e l l i g e n t d i r e c t i o n a n d s k i l l f u l e x e c u t i o n – J o h n R u s k i n
Page 5
 Concepts of OLTP and OLAP
 What is ETL?
 What is Big Data?
 5 V's of Big Data
 Types of Data
 What is Hadoop?
 History of Hadoop
 Architecture of Hadoop
 Hadoop ecosystem

Module 2: Linux & Shell scripting

 Linux File & Directory commands


 Filter commands
 File Compare commands
 File Access permissions
 Miscellaneous commands
 Vi Editor commands
 Shell Scripting – Fundamentals
 Parameters, Variables and Arguments
 Conditional statements and Loops
 Regular Expressions and Text Manipulations
 Writing Functions and Advanced Scripts
 Real World Shell Scripting Examples

Module 3: Java/Python Essentials for Hadoop


Java
 Java introduction
 JDK, JRE and JVM
 Installing Java on Linux
 Java Data types and Operators
 Conditional statements and loops
 Arrays and Strings
 Java OOPS concepts
 Classes & Objects
 Method overloading and Overriding
 Inheritance, Encapsulation and Polymorphism
 Abstract class and Interface
 Java collections
 List, Set and Map
Python
 Python Introduction
 Who uses Python?
 Python features
 Operators and Data Types in Python

Q u a l i t y i s n e v e r a n a c c i d e n t ; i t i s a l w a y s t h e r e s u l t o f g o o d p l a n n i n g ,
i n t e l l i g e n t d i r e c t i o n a n d s k i l l f u l e x e c u t i o n – J o h n R u s k i n
Page 6
 Conditional statements and loops
 Types Sequences
 Lists, Tuples ,Strings, Sets and Dictionaries
 Functions in Python
 Classes and Objects
 File Handling in Python
 Data Life Cycle
 What is Pandas?
 Pandas Operations
 Python for Hadoop

Module 4: Hadoop HDFS, MapReduce & Yarn

 Hadoop components – HDFS, MapReduce & Yarn


 Introduction to HDFS & MapReduce
 HDFS Architecture
 HDFS Commands
 MapReduce Architecture
 MapReduce Examples in Java & Python
 Validating MAP Reduce jobs

Module 5: Querying Data using Hive

 Hive Overview
 Hive Characteristics and Features
 Different Hive Tables and its Differences
 How Hive different with RDBMS
 Hive Components & Clients
 Creating and dropping Hive database
 Hive Data Types
 Hive Managed Tables
 Hive External Tables
 Altering Hive Table
 Collections - Array, Map & Struct
 Processing XML & JSON files in Hive
 Hive Partitions & Buckets
 Indexes and Views
 Hive Queries: Order By, Group By, Distribute By and Cluster By clauses
 Hive Aggregation Functions
 Hive Joins
 Hive UDF's and UDAF's
 Working with Hue
 Creating and Querying hive tables in Hue

Module 6: Ingesting Data using Apache Sqoop

 Sqoop Overview

Q u a l i t y i s n e v e r a n a c c i d e n t ; i t i s a l w a y s t h e r e s u l t o f g o o d p l a n n i n g ,
i n t e l l i g e n t d i r e c t i o n a n d s k i l l f u l e x e c u t i o n – J o h n R u s k i n
Page 7
 Sqoop Components and Architecture
 Importing data from RDBMS tables to HDFS
 Exporting data from HDFS to RDBMS Tables
 Sqoop Commands
 Working with Sqoop Jobs
 Mini testing project on Hive and Sqoop

Module 7: Data streaming using Apache flume & Kafka

 Overview on flume
 Flume Architecture and components
 Flume data flow
 Flume configuration file
 Fetching twitter logs data
 Sequence Generator Source
 NetCat Source
 Overview of Kafka Message System
 Kafka ACL’s
 Kafka Topics
 Overview on Apache Storm

Module 8: Processing and transforming Data using Apache Pig

 Overview on Pig
 Pig Shell Types
 Load and Store operators
 Diagnostic Operators
 Grouping and Joining
 Combining and Splitting
 Filtering
 Sorting
 Pig Latin Built-in functions
 Pig UDF’s
 Understanding the test cases of pig & Testing Pig Jobs
 Mini Project on Pig and Sqoop/Flume

Module 9: NoSQL Databases

 What is NoSQL?
 Challenges of RDBMS
 Benefits to adopting a NoSQL database over RDBMS
 Concepts and characteristics of NoSQL databases
 Popular NoSQL Databases (Hbase and MongoDB)
 Working with NOSQL Databases
 Schema definition -Tables, Columns, Data types, Sequences, Partitions, Procedures, Functions.
 DDL/DCL/DML Scripts
 Overview on Cassandra
Q u a l i t y i s n e v e r a n a c c i d e n t ; i t i s a l w a y s t h e r e s u l t o f g o o d p l a n n i n g ,
i n t e l l i g e n t d i r e c t i o n a n d s k i l l f u l e x e c u t i o n – J o h n R u s k i n
Page 8
Module 10: Apache Spark

Apache Spark
 Spark Introduction and Components
 Spark Architecture
 RDD’s in Spark
Apache Spark SQL
 Spark SQL Overview
 Spark SQL Libraries
 Features of Spark SQL
 Querying using Spark SQL

Module 11: Scala for Apache Spark

 Scala Introduction
 Scala Installation
 Basic Types and Operators in scala
 Scala Arrays and Strings
 Conditional statements and loops
 Classes and Objects
 Scala Functions and Closures
 Scala Traits
 Scala File Input and Output

Module 12: Hadoop/Big Data testing

 Introduction to Big Data testing and use cases


 Roles and responsibilities of Big Data tester
 Key Challenges in Testing Big Data
 Big Data Testing Techniques
 Test Plan and Test Cases
 Big Data Testing tools
 Identifying the Testing Gates and Entry Points
 Functional and Regression Testing
 Big Data Testing Stages and Testing Tasks

Q u a l i t y i s n e v e r a n a c c i d e n t ; i t i s a l w a y s t h e r e s u l t o f g o o d p l a n n i n g ,
i n t e l l i g e n t d i r e c t i o n a n d s k i l l f u l e x e c u t i o n – J o h n R u s k i n
Page 9
7. Contact Us

Visit our website: www.busyqa.com for more information

Training URL: https://www.busyqa.com/bigdata

Email: info.busyqa@gmail.com

Phone: 1 (905) 499 3705 Cell: 416 902 8026

Twitter: twitter.com/busyqa

Facebook: www.facebook.com/busyqa

Q u a l i t y i s n e v e r a n a c c i d e n t ; i t i s a l w a y s t h e r e s u l t o f g o o d p l a n n i n g ,
i n t e l l i g e n t d i r e c t i o n a n d s k i l l f u l e x e c u t i o n – J o h n R u s k i n
Page 10

Вам также может понравиться