Академический Документы
Профессиональный Документы
Культура Документы
Spring 2018
Contents
1. Introduction .......................................................................................................................................... 4
2. Objective ............................................................................................................................................... 4
3. Schedule ................................................................................................................................................ 4
4. Cost ....................................................................................................................................................... 5
5. Career Roadmap ................................................................................................................................... 5
6. Course Outline ...................................................................................................................................... 5
7. Contact Us ........................................................................................................................................... 10
Q u a l i t y i s n e v e r a n a c c i d e n t ; i t i s a l w a y s t h e r e s u l t o f g o o d p l a n n i n g ,
i n t e l l i g e n t d i r e c t i o n a n d s k i l l f u l e x e c u t i o n – J o h n R u s k i n
Page 2
Copyright Notice
Copyright © 2018, busyQA Inc
Trademarks
Company name, logo and other products mentioned in this document are registered trademarks of busyQA.
Q u a l i t y i s n e v e r a n a c c i d e n t ; i t i s a l w a y s t h e r e s u l t o f g o o d p l a n n i n g ,
i n t e l l i g e n t d i r e c t i o n a n d s k i l l f u l e x e c u t i o n – J o h n R u s k i n
Page 3
1. Introduction
This training course is a professional development program design to give students the theoretical
background and practical knowledge and skills required to succeed in the Big Data field as Hadoop
Big Data test analyst and Big Data Engineer. The course covers various components of Hadoop,
MapReduce, HDFS, Pig, Hive, Sqoop, Flume, Spark along with NoSQL Databases.
2. Objective
The primary objectives of this course are to:
3. Schedule
The course is a 75hours (10 weeks) program with 1 ½ hrs daily Monday- Friday online training.
Students are advised to spend more time learning class materials and gaining hands on experience
during their personal time.
Q u a l i t y i s n e v e r a n a c c i d e n t ; i t i s a l w a y s t h e r e s u l t o f g o o d p l a n n i n g ,
i n t e l l i g e n t d i r e c t i o n a n d s k i l l f u l e x e c u t i o n – J o h n R u s k i n
Page 4
4. Cost
Register to secure yourself a spot in the program. Program cost is $1800 + tax ($234)
5. Career Roadmap
6. Course Outline
Q u a l i t y i s n e v e r a n a c c i d e n t ; i t i s a l w a y s t h e r e s u l t o f g o o d p l a n n i n g ,
i n t e l l i g e n t d i r e c t i o n a n d s k i l l f u l e x e c u t i o n – J o h n R u s k i n
Page 5
Concepts of OLTP and OLAP
What is ETL?
What is Big Data?
5 V's of Big Data
Types of Data
What is Hadoop?
History of Hadoop
Architecture of Hadoop
Hadoop ecosystem
Q u a l i t y i s n e v e r a n a c c i d e n t ; i t i s a l w a y s t h e r e s u l t o f g o o d p l a n n i n g ,
i n t e l l i g e n t d i r e c t i o n a n d s k i l l f u l e x e c u t i o n – J o h n R u s k i n
Page 6
Conditional statements and loops
Types Sequences
Lists, Tuples ,Strings, Sets and Dictionaries
Functions in Python
Classes and Objects
File Handling in Python
Data Life Cycle
What is Pandas?
Pandas Operations
Python for Hadoop
Hive Overview
Hive Characteristics and Features
Different Hive Tables and its Differences
How Hive different with RDBMS
Hive Components & Clients
Creating and dropping Hive database
Hive Data Types
Hive Managed Tables
Hive External Tables
Altering Hive Table
Collections - Array, Map & Struct
Processing XML & JSON files in Hive
Hive Partitions & Buckets
Indexes and Views
Hive Queries: Order By, Group By, Distribute By and Cluster By clauses
Hive Aggregation Functions
Hive Joins
Hive UDF's and UDAF's
Working with Hue
Creating and Querying hive tables in Hue
Sqoop Overview
Q u a l i t y i s n e v e r a n a c c i d e n t ; i t i s a l w a y s t h e r e s u l t o f g o o d p l a n n i n g ,
i n t e l l i g e n t d i r e c t i o n a n d s k i l l f u l e x e c u t i o n – J o h n R u s k i n
Page 7
Sqoop Components and Architecture
Importing data from RDBMS tables to HDFS
Exporting data from HDFS to RDBMS Tables
Sqoop Commands
Working with Sqoop Jobs
Mini testing project on Hive and Sqoop
Overview on flume
Flume Architecture and components
Flume data flow
Flume configuration file
Fetching twitter logs data
Sequence Generator Source
NetCat Source
Overview of Kafka Message System
Kafka ACL’s
Kafka Topics
Overview on Apache Storm
Overview on Pig
Pig Shell Types
Load and Store operators
Diagnostic Operators
Grouping and Joining
Combining and Splitting
Filtering
Sorting
Pig Latin Built-in functions
Pig UDF’s
Understanding the test cases of pig & Testing Pig Jobs
Mini Project on Pig and Sqoop/Flume
What is NoSQL?
Challenges of RDBMS
Benefits to adopting a NoSQL database over RDBMS
Concepts and characteristics of NoSQL databases
Popular NoSQL Databases (Hbase and MongoDB)
Working with NOSQL Databases
Schema definition -Tables, Columns, Data types, Sequences, Partitions, Procedures, Functions.
DDL/DCL/DML Scripts
Overview on Cassandra
Q u a l i t y i s n e v e r a n a c c i d e n t ; i t i s a l w a y s t h e r e s u l t o f g o o d p l a n n i n g ,
i n t e l l i g e n t d i r e c t i o n a n d s k i l l f u l e x e c u t i o n – J o h n R u s k i n
Page 8
Module 10: Apache Spark
Apache Spark
Spark Introduction and Components
Spark Architecture
RDD’s in Spark
Apache Spark SQL
Spark SQL Overview
Spark SQL Libraries
Features of Spark SQL
Querying using Spark SQL
Scala Introduction
Scala Installation
Basic Types and Operators in scala
Scala Arrays and Strings
Conditional statements and loops
Classes and Objects
Scala Functions and Closures
Scala Traits
Scala File Input and Output
Q u a l i t y i s n e v e r a n a c c i d e n t ; i t i s a l w a y s t h e r e s u l t o f g o o d p l a n n i n g ,
i n t e l l i g e n t d i r e c t i o n a n d s k i l l f u l e x e c u t i o n – J o h n R u s k i n
Page 9
7. Contact Us
Email: info.busyqa@gmail.com
Twitter: twitter.com/busyqa
Facebook: www.facebook.com/busyqa
Q u a l i t y i s n e v e r a n a c c i d e n t ; i t i s a l w a y s t h e r e s u l t o f g o o d p l a n n i n g ,
i n t e l l i g e n t d i r e c t i o n a n d s k i l l f u l e x e c u t i o n – J o h n R u s k i n
Page 10