Вы находитесь на странице: 1из 21

Hadoop Fundamentals I

Version 2: Updated July 2013


Hadoop Fundamentals I teaches you the basics of Apache Hadoop and the concept of Big Data. The materials and software used in this course are all FREE!. This is the second version of this course. Review the What's New? section for a list of changes made from the version 1 of this course.

Welcome!
About this course Page About your instructors URL What's New? Page Taking this course, a guided tour (7:01) URL Taking this course, a guided tour - Transcript URL

Technical assistance
Course forum

Reading material and references


Hadoop: The Definitive Guide (May 2012) URL Hadoop Essentials - A Quantitative Approach (Oct 2012) URL Hadoop in Action (Dec 2010) URL 1

Lesson 1

Lesson 1: Introduction to Hadoop


Learning objectives
Understand what Hadoop is Understand what Big Data is Learn about other open source software related to Hadoop Understand how Big Data solutions can work on the Cloud

Instructions
Review all the videos provided Complete the lab

Videos
What is Hadoop? - Part 1 (3:49) URL What is Hadoop? - Part 2 (4:31) URL What is Hadoop? - Transcript URL

Hands-on lab - Creating your own Hadoop cluster


We will use IBM InfoSphere BigInsights (BigInsights) software to work with Hadoop. BigInsights is available in different editions; this course uses the Quick Start Edition which is free, has no time usage limits and no data size usage limits. Step 1: Choose any of these options to work with BigInsights Option 1: Download and install BigInsights Download BigInsights Quick Start Edition (free to use) URL

Option 2: Use BigInsights on the Amazon Cloud Review the "Hadoop and Amazon Cloud" course (BD005EN) for details URL Option 3: Use BigInsights on the IBM SmartCloud Enterprise Review the "Hadoop and the IBM SmartCloud Enterprise" course (BD006EN) for details URL Option 4: Download and use the supplied VMWare image Download the 64-bit VMWare image URL Download and install free VMWare Player to play VMWare image URL Use the supplied VMWare image - User ID / password URL Step 2: Set up lab input files Download and copy the lab input files to the right locations Page Lab Solution Lab solution (6:41) URL 2

Lesson 2

Lesson 2: Hadoop architecture


Learning objectives
Understand the main Hadoop components Learn how HDFS works List data access patterns for which HDFS is designed Describe how data is stored in an HDFS cluster

Instructions
Review all the videos provided Complete the lab

Videos
Hadoop architecture and HDFS (8:01) URL Hadoop architecture and HDFS - Transcript URL Topology awareness and writing to HDFS (2:37) URL Topology awareness and writing to HDFS - Transcript URL HDFS Command Line (4:28) URL HDFS Command Line - Transcript URL

Hands-on lab
Exploring HDFS - Lab instructions URL Lab solution (5:45) URL 3

Lesson 3

Lesson 3: Introduction to MapReduce


Learning objectives
Understand the concepts of map and reduce operations Describe how Hadoop executes a MapReduce job List MapReduce fault tolerance and scheduling features

List MapReduce fundamental data types Describe a MapReduce data flow

Instructions
Review all the videos provided Complete the lab

Videos
Map and Reduce operations - Introduction (4:21) URL Map and Reduce operations - Introduction - Transcript URL Submitting a MapReduce job (1:23) URL Submitting a MapReduce job - Transcript URL Distributed mergesort engine (1:11) URL Distributed mergesort engine - Transcript URL Fundamental data types (2:09) URL Fundamental data types - Transcript URL Fault tolerance (1:04) URL Fault tolerance - Transcript URL Scheduling and task execution (1:51) URL Scheduling and task execution - Transcript URL

Hands-on lab
Using MapReduce - Lab instructions URL 4

Lesson 4

Lesson 4: Querying data


Learning objectives
Understand how to work with Pig, Hive and JAQL

Instructions
Review all the videos provided Complete the lab

Videos
An overview of Pig, Hive and Jaql (3:23) URL An overview of Pig, Hive and Jaql - Transcript URL Working with Pig (7:43) URL Working with Pig - Transcript URL Working with Hive (9:34) URL Working with Hive - Transcript URL Working with JAQL (4:28) URL Working with JAQL - Transcript URL

Hands-on lab
Working with Jaql, Pig, and Hive - Lab instructions URL Working with Jaql, Pig and Hive - Lab solution Part 1 (5:01) URL Working with Jaql, Pig and Hive - Lab solution Part 2 (4:50) URL Working with Jaql, Pig and Hive - Lab solution Part 3 (5:07) URL

Working with Jaql, Pig and Hive - Lab solution Part 4 (4:35) URL 5

Lesson 5

Lesson 5: Hadoop administration


Learning objectives
Understand how to add and remove nodes in a Hadoop cluster Learn how to monitor the health status of your cluster Learn how to configure Hadoop

Instructions
Review all the videos provided Complete the lab

Videos
Adding and removing nodes to the cluster (7:46) URL Verifying cluster health & stopping/starting somponents (2:41) URL Configuring Hadoop - Part 1 (7:44) URL Configuring Hadoop - Part 2 (2:52) URL Setting up rack topology (1:52) URL

Hands-on lab
Hadoop Administration - Lab instructions URL Hadoop Administration - Lab solution Part 1 (5:29) URL Hadoop Administration - Lab solution Part 2 (4:59) URL Hadoop Administration - Lab solution Part 3 (4:25) URL Hadoop Administration - Lab solution Part 4 (3:55) URL 6

Lesson 6

Lesson 6: Moving data into Hadoop


Learning objectives
Understand how to move data into Hadoop using Flume

Instructions
Review all the videos provided Complete the lab

Videos
Introduction to Flume (4:42) URL Introduction to Flume - Transcript URL Flume modes of operation and configuration (3:39) URL Flume modes of operation and configuration - Transcript URL

Hands-on lab
Data Movement - Lab instructions URL

Test

Test your knowledge


Test objectives and instructions Page Take the test! Quiz Evaluation Form: Please provide feedback Assignment Print your certificate! Not available until the activity Evaluation Form: Please provide feedback is marked complete. Not available until you achieve a required score in Take the test!.

SQL Access for Hadoop


SQL Access for Hadoop teaches you how to take advantage of the SQL language to access big data stored in HDFS or HBase using SQL. The course presents the different alternatives for SQL access, such as Hive, Impala, and Big SQL. It explains the similarities and differences between these three technologies.
The course includes hands on exercises and access to a Hadoop cluster with Hive, HBase, HDFS and Big SQL, so you can try these technologies first hand. At the end of the course you will understand the different alternatives for accessing Big Data with SQL, and you will gain hands-on experience with these technologies.

Welcome!
About this course Page About your instructors URL Taking this course, a guided tour (7:01) URL Taking this course, a guided tour - Transcript URL

Technical assistance
Course forum

Reading material and references


Hadoop in Action URL 1

Lesson 1

Lesson 1: Introduction to Hive, Big SQL and Impala


Learning objectives
Understand Hive, Big SQL and Impala concepts, terminology and architecture Understand similarities and differences between these technologies

Instructions
Review all the videos provided Complete the lab

Videos
Lesson Outline (0:57) URL Lesson Outline - Transcript URL SQL for Big Data: Overview (5:43) URL SQL for Big Data - Transcript URL Introduction to Hive (8:31) URL Introduction to Hive - Transcript URL Introduction to Impala (7:08) URL Introduction to Impala - Transcript URL Introduction to Big SQL (9:38) URL Introduction to Big SQL - Transcript URL

Hands-on lab - Accessing a Hadoop Cluster on the Cloud


Follow the steps in this section to gain access to a Hadoop Cluster on the Cloud. Accessing the Cloud Based Environment for Exercises (6:30) URL Accessing the Cloud Based Environment for Exercises - Transcript URL Using putty with the IM Demo Cloud (5:17) URL Using putty with the IM Demo Cloud - Transcript URL 2

Lesson 2

Lesson 2: Working with SQL using Hive


Learning objectives
Learn how to create tables and run HiveQL queries from the command line

Instructions
Review all the videos provided

Videos
Lesson outline (00:45) URL Lesson Outline - Transcript URL Exploring and Configuring the Hive environment (5:35) URL Exploring and Configuring the Hive Environment - Transcript URL Hive Tables (7:45) URL Hive Tables - Transcript URL Querying data with Hive (6:28) URL Querying data with Hive - Transcript URL

Hands-on lab
Lab instructions - Working with Hive URL 3

Lesson 3

Lesson 3: Working with SQL using Big SQL


Lab objectives
Learn how to configure your Big SQL environment Learn how to create tables and run Big SQL queries Understand how to work with the JSQSH command line interface Understand how to work with a JDBC or ODBC client

Instructions
Watch the videos in this lesson

Review the lab instructions

Videos
Exploring the Big SQL environment (6:05) URL Exploring the Big SQL Environment - Transcript URL

Starting, stopping and monitoring the Big SQL server process (4:14) URL Starting, stopping and monitoring the Big SQL server process - Transcript URL Configuring the Big SQL server (4:57) URL Configuring the Big SQL server - Transcript URL Getting started with JSQSH and connecting to a data source (10:56) URL Getting started with JSQSH and connecting to a data source - Transcript URL Creating and dropping schemas and tables (6:14) URL Creating and dropping schemas and tables - Transcript URL Loading tables and running queries (15:00) URL Loading tables and running queries - Transcript URL Working with Complex Data Types (7:19) URL Working with Complex Data Types - Transcript URL Connecting and running queries using JDBC and Eclipse(11:08) URL Connecting and running queries using JDBC and Eclipse - Transcript URL

Hands-on lab
Lab instructions - Working with Big SQL URL 4

Lesson 4

Lesson 4: Accessing HBase with Hive and Big SQL


Learning objectives
Understand how to access HBase with Hive Understand how to access HBase with Big SQL Learn how to deal with HBase encoding and storage

Instructions
Review all the videos provided

Complete the lab

Videos
HBase Support: Overview (8:22) URL HBase Support: Overview - Transcript URL Working with Big SQL and HBase (15:01) URL Working with Big SQL and HBase - Transcript URL

Hands-on lab
Accessing HBase with SQL URL 5

Lesson 5

Lesson 5: System Tables and Troubleshooting


Learning objectives

Understand how to work with Catalog and System Tables with Big SQL Learn how to troubleshoot a problem in Big SQL

Instructions
Review all the videos provided Complete the labs

Videos
Troubleshooting in Big SQL (5:25) URL Troubleshooting in Big SQL - Transcript URL Inspecting Catalog and System Tables in Big SQL (3:11) URL Inspecting Catalog and System Tables in Big SQL - Transcript URL 6

Test

Test your knowledge


Test objectives and instructions Page Take the test! Quiz Print your certificate! Not available until you achieve a required score in Take the test!.

Stream Computing I * Preview *


Stream Computing I teaches you the basics of Stream Computing using IBM InfoSphere Streams. This is the first in a series of two courses. The course and the materials are all FREE. Trial software of InfoSphere Streams will be used for the labs.

Welcome!
About this course Page Taking this course, a guided tour (7:01) URL Taking this course, a guided tour - Transcript URL

Technical assistance
Course forum (Input your feedback)

Download the course materials


Download the VMWare Image (with a 90 day trial of Streams 3.1) for exercises URL

Reading material and references


IBM InfoSphere Streams: Assembling Continuous Insight in the Information Revolution URL 1

Lesson 1

Lesson 1: Introduction to Stream Computing


Learning objectives
Understand what Stream Computing is all about

Instructions
Review all the videos provided Complete the lab

Videos
What is Stream Computing? (5:23) URL What is Stream Computing? - Transcript URL The evolution of analytics (4:30) URL The evolution of analytics - Transcript URL Event processing vs stream computing (3:01) URL Event processing vs. stream processing - Transcript URL Use cases for stream computing (3:09) URL Use cases for stream computing - Transcript URL Introduction to IBM InfoSphere Streams (7:24) URL Introduction to IBM InfoSphere Streams - Transcript URL

Hands-on lab - Downloading and installing InfoSphere Streams


We will use IBM's InfoSphere Streams Trial software to work with Stream Computing. This trial software can be used for 90 days and has all the features of the fee-based version. Download InfoSphere Streams (trial version) URL Install InfoSphere Streams - Instructions URL 2

Lesson 2

Lesson 2: Streams concepts and terms


Learning objectives
Understand Streams concepts such as instances, hosts, operators, PEs, and jobs.

Instructions
Review all the videos provided Complete the lab

Videos
Streams instances and hosts (3:46) URL Streams instances and hosts - Transcript URL Operators and Processing Elements (5:27) URL Operators and Processing Elements - Transcript URL Components of Streams (4:36) URL Components of Streams - Transcript URL Streams Studio IDE (3:53) URL 3

Lesson 3

Lesson 3: Streams applications


Learning objectives
Working with SPL Get started with Streams applications

Instructions
Review all the videos provided Complete the lab

Videos
What is the Streams Processing Language (SPL)? (5:26) URL What is the Streams Processing Language (SPL) - Transcript URL 4

Lesson 4

Lesson 4: Composing an Application in

Streams
Learning objectives
Understand how to work with Streams operators such as Functor, Aggregate, InetSource, and more!

Instructions
Review all the videos provided Complete the lab

Videos
Setting up the environment and the inetSource operator (7:24) URL Using the custom operator (9:33) URL Using the filter operator (6:34) URL Using the sort operator and tumbling windows (10:43) URL Extracting values using Aggregate (7:42) URL Working with the Join operator (14:17) URL Selecting out columns using Functor operator (9:44) URL Building an entire application with Drag and Drop in Streams 3.0 (36:17) URL 5

Lesson 5

Lesson 5: Deploying Streams Applications


Learning objectives
Understand how to deploy a Stream application

Instructions
Review all the videos provided Complete the lab

Videos
Runtime architecture and introduction to topologies (5:36) URL Runtime architecture and introduction to topologies - Transcript URL Working with instances (2:00) URL Working with instances - Transcript URL Using StreamTool (4:52) URL Using StreamTool - Transcript URL 6 Not available 7 Not available

Spreadsheet-like Analytics
Spreadsheet-like Analytics teaches you how to explore big data and takes you into a journey of discovery without having to write a single line of code. Using BigSheets, a tool developed by IBM Research, you can perform analytics on big data with an interface similar to a regular spreadsheet. BigSheets masks all complexities of processing big data, and let's analysts and managers concentrate on getting the analytics they want without having to know how to code.

Welcome!
About this course Page Taking this course, a guided tour (7:01) URL Taking this course, a guided tour - Transcript URL

Technical assistance
Course forum 1

Lesson 1

Lesson 1: Getting started with BigSheets


Learning objectives
Understand what BigSheets is Learn who are the target users for BigSheets

Instructions
Review all the videos provided

Videos
Introduction to BigSheets (3:49) URL What can you do with BigSheets? (1:11) URL Working with BigSheets (3:31) URL A tour of BigSheets - Part 1 (2:59) URL A tour of BigSheets - Part 2 (3:01) URL 2

Lesson 2

Lesson 2: Discovering what BigSheets can do


Learning objectives
Using a simple scenario, understand BigSheets features and capabilities

Instructions
Review all the videos provided

Videos
Gathering input data from an application (4:04) URL Manipulating data in BigSheets (3:26) URL Overview of other BigSheets scenarios (2:31) URL 3

Lesson 3

Lesson 3: Deep Dive into BigSheets


Learning objectives
Exploring data by adding sheets Understanding workflow and workbook diagrams Monitoring BigSheets in the Dashboard

Instructions
Review all the videos provided Complete the lab

Videos
Exploring Data by Adding Sheets - Part 1 (6:32) URL Exploring Data by Adding Sheets - Part 1 - Transcript URL Exploring Data by Adding Sheets - Part 2 (7:40) URL Exploring Data by Adding Sheets - Part 2 - Transcript URL Exploring Data by Adding Sheets - Part 3 (8:02) URL Exploring Data by Adding Sheets - Part 3 - Transcript URL Exploring Data by Adding Sheets - Part 4 (7:58) URL Exploring Data by Adding Sheets - Part 4 - Transcript URL Exploring Data by Adding Sheets - Part 5 (6:46) URL Exploring Data by Adding Sheets - Part 5 - Transcript URL Understanding Workflow and Workbook Diagrams. (5:04) URL Understanding Workflow and Workbook Diagrams - Transcript URL Monitoring BigSheets in Dashboard (4:26) URL Monitoring BigSheets in Dashboard - Transcript URL 4

Lesson 4

Lesson 4: A complete case study using BigSheets


Learning objectives
Understand how to work with BigSheets using a complete case study

Instructions
Review all the videos provided

Videos

BigSheets and the case study overview (2:12) URL Case Study - Part 1 (3:49) URL Case Study - Part 2 (2:42) URL Case Study - Part 3 (2:42) URL Case Study - Part 4 (2:42) URL Case Study - Part 5 (2:42) URL Case Study - Part 6 (1:13) URL 5 Not available 6 Not available 7 Not available

Java Fundamentals *Preview*


Brought to you by SciSpike (www.scispike.com) Java Fundamentals teaches you the basics of the Java Programming Language. The skills you gain can also help you with Big Data technologies since MapReduce jobs in Hadoop can be written in Java.

Course Feedback (help us complete developing this course!)


Course forum (input your feedback) 1

Lesson 1

Lesson 1: Java overview


Learning objectives
Learn about the history of Java Understand what JVM, JRE, JDK, and Java APIs are Learn about Java Editions

Instructions
Complete all the presentations

Presentations
Java Overview SCORM package 2

Lesson 5

Lesson 5: Packages and Access Control


Learning objectives
Understand what packages are Learn about packages naming convention Learn about access level modifiers (private, protected, public) Understand the import statement

Instructions
Complete all the presentations

Presentations
Packages and Access Control SCORM package 3

Lesson 7

Lesson 7: Arrays
Learning objectives
Learn what arrays are Understand the syntax for arrays in Java Learn how to work with arrays Compare arrays to collections

Instructions
Complete all the presentations

Presentations
Arrays SCORM package 4

Lesson 10

Lesson 10: JavaBeans


Learning objectives
Learn what JavaBeans are Implementing the serializable interface Learn about JavaBeans properties Understand what is introspection

Instructions
Complete all the presentations

Presentations
JavaBeans SCORM package 5

Lesson 12

Lesson 12: Additional Features


Learning objectives
Learn about the enhanced for loop (foreach) Understand what is Autoboxing Learn about varargs Learn about static imports Understand how to work with annotations

Instructions
Complete all the presentations

Presentations

Additional Features SCORM package

Hadoop Reporting and Analysis


Brought to you by Jaspersoft (www.jaspersoft.com) Hadoop Reporting and Analysis teaches you how to build your own Hadoop/Big Data reports over relevant Hadoop technologies such as HBase, Hive, etc. It provides guidelines to choose between various reporting techniques: Direct Batch Reports, Live Exploration, and Indirect Batch Analysis. Hands-on labs are included using the free version of Jaspersoft and BigInsights (IBM's Hadoop distribution). All materials and software used are FREE!

Welcome!
About this course Page Taking this course, a guided tour (7:01) URL Taking this course, a guided tour - Transcript URL

Technical assistance
Course forum Instructions to Download Jaspersoft Software File Attachments Folder 1

Lesson 1

Lesson 1: Introduction to Reporting and Analysis on Hadoop


Learning objectives
- Understanding Why Reporting and Analysis on Hadoop is important - Approaches to Big Data reporting and analysis - Big Data Access Technologies for Reporting and Analysis - Business Intelligence and Hadoop Architecture

Instructions
- Review all the videos provided

Videos
Introduction to Reporting and Analytics on Hadoop (14:11) URL Introduction to Reporting and Analytics on Hadoop - Transcript URL 2

Lesson 2

Lesson 2: Direct Batch Reporting on

Hadoop
Learning objectives
- Understanding Direct Batch Reporting - Importance of Direct Batch Reporting on Hadoop - Guideline to choose Direct Batch Reporting approach - Creating a Direct Batch Report on Hadoop

Instructions
- Review all the videos provided - Complete the lab

Videos
Direct Batch Reporting (4:51) URL Direct Batch Reporting Demo (10:27) URL

Hands-on lab
Creating Direct batch reports for big data - Instructions URL Creating a big data direct batch report - Solution (11:36) URL 3

Lesson 3

Lesson 3: Live Exploration of Big Data


Learning objectives
- Understanding Live Exploration of Big Data - Guidelines to choose Live Exploration approach to Big Data analysis - Perform Live Exploration of Big Data on Hadoop

Instructions
- Review all the videos provided - Complete the lab

Videos
Live Exploration Reporting (5:22) URL Live Exploration Tutorial (10:43) URL

Hands-on lab
Practice Live Exploration URL Practice Live Exploration - Solution (12:56) URL 4

Lesson 4

Lesson 4: Indirect Batch Analysis on Hadoop


Learning objectives

- Understanding Indirect Batch Analysis on Hadoop - Guidelines to choose Indirect Batch Analysis approach - Perform Indirect Batch analysis on Big Data

Instructions
- Review all the videos provided - Complete the lab

Videos
Indirect Batch Analysis of Big Data (5:50) URL Indirect Batch Analysis of Big Data - Demo (4:47) URL

Hands-on lab
Indirect Batch Analysis - Lab Instructions URL Indirect Batch Analysis - Lab Solution (6:11) URL 5

Test

Test your knowledge


Test objectives and instructions Page Take the test! Quiz Print your certificate! Not available until you achieve a required score in Take the test!. 6

Evaluation Form

Evaluation form
Evaluation Form: Please provide feedback

Вам также может понравиться