Добро пожаловать в Scribd!

How Hadoop Works

Загружено:

0% нашли этот документ полезным (0 голосов)

38 просмотров26 страниц

MapReduce is a data-intensive database system based on the HDFS filesystem. A map function reduce function runs this program as a MapReduce job. Job Submission initialization Scheduling Execution Map Task Sort Buffer Reduce Tasks Fault tolerance is of high priority in the MapReduce framework.

Исходное описание:

Оригинальное название

How Hadoop Works[1]

Авторское право

Доступные форматы

PPT, PDF, TXT или читайте онлайн в Scribd

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Пожаловаться на этот документ

Авторское право:

Attribution Non-Commercial (BY-NC)

Доступные форматы

Скачайте в формате PPT, PDF, TXT или читайте онлайн в Scribd

Отметить как неприемлемый контент

0% нашли этот документ полезным (0 голосов)

38 просмотров26 страниц

How Hadoop Works

Загружено:

timsmith_s

Авторское право:

Attribution Non-Commercial (BY-NC)

Доступные форматы

Скачайте в формате PPT, PDF, TXT или читайте онлайн в Scribd

Отметить как неприемлемый контент

Перейти к странице

Вы находитесь на странице: 1из 26

Поиск в документе

CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop)

Shivnath Babu

Lifecycle of a MapReduce Job

Map function

Reduce function

Run this program as a MapReduce job

Lifecycle of a MapReduce Job

Map function

Reduce function

Run this program as a MapReduce job

Lifecycle of a MapReduce Job

Time

Input Splits

Map Wave 1

Map Wave 2

Reduce Wave 1

Reduce Wave 2

Components in a Hadoop MR Workflow

Next few slides are from: http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hadoop-pig

Job Submission

Initialization

Scheduling

Execution

Map Task

Sort Buffer

Reduce Tasks

Quick Overview of Other Topics (Will Revisit Them Later in the Course)
Dealing with failures Hadoop Distributed FileSystem (HDFS) Optimizing a MapReduce job

Dealing with Failures and Slow Tasks

What to do when a task fails?
Try again (retries possible because of idempotence) Try again somewhere else Report failure

What about slow tasks: stragglers

Run another version of the same task in parallel. Take results from the one that finishes first What are the pros and cons of this approach?

Fault tolerance is of high priority in the MapReduce framework

HDFS Architecture

Lifecycle of a MapReduce Job

Time

Input Splits

Map Wave 1

Map Wave 2

Reduce Wave 1

Reduce Wave 2

How are the number of splits, number of map and reduce tasks, memory allocation to tasks, etc., determined?

Job Configuration Parameters

190+ parameters in Hadoop Set manually or defaults are used

Hadoop Job Configuration Parameters

Image source: http://www.jaso.co.kr/265

Tuning Hadoop Job Conf. Parameters

Do their settings impact performance? What are ways to set these parameters?
Defaults -- are they good enough? Best practices -- the best setting can depend on data, job, and cluster properties Automatic setting

Experimental Setting
Hadoop cluster on 1 master + 16 workers Each node:
2GHz AMD processor, 1.8GB RAM, 30GB local disk Relatively ill-provisioned! Xen VM running Debian Linux Max 4 concurrent maps & 2 reduces Maximum map wave size = 16x4 = 64 Maximum reduce wave size = 16x2 = 32

Not all users can run large Hadoop clusters:

Can Hadoop be made competitive in the 10-25 node, multi GB to TB data size range?

Parameters Varied in Experiments

Hadoop 50GB TeraSort

Varying number of reduce tasks, number of concurrent sorted streams for merging, and fraction of map-side sort buffer devoted to metadata storage

Hadoop 50GB TeraSort

Varying number of reduce tasks for different values of the fraction of map-side sort buffer devoted to metadata storage (with io.sort.factor = 500)

Hadoop 50GB TeraSort

Varying number of reduce tasks for different values of io.sort.factor (io.sort.record.percent = 0.05, default)

Hadoop 75GB TeraSort

1D projection for io.sort.factor=500

Automatic Optimization? (Not yet in Hadoop)

Map Map Wave 1 Wave 2

Map Wave 3

Reduce Wave 1

Shuffle

Reduce Wave 2 What if #reduces increased to 9?

Map Map Wave 1 Wave 2

Map Wave 3

Reduce Reduce Wave 1 Wave 2

Reduce Wave 3

Вам также может понравиться

NEMSIS Data Elements and Values v2.2.1.1.2009 002
Документ106 страниц
NEMSIS Data Elements and Values v2.2.1.1.2009 002
timsmith_s
Оценок пока нет
Combined Ranking and Regression
Документ9 страниц
Combined Ranking and Regression
timsmith_s
Оценок пока нет
R Handson
Документ28 страниц
R Handson
Alexandre Santos
Оценок пока нет
Uis White Paper
Документ9 страниц
Uis White Paper
timsmith_s
Оценок пока нет
Two in A Box
Документ8 страниц
Two in A Box
timsmith_s
Оценок пока нет
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
От Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Рейтинг: 4 из 5 звезд
4/5 (5784)
The Little Book of Hygge: Danish Secrets to Happy Living
От Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Рейтинг: 3.5 из 5 звезд
3.5/5 (399)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
От Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Рейтинг: 4 из 5 звезд
4/5 (890)
Shoe Dog: A Memoir by the Creator of Nike
От Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Рейтинг: 4.5 из 5 звезд
4.5/5 (537)
Grit: The Power of Passion and Perseverance
От Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Рейтинг: 4 из 5 звезд
4/5 (587)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
От Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Рейтинг: 4.5 из 5 звезд
4.5/5 (474)
The Yellow House: A Memoir (2019 National Book Award Winner)
От Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Рейтинг: 4 из 5 звезд
4/5 (98)
Team of Rivals: The Political Genius of Abraham Lincoln
От Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Рейтинг: 4.5 из 5 звезд
4.5/5 (234)
Fear: Trump in the White House
От Everand
Fear: Trump in the White House
Bob Woodward
Рейтинг: 3.5 из 5 звезд
3.5/5 (738)
Never Split the Difference: Negotiating As If Your Life Depended On It
От Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Рейтинг: 4.5 из 5 звезд
4.5/5 (838)
The Emperor of All Maladies: A Biography of Cancer
От Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Рейтинг: 4.5 из 5 звезд
4.5/5 (271)
Yes Please
От Everand
Yes Please
Amy Poehler
Рейтинг: 4 из 5 звезд
4/5 (1888)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
От Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Рейтинг: 3.5 из 5 звезд
3.5/5 (231)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
От Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Рейтинг: 4.5 из 5 звезд
4.5/5 (265)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
От Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Рейтинг: 4.5 из 5 звезд
4.5/5 (344)
Principles: Life and Work
От Everand
Principles: Life and Work
Ray Dalio
Рейтинг: 4 из 5 звезд
4/5 (599)
On Fire: The (Burning) Case for a Green New Deal
От Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Рейтинг: 4 из 5 звезд
4/5 (72)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
От Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Рейтинг: 3.5 из 5 звезд
3.5/5 (2219)
Rise of ISIS: A Threat We Can't Ignore
От Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Рейтинг: 3.5 из 5 звезд
3.5/5 (137)
The Unwinding: An Inner History of the New America
От Everand
The Unwinding: An Inner History of the New America
George Packer
Рейтинг: 4 из 5 звезд
4/5 (45)
Angela's Ashes: A Memoir
От Everand
Angela's Ashes: A Memoir
Frank McCourt
Рейтинг: 4.5 из 5 звезд
4.5/5 (440)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
От Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Рейтинг: 4 из 5 звезд
4/5 (1090)
The Glass Castle: A Memoir
От Everand
The Glass Castle: A Memoir
Jeannette Walls
Рейтинг: 4.5 из 5 звезд
4.5/5 (1711)
John Adams
От Everand
John Adams
David McCullough
Рейтинг: 4.5 из 5 звезд
4.5/5 (2409)
Steve Jobs
От Everand
Steve Jobs
Walter Isaacson
Рейтинг: 4.5 из 5 звезд
4.5/5 (806)
Bad Feminist: Essays
От Everand
Bad Feminist: Essays
Roxane Gay
Рейтинг: 4 из 5 звезд
4/5 (1015)
The Outsider: A Novel
От Everand
The Outsider: A Novel
Stephen King
Рейтинг: 4 из 5 звезд
4/5 (1800)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
От Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Рейтинг: 4.5 из 5 звезд
4.5/5 (119)
The Light Between Oceans: A Novel
От Everand
The Light Between Oceans: A Novel
M.L. Stedman
Рейтинг: 4.5 из 5 звезд
4.5/5 (789)
The Woman in Cabin 10
От Everand
The Woman in Cabin 10
Ruth Ware
Рейтинг: 3.5 из 5 звезд
3.5/5 (2322)
A Man Called Ove: A Novel
От Everand
A Man Called Ove: A Novel
Fredrik Backman
Рейтинг: 4.5 из 5 звезд
4.5/5 (4609)
Wolf Hall: A Novel
От Everand
Wolf Hall: A Novel
Hilary Mantel
Рейтинг: 4 из 5 звезд
4/5 (3811)
Brooklyn: A Novel
От Everand
Brooklyn: A Novel
Colm Tóibín
Рейтинг: 3.5 из 5 звезд
3.5/5 (1937)
The Perks of Being a Wallflower
От Everand
The Perks of Being a Wallflower
Stephen Chbosky
Рейтинг: 4.5 из 5 звезд
4.5/5 (2099)
A Tree Grows in Brooklyn
От Everand
A Tree Grows in Brooklyn
Betty Smith
Рейтинг: 4.5 из 5 звезд
4.5/5 (1929)
Sing, Unburied, Sing: A Novel
От Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Рейтинг: 4 из 5 звезд
4/5 (1103)
Little Women
От Everand
Little Women
Louisa May Alcott
Рейтинг: 4 из 5 звезд
4/5 (104)
The Constant Gardener: A Novel
От Everand
The Constant Gardener: A Novel
John le Carré
Рейтинг: 3.5 из 5 звезд
3.5/5 (104)
The Art of Racing in the Rain: A Novel
От Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Рейтинг: 4 из 5 звезд
4/5 (4193)
Manhattan Beach: A Novel
От Everand
Manhattan Beach: A Novel
Jennifer Egan
Рейтинг: 3.5 из 5 звезд
3.5/5 (791)
Her Body and Other Parties: Stories
От Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Рейтинг: 4 из 5 звезд
4/5 (821)
Assessing The Effectiveness of Comic-Style Illustrations For Promoting Environmental Sustainability
Документ10 страниц
Assessing The Effectiveness of Comic-Style Illustrations For Promoting Environmental Sustainability
marikuza
Оценок пока нет
Quantitative & Qualitative Research: Adesas, Kristel Mae M. & Arce, Jeoffrey J
Документ30 страниц
Quantitative & Qualitative Research: Adesas, Kristel Mae M. & Arce, Jeoffrey J
Mhoer Coertib
Оценок пока нет
Recovering After The Loss of Online Redo Log Files: Scenarios
Документ6 страниц
Recovering After The Loss of Online Redo Log Files: Scenarios
shaikali1980
Оценок пока нет
Using ODBC With Microsoft SQL Server PDF
Документ66 страниц
Using ODBC With Microsoft SQL Server PDF
davy_7569
0% (1)
Resetlogs and Noresetlogs
Документ1 страница
Resetlogs and Noresetlogs
dbareddy
Оценок пока нет
Working With Stored Procedure Parameters
Документ3 страницы
Working With Stored Procedure Parameters
api-3747025
100% (1)
Developmental Research For Information and Technology
Документ21 страница
Developmental Research For Information and Technology
Zelop Drew
100% (2)
En - SELECT Statement
Документ2 страницы
En - SELECT Statement
Mukul
Оценок пока нет
Ultimate Guide To Observability
Документ16 страниц
Ultimate Guide To Observability
Shruthi
Оценок пока нет
Advertisement Effectiveness of Amul Pure Ghee: A Synopsis For Summer Training Project Report On
Документ13 страниц
Advertisement Effectiveness of Amul Pure Ghee: A Synopsis For Summer Training Project Report On
Absar Khan
Оценок пока нет
Chapter 6: The Relational Algebra and Relational Calculus: Answers To Selected Exercises
Документ4 страницы
Chapter 6: The Relational Algebra and Relational Calculus: Answers To Selected Exercises
saumya tiwari
Оценок пока нет
Visual Basic 6 Client-Server Programming Gold Book
Документ830 страниц
Visual Basic 6 Client-Server Programming Gold Book
Rishi
Оценок пока нет
Pivot 4A Budget of Work (Bow) For Senior High School - Applied Subjects
Документ10 страниц
Pivot 4A Budget of Work (Bow) For Senior High School - Applied Subjects
Arnel Sumagaysay Gallo
Оценок пока нет
DIVQ.E212323 - Circuit Breakers, Molded Case and Circuit-Breaker Enclosures - UL Product IQ
Документ1 страница
DIVQ.E212323 - Circuit Breakers, Molded Case and Circuit-Breaker Enclosures - UL Product IQ
Rahul Kumar Singh (IPR and Product Safety Compliance)
Оценок пока нет
Database Approach Advantages Over File Processing
Документ2 страницы
Database Approach Advantages Over File Processing
Jasjeet Singh
Оценок пока нет
Sip - Dukes - Sashmit Tripathi-053
Документ15 страниц
Sip - Dukes - Sashmit Tripathi-053
ashish_phad
Оценок пока нет
Mobile Ip
Документ19 страниц
Mobile Ip
Manvika Nadella
Оценок пока нет
Effective Control Flow Execution with 7 Different Types of Foreach Loop Enumerator
Документ38 страниц
Effective Control Flow Execution with 7 Different Types of Foreach Loop Enumerator
exbis
Оценок пока нет
Schemes of Work (IT Unit 3)
Документ44 страницы
Schemes of Work (IT Unit 3)
Nirali Soni
Оценок пока нет
SAP Table Types I. Transparent Tables (BKPF, VBAK, VBAP, KNA1, COEP)
Документ6 страниц
SAP Table Types I. Transparent Tables (BKPF, VBAK, VBAP, KNA1, COEP)
Mohamed Asif Ali H
Оценок пока нет
Measuring Customer Satisfaction at Maihar Cement
Документ59 страниц
Measuring Customer Satisfaction at Maihar Cement
Aman Mishra
Оценок пока нет
Data Migration With Oracle DataPump Tool
Документ2 страницы
Data Migration With Oracle DataPump Tool
Editor IJTSRD
Оценок пока нет
Action Script Access To XMP
Документ16 страниц
Action Script Access To XMP
Theo Jager
Оценок пока нет
Layered CRF Approach for Accurate & Efficient IDS
Документ7 страниц
Layered CRF Approach for Accurate & Efficient IDS
Jayaraman Anandan
Оценок пока нет
PPDM Lite Architectural Principles
Документ17 страниц
PPDM Lite Architectural Principles
Jose Casanova
Оценок пока нет
Hping3 Cheatsheet v1.0-EnG
Документ1 страница
Hping3 Cheatsheet v1.0-EnG
Spiros Fraganastasis
Оценок пока нет
Lesson 11 Twitter Nested SQL
Документ2 страницы
Lesson 11 Twitter Nested SQL
deadly.aki
Оценок пока нет
Sort Doc#1
Документ156 страниц
Sort Doc#1
sysadm_038
Оценок пока нет
SQLite Database
Документ6 страниц
SQLite Database
Ashley Jaswal
Оценок пока нет
Hitachi Accelerated Flash 2.0
Документ26 страниц
Hitachi Accelerated Flash 2.0
Kurnia
Оценок пока нет