Добро пожаловать в Scribd!

Unit Ii Iintroduction To Map Reduce

Загружено:

0% нашли этот документ полезным (0 голосов)

16 просмотров4 страницы

Traditional enterprise systems have centralized servers that create bottlenecks when processing large, scalable data. Google developed MapReduce to address this issue by dividing tasks into small parts assigned to multiple computers. The MapReduce algorithm contains two tasks: Map converts input data into key-value pairs, while Reduce combines the outputs of Map into a smaller set of data. For example, when analyzing tweets, MapReduce can tokenize, filter, count words, and aggregate the counts to manage large volumes of social media data across many servers.

Исходное описание:

xcvxc

Оригинальное название

UNIT II

Авторское право

Доступные форматы

DOCX, PDF, TXT или читайте онлайн в Scribd

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Пожаловаться на этот документ

Авторское право:

Доступные форматы

Скачайте в формате DOCX, PDF, TXT или читайте онлайн в Scribd

Отметить как неприемлемый контент

0% нашли этот документ полезным (0 голосов)

16 просмотров4 страницы

Unit Ii Iintroduction To Map Reduce

Загружено:

kathirdcn

Авторское право:

Доступные форматы

Скачайте в формате DOCX, PDF, TXT или читайте онлайн в Scribd

Отметить как неприемлемый контент

Перейти к странице

Вы находитесь на странице: 1из 4

Поиск в документе

UNIT II

IINTRODUCTION TO MAP REDUCE

Traditional Enterprise Systems normally have a centralized server to store and process data. The following
illustration depicts a schematic view of a traditional enterprise system. Traditional model is certainly not suitable to process
huge volumes of scalable data and cannot be accommodated by standard database servers. Moreover, the centralized
system creates too much of a bottleneck while processing multiple files simultaneously.

Google solved this bottleneck issue using an algorithm called MapReduce. MapReduce divides a task into
small parts and assigns them to many computers. Later, the results are collected at one place and integrated to
form the result dataset.

MAPREDUCE WORKING
The MapReduce algorithm contains two important tasks, namely Map and Reduce.
 The Map task takes a set of data and converts it into another set of data, where
individual elements are broken down into tuples (key-value pairs).
 The Reduce task takes the output from the Map as an input and combines
those data tuples (key-value pairs) into a smaller set of tuples.
The reduce task is always performed after the map job.Let us now take a
close look at each of the phases and try to understand their significance.

 Input Phase − Here we have a Record Reader that translates each record in an
input file and sends the parsed data to the mapper in the form of key-value pairs.
 Map − Map is a user-defined function, which takes a series of key-value pairs and
processes each one of them to generate zero or more key-value pairs.

 Intermediate Keys − The key-value pairs generated by the mapper are known as
intermediate keys.
 Combiner − A combiner is a type of local Reducer that groups similar data from
the map phase into identifiable sets. It takes the intermediate keys from the
mapper as input and applies a user-defined code to aggregate the values in a
small scope of one mapper. It is not a part of the main MapReduce algorithm; it
is optional.
 Shuffle and Sort − The Reducer task starts with the Shuffle and Sort step. It
downloads the grouped key-value pairs onto the local machine, where the
Reducer is running. The individual key-value pairs are sorted by key into a larger
data list. The data list groups the equivalent keys together so that their values
can be iterated easily in the Reducer task.
 Reducer − The Reducer takes the grouped key-value paired data as input and runs
a Reducer function on each one of them. Here, the data can be aggregated,
filtered, and combined in a number of ways, and it requires a wide
range of processing. Once the execution is over, it gives zero or more key-

value pairs to the final step.

 Output Phase − In the output phase, we have an output formatter that

translates the final key-value pairs from the Reducer function and
writes them onto a file using a record writer.
Let us try to understand the two tasks Map & Reduce with the help of a small diagram
MAPREDUCE-EXAMPLE
Let us take a real-world example to comprehend the power of MapReduce. Twitter receives
around 500 million tweets per day, which is nearly 3000 tweets per second. The following illustration
shows how Tweeter manages its tweets with the help of MapReduce.
As shown in the illustration, the MapReduce algorithm performs the following actions−
 Tokenize − Tokenizes the tweets into maps of tokens and writes them as
key- value pairs.
 Filter − Filters unwanted words from the maps of tokens and writes
the filtered maps as key-value pairs.

 Count − Generates a token counter per word.

 Aggregate Counters − Prepares an aggregate of similar counter values
into small manageable units.

Вам также может понравиться

Learn SAP Basis in 24 Hours
От Everand
Learn SAP Basis in 24 Hours
Alex Nordeen
Рейтинг: 4.5 из 5 звезд
4.5/5 (2)
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
От Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
Оценок пока нет
Optimization in Engineering Models and Algorithms
Документ422 страницы
Optimization in Engineering Models and Algorithms
Alexander Fierro
100% (3)
Map Reduce Tutorial-1
Документ7 страниц
Map Reduce Tutorial-1
jefferyleclerc
Оценок пока нет
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
Документ36 страниц
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
Mony Simonetti
Оценок пока нет
Why MapReduce
Документ8 страниц
Why MapReduce
punekarmanish_347129
Оценок пока нет
Map Reduce
Документ18 страниц
Map Reduce
Soni Nsit
Оценок пока нет
Fundamentals of MapReduce With Example
Документ2 страницы
Fundamentals of MapReduce With Example
Sanidhya Singh Rajawat
Оценок пока нет
What Is MapReduce
Документ6 страниц
What Is MapReduce
Sundaram yadav
Оценок пока нет
BDA-MapReduce (1) 5rfgy656yhgvcft6
Документ60 страниц
BDA-MapReduce (1) 5rfgy656yhgvcft6
Ayush Jha
Оценок пока нет
Unit-2 Bda Kalyan - Pagenumber
Документ15 страниц
Unit-2 Bda Kalyan - Pagenumber
rohit sai tech viewers
Оценок пока нет
Anatomy of A MapReduce Job
Документ5 страниц
Anatomy of A MapReduce Job
kumar
Оценок пока нет
Chapter 4 - Understanding Map Reduce Fundamentals
Документ45 страниц
Chapter 4 - Understanding Map Reduce Fundamentals
WEGENE ARGOW
Оценок пока нет
UNIT 3bda
Документ16 страниц
UNIT 3bda
Nagur Basha
Оценок пока нет
Hadoop Karunesh
Документ14 страниц
Hadoop Karunesh
Mukul Mishra
Оценок пока нет
Understand: The First Phase of Mapreduce Paradigm, What Is A Map/Mapper, What Is The Input To The
Документ5 страниц
Understand: The First Phase of Mapreduce Paradigm, What Is A Map/Mapper, What Is The Input To The
Anjali Adwani
Оценок пока нет
BDA Unit 3 Notes
Документ11 страниц
BDA Unit 3 Notes
duraisamy
Оценок пока нет
Paper Summary - MapReduce - Simplified Data Processing On Large Clusters (2004) - MeloSpace
Документ7 страниц
Paper Summary - MapReduce - Simplified Data Processing On Large Clusters (2004) - MeloSpace
Edward
Оценок пока нет
Hadoop (Mapreduce)
Документ43 страницы
Hadoop (Mapreduce)
Nisrine Mofakir
Оценок пока нет
BDA Unit 3 1
Документ37 страниц
BDA Unit 3 1
Jerald Ruban
Оценок пока нет
Lecture 10 MapReduce Hadoop
Документ37 страниц
Lecture 10 MapReduce Hadoop
DAVID HUMBERTO GUTIERREZ MARTINEZ
Оценок пока нет
Unit-2 (MapReduce-II)
Документ11 страниц
Unit-2 (MapReduce-II)
tripathineeharika
Оценок пока нет
Data Science Presentation
Документ20 страниц
Data Science Presentation
yadvendra dhakad
Оценок пока нет
Unit-2 (MapReduce-I)
Документ28 страниц
Unit-2 (MapReduce-I)
tripathineeharika
Оценок пока нет
Chapter4 - MapReduce
Документ29 страниц
Chapter4 - MapReduce
genace
Оценок пока нет
Practical 1: Data Mining and Business Intelligence Practical-1
Документ10 страниц
Practical 1: Data Mining and Business Intelligence Practical-1
Dhrupad Patel
Оценок пока нет
Unit 3 - Big Data Technologies
Документ42 страницы
Unit 3 - Big Data Technologies
prakash N
Оценок пока нет
Map Reduce
Документ10 страниц
Map Reduce
Vikas Sinha
Оценок пока нет
DSBDA Manual Assignment 11
Документ6 страниц
DSBDA Manual Assignment 11
kartiknikumbh11
Оценок пока нет
Map Reduce
Документ36 страниц
Map Reduce
Papai Rana
Оценок пока нет
What Is MapReduce in Hadoop
Документ5 страниц
What Is MapReduce in Hadoop
Rakesh Shaw
Оценок пока нет
Bda Expt5 - 60002190056
Документ5 страниц
Bda Expt5 - 60002190056
kr
Оценок пока нет
Unit 3
Документ27 страниц
Unit 3
Radhamani V
Оценок пока нет
MapReduce: Simplified Data Processing On Large Clusters
Документ13 страниц
MapReduce: Simplified Data Processing On Large Clusters
zzztimbo
100% (1)
Dean 08 Map Reduce
Документ7 страниц
Dean 08 Map Reduce
Lindsey Hoffman
Оценок пока нет
Unit 3 Bba
Документ11 страниц
Unit 3 Bba
rajendrameena172003
Оценок пока нет
Research Paper - Map Reduce - CSC3323
Документ16 страниц
Research Paper - Map Reduce - CSC3323
AliElabridi
Оценок пока нет
New Microsoft Office Word Document
Документ10 страниц
New Microsoft Office Word Document
pooja pooh
Оценок пока нет
BDA Lab 5
Документ6 страниц
BDA Lab 5
Mohit Gangwani
Оценок пока нет
MapReduce Tutorial
Документ32 страницы
MapReduce Tutorial
ShivanshuSingh
Оценок пока нет
Hadoop Interview Questions Faq
Документ14 страниц
Hadoop Interview Questions Faq
mihirhota
Оценок пока нет
Big Data Management
Документ5 страниц
Big Data Management
Keshav Chaudhary
Оценок пока нет
Map Reduce
Документ11 страниц
Map Reduce
Indra Kishor Chaudhary Avaiduwai
Оценок пока нет
Term Paper Java
Документ14 страниц
Term Paper Java
Muskan Bharti
Оценок пока нет
Unit 3
Документ14 страниц
Unit 3
Ankit Kumar Jha
Оценок пока нет
Map Red
Документ6 страниц
Map Red
20261A6757 VIJAYAGIRI ANIL KUMAR
Оценок пока нет
Mapreduce: Simpli - Ed Data Processing On Large Clusters
Документ4 страницы
Mapreduce: Simpli - Ed Data Processing On Large Clusters
Ibrahim Hamza
Оценок пока нет
Unit 3 MapReduce Part 1
Документ12 страниц
Unit 3 MapReduce Part 1
Ruparel Education Pvt. Ltd.
Оценок пока нет
BDA Experiment 3
Документ7 страниц
BDA Experiment 3
AYAAN Satkut
Оценок пока нет
Unit - III
Документ37 страниц
Unit - III
praneelp2000
Оценок пока нет
MapReduce Flow
Документ8 страниц
MapReduce Flow
Chilaka Kumar
Оценок пока нет
Understanding MapReduce
Документ15 страниц
Understanding MapReduce
gopikrishna
Оценок пока нет
PPT1 Module2 Hadoop Distribution
Документ23 страницы
PPT1 Module2 Hadoop Distribution
Hiran Suresh
Оценок пока нет
Big Data Analytics (2171607) : Chapter - 1 Mapreduce
Документ32 страницы
Big Data Analytics (2171607) : Chapter - 1 Mapreduce
Dhana Lakshmi Boomathi
Оценок пока нет
Hadoop Interview Questions Author: Pappupass Learning Resource
Документ16 страниц
Hadoop Interview Questions Author: Pappupass Learning Resource
Dheeraj Reddy
Оценок пока нет
Unit 4 BDA
Документ31 страница
Unit 4 BDA
Amritha
Оценок пока нет
BDA - II Sem - II Mid
Документ4 страницы
BDA - II Sem - II Mid
Polikanti Goutham
100% (1)
18mcs35e U4
Документ7 страниц
18mcs35e U4
jefferyleclerc
Оценок пока нет
Conceptual Programming: Conceptual Programming: Learn Programming the old way!
От Everand
Conceptual Programming: Conceptual Programming: Learn Programming the old way!
Avishek Sharma
Оценок пока нет
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
От Everand
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
Karl Josef Hensel
Оценок пока нет
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
От Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
Оценок пока нет
Internship-2020 Report - (18di36-Srivarsan.s)
Документ17 страниц
Internship-2020 Report - (18di36-Srivarsan.s)
kathirdcn
Оценок пока нет
A Topical Start-Up Guide Series On Emerging Topics On Educational Media and Technology
Документ12 страниц
A Topical Start-Up Guide Series On Emerging Topics On Educational Media and Technology
kathirdcn
Оценок пока нет
Web Lab Syllabus
Документ7 страниц
Web Lab Syllabus
kathirdcn
Оценок пока нет
Index: SL No Experiment Name
Документ23 страницы
Index: SL No Experiment Name
kathirdcn
Оценок пока нет
Lab Manual: ISC 440 (Web Programming 2)
Документ44 страницы
Lab Manual: ISC 440 (Web Programming 2)
kathirdcn
Оценок пока нет
Diploma Board Examination - December 2020
Документ2 страницы
Diploma Board Examination - December 2020
kathirdcn
Оценок пока нет
A Classification and Characterization of Security Threats in Cloud Computing
Документ28 страниц
A Classification and Characterization of Security Threats in Cloud Computing
kathirdcn
Оценок пока нет
Open Up Eclipse and Go To Menu Section
Документ5 страниц
Open Up Eclipse and Go To Menu Section
kathirdcn
Оценок пока нет
Security Challenges in Cloud Computing: January 2010
Документ16 страниц
Security Challenges in Cloud Computing: January 2010
kathirdcn
Оценок пока нет
OIT552-Cloud Computing
Документ10 страниц
OIT552-Cloud Computing
kathirdcn
50% (2)
Proud To Be Indian: Happy Independence DAY
Документ2 страницы
Proud To Be Indian: Happy Independence DAY
kathirdcn
Оценок пока нет
Disclosure of Security Policies, Compliance and Practices: The Cloud Service Provider Should Demonstrate
Документ15 страниц
Disclosure of Security Policies, Compliance and Practices: The Cloud Service Provider Should Demonstrate
kathirdcn
Оценок пока нет
RTOS Basics Semaphore
Документ3 страницы
RTOS Basics Semaphore
kathirdcn
Оценок пока нет
5.3 Task States and Scheduling
Документ3 страницы
5.3 Task States and Scheduling
kathirdcn
Оценок пока нет
Embedded Design Life Cycle
Документ4 страницы
Embedded Design Life Cycle
kathirdcn
Оценок пока нет
Embedded System Classification
Документ4 страницы
Embedded System Classification
kathirdcn
Оценок пока нет
Dump State
Документ9 страниц
Dump State
Giselle Silva Rocha
Оценок пока нет
Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj
Документ6 страниц
Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj
Md. Shameem Sharif
Оценок пока нет
Claroty Continuous Threat Detection Datasheet
Документ11 страниц
Claroty Continuous Threat Detection Datasheet
Tuan MA
Оценок пока нет
Workers Salary
Документ7 страниц
Workers Salary
Tika Warisman
Оценок пока нет
IT ICS Security Architecture Assessment - Questionnaire
Документ24 страницы
IT ICS Security Architecture Assessment - Questionnaire
Huzaifa Sayeed
Оценок пока нет
Noc18 mg42 Assignment4
Документ4 страницы
Noc18 mg42 Assignment4
Karan Patel
Оценок пока нет
Book PDF
Документ359 страниц
Book PDF
ASDDSA
Оценок пока нет
DPP 255 User Manual
Документ24 страницы
DPP 255 User Manual
Peter Harte
Оценок пока нет
Lecture 1: Introduction To Artificial Intelligence: Dr. Roman V Belavkin BIS3226
Документ3 страницы
Lecture 1: Introduction To Artificial Intelligence: Dr. Roman V Belavkin BIS3226
Ron Rio
Оценок пока нет
Chapter01 Introduction To Computers - Quiz
Документ48 страниц
Chapter01 Introduction To Computers - Quiz
Treblig Mosquera Omsare Lpt
Оценок пока нет
GUI Notes
Документ27 страниц
GUI Notes
Satvik Gupta
Оценок пока нет
ISMA Tool Datasheet V1.2.3 ENG
Документ2 страницы
ISMA Tool Datasheet V1.2.3 ENG
Sinjiku Yuuki
Оценок пока нет
Isscc 2019 1
Документ3 страницы
Isscc 2019 1
Zyad Iskandar
Оценок пока нет
Presents: Cops Iit (Bhu)
Документ10 страниц
Presents: Cops Iit (Bhu)
Soumyadwip Chanda
Оценок пока нет
LF ZMB26 PL V0
Документ6 страниц
LF ZMB26 PL V0
Sushil Kore
Оценок пока нет
Aplikasi Pengenalan Perangkat Keras Komputer Berbasis: Android Menggunakan Augmented Reality (Ar)
Документ8 страниц
Aplikasi Pengenalan Perangkat Keras Komputer Berbasis: Android Menggunakan Augmented Reality (Ar)
Wahyu Lmnn
Оценок пока нет
ANSYS 12 Operation
Документ102 страницы
ANSYS 12 Operation
Pablo J. González Mora
Оценок пока нет
CA Question Paper
Документ9 страниц
CA Question Paper
Vaibhav Bajpai
Оценок пока нет
Vadla Brahma Chary: Work Experience
Документ2 страницы
Vadla Brahma Chary: Work Experience
NEB MEP
Оценок пока нет
Smart Ac For Green Energy 1
Документ16 страниц
Smart Ac For Green Energy 1
Vinita Katwaroo
Оценок пока нет
Class Xi Computer Science Paper For Half Yearly Exam 2010
Документ8 страниц
Class Xi Computer Science Paper For Half Yearly Exam 2010
sm_manda
60% (5)
Regular Expressions in Perl::-KLK Mohan, 200841011, M.Tech, VLSI
Документ18 страниц
Regular Expressions in Perl::-KLK Mohan, 200841011, M.Tech, VLSI
Sudheer Prasad
Оценок пока нет
SE MID-1 Imp Questions+Ans
Документ7 страниц
SE MID-1 Imp Questions+Ans
BANDREDDY SARATH CHANDRA
Оценок пока нет
CS Chap7 Multicores Multiprocessors Clusters
Документ65 страниц
CS Chap7 Multicores Multiprocessors Clusters
Đỗ Trị
Оценок пока нет
Performance Analysis of The IEEE 802.11 Distributed Coordination Function
Документ13 страниц
Performance Analysis of The IEEE 802.11 Distributed Coordination Function
Dibbendu Roy
Оценок пока нет
ERP Selection Checklist - ERP Focus
Документ3 страницы
ERP Selection Checklist - ERP Focus
Dennis Nebeker
Оценок пока нет
Devops CIA 2 QB Final
Документ15 страниц
Devops CIA 2 QB Final
LRG Govt. Arts College for Women Tiruppur
Оценок пока нет
Understanding and Troubleshooting DHCP in Catalyst Switch or Enterprise Networks - Cisco
Документ56 страниц
Understanding and Troubleshooting DHCP in Catalyst Switch or Enterprise Networks - Cisco
Suman Kumar
Оценок пока нет
Cyber Security Awareness - Exam
Документ3 страницы
Cyber Security Awareness - Exam
Rahul Patel
Оценок пока нет