Вы находитесь на странице: 1из 17

Hadoop for Dummies

MapReduce to the Rescue

Introduction
Knowing why MapReduce is essential Understanding how MapReduce works Looking at the industries that MapReduce Considering real-world applications

use

Solution to Huge Unstructured Data

MapReduce
A software framework that breaks big problems These servers are called nodes, and they into small, manageable tasks and then distributes work together in parallel them to to arrive at a result. multiple servers.

Data requirement for MapReduce


If your job is to coax insight from a very large disk-based information set , measured in terabytes to petabytes, then MapReduces will likely meet your needs. The data may be structured or unstructured, and is commonly made up of text, binary, or multi-line records. MapReduce can work with raw data thats stored in disk files, in relational databases, or both. The most common MapReduce usage pattern employs a distributed file system known as Hadoop Distributed File System (HDFS).

Data is stored on local disk and processing is done locally on the computer with the data.

MapReduce Architecture
Map

Key

Value

The key identifies what kind of information were looking at.

When compared with a relational database, a key usually equates to a column.

The value portion of the key/value pair is an actual instance of data associated with a key.

Examples of Key and Value

First name Transaction amount Search term

Danielle
19.96 Snare drums

First name/Danielle Transaction amount/19.96 Search term/Snare drums

Reduce
After the Map phase is over, all the intermediate values for a given output key are combined together into a list. The reduce() function then combines the intermediate values into one or more final values for the same key.

Configuring MapReduce

Components will fail at a high rate

Data will be contained in a relatively small number of big files

Data files are write-once

Lots of streaming reads

Higher sustained throughput across large amounts of data

MapReduce in Action (Example)


In this scenario, youre in charge of the ecommerce website for a very large retailer. You stock over 200,000 individual products, and your website receives hundreds of thousands of visitors each day. In aggregate, your customers place nearly 50,000 orders each day.

Example (Prepare sorted list of search terms)


Step 1: The data should ideally be broken into numerous 1 GB +/- files. Step 2: Each file will be distributed to a different node. Step 3: On each node, the Map step will produce a list, consisting of each word in the file along with how many times it appears. For example, one node might come up with these intermediate results from its own set of data: Skate: 4992120 Ski: 303021 Skis: 291101

Example (Prepare sorted list of search terms)


Step 4: The Reduce step will then consolidate all of the results from the Map step, producing a list of all search terms and the total number of times they appeared across all of the files. For example, the combined counts for these search terms might look like this: Skate: 1872695210 Ski: 902785455 Skis: 3486501184

Benefit from MapReduce if you have


Lots of Data Multiple servers at your disposal
MapReduce-based software such as Hadoop

MapReduce Users
Financial services Telco Retail Government Defense Homeland security Health and life services Utilities Social networks/Internet Internet service providers

Other MapReduce Applications


Risk modeling Recommendation engines Point of sale transaction analysis Threat analysis Search quality ETL logic for data warehouses Customer churn analysis Ad targeting Network traffic analysis Trade surveillance Data sandboxes

Real-World MapReduce Examples

Financial Services
Fraud detection
Data source and data Asset management store consolidation

Real-World MapReduce Examples

Retail
Web log analytics Improving customer experience and improving relevance of offers Supply chain optimization

Real-World MapReduce Examples

Auto Manufacturing
Vehicle model and option validation
Vehicle mass analysis Emission Customer reporting satisfaction

Вам также может понравиться