Вы находитесь на странице: 1из 11

Deep Learning

A Quick Review of Hadoop & MapReduce


Dr Xuewen Chens Group

I.I. Itauma
Wayne State University Department of Computer Science

November 22, 2013

Itauma

Introduction to Hadoop & MapReduce

Deep Learning

Data

Telecommunication. Internet. Phone data. Online stores. Medicine - X rays. Research - Similarity in tumours. Need to store & process data.

Itauma

Introduction to Hadoop & MapReduce

Deep Learning

What is Big Data

Anything that can not be stored in a traditional database. Any data too big to be process on a single machine.

Itauma

Introduction to Hadoop & MapReduce

Deep Learning

Challenges in Big Data

Data are created fast. Data from different sources in various formats. Data is not worthless but have a lot of value.

Itauma

Introduction to Hadoop & MapReduce

Deep Learning

3Vs in Big Data

Volume - Size of data. Variety - Different sources and format of data. Velocity - Speed at which it is generated and made available for processing. Volume: Cost based on size of storage (SAN) AWS. We need cheaper ways to store reliably. (Read & process it efciently). Streaming data & processing can be slow. Hadoop helps to scale & store data. Variety: structure & unstructured or semi-structure data. Hadoop: Data can be stored in its raw format. Not throwing any information away. [S]

Itauma

Introduction to Hadoop & MapReduce

Deep Learning

What Data interests you?

Science. E-commerce. Medical. Social. Financial. Sports. Utilities ...

Itauma

Introduction to Hadoop & MapReduce

Deep Learning

Cloudera - Doug
Hadoop was coinded out of Dong sons toy elephant which he called hadoop. Hadoop store in HDFS and process with MapReduce. It offers an efcient way of storing data via HDFS. Hadoop Ecosystem. [S] CDH. Distribution of Hadoop with easy installation https://docs.google.com/document/d/1v0zGBZ6EHapSmsr3x3sGGpDW-54m82kDpPKC2M6uiY/edit Hadoop was originally part of the open source project called Nutch. S1

Itauma

Introduction to Hadoop & MapReduce

Deep Learning

MapReduce

Processing chunks of data in parallel. S2 Used in Recommendation system, Fraud Detection, Item classication

Itauma

Introduction to Hadoop & MapReduce

Deep Learning

Running Jobs on the Cluster

Hadoop streaming enables us to write our codes in any language e.p python, octave.

Itauma

Introduction to Hadoop & MapReduce

Appendix

Thank you for your attention

Thank you for your attention I

Itauma

Introduction to Hadoop & MapReduce

Appendix

Thank you for your attention

Thanks!

Itauma

Introduction to Hadoop & MapReduce

Вам также может понравиться