Академический Документы
Профессиональный Документы
Культура Документы
I.I. Itauma
Wayne State University Department of Computer Science
Itauma
Deep Learning
Data
Telecommunication. Internet. Phone data. Online stores. Medicine - X rays. Research - Similarity in tumours. Need to store & process data.
Itauma
Deep Learning
Anything that can not be stored in a traditional database. Any data too big to be process on a single machine.
Itauma
Deep Learning
Data are created fast. Data from different sources in various formats. Data is not worthless but have a lot of value.
Itauma
Deep Learning
Volume - Size of data. Variety - Different sources and format of data. Velocity - Speed at which it is generated and made available for processing. Volume: Cost based on size of storage (SAN) AWS. We need cheaper ways to store reliably. (Read & process it efciently). Streaming data & processing can be slow. Hadoop helps to scale & store data. Variety: structure & unstructured or semi-structure data. Hadoop: Data can be stored in its raw format. Not throwing any information away. [S]
Itauma
Deep Learning
Itauma
Deep Learning
Cloudera - Doug
Hadoop was coinded out of Dong sons toy elephant which he called hadoop. Hadoop store in HDFS and process with MapReduce. It offers an efcient way of storing data via HDFS. Hadoop Ecosystem. [S] CDH. Distribution of Hadoop with easy installation https://docs.google.com/document/d/1v0zGBZ6EHapSmsr3x3sGGpDW-54m82kDpPKC2M6uiY/edit Hadoop was originally part of the open source project called Nutch. S1
Itauma
Deep Learning
MapReduce
Processing chunks of data in parallel. S2 Used in Recommendation system, Fraud Detection, Item classication
Itauma
Deep Learning
Hadoop streaming enables us to write our codes in any language e.p python, octave.
Itauma
Appendix
Itauma
Appendix
Thanks!
Itauma