Вы находитесь на странице: 1из 29

PROYECTO FINAL INTEGRADOR DE TECNOLOGÍAS Y METODOLOGÍAS

Estudiantes:

Deiber Alirio Ramírez Gallego


C.C 1058843769

Heri Andrés López


C.C 1071143704

Carlos Andrés Rodríguez


C.C. 1.056.799.135

Juan Gabriel León


C.C 1073157562

Juan Pablo Mayorga


C.C

Director:

Ibo Luis Cerra

Grupo:

301125_20
Base De Datos Avanzada 301125A_363

Universidad Nacional Abierta y a Distancia - UNAD


Escuela De Ciencias Básicas, Ingenierías y Tecnologías
Bogotá D.C
Diciembre de 2017
Tabla de Contenido

Introducción ........................................................................................................................................ 4
Proyecto Final: Fase 4 - Proyecto Final integrador de tecnologías y Metodologías ........................... 5
Link Acceso DropBox ........................................................................................................................... 5
Actividad 1. Seleccionar un Rol ........................................................................................................... 5
Actividad 2. Revisión de la WebConference ....................................................................................... 6
Actividad 3. Material Multimedia ....................................................................................................... 6
3.1. Distribución del Material Multimedia ...................................................................................... 6
3.2. Revisar los Materiales .............................................................................................................. 8
3.3. Prepara los Mapas conceptuales ............................................................................................. 8
1. What launched the Big Data era: ............................................................................................ 8
2. Applications What makes big data valuable: .......................................................................... 9
3. Example Saving lives with Big Data: ........................................................................................ 9
4. Example Using Big Data to Help Patients .............................................................................. 10
5. Sentiment Analysis Success Story Meltwater helping........................................................... 10
6. Getting Started: Where Does Big Data Come From? ............................................................ 11
7. Machine-Generated Data It's Everywhere and There's ........................................................ 11
8. Machine-Generated Data ...................................................................................................... 12
9. Big Data Generated By People The Unstructured Challenge ................................................ 12
10. Big Data Generated By People ............................................................................................ 13
11. Organization-Generated Data: Structured but often siloed ............................................... 13
12. Organization-Generated Data: Benefits Come From Combining With Other Data Types .. 14
13. The Key: Integrating Diverse Data ....................................................................................... 14
Actividad 4. Discusiones sobre Temas de Big Data ........................................................................... 15
Let's discuss: Who are you providing data to? .......................................................................... 15
Actividad 6. Solución a Cuestionarios sobre temas de Big Data ....................................................... 16
Cuestionario .................................................................................................................................. 17
Why Big Data and Where Did it Come From? ........................................................................... 17
Why Big Data and Where Did it Come From? ........................................................................... 19
V for the V's of Big Data ............................................................................................................ 22
Data Science 101 ....................................................................................................................... 23
Foundations for Big Data........................................................................................................... 25
Intro to Hadoop ......................................................................................................................... 25
Actividad 7. El Video Grupal sobre Modelar y Administrar un Proyecto Big Data............................ 27
Actividad 8. Evidencia de participación en los Temas de Foro de la Fase. ....................................... 27
 Deiber Alirio Ramírez Gallego ........................................................................................... 27
 Heri Andrés López ............................................................................................................. 28
 Carlos Andres Rodríguez ................................................................................................... 28
 Juan Gabriel León .............................................................................................................. 28
 Juan Pablo Mayorga .......................................................................................................... 28
Bibilografia ........................................................................................................................................ 29
Introducción
Proyecto Final: Fase 4 - Proyecto Final integrador de tecnologías y
Metodologías

Link Acceso DropBox

https://www.dropbox.com/sh/4prnkrxoer8aspo/AACeApKB_TIzTW5QVXytyLW2a?dl
=0

Actividad 1. Seleccionar un Rol

Cada uno de los miembros del Grupo de Trabajo Colaborativo, debe seleccionar un Rol el
cual ejercer durante la Fase de acuerdo con sus responsabilidades consignadas en esta Guía.
En el informe deben incluir un cuadro con los roles seleccionado.

Roles y responsabilidades

Moderador: Carlos Andrés Rodríguez

Colaborador:

Evaluador: Deiber Alirio Ramírez

Creativo:

Investigador: Heri Andrés López


Actividad 2. Revisión de la WebConference

Cada integrante del Grupo participa o revise la grabación de la Webconference 6 y 7: Taller


Virtual 3 - Modelado de Sistemas Big Data para que pueda adelantar el desarrollo de la guía.

Actividad 3. Material Multimedia

Revisión del Material Multimedia (Multimedios y/o Lecturas) seleccionados para la Fase 3.

3.1. Distribución del Material Multimedia

El Grupo de Trabajo Colaborativo se Distribuye el Material Multimedia (Multimedios y/o


Lecturas) entre los miembros del Grupo.
Why Big Data?

1. What launched the Big Data era? Deiber A. Ramírez

2. Applications: What makes big data


valuable Deiber A. Ramírez

3. Example: Saving lives with Big Data


Deiber A. Ramírez

4. Example: Using Big Data to Help


Patients Heri Andrés López

5. A Sentiment Analysis Success Story:


Meltwater helping Danone

Big Data: ¿Where Does It Come From?

6. Getting Started: Where Does Big Data


Come From?

7. Machine-Generated Data: It's


Everywhere and There's a Lot!

8. Machine-Generated Data: Advantages

9. Big Data Generated By People: The


Unstructured Challenge Heri Andrés López

10. Big Data Generated By People: How Is


It Being Used? Heri Andrés López

11. Organization-Generated Data:


Structured but often siloed Carlos Andres Rodríguez
12. Organization-Generated Data:
Benefits Come From Combining With Carlos Andres Rodríguez
Other Data Types
13. The Key: Integrating Diverse Data
Carlos Andres Rodríguez

3.2. Revisar los Materiales

Cada integrante del Grupo de Trabajo Colaborativo Revisa los Materiales, Lecturas y/o
Multimedios, que le fueron asignados y que se encuentran indicadas en el documento Casos
y Material de Estudio

3.3. Prepara los Mapas conceptuales

Cada integrante del Grupo de Trabajo Colaborativo, después de revisar los Materiales, Lecturas y/o
Multimedios, que le fueron asignados, Prepara los Mapas conceptuales requeridos que permitan
comprender y asimilar los conceptos consignados en cada uno Materiales, Lecturas y/o
Multimedios.

1. What launched the Big Data era:

https://cmapscloud.ihmc.us:443/rid=1RY9Y4NBJ-1NWXTKW-3NDKBG
2. Applications What makes big data valuable:

https://cmapscloud.ihmc.us:443/rid=1RYB16DTG-2C16CJB-3NPQYB

3. Example Saving lives with Big Data:

https://cmapscloud.ihmc.us:443/rid=1RYB4DDSC-1ZCF6Q3-3P72KB
4. Example Using Big Data to Help Patients

https://cmapscloud.ihmc.us:443/rid=1RYGQ1HQM-1L323XQ-4KFYKT

5. Sentiment Analysis Success Story Meltwater helping

https://cmapscloud.ihmc.us:443/rid=1RYKRYB6C-1NW2J3F-50PWKB
6. Getting Started: Where Does Big Data Come From?

7. Machine-Generated Data It's Everywhere and There's

https://cmapscloud.ihmc.us:443/rid=1RYKB59YD-Z3L59W-4Y1PZ8
8. Machine-Generated Data

https://cmapscloud.ihmc.us:443/rid=1RYKJK58B-19XCPW1-4Z9LV7

9. Big Data Generated By People The Unstructured Challenge

https://cmapscloud.ihmc.us:443/rid=1RYGSY9CB-1F6VGH4-4KXBY9
10. Big Data Generated By People

https://cmapscloud.ihmc.us/viewer/cmap/1RYK46PZG-K96BJM-4W9B1T

11. Organization-Generated Data: Structured but often siloed


12. Organization-Generated Data: Benefits Come From Combining With Other Data
Types

13. The Key: Integrating Diverse Data


Actividad 4. Discusiones sobre Temas de Big Data

El grupo de Trabajo Colaborativo debate y da respuesta a cada una de las Discusiones


propuestas en el Foro. Para cada una de las Discusiones, el Grupo de Trabajo Colaborativo
debe presentar una posición unificada al respecto de la Discusión. E sta debe recoger
la posición del grupo con respecto a la Discusión.

Let's discuss: Who are you providing data to?

It's commonly discussed in the news how social media sites like twitter and facebook gather
data on their users. But take a minute to this in detail about the various ways you interact
with machines and applications on a given day. What's one surprising or uncomfortable
thing you may be providing data on? Is there a non-social media (or shopping) application
you realize you do give information to (perhaps that you hadn't thought of before)?

¿Quién comparte mis datos?

La tecnología evoluciona imparable, y consecuentemente, cada vez son más las


posibilidades y facilidades que ofrece. Pero debemos ser realistas y concienciarnos de que
además de la infinidad de ventajas que proporciona, el progreso tecnológico también
implica cierto riesgo. Y este riesgo se agrava en función del uso que hagamos de ella.
La cantidad de datos personales que se generan son almacenados en un paradigma
abstracto que identificamos comúnmente como internet o la nube. Determinar la capacidad
de este espacio intangible es una tarea inútil. Es una red de redes a escala mundial cuya
extensión aumenta en función de la información que se genera; por esta razón, habiendo
alcanzado ya unas dimensiones desorbitadas, internet continúa sometido a una constante
expansión.
Toda acción desarrollada en la Red queda registrada. Cada vez que encendemos el
ordenador, utilizamos un smartphone o consumimos cualquier dispositivo electrónico
conectado a la web, procedemos a compartir datos. La información otorga un gran poder a
quien goza de ella, por lo que debemos tratar de ser conscientes del alcance que tiene este
fenómeno tecnológico y cuestionarnos dónde queda la privacidad de todos y cada uno de
los individuos que hacemos uso de Internet. Continuamente estamos informando sobre
nuestros intereses, necesidades y gustos sin prestar a esta práctica importancia alguna. Y
como consecuencia, el propio control que cada usuario tiene sobre sus datos personales se
ha visto debilitado.
Los datos se han convertido en un reclamo para muchos de los internautas, y entre ellos,
las empresas, ya que el interés del consumidor ha pasado a ser un factor condicionante en
el desarrollo de los nuevos servicios digitales- ya sean aplicaciones, dispositivos o sistemas.
Hay entidades que buscan obtener nuevas ventajas competitivas a partir de la información
que proporcionan los distintos usuarios. Es, por ejemplo, el caso de los responsables del
marketing, cuyo reto actual es sacar el máximo rendimiento de los datos personales que
seleccionan y almacenan. Estos profesionales llevan a cabo un análisis exhaustivo con el fin
de conocer los intereses que predominan en la comunidad de internautas y mejorar su
estrategia de comercialización.

La ambición por el control de los datos y el dominio de la información a partir de internet,


las redes sociales y las aplicaciones móviles ha desatado un mercado en el que existen
ciertas grietas legales. En ocasiones, el tráfico de datos se realiza de forma irregular, ya que
algunas entidades han optado por sobrepasar los límites impuestos por la Ley: apropiación
indebida de datos personales e íntimos, es decir, sin el consentimiento del usuario; o
también la cesión de los mismos a terceros. Esta práctica supone un problema que cada vez
es más frecuente; hay miles de ciberdelincuentes campando por la Red.
Los expertos pretenden alertarnos sobre esta tendencia y aseguran que el riesgo se agrava
con el uso indebido de las nuevas tecnologías. Los ciudadanos confiamos en el ecosistema
que supone internet, desconocemos su alcance y tendemos a creer que nuestra propia
irresponsabilidad será lidiada por otros. Y no es así. Debemos proteger la privacidad de
todos y cada uno de los usuarios, asegurarnos de los datos personales que estamos
dispuestos a aportar voluntariamente a las empresas y ejercer un firme e imprescindible
control sobre lo que queremos compartir en internet.
Tenemos que ser prudentes; la información es clave, y el comportamiento responsable
también.

Actividad 6. Solución a Cuestionarios sobre temas de Big Data

1. Cada integrante del Grupo, después de revisar los Cuestionarios, Prepara las
Respuestas, como resultado de la Revisión de los Materiales, Lecturas y/o
Multimedios que cada uno tenía asignados. Las Respuestas a los cuestionarios,
lograda por cada Integrante del Grupo de Trabajo Colaborativo, las Debe dejar en el
tema Cuestionarios del Foro como evidencia individual de su trabajo en la Fase 3.

2. Cada estudiante Socializa la solución de los Cuestionarios en el Tema Casos de del


Foro de la Fase 3.

3. El grupo de Trabajo Colaborativo a partir de la solución que cada uno de los


integrantes del grupo ha logrado del Cuestionario, confronta y prepara
una respuesta unificada por parte del Grupo. Cada respuesta debe ser sustentada
en consenso.
Cuestionario

Why Big Data and Where Did it Come From?

1. Which of the following is an example of big data utilized in action today?


 Wi-Fi Networks
 Social Media
 The Internet
 Individual, Unconnected Hospital Databases

2. What reasoning was given for the following: ¿why is the “data storage to price ratio”
relevant to big data?
 Larger storage means easier accessibility to big data for every user because it allows users
to download in bulk.
 Access of larger storage becomes easier for everyone, which means client-facing services
require very large data storage.
 It isn't, it was just an arbitrary example on big data usage.
 Companies can't afford to own, maintain, and spend the energy to support large data
storage unless the cost is sufficiently low.

3. What is the best description of personalized marketing enabled by big data?


 Being able to use the data from each customer for marketing needs.
 Being able to obtain and use customer information for specific groups and utilize them for
marketing needs.
 Marketing to each customer on an individual level and suiting to their needs.

4. ¿Of the following, which are some examples of personalized marketing related to big data?
 Facebook revealing posts that cater towards similar interests.
 News outlets gathering information from the internet in order to report them to the
public.
 A survey that asks your age and markets to you a specific brand.

5. What is the workflow for working with big data?


 Extrapolation -> Understanding -> Reproducing
 Big Data -> Better Models -> Higher Precision
 Theory -> Models -> Precise Advice

6. Which is the most compelling reason why mobile advertising is related to big data?
 Mobile advertising benefits from data integration with location which requires big data.
 Since almost everyone owns a cell/mobile phone, the mobile advertising market is large
and thus requires big data to contain all the information.
 Mobile advertising allows massive cellular/mobile texting to a wide audience, thus
providing large amounts of data.
 Mobile advertising in and of itself is always associated with big data.

7. What are the three types of diverse data sources?


 Machine Data, Map Data, and Social Media
 Sensor Data, Organizational Data, and Social Media
 Information Networks, Map Data, and People
 Machine Data, Organizational Data, and People

8. What is an example of machine data?


 Social Media
 Weather station sensor output.
 Sorted data from Amazon regarding customer info.

9. What is an example of organizational data?


 Disease data from Center for Disease Control.
 Social Media
 Satellite Data

10. ¿Of the three data sources, which is the hardest to implement and streamline into a model?
 Organizational Data
 Machine Data
 People

11. Which of the following summarizes the process of using data streams?
 Integration -> Personalization -> Precision
 Big Data -> Better Models -> Higher Precision
 Theory -> Models -> Precise Advice
 Extrapolation -> Understanding -> Reproducing

12. Where does the real value of big data often come from?
 Combining streams of data and analyzing them for new insights.
 Size of the data.
 Having data-enabled decisions and actions from the insights of new data.
 Using the three major data sources: Machines, People, and Organizations.

13. What does it mean for a device to be "smart"?


 Must have a way to interact with the user.
 Connect with other devices and have knowledge of the environment.
 Having a specific processing speed in order to keep up with the demands of data
processing.

14. What does the term "in situ" mean in the context of big data?
 In the situation
 The sensors used in airplanes to measure altitude.
 Accelerometers.
 Bringing the computation to the location of the data.

15. Which of the following are reasons mentioned for why data generated by people are hard to
process?
 The velocity of the data is very high.
 Very unstructured data.
 Skilled people to analyze the data are hard to come by.
 They cannot be modeled and stored.

16. ¿What is the purpose of retrieval and storage; pre-processing; and analysis in order to
convert multiple data sources into valuable data?
 To enable ETL methods.
 Since the multi-layered process is built into the Neo4j database connection.
 Designed to work like the ETL process.
 To allow scalable analytical solutions to big data.

17. Which of the following are benefits for organization generated data?
 Customer Satisfaction
 Better Profit Margins
 Improved Safety
 High Velocity
 Higher Sales

18. What are data silos and why are they bad?
 Highly unstructured data. Bad because it does not provide meaningful results for
organizations.
 A giant centralized database to house all the data produces within an organization. Bad
because it is hard to maintain as highly structured data.
 Data produced from an organization that is spread out. Bad because it creates
unsynchronized and invisible data.
 A giant centralized database to house all the data production within an organization. Bad
because it hinders opportunity for data generation.

19. Which of the following is a benefit of data integration?


 Increase data collaboration.
 Adds value to big data.
 Increase data availability.
 Unify your data system.
 Reduce data complexity.
 Monitoring of data.

Why Big Data and Where Did it Come From?

1. Which of the following is an example of big data utilized in action today?


 Wi-Fi Networks
 Social Media
 The Internet
 While the Internet may be enabling the easier collection and sharing of big data, in and of
itself, it is not an example of big data utilized in action today.
 Individual, Unconnected Hospital Databases
2. What reasoning was given for the following: why is the "data storage to price ratio" relevant
to big data?
 Larger storage means easier accessibility to big data for every user because it allows users
to download in bulk.
 Access of larger storage becomes easier for everyone, which means client-facing services
require very large data storage.
 It isn't, it was just an arbitrary example on big data usage.
 Companies can't afford to own, maintain, and spend the energy to support large data
storage unless the cost is sufficiently low.

3. What is the best description of personalized marketing enabled by big data?


 Being able to use the data from each customer for marketing needs.
 Being able to obtain and use customer information for specific groups and utilize them for
marketing needs.
 Marketing to each customer on an individual level and suiting to their needs.

4. Of the following, which are some examples of personalized marketing related to big data?
 Facebook revealing posts that cater towards similar interests.
 News outlets gathering information from the internet in order to report them to the
public.
 A survey that asks your age and markets to you a specific brand.

5. What is the workflow for working with big data?


 Extrapolation -> Understanding -> Reproducing
 Big Data -> Better Models -> Higher Precision
 Theory -> Models -> Precise Advice

6. Which is the most compelling reason why mobile advertising is related to big data?
 Mobile advertising benefits from data integration with location which requires big data.
 Since almost everyone owns a cell/mobile phone, the mobile advertising market is large
and thus requires big data to contain all the information.
 Mobile advertising allows massive cellular/mobile texting to a wide audience, thus
providing large amounts of data.
 Mobile advertising in and of itself is always associated with big data.

7. What are the three types of diverse data sources?


 Machine Data, Map Data, and Social Media
 Sensor Data, Organizational Data, and Social Media
 Information Networks, Map Data, and People
 Machine Data, Organizational Data, and People

8. What is an example of machine data?


 Social Media
 Weather station sensor output.
 Sorted data from Amazon regarding customer info.

9. What is an example of organizational data?


 Disease data from Center for Disease Control.
 Social Media
 Satellite Data

10. Of the three data sources, which is the hardest to implement and streamline into a model?
 Organizational Data
 Machine Data
 People

11. Which of the following summarizes the process of using data streams?
 Integration -> Personalization -> Precision
 Big Data -> Better Models -> Higher Precision
 Theory -> Models -> Precise Advice
 Extrapolation -> Understanding -> Reproducing

12. Where does the real value of big data often come from?
 Combining streams of data and analyzing them for new insights.
 Size of the data.
 Having data-enabled decisions and actions from the insights of new data.
 Using the three major data sources: Machines, People, and Organizations.

13. What does it mean for a device to be "smart"?


 Must have a way to interact with the user.
 Connect with other devices and have knowledge of the environment.
 Having a specific processing speed in order to keep up with the demands of data
processing.

14. What does the term "in situ" mean in the context of big data?
 In the situation
 The sensors used in airplanes to measure altitude.
 Accelerometers.
 Bringing the computation to the location of the data.

15. Which of the following are reasons mentioned for why data generated by people are hard to
process?
 The velocity of the data is very high.
 Very unstructured data.
 Skilled people to analyze the data are hard to come by.
 They cannot be modeled and stored.

16. What is the purpose of retrieval and storage; pre-processing; and analysis in order to convert
multiple data sources into valuable data?
 To enable ETL methods.
 Since the multi-layered process is built into the Neo4j database connection.
 Designed to work like the ETL process.
 To allow scalable analytical solutions to big data.

17. Which of the following are benefits for organization generated data?
 Customer Satisfaction
 Better Profit Margins
 Improved Safety
 High Velocity
 Higher Sales

18. What are data silos and why are they bad?
 Highly unstructured data. Bad because it does not provide meaningful results for
organizations.
 A giant centralized database to house all the data produces within an organization. Bad
because it is hard to maintain as highly structured data.
 Data produced from an organization that is spread out. Bad because it creates
unsynchronized and invisible data.
 A giant centralized database to house all the data production within an organization. Bad
because it hinders opportunity for data generation.

V for the V's of Big Data

1. Amazon has been collecting review data for a particular product. They have realized that
almost 90% of the reviews were mostly a 5/5 rating. However, of the 90%, they realized that
50% of them were customers who did not have proof of purchase or customers who did not post
serious reviews about the product. Of the following, which is true about the review data
collected in this situation?
 High Veracity
 Low Valence
 High Valence
 High Volume
 Low Veracity
 Low Volume

2. As mentioned in the slides, what are the challenges to data with high valence?
 Complex Data Exploration Algorithms
 Difficult to Integrate
 Reliability of Data

3. Which of the following are the 6 V's in big data?


 Veracity
 Value
 Volume
 Valence
 Vision
 Velocity
 Variety

4. What is the veracity of big data?


 The speed at which data is produced.
 The size of the data.
 The connectedness of data.
 The abnormality or uncertainties of data.

5. What are the challenges of data with high variety?


 The quality of data is low.
 Hard in utilizing group event detection.
 Hard to perform emergent behavior analysis.
 Hard to integrate.

6. Which of the following is the best way to describe why it is crucial to process data in real-
time?
 Batch processing is an older method that is not as accurate as real-time processing.
 More expensive to batch process.
 Prevents missed opportunities.
 More accurate.

7. What are the challenges with big data that has high volume?
 Storage and Accessibility
 Speed Increase in Processing
 Cost, Scalability, and Performance
 Effectiveness and Cost

Data Science 101

1. Which of the follow are parts of the 5 P's of data science and part of an additional P
introduced in the slides?
 Purpose
 Process
 People
 Platforms
 Perception
 Programmability
 Product

2. Which of the following are part of the four main categories to acquire, access, and retrieve
data?
 Remote Data
 Web Services
 Text Files
 Traditional Databases
 NoSQL Storage

3. What are the steps required for data analyzation?


 Select Technique, Build Model, Evaluate
 Classification, Regression, Analysis
 Regression, Evaluate, Classification
 Investigate, Build Model, Evaluate

4. Of the following, what is a technique mentioned in the videos for building a model?
 Analysis
 Validation
 Evaluation
 Investigation

5. What is the first step in finding a right problem to tackle in data science?
 Ask the Right Questions
 Define Goals
 Assess the Situation
 Define the Problem

6. What is the first step to big data strategy?


 Collect Data
 Build In-House Expertise
 Business Objectives
 Organizational Buy-In

7. According to Ilkay, why is exploring data crucial to better modeling?


 Data exploration... <complete the sentence>
 enables histograms and others graphs as data visualization.
 enables a description of data which allows visualization.
 enables understanding of general trends, correlations, and outliers.
 leads to data understanding which allows an informed analysis of the data.

8. Why is data science mainly about teamwork?


 Data science requires a variety of expertise in different fields.
 Exhibition of curiosity is required.
 Engineering solutions are preferred.
 Analytic solutions are required.

9. What are the ways to address data quality issues?


 Remove data with missing values.
 Data Wrangling
 Generate best estimates for invalid values.
 Remove outliers.
 Merge duplicate records.

10. What is done to the data in the preparation stage?


 Understand Nature of Data and Preliminary Analysis.
 Build Models
 Identify Data Sets and Query Data
 Select Analytical Techniques
 Retrieve Data
Foundations for Big Data

1. Which of the following is the best description of why it is important to learn about the
foundations in big data?
 Foundations allow understanding of practical concepts in Hadoop.
 Understanding of practical concepts in Hadoop allows solid foundation.
 Foundations stand the test of time.
 Since foundations can be retained for a long time, Hadoop should be learned.

2. What is the benefit of a commodity cluster?


 Much faster than a traditional super computer.
 Prevents individual component failures.
 Cost Effective
 Prevents network connection failure.

3. What is a way to enable fault tolerance?


 System Wide Restart
 Better LAN Connection
 Redundant Data Storage
 Distributed Computing

4. What is a benefit specific to a distributed file system?


 High Concurrency
 Large Storage
 Data Scalability
 High Fault Tolerance

5. Which of the following are general requirements for a programming language in order to
support big data models?
 Handle Fault Tolerance
 Optimization of Specific Data Types
 Enable Adding of More Racks
 Utilize Map Reduction Methods
 Support Big Data Operations

Intro to Hadoop

1. What does IaaS provide?


 Hardware Only
 Computing Environment
 Software On-Demand

2. What does PaaS provide?


 Software On-Demand
 Computing Environment
 Hardware Only

3. What does SaaS provide?


 Hardware Only
 Software On-Demand
 Computing Environment

4. What are the two key components of HDFS and what are they used for?
 NameNode for metadata and DataNode for block storage.
 FASTA for genome sequence and Rasters for geospatial data.
 NameNode for block storage and Data Node for metadata.

5. What is the job of the NameNode?


 Coordinate operations and assigns tasks to Data Nodes
 Listens from DataNode for block creation, deletion, and replication.
 For gene sequencing calculations.

6. What are the three steps to Map Reduce?


 Shuffle and Sort -> Map -> Reduce
 Map -> Shuffle and Sort -> Reduce
 Map -> Reduce -> Shuffle and Sort
 Shuffle and Sort -> Reduce -> Map

7. What is a benefit of using pre-built Hadoop images?


 Quick prototyping, deploying, and guaranteed bug free.
 Quick prototyping, deploying, and validating of projects.
 Guaranteed hardware support.
 Less software choices to choose from.

8. What are some examples of open-source tools built for Hadoop and what does it do?
 Pig, for real-time and in-memory processing of big data.
 Zookeeper, analyze social graphs.
 Giraph, for SQL-like queries.
 Zookeeper, management system for animal named related components.

9. What is the difference between low level interfaces and high level interfaces?
 Low level deals with interactivity while high level deals with storage and scheduling.
 Low level deals with storage and scheduling while high level deals with interactivity.

10. Which of the following are problem sto look out for when you want to integrate your project
with Hadoop?
 Task Level Parallelism
 Random Data Access
 Advanced Alogrithms
 Infrastructure Replacement
 Data Level Parallelism
11. As covered in the slides, which of the following are the major goals of Hadoop?
 Latency Sensitive Tasks
 Enable Scalability
 Handle Fault Tolerance
 Provide Value for Data
 Facilitate a Shared Environment
 Optimized for a Variety of Data Types

12. What is the purpose of YARN?


 Allows various applications to run on the same Hadoop cluster.
 Enables large scale data across clusters.
 Implementation of Map Reduce.

13. What are the two main components for a data computation framework that were described
in the slides?
 Applications Master and Container
 Node Manager and Applications Master
 Resource Manager and Container
 Node Manager and Container
 Resource Manager and Node Manager

Actividad 7. El Video Grupal sobre Modelar y Administrar un Proyecto Big Data


 DropBox

Actividad 8. Evidencia de participación en los Temas de Foro de la Fase.

1. El estudiante prepara un Informe sobre sus actuaciones en los Temas planteados en el Foro
Unidad 3: Fase 3 - Taller Virtual 3, Modelar y administrar un proyecto Big Data. Indicando en el
cuadro siguiente, el Tema, Numero de Participaciones, Fechas de las Participaciones, Indicador de
si estaba en los rangos permitidos, pertinencia y oportunidad de los aportes y si referenciaba el
origen de los aportes cuando no fueran de su autoría.

 Deiber Alirio Ramírez Gallego

Tema No. Fechas de ¿Rangos Pertinencia ¿Aporte con


Participación participación de Oportunidad Referencia?
Fechas? (Si/No)
Mapas 3 10-29 Nov 2017 10-11-2017 / OK SI
Conceptuales 29-11-2017
Discusiones 1 10-29 Nov 2017 10-11-2017 / OK SI
29-11-2017
Casos 0 10-29 Nov 2017 10-11-2017 / OK SI
29-11-2017
Cuestionarios 5 10-29 Nov 2017 10-11-2017 / OK SI
29-11-2017
Video 1 10-29 Nov 2017 10-11-2017 / OK SI
29-11-2017
 Heri Andrés López

Tema No. Fechas de ¿Rangos Pertinencia ¿Aporte con


Participación participación de Oportunidad Referencia?
Fechas? (Si/No)
Mapas 3 01 Nov 2017 10-11-2017 / OK SI
Conceptuales 29-11-2017
Discusiones 1 26 Nov 2017 10-11-2017 / OK SI
29-11-2017
Casos 0 N/A 10-11-2017 / OK N/A
29-11-2017
Cuestionarios 1 27 Nov 2017 10-11-2017 / OK SI
29-11-2017
Video 1 28 Nov 2017 10-11-2017 / OK SI
29-11-2017

 Carlos Andres Rodríguez


Tema No. Fechas de ¿Rangos Pertinencia ¿Aporte con
Participación participación de Oportunidad Referencia?
Fechas? (Si/No)
Mapas
Conceptuales
Discusiones

Casos

Cuestionarios

Video

 Juan Gabriel León

Tema No. Fechas de ¿Rangos Pertinencia ¿Aporte con


Participación participación de Oportunidad Referencia?
Fechas? (Si/No)
Mapas
Conceptuales
Discusiones

Casos

Cuestionarios

Video

 Juan Pablo Mayorga

Tema No. Fechas de ¿Rangos Pertinencia ¿Aporte con


Participación participación de Oportunidad Referencia?
Fechas? (Si/No)
Mapas
Conceptuales
Discusiones

Casos

Cuestionarios

Video
Bibilografia

 Unad-BDAvanzadas/U3_Caso_Material_Estudio, Recuperado de :
https://github.com/Unad-BDAvanzadas/U3_Caso_Material_Estudio
 Readings: McKinsey report: http://www.mckinsey.com/business-functions/business-
technology/our-insights/big-data-the-next-frontier-for-innovation
 Lectura · The WIFIRE Project: https://www.youtube.com/watch?v=0ohwGggaXZM
 Reading · A Small Definition of Big Data Reading :
http://spaceanalytics.blogspot.com.co/2016/05/caracteristicas-de-big-data.html
 Protección al consumidor, Por Marián Lezaun, noviembre 21, 2017, Tomado de:
https://escriturapublica.es/quien-comparte-mis-datos/

Вам также может понравиться