Академический Документы
Профессиональный Документы
Культура Документы
Estudiantes:
Director:
Grupo:
301125_20
Base De Datos Avanzada 301125A_363
Introducción ........................................................................................................................................ 4
Proyecto Final: Fase 4 - Proyecto Final integrador de tecnologías y Metodologías ........................... 5
Link Acceso DropBox ........................................................................................................................... 5
Actividad 1. Seleccionar un Rol ........................................................................................................... 5
Actividad 2. Revisión de la WebConference ....................................................................................... 6
Actividad 3. Material Multimedia ....................................................................................................... 6
3.1. Distribución del Material Multimedia ...................................................................................... 6
3.2. Revisar los Materiales .............................................................................................................. 8
3.3. Prepara los Mapas conceptuales ............................................................................................. 8
1. What launched the Big Data era: ............................................................................................ 8
2. Applications What makes big data valuable: .......................................................................... 9
3. Example Saving lives with Big Data: ........................................................................................ 9
4. Example Using Big Data to Help Patients .............................................................................. 10
5. Sentiment Analysis Success Story Meltwater helping........................................................... 10
6. Getting Started: Where Does Big Data Come From? ............................................................ 11
7. Machine-Generated Data It's Everywhere and There's ........................................................ 11
8. Machine-Generated Data ...................................................................................................... 12
9. Big Data Generated By People The Unstructured Challenge ................................................ 12
10. Big Data Generated By People ............................................................................................ 13
11. Organization-Generated Data: Structured but often siloed ............................................... 13
12. Organization-Generated Data: Benefits Come From Combining With Other Data Types .. 14
13. The Key: Integrating Diverse Data ....................................................................................... 14
Actividad 4. Discusiones sobre Temas de Big Data ........................................................................... 15
Let's discuss: Who are you providing data to? .......................................................................... 15
Actividad 6. Solución a Cuestionarios sobre temas de Big Data ....................................................... 16
Cuestionario .................................................................................................................................. 17
Why Big Data and Where Did it Come From? ........................................................................... 17
Why Big Data and Where Did it Come From? ........................................................................... 19
V for the V's of Big Data ............................................................................................................ 22
Data Science 101 ....................................................................................................................... 23
Foundations for Big Data........................................................................................................... 25
Intro to Hadoop ......................................................................................................................... 25
Actividad 7. El Video Grupal sobre Modelar y Administrar un Proyecto Big Data............................ 27
Actividad 8. Evidencia de participación en los Temas de Foro de la Fase. ....................................... 27
Deiber Alirio Ramírez Gallego ........................................................................................... 27
Heri Andrés López ............................................................................................................. 28
Carlos Andres Rodríguez ................................................................................................... 28
Juan Gabriel León .............................................................................................................. 28
Juan Pablo Mayorga .......................................................................................................... 28
Bibilografia ........................................................................................................................................ 29
Introducción
Proyecto Final: Fase 4 - Proyecto Final integrador de tecnologías y
Metodologías
https://www.dropbox.com/sh/4prnkrxoer8aspo/AACeApKB_TIzTW5QVXytyLW2a?dl
=0
Cada uno de los miembros del Grupo de Trabajo Colaborativo, debe seleccionar un Rol el
cual ejercer durante la Fase de acuerdo con sus responsabilidades consignadas en esta Guía.
En el informe deben incluir un cuadro con los roles seleccionado.
Roles y responsabilidades
Colaborador:
Creativo:
Revisión del Material Multimedia (Multimedios y/o Lecturas) seleccionados para la Fase 3.
Cada integrante del Grupo de Trabajo Colaborativo Revisa los Materiales, Lecturas y/o
Multimedios, que le fueron asignados y que se encuentran indicadas en el documento Casos
y Material de Estudio
Cada integrante del Grupo de Trabajo Colaborativo, después de revisar los Materiales, Lecturas y/o
Multimedios, que le fueron asignados, Prepara los Mapas conceptuales requeridos que permitan
comprender y asimilar los conceptos consignados en cada uno Materiales, Lecturas y/o
Multimedios.
https://cmapscloud.ihmc.us:443/rid=1RY9Y4NBJ-1NWXTKW-3NDKBG
2. Applications What makes big data valuable:
https://cmapscloud.ihmc.us:443/rid=1RYB16DTG-2C16CJB-3NPQYB
https://cmapscloud.ihmc.us:443/rid=1RYB4DDSC-1ZCF6Q3-3P72KB
4. Example Using Big Data to Help Patients
https://cmapscloud.ihmc.us:443/rid=1RYGQ1HQM-1L323XQ-4KFYKT
https://cmapscloud.ihmc.us:443/rid=1RYKRYB6C-1NW2J3F-50PWKB
6. Getting Started: Where Does Big Data Come From?
https://cmapscloud.ihmc.us:443/rid=1RYKB59YD-Z3L59W-4Y1PZ8
8. Machine-Generated Data
https://cmapscloud.ihmc.us:443/rid=1RYKJK58B-19XCPW1-4Z9LV7
https://cmapscloud.ihmc.us:443/rid=1RYGSY9CB-1F6VGH4-4KXBY9
10. Big Data Generated By People
https://cmapscloud.ihmc.us/viewer/cmap/1RYK46PZG-K96BJM-4W9B1T
It's commonly discussed in the news how social media sites like twitter and facebook gather
data on their users. But take a minute to this in detail about the various ways you interact
with machines and applications on a given day. What's one surprising or uncomfortable
thing you may be providing data on? Is there a non-social media (or shopping) application
you realize you do give information to (perhaps that you hadn't thought of before)?
1. Cada integrante del Grupo, después de revisar los Cuestionarios, Prepara las
Respuestas, como resultado de la Revisión de los Materiales, Lecturas y/o
Multimedios que cada uno tenía asignados. Las Respuestas a los cuestionarios,
lograda por cada Integrante del Grupo de Trabajo Colaborativo, las Debe dejar en el
tema Cuestionarios del Foro como evidencia individual de su trabajo en la Fase 3.
2. What reasoning was given for the following: ¿why is the “data storage to price ratio”
relevant to big data?
Larger storage means easier accessibility to big data for every user because it allows users
to download in bulk.
Access of larger storage becomes easier for everyone, which means client-facing services
require very large data storage.
It isn't, it was just an arbitrary example on big data usage.
Companies can't afford to own, maintain, and spend the energy to support large data
storage unless the cost is sufficiently low.
4. ¿Of the following, which are some examples of personalized marketing related to big data?
Facebook revealing posts that cater towards similar interests.
News outlets gathering information from the internet in order to report them to the
public.
A survey that asks your age and markets to you a specific brand.
6. Which is the most compelling reason why mobile advertising is related to big data?
Mobile advertising benefits from data integration with location which requires big data.
Since almost everyone owns a cell/mobile phone, the mobile advertising market is large
and thus requires big data to contain all the information.
Mobile advertising allows massive cellular/mobile texting to a wide audience, thus
providing large amounts of data.
Mobile advertising in and of itself is always associated with big data.
10. ¿Of the three data sources, which is the hardest to implement and streamline into a model?
Organizational Data
Machine Data
People
11. Which of the following summarizes the process of using data streams?
Integration -> Personalization -> Precision
Big Data -> Better Models -> Higher Precision
Theory -> Models -> Precise Advice
Extrapolation -> Understanding -> Reproducing
12. Where does the real value of big data often come from?
Combining streams of data and analyzing them for new insights.
Size of the data.
Having data-enabled decisions and actions from the insights of new data.
Using the three major data sources: Machines, People, and Organizations.
14. What does the term "in situ" mean in the context of big data?
In the situation
The sensors used in airplanes to measure altitude.
Accelerometers.
Bringing the computation to the location of the data.
15. Which of the following are reasons mentioned for why data generated by people are hard to
process?
The velocity of the data is very high.
Very unstructured data.
Skilled people to analyze the data are hard to come by.
They cannot be modeled and stored.
16. ¿What is the purpose of retrieval and storage; pre-processing; and analysis in order to
convert multiple data sources into valuable data?
To enable ETL methods.
Since the multi-layered process is built into the Neo4j database connection.
Designed to work like the ETL process.
To allow scalable analytical solutions to big data.
17. Which of the following are benefits for organization generated data?
Customer Satisfaction
Better Profit Margins
Improved Safety
High Velocity
Higher Sales
18. What are data silos and why are they bad?
Highly unstructured data. Bad because it does not provide meaningful results for
organizations.
A giant centralized database to house all the data produces within an organization. Bad
because it is hard to maintain as highly structured data.
Data produced from an organization that is spread out. Bad because it creates
unsynchronized and invisible data.
A giant centralized database to house all the data production within an organization. Bad
because it hinders opportunity for data generation.
4. Of the following, which are some examples of personalized marketing related to big data?
Facebook revealing posts that cater towards similar interests.
News outlets gathering information from the internet in order to report them to the
public.
A survey that asks your age and markets to you a specific brand.
6. Which is the most compelling reason why mobile advertising is related to big data?
Mobile advertising benefits from data integration with location which requires big data.
Since almost everyone owns a cell/mobile phone, the mobile advertising market is large
and thus requires big data to contain all the information.
Mobile advertising allows massive cellular/mobile texting to a wide audience, thus
providing large amounts of data.
Mobile advertising in and of itself is always associated with big data.
10. Of the three data sources, which is the hardest to implement and streamline into a model?
Organizational Data
Machine Data
People
11. Which of the following summarizes the process of using data streams?
Integration -> Personalization -> Precision
Big Data -> Better Models -> Higher Precision
Theory -> Models -> Precise Advice
Extrapolation -> Understanding -> Reproducing
12. Where does the real value of big data often come from?
Combining streams of data and analyzing them for new insights.
Size of the data.
Having data-enabled decisions and actions from the insights of new data.
Using the three major data sources: Machines, People, and Organizations.
14. What does the term "in situ" mean in the context of big data?
In the situation
The sensors used in airplanes to measure altitude.
Accelerometers.
Bringing the computation to the location of the data.
15. Which of the following are reasons mentioned for why data generated by people are hard to
process?
The velocity of the data is very high.
Very unstructured data.
Skilled people to analyze the data are hard to come by.
They cannot be modeled and stored.
16. What is the purpose of retrieval and storage; pre-processing; and analysis in order to convert
multiple data sources into valuable data?
To enable ETL methods.
Since the multi-layered process is built into the Neo4j database connection.
Designed to work like the ETL process.
To allow scalable analytical solutions to big data.
17. Which of the following are benefits for organization generated data?
Customer Satisfaction
Better Profit Margins
Improved Safety
High Velocity
Higher Sales
18. What are data silos and why are they bad?
Highly unstructured data. Bad because it does not provide meaningful results for
organizations.
A giant centralized database to house all the data produces within an organization. Bad
because it is hard to maintain as highly structured data.
Data produced from an organization that is spread out. Bad because it creates
unsynchronized and invisible data.
A giant centralized database to house all the data production within an organization. Bad
because it hinders opportunity for data generation.
1. Amazon has been collecting review data for a particular product. They have realized that
almost 90% of the reviews were mostly a 5/5 rating. However, of the 90%, they realized that
50% of them were customers who did not have proof of purchase or customers who did not post
serious reviews about the product. Of the following, which is true about the review data
collected in this situation?
High Veracity
Low Valence
High Valence
High Volume
Low Veracity
Low Volume
2. As mentioned in the slides, what are the challenges to data with high valence?
Complex Data Exploration Algorithms
Difficult to Integrate
Reliability of Data
6. Which of the following is the best way to describe why it is crucial to process data in real-
time?
Batch processing is an older method that is not as accurate as real-time processing.
More expensive to batch process.
Prevents missed opportunities.
More accurate.
7. What are the challenges with big data that has high volume?
Storage and Accessibility
Speed Increase in Processing
Cost, Scalability, and Performance
Effectiveness and Cost
1. Which of the follow are parts of the 5 P's of data science and part of an additional P
introduced in the slides?
Purpose
Process
People
Platforms
Perception
Programmability
Product
2. Which of the following are part of the four main categories to acquire, access, and retrieve
data?
Remote Data
Web Services
Text Files
Traditional Databases
NoSQL Storage
4. Of the following, what is a technique mentioned in the videos for building a model?
Analysis
Validation
Evaluation
Investigation
5. What is the first step in finding a right problem to tackle in data science?
Ask the Right Questions
Define Goals
Assess the Situation
Define the Problem
1. Which of the following is the best description of why it is important to learn about the
foundations in big data?
Foundations allow understanding of practical concepts in Hadoop.
Understanding of practical concepts in Hadoop allows solid foundation.
Foundations stand the test of time.
Since foundations can be retained for a long time, Hadoop should be learned.
5. Which of the following are general requirements for a programming language in order to
support big data models?
Handle Fault Tolerance
Optimization of Specific Data Types
Enable Adding of More Racks
Utilize Map Reduction Methods
Support Big Data Operations
Intro to Hadoop
4. What are the two key components of HDFS and what are they used for?
NameNode for metadata and DataNode for block storage.
FASTA for genome sequence and Rasters for geospatial data.
NameNode for block storage and Data Node for metadata.
8. What are some examples of open-source tools built for Hadoop and what does it do?
Pig, for real-time and in-memory processing of big data.
Zookeeper, analyze social graphs.
Giraph, for SQL-like queries.
Zookeeper, management system for animal named related components.
9. What is the difference between low level interfaces and high level interfaces?
Low level deals with interactivity while high level deals with storage and scheduling.
Low level deals with storage and scheduling while high level deals with interactivity.
10. Which of the following are problem sto look out for when you want to integrate your project
with Hadoop?
Task Level Parallelism
Random Data Access
Advanced Alogrithms
Infrastructure Replacement
Data Level Parallelism
11. As covered in the slides, which of the following are the major goals of Hadoop?
Latency Sensitive Tasks
Enable Scalability
Handle Fault Tolerance
Provide Value for Data
Facilitate a Shared Environment
Optimized for a Variety of Data Types
13. What are the two main components for a data computation framework that were described
in the slides?
Applications Master and Container
Node Manager and Applications Master
Resource Manager and Container
Node Manager and Container
Resource Manager and Node Manager
1. El estudiante prepara un Informe sobre sus actuaciones en los Temas planteados en el Foro
Unidad 3: Fase 3 - Taller Virtual 3, Modelar y administrar un proyecto Big Data. Indicando en el
cuadro siguiente, el Tema, Numero de Participaciones, Fechas de las Participaciones, Indicador de
si estaba en los rangos permitidos, pertinencia y oportunidad de los aportes y si referenciaba el
origen de los aportes cuando no fueran de su autoría.
Casos
Cuestionarios
Video
Casos
Cuestionarios
Video
Casos
Cuestionarios
Video
Bibilografia
Unad-BDAvanzadas/U3_Caso_Material_Estudio, Recuperado de :
https://github.com/Unad-BDAvanzadas/U3_Caso_Material_Estudio
Readings: McKinsey report: http://www.mckinsey.com/business-functions/business-
technology/our-insights/big-data-the-next-frontier-for-innovation
Lectura · The WIFIRE Project: https://www.youtube.com/watch?v=0ohwGggaXZM
Reading · A Small Definition of Big Data Reading :
http://spaceanalytics.blogspot.com.co/2016/05/caracteristicas-de-big-data.html
Protección al consumidor, Por Marián Lezaun, noviembre 21, 2017, Tomado de:
https://escriturapublica.es/quien-comparte-mis-datos/