Вы находитесь на странице: 1из 2

Q1. What is Big Data? Why should anyone care?

Q2. Describe the 4V model of Big Data.

Q3. What are the major technological challenges in managing Big Data?

Q4: What are the technologies available to manage Big Data?

Q5. What kind of analyses can be done on Big Data?

Q6: Why did people pay attention to Hadoop/ MapReduce when it was introduced? Liberty Stores
Case Exercise:

Q1: What are the major sources of Big Data? Describe a source of each type.

Q2: What are the three major types of Big Data applications? Describe two applications of each type.
Q3: Would it be ethical to arrest someone based on a Big Data Model’s prediction of that person
likely to commit a crime?

Q4: An auto insurance company learned about the movements of a person based on the GPS
installed in the vehicle. Would it be ethical to use that as a surveillance tool?

Q5: Research can describe a Big Data application that has a proven return on investment for an
organization.

Q1: Describe the Big Data processing architecture.

Q2: What are Google’s contributions to Big data processing?

Q3: What are some of the hottest technologies visible in Big Data processing?

Q1: How does Hadoop differ from a traditional file system?

Q2: What are the design goals for HDFS?

Q3: How does HDFS ensure security and integrity of data?

Q4: How does a master node differ from the worker node?

Q1: What is MapReduce ? What are its benefits?

Q2: What is the key-value pair format? How is it different from other data structures? What are its
benefits? And limitations.

Q3: What is a Job tracker program? How does it differ for the task tracker program?

Q4: What are Hive and Pig? How are they different?

Q1: What is a NoSQL database? What are the different types of it?

Q2: How does a NoSQL database leverage the power of MapReduce?

Q3: what are different kinds of NoSQL databases? What are the advantages of each?

Q4: What are the similarities and differences between Hive and Pig?

Q1: Describe the Apache Spark ecosystem.

Q2: Compare Spark and Hadoop in terms of their ability to do stream computing?
Q3: What is an RDD? How does it make Spark faster?

Q4: Describe three major capabilities in Spark for data analytics.

Q1: What is a data ingest system? Why is it an important topic?

Q2: What are the two ways of delivering data from many sources to many targets?

Q3: What is Kafka? What are its advantages? Describe 3 use cases of Kafka.

Q4: What is a topic? How does it help with data ingest management?

Q1: Describe Cloud Computing model.

Q2: What are the advantages of cloud computing over in-house computing

Q3: Describe the technical architecture for Cloud computing.

Q4: Name a few major providers of cloud computing services.

What is data mining? What are supervised and unsupervised learning techniques?

Describe the key steps in the data mining process. Why is it important to follow these processes?

What is a confusion matrix?

Why is data preparation so important and time consuming?

What are some of the most popular data mining techniques?

How is mining Big data different from traditional data mining?

Вам также может понравиться