Вы находитесь на странице: 1из 8

Name : Melesse Bisema Dessie

ID: MTR/226/12
Submitted to: Dr. Vasu
1. A) List the main characteristics of big data architecture with a neat schematic
diagram.
Main Characteristics of Big Data( 5 Vs of Big data ) are as follow:

 1st V-volume:- Data Volume( 44x increase from 2009 to 2020 From 0.8 zettabytes to
35zb & Data volume is increasing exponentially)
 2nd V-velocity: Data is being generated at every minute
 3rd V-Variety: different kinds of data generated from various sources
 4th V - Veracity: uncertainties and inconsistencies in big data
 5th V - Value: Mechanism to bring correct meaning out of the data

As summery their diagram is

B) How would you show your understanding of the tools, trends and technology in big
data?

Trending Big Data Tools and Technologies

1. Hadoop

The High-availability distributed object-oriented platform, popularly known as Hadoop, is a


software framework that evaluates structured and unstructured data. Due to Hadoop, data
scaling is possible without the threat of hardware failures. It offers huge storage for a variety of
data It can virtually handle infinite coexisting ta

2.MongoDB

It is an agile, principal NoSQL, open-source document database that is cross-platform


compatible. MongoDB is famous because of its storage capacity and its role in the MEAN
software stack. It stores the document data in the binary form of JSON document, which is
otherwise the BSON type. MongoDB is mostly in use for its high scalability, obtainability, and
presentation.

others are like Hive, Spark ,Hbase, Cassandra, Kafka etc.

2. A) What are the best practices in Big Data analytics? Explain the techniques used in
Big Data Analytics.
Best practices in big data analytics
Big data analytics analyzes the collected data and find patterns from it. The velocity, veracity, variety,
and volume of data lying with organizations must be put to work to gain actionable insights out of the
same. Organizations leveraging big data analytics must thoroughly understand the best practices for big
data first to be able to use the most relevant data for analysis.

There are 7 widely used Big Data analysis techniques:

I. Association rule learning


II. Classification tree analysis
III. Genetic algorithms
IV. Machine learning
V. Regression analysis
VI. Sentiment analysis
VII. Social network analysis

B) How can u identify the companies that are using Big Data Analytics in Ethiopia?
I can identify simply by observing they are collecting data from social media , website and
comments from form to analyse their need and preference to determine and improve their
products.

3. A) What is the difference between Hadoop and Traditional RDBMS?


Differences between Hadoop and RDBMS
Unlike Relational Database Management System (RDBMS), we cannot call Hadoop a database,
but it is more of a distributed file system that can store and process a huge volume of data sets
across a cluster

of computers. Hadoop has two major components: HDFS (Hadoop Distributed File System) and
MapReduce. The former one is the storage layer of Hadoop which stores huge amounts of data.
MapReduce is primarily a programming model which can effectively process the large data sets
by converting them into different blocks of data. These blocks are distributed across the nodes on
various machines in the cluster.

However, RDBMS is a structured database approach, in which data gets stored in tables in the
forms of rows and columns. RDBMS uses SQL or Structured Query Language, which can help
update and access the data present in different tables. As in the case of Hadoop, traditional
RDBMS is not competent to be used in storage of a larger amount of data or simply big data.

Summery
B) Highlight the features of Hadoop and explain the functionalities of Hadoop cluster?
Hadoop cluster

C) What is the best hardware configuration to run Hadoop? What platform(os) and
Java version is required to run Hadoop
Minimum hardware requirements are
 1 GB hard disk
 4 GB RAM
Operation system
 Linux (oracle linux x64 or compatible processor)
Java development kit
 JDK 1.7 or higher
4. A) Explain the significances Hadoop distributed file systems and its application .
HDFS is a distributed file system that handles large data sets running on commodity hardware.
It is used to scale a single Apache Hadoop cluster to hundreds (and even thousands) of nodes.
HDFS is one of the major components of Apache Hadoop, the others being MapReduce and
YARN . HDFS should not be confused with or replaced by Apache HBase, which is a column-
oriented non-relational database management system that sits on top of HDFS and can better
support real-time data needs with its in-memory processing engine.

Application of Hadoop

B) Explain the difference between NameNode, Backup Node and Checkpoint


NameNode:-
NameNode is at the heart of the HDFS file system which manages the metadata i.e. the data of
the files is not stored on the NameNode but rather it has the directory tree of all the files present
in the HDFS file system on a hadoop cluster

Checkpoint Node :-

Checkpoint Node keeps track of the latest checkpoint in a directory that

has same structure as that of NameNode’s directory. Checkpoint node creates checkpoints for the

Namespace at regular intervals by downloading the edits and fsimage file from the NameNode
and merging it locally.

BackupNode:

Backup Node also provides check pointing functionality like that of the checkpoint node but it
also maintains its up-to-date in-memory copy of the file system namespace that is in sync with
the active NameNode.

5. A) What is commodity hardware?


Commodity Hardware: it is Computer hardware that is affordable and easy to obtain. Typically
it is a low- performance system that is IBM PC- compatible and is capable of running Microsoft
Windows, Linux, or MS-DOS without requiring any special devices or equipment.

B) How big data analysis helps businesses increase their revenue? Give example
and Name some companies that use Hadoop.
Big data analysis is helping businesses differentiate themselves – for example Walmart the
world’s largest retailer in 2014 in terms of revenue - is using big data analytics to increase its
sales through better predictive analytics, providing customized recommendations and launching
new products based on customer preferences and needs. Walmart observed a significant 10% to
15% increase in online sales for $1 billion in incremental revenue.

NB. Big data analytics provides businesses customized recommendations and suggestions.

Some companies which are using big data analysis are Amazone, BDO, Capital one, General
Electric, Netflix, Next Big Sound… etc.

6. Given the current situation on going COVID19 crisis, how could the Information
Communication Technologies and big data analytics contribute to solve it?
As we know Big data is used to extract meaningful data from massive stored data. therefore with
the combination of ICT and big data it is possible to extract who are the most vulnerable to
COVID 19 , what are different symptoms , whose age and sex group are the most exposed to this
disease , such information will be extracted from big data.

Вам также может понравиться