Академический Документы
Профессиональный Документы
Культура Документы
ID: MTR/226/12
Submitted to: Dr. Vasu
1. A) List the main characteristics of big data architecture with a neat schematic
diagram.
Main Characteristics of Big Data( 5 Vs of Big data ) are as follow:
1st V-volume:- Data Volume( 44x increase from 2009 to 2020 From 0.8 zettabytes to
35zb & Data volume is increasing exponentially)
2nd V-velocity: Data is being generated at every minute
3rd V-Variety: different kinds of data generated from various sources
4th V - Veracity: uncertainties and inconsistencies in big data
5th V - Value: Mechanism to bring correct meaning out of the data
B) How would you show your understanding of the tools, trends and technology in big
data?
1. Hadoop
2.MongoDB
2. A) What are the best practices in Big Data analytics? Explain the techniques used in
Big Data Analytics.
Best practices in big data analytics
Big data analytics analyzes the collected data and find patterns from it. The velocity, veracity, variety,
and volume of data lying with organizations must be put to work to gain actionable insights out of the
same. Organizations leveraging big data analytics must thoroughly understand the best practices for big
data first to be able to use the most relevant data for analysis.
B) How can u identify the companies that are using Big Data Analytics in Ethiopia?
I can identify simply by observing they are collecting data from social media , website and
comments from form to analyse their need and preference to determine and improve their
products.
of computers. Hadoop has two major components: HDFS (Hadoop Distributed File System) and
MapReduce. The former one is the storage layer of Hadoop which stores huge amounts of data.
MapReduce is primarily a programming model which can effectively process the large data sets
by converting them into different blocks of data. These blocks are distributed across the nodes on
various machines in the cluster.
However, RDBMS is a structured database approach, in which data gets stored in tables in the
forms of rows and columns. RDBMS uses SQL or Structured Query Language, which can help
update and access the data present in different tables. As in the case of Hadoop, traditional
RDBMS is not competent to be used in storage of a larger amount of data or simply big data.
Summery
B) Highlight the features of Hadoop and explain the functionalities of Hadoop cluster?
Hadoop cluster
C) What is the best hardware configuration to run Hadoop? What platform(os) and
Java version is required to run Hadoop
Minimum hardware requirements are
1 GB hard disk
4 GB RAM
Operation system
Linux (oracle linux x64 or compatible processor)
Java development kit
JDK 1.7 or higher
4. A) Explain the significances Hadoop distributed file systems and its application .
HDFS is a distributed file system that handles large data sets running on commodity hardware.
It is used to scale a single Apache Hadoop cluster to hundreds (and even thousands) of nodes.
HDFS is one of the major components of Apache Hadoop, the others being MapReduce and
YARN . HDFS should not be confused with or replaced by Apache HBase, which is a column-
oriented non-relational database management system that sits on top of HDFS and can better
support real-time data needs with its in-memory processing engine.
Application of Hadoop
Checkpoint Node :-
has same structure as that of NameNode’s directory. Checkpoint node creates checkpoints for the
Namespace at regular intervals by downloading the edits and fsimage file from the NameNode
and merging it locally.
BackupNode:
Backup Node also provides check pointing functionality like that of the checkpoint node but it
also maintains its up-to-date in-memory copy of the file system namespace that is in sync with
the active NameNode.
B) How big data analysis helps businesses increase their revenue? Give example
and Name some companies that use Hadoop.
Big data analysis is helping businesses differentiate themselves – for example Walmart the
world’s largest retailer in 2014 in terms of revenue - is using big data analytics to increase its
sales through better predictive analytics, providing customized recommendations and launching
new products based on customer preferences and needs. Walmart observed a significant 10% to
15% increase in online sales for $1 billion in incremental revenue.
NB. Big data analytics provides businesses customized recommendations and suggestions.
Some companies which are using big data analysis are Amazone, BDO, Capital one, General
Electric, Netflix, Next Big Sound… etc.
6. Given the current situation on going COVID19 crisis, how could the Information
Communication Technologies and big data analytics contribute to solve it?
As we know Big data is used to extract meaningful data from massive stored data. therefore with
the combination of ICT and big data it is possible to extract who are the most vulnerable to
COVID 19 , what are different symptoms , whose age and sex group are the most exposed to this
disease , such information will be extracted from big data.