Вы находитесь на странице: 1из 25

Introduction to Hadoop

Doug Cutting, Mike Cafarella and team took the solution


provided by Google and started an Open Source Project
called HADOOP in 2005 and Doug named it after his
son's toy elephant. Now Apache Hadoop is a registered
trademark of the Apache Software Foundation.
Hadoop runs applications using the Map Reduce algorithm,
where the data is processed in parallel on different CPU
nodes. In short, Hadoop framework is capable enough to
develop applications capable of running on clusters of
computers and they could perform complete statistical
analysis for a huge amounts of data.
Architecture of Hadoop
Hadoop is an open source ,java based
programming framework that supports
the processing and storage of extremely
large data sets in a distributed computing
environment. It is the part of the Apache
project sponsored by the Apache
Software Foundation.
Map Reducer
The MapReduce Framework is the software
layer implementing the MapReduce paradigm.
HDFS
The Hadoop Distributed File System(HDFS)
is a sub project of the Apache Hadoop
project.This Apache software foundation
project is designed to run on commodity
hardware.
According to the Apache software
foundation the primary objectives of HDFS
is stored data reliably even in the presence
of failures including Namenode
failures,Datanode failures and Network
partition.
HDFS cluster
HDFS client
The YARN Infrastructure (Yet another
Resource Negotiator) is the framework
responsible for providing the computational
resources (e.g., CPUs, memory, etc.) needed for
application executions.
The Resource Manager (one per cluster) is
the master. It knows where the slaves are
located (Rack Awareness) and how many
resources they have. It runs several services, the
most important is the Resource
Scheduler which decides how to assign the
resources.
JAQL
JAQL is a functional data processing and
query lanaguage most commonlu used for
JSON query processing on Bigdata.It
started as an open source project at
Google but the latest release was on
7/12/2010.IBM took it over as primary
data processing language for their
Hadoop software package Big Insights.
JAQL
HIVE
Hive has three main functions :Data
summarization, query and analysis. It
supports the queries expressed in a
lanaguage called HiveQL, which
automatically translates SQL like queries
into Mapreduce jobs executed on
Hadoop.
HIVE
PIG
PIG is high level platform for creating
programs that runs on Apache Hadoop.
Pig Latin abstracts the programming from
the java Mapreduce idiom into a notation
which makes Mapreduce programming
high level ,similar to that SQL for
relational database management system.
PIG
JSON
Our three key players are Hadoop, thr
defacto distributed batch data processing
platform,JSON a ubiquitos data format
and kafka which us becoming the system
of choice for transporting streams of
data.
Cloud computing
What is cloud? Applications

Networking Databases Services


Cloud Service models
IaaS
Infrastructure as a service
Provision servers
Storage
Networking resources

PaaS
Platform as a service
Middleware platform
Solution stack
Both accessible over a network

SaaS
Software as a service
Software
Applications
Or services that are delivered over a network
Infrastructure as a service(Iaas)
architecture
An infrastructure provider (IP) makes an entire computing
infrastructure available as a service

Manages a large pool of computing resources and uses


virtualization to assign and dynamically resize customer resources
Customers rent processing capacity, memory, data storage, and
networking resources that are provisioned over a network
Platform as a service (Paas)
architecture
Service provider (SP) supplies the software platform or
middleware where the applications run
Service user is responsible for the creation, updating, and
maintenance of the application.

The sizing of the hardware that is required for the execution


of the software is made in an understandable manner
Software as a service (Saas)
architecture
Service provider (SP) is responsible for the creation,
updating, and maintenance of software and application

Service user accesses the service through Internet-


based interfaces
V-Care Project
To design an application which can help the
hospital in managing the patient records and
also help the patient-caretaker by tracking
patient health records and medication.
The caretaker is able to update the
information about the patient,track anytime
on the last update by doctor in the
hospital,get alerts when the patients data is
updated by the hospital.Alert if the
medication time has passed and update has
not taken place.
Purpose of the project
Enhance the helping hand.
Introduce Hospital-caretaker
transparency.
Live tracking of the patients health and
medication
services being provided by the hospital.
Act as the patient caretaker's .
Scope of the project
Manage Patient Records: It can manage all
the records of the patients. It stores the
records and can evaluate some data
automatically.
Health Tracker: It will keep track of the
patients health and will inform the care taker
after every health update.

Medicine Tracker: It will keep track of the


medication routine of the patient.
Login User interface
Doctor sign up/login

Вам также может понравиться