Вы находитесь на странице: 1из 9

CLOUD COMPUTING USING HADOOP TECHNOLOGY

DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY


SALEM
B.NARENDRA PRASATH
3rd year CSE Department,
Email:narendren.jbk@gmail.com
Abstract:
Cloud computing offers a powerful
abstraction that provides a scalable,
virtualized infrastructure as a service where
the complexity of ne-grained resource
management is hidden from the end-user.
Running data analytics applications in the
cloud on extremely large data sets is gaining
traction as the underlying infrastructure can
meet the extreme demands of scalability.
We introduce the concept of how the data
are stored and processing in cloud
computing using Big data applications of
Hadoop technology.
Keywords: Cloud computing, Big data,
Hadoop ,Map reduce and HDFC

Introduction to Cloud Computing


When you store your photos online
instead of on your home computer, or use
webmail or a social networking site, you are
using a cloud computing service. If you
are an organization, and you want to use, for
example, an online invoicing service instead
of updating the in-house one you have been
using for many years, that online invoicing
service is a cloud computing service.

S.PRAVEEN KUMAR
3rd year CSE Department,
Email:praveencse333@gmail.com
Cloud computing refers to the
delivery of computing resources over the
Internet. Instead of keeping data on your
own hard drive or updating applications for
your needs, you use a service over the
Internet, at another location, to store your
information or use its applications. Doing so
may give rise to certain privacy
implications. For that reason the Office of
the Privacy Commissioner of Canada (OPC)
has prepared some responses to Frequently
Asked Questions (FAQs). We have also
developed a Fact Sheet that provides
detailed information on cloud computing
and the privacy challenges it presents

Cloud Computing
Cloud computing is the delivery of
computing services over the Internet. Cloud
services allow individuals and businesses to
use software and hardware that are managed
by third parties at remote locations.
Examples of cloud services include online
file storage, social networking sites,
webmail, and online business applications.
The cloud computing model allows access to
information and computer resources from
anywhere that a network connection is
available. Cloud computing provides a
shared pool of resources, including data

storage
space,
networks,
computer
processing power, and specialized corporate
and
user
applications.[1]

The following definition of cloud computing


has been developed by the U.S. National
Institute of Standards and Technology
(NIST):
Cloud computing is a model for enabling
convenient, on-demand network access to a
shared pool of configurable computing
resources (e.g., networks, servers, storage,
applications, and services) that can be
rapidly provisioned and released with
minimal management effort or service
provider interaction. This cloud model
promotes availability and is composed of
five essential characteristics, three service
models, and four deployment models.[2]

TYPES OF CLOUDS
Cloud providers typically centre on
one type of cloud functionality provisioning:
Infrastructure, Platform or Software /
Application, though there is potentially no
restriction to offer multiple types at the same
time, which can often be observed in PaaS
(Platform as a Service) providers which
offer specific applications too, such as

Google App Engine in combination with


Google Docs. Due this combinatorial
capability, these types are also often referred
to
as
components.

Literature and publications typically differ


slightly in the terminologies applied. This is
mostly due to the fact that some application
areas overlap and are therefore difficult to
distinguish. As an example, platforms
typically have to provide access to resources
indirectly, and thus are sometimes confused
with infrastructures. Additionally, more
popular terms have been introduced in less
technologically centred publications.
The following list identifies the main types
of clouds (currently in use):
Infrastructure as a Service (IaaS) also
referred to as Resource Clouds, provide
(managed and scalable) resources as
services to the user in other words, they
basically provide enhanced virtualisation
capabilities.
Accordingly,
different
resources may be provided via a service
interface:
Data & Storage Clouds deal with reliable
access to data of potentially dynamic size,

weighing resource usage with access


requirements and / or quality definition

Examples: Force.com, Google App Engine,


Windows Azure (Platform).

Examples: Amazon S3, SQL Azure.

Software as a Service (SaaS), also


sometimes referred to as Service or
Application
Clouds
are
offering
implementations of specific business
functions and business processes that are
provided with specific cloud capabilities, i.e.
they provide applications / services using a
cloud infrastructure or platform, rather than
providing cloud features themselves. Often,
kind of standard application software
functionality is offered within a cloud.

Compute Clouds provide access to


computational resources, i.e. CPUs. So far,
such low-level resources cannot really be
exploited on their own, so that they are
typically exposed as part of a virtualized
environment (not to be mixed with PaaS
below), i.e. hypervisors. Compute Cloud
Providers therefore typically offer the
capability to provide computing resources
(i.e. raw access to resources unlike PaaS that
offer full software stacks to develop and
build applications), typically virtualised, in
which to execute cloudified services and
applications. IaaS (Infrastructure as a
Service) offers additional capabilities over a
simple compute service.
Examples:
Elastichosts.

Amazon

EC2,

Zimory,

Platform as a Service (PaaS), provide


computational resources via a platform upon
which applications and services can be
developed and hosted. PaaS typically makes
use of dedicated APIs to control the
behaviour of a server hosting engine which
executes and replicates the execution
according to user requests (e.g. access rate).
As each provider exposes his / her own API
according
to the respective key capabilities,
applications developed for one specific
cloud provider cannot be moved to another
cloud host there are however attempts to
extend generic programming models with
cloud capabilities (such as MS Azure).

Examples: Google Docs, Salesforce CRM,


SAP Business by Design.
Overall, Cloud Computing is not restricted
to Infrastructure / Platform / Software as a
Service systems, even though it provides
enhanced capabilities which act as (vertical)
enablers to these systems. As such, I/P/SaaS
can be considered specific usage patterns
for cloud systems which relate to models
already approached by Grid, Web Services
etc. Cloud systems are a promising way to
implement these models and extend them
further. [1][2].
DEPLOYMENT TYPES
Similar to P/I/SaaS, clouds may be
hosted and employed in different fashions,
depending on the use case, respectively the
business model of the provider. So far, there
has been a tendency of clouds to evolve
from private, internal solutions (private
clouds) to manage the local infrastructure
and the amount of requests e.g. to ensure
availability of highly requested data. This is
due to the fact that data centres initiating

cloud capabilities made use of these features


for internal purposes before considering
selling the capabilities publicly (public
clouds). Only now that the providers have
gained confidence in publication and
exposition of cloud features do the first
hybrid solutions emerge. This movement
from private via public to combined
solutions is often considered a natural
evolution of such systems, though there is
no reason for providers to not start up with
hybrid solutions, once the necessary
technologies have reached a mature enough
position.
We can hence distinguish between the
following deployment types:
Private Clouds are typically owned by the
respective enterprise and / or leased.
Functionalities are not directly exposed to
the customer, though in some cases services
with cloud enhanced features may be offered
this is similar to (Cloud) Software as a
Service from the customer point of view.
Example: eBay.
Public Clouds. Enterprises may use cloud
functionality from others, respectively offer
their own services to users outside of the
company. Providing the user with the actual
capability to exploit the cloud features for
his / her own purposes also allows other
enterprises to outsource their services to
such cloud providers, thus reducing costs
and effort to build up their own
infrastructure. As noted in the context of
cloud types, the scope of functionalities
thereby may differ.

Example: Amazon, Google Apps, Windows


Azure.
Hybrid Clouds. Though public clouds
allow enterprises to outsource parts of their
infrastructure to cloud providers, they at the
same time would lose control over the
resources and the distribution / management
of code and data. In some cases, this is not
desired by the respective enterprise. Hybrid
clouds consist of a mixed employment of
private and public cloud infrastructures so as
to achieve a maximum of cost reduction
through outsourcing whilst maintaining the
desired degree of control over e.g. sensitive
data by employing local private clouds.
There are not many hybrid clouds actually in
use today, though initial initiatives such as
the one by IBM and Juniper already
introduce base technologies for their
realization
Community Clouds. Typically cloud
systems are restricted to the local
infrastructure, i.e. providers of public clouds
offer their own infrastructure to customers.
Though the provider could actually resell the
infrastructure of another provider, clouds do
not aggregate infrastructures to build up
larger, cross-boundary structures. In
particular smaller SMEs could profit from
community clouds to which different entities
contribute with their respective (smaller)
infrastructure. Community clouds can either
aggregate public clouds or dedicated
resource infrastructures.
We may thereby distinguish between private
and public community clouds. For example
smaller organizations may come together
only to pool their resources for building a

private community cloud. As opposed to


this, resellers such as Zimory may pool
cloud resources from different providers and
resell them.
Community Clouds as such are still just a
vision, though there are already indicators
for such development, e.g. through Zimory
and RightScale. Community clouds show
some overlap with GRIDs technology (see
e.g. Reservoir ).
Special Purpose Clouds. In particular IaaS
clouds originating from data centres have a
general purpose appeal to them, as their
according capabilities can be equally used
for a wide scope of use cases and customer
types. As opposed to this, PaaS clouds tend
to provide functionalities more specialized
to specific use cases, which should not be
confused with proprietariness of the
platform: specialization implies providing
additional, use case specific methods, whilst
proprietary data implies that structure of
data and interface are specific to the
provider.
Specialized functionalities are provided e.g.
by the Google App Engine which provides
specific capabilities dedicated to distributed
document management. Similar to general
service provisioning (web based or not), it
can be expected that future systems will
provide even more specialized capabilities
to attract individual user areas, due to
competition, customer demand and available
expertise.
Special Purpose Clouds are just extensions
of normal cloud systems to provide
additional, dedicated capabilities. The basis
of such development is already visible.[2]

Why cloud services are popular


Cloud services are popular because they can
reduce the cost and complexity of owning
and operating computers and networks.
Since cloud users do not have to invest in
information
technology
infrastructure,
purchase hardware, or buy software
licences, the benefits are low up-front costs,
rapid return on investment, rapid
deployment, customization, flexible use, and
solutions that can make use of new
innovations. In addition, cloud providers
that have specialized in a particular area
(such as e-mail) can bring advanced services
that a single company might not be able to
afford or develop. Some other benefits to
users include scalability, reliability, and
efficiency. Scalability means that cloud
computing offers unlimited processing and
storage capacity. The cloud is reliable in that
it enables access to applications and
documents anywhere in the world via the
Internet. Cloud computing is often
considered efficient because it allows
organizations to free up resources to focus
on innovation and product development.
Another potential benefit is that personal
information may be better protected in the
cloud. Specifically, cloud computing may
improve efforts to build privacy protection
into technology from the start and the use of
better
security
mechanisms.
Cloud
computing will enable more flexible IT
acquisition and improvements, which may
permit adjustments to procedures based on
the sensitivity of the data. Widespread use of
the cloud may also encourage open
standards for cloud computing that will
establish baseline data security features
common across different services and

providers. Cloud computing may also allow


for better audit trails. In addition,
information in the cloud is not as easily lost
(when compared to the paper documents or
hard drives, for example). Potential privacy
risks while there are benefits, there are
privacy and security concerns too. Data is
travelling over the Internet and is stored in
remote locations. In addition, cloud
providers often serve multiple customers
simultaneously. All of this may raise the
scale of exposure to possible breaches, both
accidental and deliberate. Concerns have
been raised by many that cloud computing
may lead to function creep uses of data
by cloud providers that were not anticipated
when the information was originally
collected and for which consent has
typically not been obtained. Given how
inexpensive it is to keep data, there is little
incentive to remove the information from
the cloud and more reasons to find other
things to do with it.
Security issues, the need to segregate data
when dealing with providers that serve
multiple customers, potential secondary uses
of the datathese are areas that
organizations should keep in mind when
considering a cloud provider and when
negotiating contracts or reviewing terms of
service with a cloud provider. Given that
the
organization
transferring
this
information to the provider is ultimately
accountable for its protection, it needs to
ensure that the personal information is
appropriate handled. [3]
The Best Cloud Computing Companies
And CEOs To Work For In 2014

How to Data Stored in Cloud


The data are stored using the Big
Data Technology.
Big Data:
Big data which admittedly
means many things to many people is no
longer confined to the realm of technology.
Today it is a business priority, given its
ability to profoundly affect commerce in the
globally integrated economy. In addition to
providing solutions to long-standing
business challenges, big data inspires new
ways to transform processes, organizations,
entire industries and even society itself. In
our paper we described what is meant by big
data and its types dramatically. We also
described how to build a platform for big
data. In this paper we also suggested two
current running solutions with big data i.e.,
Big Data Strategy and Oracle Big Data
Solutions. The big data strategy provides a
plan for implementing big data in a right
manner and provides opportunities for

Australian government agencies and for


future work too. Whereas Oracle Big Data
Solutions uses Hadoop and NoSQL
Database technologies for implementing Big
Data.

What is Apache Hadoop?

Large scale, open source software


framework
Yahoo! has been the largest
contributor to date
Dedicated to scalable, distributed,
data-intensive
computing
Handles thousands of nodes and
petabytes of data
Supports applications under a free
license .

3 Hadoop subprojects:
Hadoop
Common:
utilities package

common

HFDS: Hadoop Distributed File


System with high throughput access
to application data
MapReduce:
A
software
framework for distributed processing
of large data sets on computer
clusters[4]

Hadoop MapReduce

MapReduce is a programming model and


software framework first developed by
Google (Googles MapReduce paper
submitted in 2004)
Intended to facilitate and simplify the
processing of vast amounts of data in
parallel on large clusters of commodity
hardware in a reliable, fault-tolerant
manner
Petabytes of data
Thousands of nodes
Computational
both:

processing

occurs

Unstructured data : filesystem


Structured data : database[4]

on

blocks across machines in a large


cluster
Reliability and fault tolerance
ensured by replicating data across
multiple hosts Has data awareness
between nodes Designed to be
deployed on low-cost hardware.
Assumptions and Goals

Hadoop Distributed File System


(HFDS)

Hardware Failure
Streaming Data Access
Large Data Sets
Moving Computation is Cheaper
than Moving Data.

More on Hadoop file systems


Hadoop can work directly with any
distributed file system which can be
mounted by the underlying OS
However, doing this means a loss of
locality as Hadoop needs to know which
servers are closest to the data.
Hadoop-specific file systems like HFDS
are developed for locality, speed, fault
tolerance, integration with Hadoop, and
reliability.[5]

What are Hadoop/MapReduce


limitations?
Inspired by Google File System
Scalable,
distributed,
portable
filesystem written in Java for
Hadoop framework
Primary distributed storage used by
Hadoop applications HFDS can be
part of a Hadoop cluster or can be a
stand-alone
general
purpose
distributed file system An HFDS
cluster
primarily
consists
of
NameNode that manages file system
metadata DataNode that stores
actual data Stores very large files in

Cannot control the order in which the maps


or
reductions are run For maximum
parallelism, you need Maps and Reduces to
not depend on data generated in the same
MapReduce job (i.e. stateless)
A database with an index will always be
faster than a MapReduce job on unindexed
data .
Reduce operations do not take place until
all Maps are complete (or have failed then
been skipped)
General assumption that the output of
Reduce is smaller than the input to Map;

large data source used to generate smaller


final values.[5]

Whos using it?


Lots of companies!
Yahoo!, AOL, eBay, Facebook,
IBM, Last.fm, LinkedIn, The New
York Times, Ning, Twitter, and more
In 2007 IBM and Google announced
an initiative to use Hadoop to
support university courses in
distributed computer programming
In 2008 this collaboration and the
Academic
Cloud
Computing
Initiative were funded by the NSF
and
produced
the
Cluster
Exploratory Program (CLuE).[6][7]

Conclusion
As we disguised in this paper it
shows that how Cloud Computing provides
us in various aspects. Analyzing new and
diverse digital data streams can reveal new
sources of economic value, provide fresh
insights into customer behaviour and
identify market trends early on. But this
influx of new data creates challenges for IT
departments. To derive real business value
from big data, you need the right tools to
capture and organize a wide variety of data
types from different sources, and to be able
to easily analyze it within the context of all
your enterprise data..Hadoop is a large scale,
open source software framework dedicated
to scalable, distributed, data-intensive
computing. The framework breaks up large
data into smaller parallelizable chunks and
handles scheduling Maps each piece to an
intermediate value Reduces intermediate
values to a solution User-specified partition
and combiner options Fault tolerant,
reliable, and supports thousands of nodes
and petabytes of data If you can rewrite
algorithms into Maps and Reduces, and your

problem can be broken up into small pieces


solvable in parallel, then Hadoops
MapReduce is the way to go for a
distributed problem solving approach to
large datasets .

REFERENCES:
[1]www.priv.gc.ca
[2]www.cse.buffalo.edu/~bina/CloudComp
utingJun28.
[3]
Hadoop
Distributed
Filesystem.
http://hadoop.apache.org.
[4]http://www.forbes.com/sites/louiscolumb
us/2014/02/24/the-best-cloud-computingcompanies-and-ceos-to-work-for-in-2014/
[5]http://www.javacodegeeks.com/2012/05/
mapreduce-for-dummies.html
[6]HDFS
Java
API:
http://hadoop.apache.org/core/docs/current/a
pi/
[7]HDFS
source
code:
http://hadoop.apache.org/core/version_contr
ol.html

Вам также может понравиться