Вы находитесь на странице: 1из 12

SECURE DATA DEDUPLICATION WITH DYNAMIC

OWNERSHIP MANAGEMENT IN CLOUD STORAGE

ABSTRACT
In cloud storage services, deduplication technology is commonly used to reduce the
space and bandwidth requirements of services by eliminating redundant data and storing only
a single copy of them. Deduplication is most effective when multiple users outsource the
same data to the cloud storage, but it raises issues relating to security and ownership. Proof
of ownership schemes allow any owner of the same data to prove to the cloud storage server
that he owns the data in a robust way. However, many users are likely to encrypt their data
before outsourcing them to the cloud storage to preserve privacy, but this hampers
deduplication because of the randomization property of encryption. Recently, several
deduplication schemes have been proposed to solve this problem by allowing each owner to
share the same encryption key for the same data. However, most of the schemes suffer from
security flaws, since they do not consider the dynamic changes in the ownership of
outsourced data that occur frequently in a practical cloud storage service. In this paper, we
propose a novel server-side deduplication scheme for encrypted data. It allows the cloud
server to control access to outsourced data even when the ownership changes dynamically by
exploiting randomized convergent encryption and secure ownership group key distribution.
This prevents data leakage not only to revoked users even though they previously owned that
data, but also to an honest-but-curious cloud storage server. In addition, the proposed scheme
guarantees data integrity against any tag inconsistency attack. Thus, security is enhanced in
the proposed scheme. The efficiency analysis results demonstrate that the proposed scheme is
almost as efficient as the previous schemes, while the additional computational overhead is
negligible.

CHAPTER 1
INTRODUCTION

The term cloud computing is a recent buzzword in the IT world. Behind this fancy
poetic phrase there lies a true picture of the future of computing for both in technical
perspective and social perspective. Though the term Cloud Computing is recent but the
idea of centralizing computation and storage in distributed data centers maintained by third
party companies is not new but it came in way back in 1990s along with distributed
computing approaches like grid computing. Cloud computing is aimed at providing IT as a
service to the cloud users on-demand basis with greater flexibility, availability, reliability and
scalability with utility computing model.

The origin of cloud computing can be seen as an evolution of grid computing


technologies. The term Cloud computing was given prominence first by Googles CEO Eric
Schmidt in late 2006. So the birth of cloud computing is very recent phenomena although its
root belongs to some old ideas with new business, technical and social perspectives. From the
architectural point of view cloud is naturally build on an existing grid based architecture and
uses the grid services and adds some technologies like virtualization and some business
models. In brief cloud is essentially a bunch of commodity computers networked together in
same or different geographical locations, operating together to serve a number of customers
with different need and workload on demand basis with the help of virtualization.

Cloud Computing provides us a means by which we can access the applications as


utilities, over the Internet. It allows us to create, configure, and customize applications online.
The term Cloud refers to a Network or Internet. Cloud Computing refers to manipulating,
configuring, and accessing the applications online. It offers online data storage, infrastructure
and application. Cloud can provide services over network, i.e., on public networks or on

private networks, i.e., WAN, LAN or VPN. Applications such as e-mail, web conferencing,
customer relationship management (CRM),all run in cloud.
Basic Concepts There are certain services and models working behind the scene
making the cloud computing feasible and accessible to end users. Following are the working
models for cloud computing:

Deployment Models
Service Models

1.1 DEPLOYMENT MODELS

Deployment models define the type of access to the cloud can have any of the four
types of access: Public, Private, Hybrid and Community. The Public Cloud allows systems
and services to be easily accessible to the general public. Public cloud may be less secure
because of its openness, e.g., e-mail. The Private Cloud allows systems and services to be
accessible within an organization. It offers increased security because of its private nature.
The Community Cloud allows systems and services to be accessible by group of
organizations. The Hybrid Cloud is mixture of public and private cloud. However, the critical
activities are performed using private cloud while the non-critical activities are performed
using public cloud.

1.2 SERVICE
MODELS

Service
Models are the
reference
models on which the Cloud Computing is based. These can be categorized into three basic
service models as listed below:

Infrastructure as a Service (IaaS)


Platform as a Service (PaaS)
Software as a Service (SaaS)

There are many other service models all of which can take the form like XaaS, i.e., Anything
as a Service. This can be Network as a Service, Business as a Service, Identity as a Service,
Database as a Service or Strategy as a Service. The Infrastructure as a Service (IaaS) is the
most basic level of service. Each of the service models make use of the underlying service
model, i.e., each inherits the security and management mechanism from the underlying
model, as shown in the following diagram:

1.1.1INFRASTRUCTURE AS A SERVICE (IaaS)


IaaS provides access to fundamental resources such as physical machines, virtual
machines, virtual storage, etc. In an IaaS model, a third-party provider hosts hardware,
software, servers, storage and other infrastructure components on behalf of its users. IaaS
providers also host users' applications and handle tasks including system maintenance,
backup and resiliency planning.
IaaS platforms offer highly scalable resources that can be adjusted on-demand. This makes
IaaS well-suited for workloads that are temporary, experimental or change unexpectedly.
Other characteristics of IaaS environments include the automation of administrative tasks,
dynamic scaling, desktop virtualization and policy-based services. IaaS customers pay on a
per-use basis, typically by the hour, week or month. Some providers also charge customers
based on the amount of virtual machine space they use. This pay-as-you-go model eliminates
the capital expense of deploying in-house hardware and software. However, users should
monitor their IaaS environments closely to avoid being charged for unauthorized services.
Because IaaS providers own the infrastructure, systems management and monitoring may
become more difficult for users. Also, if an IaaS provider experiences downtime, users'

workloads may be affected.For example, if a business is developing a new software product,


it might be more cost-effective to host and test the application through an IaaS provider. Once
the new software is tested and refined, it can be removed from the IaaS environment for a
more traditional in-house deployment or to save money or free the resources for other
projects. Leading IaaS providers include Amazon Web Services (AWS), Windows Azure,
Google Compute Engine, Rackspace Open Cloud, and IBM SmartCloud Enterprise.
1.1.2PLATFORM AS A SERVICE (PaaS)

PaaS provides the runtime environment for applications, development & deployment
tools, etc. In a PaaS model, a cloud provider delivers hardware and software tools usually
those needed for application development -- to its users as a service. A PaaS provider hosts
the hardware and software on its own infrastructure. As a result, PaaS frees users from having
to install in-house hardware and software to develop or run a new application.

PaaS does not typically replace a business' entire infrastructure. Instead, a business
relies on PaaS providers for key services, such as Java development or application hosting.
For example, deploying a typical business tool locally might require an IT team to buy and
install hardware, operating systems, middleware (such as databases, Web servers and so on)
the actual application, define user access or security, and then add the application to existing
systems management or application performance monitoring (APM) tools. IT teams must
then maintain all of these resources over time. A PaaS provider, however, supports all the
underlying computing and software; users only need to log in and start using the platform
usually through a Web browser interface. Most PaaS platforms are geared toward software
development, and they offer developers several advantages. For example, PaaS allows
developers to frequently change or upgrade operating system features. It also helps
development teams collaborate on projects.

Users typically access PaaS through a Web browser. PaaS providers then charge for that
access on a per-use basis. Some PaaS providers charge a flat monthly fee to access the
platform and the apps hosted within it. It is important to discuss pricing, service uptime and

support with a PaaS provider before engaging their services. Since users rely on a provider's
infrastructure and software, vendor lock-in can be an issue in PaaS environments. Other risks
associated with PaaS are provider downtime or a provider changing its development
roadmap. If a provider stops supporting a certain programming language, users may be forced
to change their programming language, or the provider itself. Both are difficult and disruptive
steps. Common PaaS vendors include Salesforce.com's Force.com, which provides an
enterprise customer relationship management (CRM) platform. PaaS platforms for software
development and management include Appear IQ, Mendix, Amazon Web Services (AWS)
Elastic Beanstalk, Google App Engine and Heroku.
1.1.3 SOFTWARE AS A SERVICE (SaaS)

SaaS model allows to use software applications as a service to end users. SaaS
removes the need for organizations to install and run applications on their own computers or
in their own data centers. This eliminates the expense of hardware acquisition, provisioning
and maintenance, as well as software licensing, installation and support. Other benefits of the
SaaS model include:

Flexible payments: Rather than purchasing software to install, or additional hardware


to support it, customers subscribe to a SaaS offering. Generally, they pay for this service on a
monthly basis using a pay-as-you-go model. Transitioning costs to a recurring operating
expense allows many businesses to exercise better and more predictable budgeting. Users can
also terminate SaaS offerings at any time to stop those recurring costs.
Scalable usage: Cloud services like SaaS offer high scalability, which gives
customers the option to access more, or fewer, services or features on-demand.
Automatic updates: Rather than purchasing new software, customers can rely on a
SaaS provider to automatically perform updates and patch management. This further reduces
the burden on in-house IT staff.

Accessibility and persistence: Since SaaS applications are delivered over the
Internet, users can access them from any Internet-enabled device and location.
But SaaS also poses some potential disadvantages. Businesses must rely on outside
vendors to provide the software, keep that software up and running, track and report accurate
billing and facilitate a secure environment for the business' data. Providers that experience
service disruptions, impose unwanted changes to service offerings, experience a security
breach or any other issue can have a profound effect on the customers' ability to use those
SaaS offerings. As a result, users should understand their SaaS provider's service-level
agreement, and make sure it is enforced. SaaS is closely related to the ASP (application
service provider) and on demand computing software delivery models. The hosted
application management model of SaaS is similar to ASP: the provider hosts the customers
software and delivers it to approved end users over the internet. In the software on
demand SaaS model, the provider gives customers network-based access to a single copy of
an application that the provider created specifically for SaaS distribution. The applications
source code is the same for all customers and when new features are functionalities are rolled
out, they are rolled out to all customers. Depending upon the service level agreement (SLA),
the customers data for each model may be stored locally, in the cloud or both locally and in
the cloud.
The

organizations

can

integrate

SaaS

applications

with

other

software

using application programming interfaces (APIs). For example, a business can write its own
software tools and use the SaaS provider's APIs to integrate t hose tools with the SaaS
offering. There are SaaS applications for fundamental business technologies, such as email,
sales management, customer relationship management (CRM), financial management, human
resource management, billing and collaboration. Leading SaaS providers include Salesforce,
Oracle, SAP, Intuit and Microsoft.

The concept of cloud computing came into existence in 1950 with implementation of
mainframe computers, accessible via thin/static clients. Since then, cloud computing has been
evolved from static clients to dynamic ones from software to services. The benefit of Cloud
Computing has numerous advantages. Some of them are, one can access applications as

utilities, over the Internet. Manipulate and configure the application online at any time. It
does not require installing a specific piece of software to access or manipulating cloud
application. Cloud Computing offers online development and deployment tools,
programming runtime environment through Platform as a Service model. Cloud resources
are available over the network in a manner that provides platform independent access to any
type of clients. Cloud Computing offers on-demand self-service. The resources can be used
without interaction with cloud service provider. Cloud Computing is highly cost effective
because it operates at higher efficiencies with greater utilization. It just requires an Internet
connection. Cloud Computing offers load balancing that makes it more reliable.

Cloud users use these services provided by the cloud providers and build their
applications in the internet and thus deliver them to their end users. So the cloud users dont
have to worry about installing, maintaining hardware and software needed. And they also can
afford these services as they have to pay as much they use. So the cloud users can reduce
their expenditure and effort in the field of IT using cloud services instead of establishing IT
infrastructure themselves. Cloud is essentially provided by large distributed data centers.
These data centers are often organized as grid and the cloud is built on top of the grid
services. Cloud users are provided with virtual images of the physical machines in the data
centers. This virtualization is one of the key concept of cloud computing as it essentially
builds the abstraction over the physical system. Many cloud applications are gaining
popularity day by day for their availability, reliability, scalability and utility model. These
applications made distributed computing easy as the critical 4 Cloud Computing aspects are
handled by the cloud provider itself.

Cloud computing is growing now-a-days in the interest of technical and business


organizations but this can also be beneficial for solving social issues. In the recent time EGovernance is being implemented in developing countries to improve efficiency and
effectiveness of governance. This approach can be improved much by using cloud computing
instead of traditional ICT. In India, economy is agriculture based and most of the citizens live
in rural areas. The standard of living, agricultural productivity etc can be enhanced by

utilizing cloud computing in a proper way. Both of these applications of cloud computing
have technological as well as social challenges to overcome.

Cloud computing provides scalable, low-cost, and location-independent online


services ranging from simple backup services to cloud storage infrastructures. The fast
growth of data volumes stored in the cloud storage has led to an increased demand for
techniques for saving disk space and network bandwidth. To reduce resource consumption,
many cloud storage services, such as Dropbox [1], Wuala[2], Mozy [3], and Google Drive
[4], employ a deduplication technique, where the cloud server stores only a single copy of
redundant data and provides links to the copy instead of storing other actual copies of that
data, regardless of how many clients ask to store the data. The savings are significant [5], and
reportedly, business applications can achieve disk and bandwidth savings of more than 90%
[6]. However, from a security perspective, the shared usage of users data raises a new
challenge. As customers are concerned about their private data, they may encrypt their data
before outsourcing in order to protect data privacy from unauthorized outside adversaries, as
well as from the cloud service provider [7],[8],[9]. This is justified by current security trends
and numerous industry regulations such as PCI DSS [10]. However, conventional encryption
makes deduplication impossible for the following reason. Deduplication techniques take
advantage of data similarity to identify the same data and reduce the storage space. In
contrast, encryption algorithms randomize the encrypted files in order to make ciphertext
indistinguishable from theoretically random data. Encryptions of the same data by different
users with different encryption keys results in different ciphertexts, which make it difficult for
the cloud server to determine whether the plain data are the same and deduplicate them. Say a
user 1 encrypts a file M under secret key Sk A and stores its corresponding ciphertext CA .
Another user 2 would store CB, which is the encryption of M under his secret key skB. Then,
two issues arise:

How can the cloud server detect that the underlying file M is the same, and
Even if it can detect this, how can it allow both parties to recover the stored
data, based on their separate secret keys.

Straightforward client side encryption that is secure against a chosen-plaintext attack


with randomly chosen encryption keys prevents deduplication [11],[12].One naive solution is
to allow each client to encrypt the data with the public key of the cloud storage server. Then,
the server is able to deduplicate the identified data by decrypting it with its private key pair.
However, this solution allows the cloud storage server to obtain the outsourced plain data,
which may violate the privacy of the data if the cloud server cannot be fully trusted [13],[14].
Convergent encryption [15] resolves this problem effectively.

A convergent encryption algorithm encrypts an input file with the hash value of the
input file as an encryption key. The ciphertext is given to the server and the user retains the
encryption key. Since convergent encryption is deterministic, identical files are always
encrypted into identical ciphertext, regardless of who encrypts them. Thus, the cloud storage
server can perform deduplication over the ciphertext, and all owners of the file can download
the ciphertext(after the proof-of-ownership (PoW) process optionally) and decrypt it later
since they have the same encryption key for the file.

Convergent encryption has long been studied in commercial systems and has different
encryption variants for secure deduplication [8],[16],[17],[18], which was formalized as
message locked encryption later in [20]. However, convergent encryption suffers from
security flaws with regard to tag consistency and ownership revocation. As an example of the
tag consistency attack issue, suppose User1 and User2 have the same data M , and User1
generates ciphertext CA from M , and then maliciously generates another ciphertext CA from
M(=M). Next, she uploads CA with an honestly generated tag T(CA) =H(M) for a
cryptographic hash function H, which plays the role of data index. When User2 generates
ciphertext CB from M and tries to upload CB, the cloud server checks T(CA) =T(CB). Then,
it deletes CB and keeps only CA. Afterwards, when User2 downloads and decrypts it, the
data would be M , not M, which means the integrity of his data has been compromised.
Recently, message locked encryption (MLE) [20] and leakage-resilient deduplication [19]
schemes have been proposed to solve this problem by introducing additional integrity check
phase for decrypted data.

In the case of ownership revocation, suppose multiple users have ownership of a


ciphertext outsourced in cloud storage. As time elapses, some of these users may request the
cloud server to delete or modify their data, and then, the server deletes the ownership
information of the users from the ownership list for the corresponding data. Then, the
revoked users should be prevented from accessing the data stored in the cloud storage after
the deletion or modification request (forward secrecy).

On the other hand, when a user uploads data that already exist in the cloud storage,
the user should be deterred from accessing the data that were stored before he obtained the
ownership by uploading it (backward secrecy). These dynamic ownership changes may occur
very frequently in a practical cloud system, and thus, it should be properly managed in order
to avoid the security degradation of the cloud service. However, the previous deduplication
schemes could not achieve secure access control under a dynamic ownership changing
environment, in spite of its importance to secure deduplication, because the encryption key is
derived deterministically and rarely updated after the initial key derivation. Therefore, for as
long as revoked users keep the encryption key, they can access the corresponding data in the
cloud storage at any time, regardless of the validity of their ownership.

Вам также может понравиться