Вы находитесь на странице: 1из 6

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882

Volume 4, Issue 5, May 2015

Hyper Elliptic Curve Encryption and Cost Minimization Approach in


Moving Big Data to Cloud
Jerin Jose1, Dr. Shine N Das2
1

(Department of CSE, College of Engineering, Munnar)


2
(Department CSE, College of Engineering, Munnar)

ABSTRACT
Cloud computing is a latest computational system which
can be used for big data processing. Huge amount of
unstructured, structured and semi structured data can be
called as big data. Map-Reduce and the Hadoop facilitate
an affordable mechanism to handle and process data
from multiple sources and store the big data in
distributed cloud. This paper explains the secured and
cost minimizing approach to move and store very large
amount of data to cloud. Hyper elliptic cryptography is
introduced in this paper to provide encryption to the
huge amount of data arriving to the cloud. In addition to
cryptography, data download module is included. So the
paper mainly covers cost minimization in moving big
data and the security of the big data.
Keywords- Big Data, Cloud Computing, Hyper Elliptic
Curve Cryptography, Online Algorithm

1. INTRODUCTION
Cloud computing is simply a service over the internet to
store gigantic amount of data that our computers or
single server cannot hold and facilitate services of
computer over the Internet. That is it provides server
resources such as storage, bandwidth and CPU to users.
Its desirable feature is on demand supply of server
resources and minimized management effort. Cloud
platform is a collection/group of software and internet
infrastructure integrated and hardware that are inter
connected. The software - hardware services of cloud
computing are available to enterprises, corporations,
businesses markets and public.
Essential characteristics of cloud computing are on
demand self-service, rapid elasticity, broad network
access, resources pooling and measured service. Massive
Scale,
Geographic
Distribution,
Homogeneity,
Virtualization, Low Cost Software, Resilient computing
are some of the common features of cloud computing.
Big data analysts concentrated their work more in the
analyzing and processing of big data. Before analyzing,
it is necessary to store the data in a storage area. As we
know, the big data is intensively larger in volume, so the

best way is to store it in the cloud. So we have to move


the massive amount of data from the sources to the
cloud. The big data should be moved to the cloud in a
cost optimization manner and also it should be secure.
Some works are done for moving big data to cloud by
considering the cost minimization. But the data should
be secured. So a security system is mandatory. So I
implemented hyper elliptic curve cryptography which
facilitates encryption to the arrived data in the cloud.

2. RELATED WORKS
A series of recent work studies application migration to
the cloud. The following are some of the related works
on cloud computing and big data.
Big Data is not just Hadoop [1]. This paper
summarizes Hadoop as a cost-efficient platform and it
has the ability to significantly lower the cost of certain
workloads. Organizations may have particular pain
around reducing the overall cost of their data warehouse.
Certain groups of data may be seldom used and possible
candidates to offload to a lower-cost platform. Certain
operations such as transformations may be offloaded to a
more cost efficient platform. The primary area of value
creation is cost savings. By pushing workloads and data
sets onto a Hadoop platform, organizations are able to
preserve their queries and take advantage of Hadoops
cost-effective processing capabilities. One customer
example, a financial services firm, moved processing of
applications and reports from an operational data
warehouse to Hadoop Hbase; they were able to preserve
their existing queries and reduce the operating cost of
their data management platform.
A tunable workflow scheduling algorithm based on
particle swarm optimization for cloud computing [2]
explains that Cloud computing provides a pool of
virtualized computing resources and adopts pay-per-use
model. Schedulers for cloud computing make decision
on how to allocate tasks of workflow to those virtualized
computing resources. In this paper, a flexible particle
swarm optimization (PSO) based scheduling algorithm
to minimize both total cost and make span is presented.
Experiment is conducted by varying computation of

www.ijsret.org

554

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 5, May 2015

tasks, number of particles and weight values of cost and


makes span in fitness function. The results show that the
proposed algorithm achieves both low cost and make
span. In addition, it is adjustable according to different
QoS constraints.
Privacy-Aware Cloud Deployment Scenario
Selection [6] presented a privacy-aware decision method
for cloud deployment scenarios. This method is built
upon the ProPAn and PACTS method. The first step of
the presented method is the definition of the clouds used
in concrete deployment scenarios and their cloud
stakeholders. Then which domains shall be put into
which defined clouds have to be decided. Then the
defined clouds, cloud stakeholders, and the relation
between existing domains and the defined clouds in
domain knowledge diagrams have to be captured. We
can apply ProPAns graph generation algorithms on
these domain knowledge diagrams together with a given
model of the functional requirements in problem frames
notation. The resulting privacy threat graphs are then
analyzed to decide which deployment scenario best fits
the privacy needs in the last step of the method. To
support the method, they extended the ProPAn-tool with
wizards that guide the user through the definition of the
deployment scenarios and that automatically generate
the corresponding domain knowledge diagrams. The
proposed method scales well due to the modular way in
that the relevant knowledge for the cloud deployment
scenarios are integrated into the requirements model and
the provided tool-support.
New Algorithms for Planning Bulk Transfer via
Internet and Shipping Networks [9] is the first to explore
the problem of planning a group-based deadline-oriented
data transfer in a scenario where data can be sent over
both: (1) the internet, and (2) by shipping storage
devices (e.g., external or hot-plug drives, or SSDs) via
companies such as Fedex, UPS, USPS, etc. The authors
first formalize the problem and prove its NP-Hardness.
Then, they propose novel algorithms and use them to
build a planning system called Pandora (People and
Networks Moving Data Around). Pandora uses new
concepts of time-expanded networks and delta-timeexpanded networks, combining them with integer
programming techniques and optimizations for both
shipping and internet edges. The experimental
evaluation using real data from Fedex and from
PlanetLab indicate the Pandora planner manages to
satisfy deadlines and reduce costs significantly.
Budget-constrained bulk data transfer via internet and
shipping networks [10] formulated and solved the
problem of finding the fastest bulk data transfer plan
given a strict budget constraint. The authors first
characterized the solution space, and observed that the

optimal solution can be found by searching through


solutions to the deadline-constrained minimum cost
problem. Based on these observations, they devised a
two-step binary search method that will find an optimal
solution. They then developed a bounded binary search
method that makes use of bounding functions that
provide upper- and lower bounds. In this paper the
authors also presented two instances of bounding
functions, based on variants of our data transfer
networks, and proved that they do indeed provide
bounds. Finally, they evaluated the proposed algorithms
by running them on realistic network and found that the
proposed techniques significantly reduce the time
needed to compute solutions.
Scaling social media applications into geodistributed clouds [8]. The paper exploits the social
influences among users proposes efficient proactive
algorithms for dynamic, optimal scaling of a social
media application in a geo-distributed cloud. The key
contribution of this paper is an online content migration
and request distribution algorithm with the following
features: (1) future demand prediction by novelly
characterizing social influences among the users in a
simple but effective epidemic model; (2) one-shot
optimal content migration and request distribution based
on efficient optimization algorithms to address the
predicted demand, and (3) a (t)-step look-ahead
mechanism to adjust the one-shot optimization results
towards the offline optimum. This paper also verifies the
effectiveness of our algorithm using solid theoretical
analysis, as well as large-scale experiments under
dynamic realistic settings on a home-built cloud
platform.

3. METHODOLOGY
3.1 PROBLEM DEFINITION
This work is focused on providing security in big data in
cloud which arrives from data centers. Current
approaches concentrate in big data analysis, and
constraints regarding moving big data to cloud system.
The proposed method is focused on encryption of data in
cloud, downloading of data from cloud. The encryption
method proposed here is Hyper Elliptic Curve
Cryptography. The downloading module includes a
clustering system to simplify the bottlenecks in
downloading.
3.2
SYSTEM DESIGN
We consider a cloud consisting of K geo-distributed data
centers in a set of regions K, where K = |K|. A cloud user
(e.g., a global astronomical telescope application)
continuously produces large volumes of data at a set D

www.ijsret.org

555

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 5, May 2015

of multiple geographic locations. The user connects to


the data centers from different data generation locations
via virtual private networks (VPNs), with G VPN
gateways at the user side and K VPN gateways each
collocated with a data center. Let the set of VPN
gateways at the user side are denoted by G, with G = |G|.
An illustration of the system is in Fig. 1. A private (the
users) network inter-relates the data generation
locations and the VPN gateways at the user side. Such a
model
demonstrates
characteristic
connection
approaches between users and public clouds where
devoted, private network connections are established
between a users premise and the cloud, for enhanced
reliability and security, and guaranteed inter-connection
bandwidth. Inter-data centre connections within a cloud
are usually dedicated high-bandwidth lines. Within the
users private network, the data transmission bandwidth
between a data generation location d D and a VPN
gateway g G is large as well. The bandwidth Ugi on a
VPN link (g, i) from user side gateway g to data center i
is restricted, and comprises the bottleneck in the system.

Fig. 1 Block diagram of Feature based Sentiment


Analysis Model
3.2.1 PROBLEM FORMULATION
Assume the system executes in a time-slotted fashion
with slot length . Fd(t) bytes of data are produced at
location d in slot t, for uploading to the cloud. ldg is the
latency between data location d D and user side
gateway g G, pgi is the delay along VPN link (g, i), and
ik is the latency between data centers i and k. These
delays, which can be obtained by a simple command
such as ping, are dictated by the respective geographic
distances. A cloud user needs to decide (i) via which
VPN connections to upload its data to the cloud, and (ii)
to which data center to aggregate data, for processing by
a Map Reduce-like framework, such that the monetary
charges induced, as well as the latency for the data to

reach the aggregation point, are jointly minimized. The


total cost C to be minimized has four components:
routing cost, migration cost, bandwidth cost and
aggregate storage and computing cost.
3.2.2 OFFLINE ALGORITHM
We propose a polynomial-time dynamic programming
based algorithm to solve the offline optimal data
migration problem, given absolute knowledge of data
generation in the temporal domain. The derived offline
optimal strategies serve as a benchmark for our online
algorithms. The offline algorithm derives the theoretical
minimum cost given complete knowledge of data
generation in both temporal and spatial domains.
3.2.3 ONLINE ALGORITHM
A straightforward algorithm solves the above
optimization in each time slot, based on y(t 1) in the
previous time slot. This can be far from optimal due to
premature data migration. For example, assume data
center k was selected at t1, and migrating data from k
to j is cost-optimal at t according to the one-shot
optimization (e.g., because more data are generated in
region j in t); the offline optimum may indicate to keep
all data in k at t, if the volume of data originated in k in t
+ 1 surges. We next explore dependencies among the
selection of the aggregation data center across
consecutive time slots, and design a more judicious
online algorithm accordingly.
We divide the overall cost C(x(t), y(t)) incurred in t
into two parts: (i) migration cost Ct MG(y(t), y(t 1))
related to decisions in t 1; (ii) non-migration cost that
relies only on current information at t:
CtMG(x(t), y(t)) = CBW(x(t)) + CDC(y(t)) + CRT
(x(t)). (1)
We design an online algorithm, whose basic idea is
to postpone data center switching even if the one-shot
optimum indicates so, until the cumulative nonmigration cost (in Ct MG(x(t), y(t))) has significantly
exceeded the potential data migration cost. At the
beginning (t=1), we solve the one-shot optimization and
upload data via the derived optimal routes x(1) to the
optimal aggregation data center indicted by y(1). Let t
be the time of the data center switch. In each following
time slot t, we compute the overall non-migration cost in
[t, t 1], t1 =t C MG(x(), y()). The algorithm
checks whether this cost is at least 2 times the
migration cost Ct MG(y(t), y(t1)). If so, it solves the
one-shot optimization to derive x(t) and y(t) without
considering the migration cost, i.e., by minimizing Ct
MG(x(t), y(t)) and an additional constraint, that the
potential migration cost, Ct MG(y(t), y(t 1)), is no
larger than 1 times the non migration cost Ct MG(x(t),

www.ijsret.org

556

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 5, May 2015

y(t)) at time t (to make sure that the migration cost is not
too excessive). If a change of migration data center is
indicated (y(t) = y(t 1)), the algorithm accepts the new
aggregation decision, and migrates data accordingly. In
all other cases, the aggregation data center remains
unchanged from t1, while optimal data routing paths
are computed given this aggregation decision, for upload
of new data generated in t.
The Online Algorithm:
1: t = 1;
2: t = 1; //Time slot when the last change of aggregation
data center happens
3: Compute data routing decision x(1) and aggregation
decision_y(1) by minimizing C(x(1), y(1));
4: Compute C1 MG(y(1), y(0)) and C1 MG(x(1),y(1));
5: while t T do
6: if Ct MG(y(t), y(t1)) 1 2 t1 =t CMG(x(),
y()) then
7: Derive x(t) and y(t) by minimizing Ct MG(x(t), y(t))
and constraint Ct MG(y(t), y(t1)) 1Ct MG(x(t),
y(t));
8: if y(t) = y(t 1) then
9: Use the new aggregation data center indicated by y(t);
10: t = t;
11: if t < t then //not to use new aggregation data center
12: y(t) = y(t 1), compute data routing decision x(t)
if not derived;
13: t = t + 1;
3.2.4 HYPER ELLIPTIC CURVES
A hyper elliptic curve C of genus g outlined over a field
Fq of characteristic p is given by associate degree
equation of the form
y2 + h(x)y = f(x)
where h(x) and f(x) square measure polynomials with
coefficients in Fq, with deg h(x) g and deg f(x) = 2g +
one. an extra demand is that C isn't a singular curve. If
h(x) = zero and p > a pair of this amounts to the
necessity that f(x) could be a square free polynomial. In
general, the condition is that there aren't any x and y in
the pure mathematics closure of Fq that satisfy the
equation (1) and also the 2 partial derivatives 2y + h(x) =
zero and h (x)y f (x) = 0.
3.2.5 SCHEMES
Signature schemes, encryption schemes and key
agreement schemes are the schemes which can base on
elliptic and hyper elliptic curves.
Diffie-Hellman Key Agreement Scheme: Two
parties Sender and Receiver wish to agree on a common
secret by communicating over a public channel. An
eavesdropper Interrupter, who can listen to all

communication between Sender and Receiver, should


not be able to obtain this common secret. First, we
assume that there are the following publicly known
system parameters: The group G. An element R G
of large prime order r.
The steps that Sender performs are the following:
1. Choose a random integer a [1, r 1].
2. Compute P aR in the group G, and send it to
Receiver.
3. Receive the element Q G from Receiver.
4. Compute S = aQ as common secret.
The steps that Receiver performs are:
1. Choose a random integer b [1, r 1].
2. Compute Q = bR in the group G, and send it to
Sender.
3. Receive the element P G from Sender.
4. Compute S = bP as common secret.
Note that both Sender and Receiver have computed
the same values S, as S = a(bR) = (ab)R = b(aR).
It is not known how Interrupter, knowing only P, Q
and R, can compute S within reasonable time. If she
could solve the discrete logarithm problem in G, then
she could calculate a from P and R, and then calculate S
= aQ. The problem of computing S from P, Q and R is
known as the Diffie-Hellman problem. The pair (a, P) is
called Senders key pair consisting of her private key
and public key P. Likewise, Receivers key pair is (b,
Q), with private key b and public key Q. It is important
to realize that the scheme that is described here should
be used with additional forms of authentication of the
public keys. Otherwise an auditor (interrupter) who is
able to intercept and change information sent and is able
to agree on keys separately with Sender and Receiver.
This is known as a man in the middle attack.
The (Hyper-) Elliptic Curve Integrated Encryption
Scheme: This encryption scheme uses the DiffieHellman scheme to derive a secret key, and combines it
with tools from symmetric key cryptography to provide
better provable security. It can be proved to be securing
against adaptive chosen cipher text attacks. We again
formulate the scheme for any group G and R G with
large prime order r. The symmetric tools that are used in
the scheme are:
A key derivation function. This is a function KD(P)
that takes as input a key P, in our case this is an
element of G, and outputs keying data of any
required length.
A symmetric encryption scheme consisting of a
function Enck that encrypts the message M to a
ciphertext C = Enck(M) using a key k, and a function
Deck that decrypts C to the message M = Deck(C).

www.ijsret.org

557

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 5, May 2015

A Message Authentication Code MACk. One can


think of this as a keyed hash function. It is a function
that takes as input a ciphertext C and a key k. It
computes a string MACk(C) that satisfies the
following property: Given a number of pairs (Ci,
MACk(Ci)), it is computationally infeasible to
determine a pair (C, MACk (C)), with C different
from the Ci if one does not know k.
1. Obtain Receivers public key Q.
2. Choose a secret number a [1, r 1].
3. Compute C1 = aR.
4. Compute C2 = aQ.
5. Compute two keys k1 and k2 from KD(C2), i.e.
(k1||k2) = KD(C2).
6. Encrypt the message, C = Enck1 (M).
7. Compute mac = MACK2 (C). 8. Send (C1, C, mac).
To decrypt, Receiver does the following:
1. Obtain the encrypted message (C1, C, mac) from
Sender.
2. Compute C2 = bC1.
3. Compute the keys k1 and k2 from KD(C2).
4. Check whether mac equals MACk2 (C). If not, reject
the message and stop.
5. Decrypt the message M = Deck1 (C).
The Digital Signature Algorithm (DSA) is the basis
of the digital NIST signature standard. This algorithm
can be adapted for elliptic and hyper elliptic curves.
More generally, one can use it for any group G where
the DLP is difficult, provided that one has a computable
map G Z with large enough image and few inverses
for each element in the image. The elliptic curve version,
known as ECDSA, can be found in various standards.
The hyper elliptic curve version seems not to have
appeared a lot in existing literature. In the hyper elliptic
curve case, one can take for the following map. Let D
= [u(x), v(x)] be a divisor in Mumford representation. .
Let u(x) = deg(u(x)) ui xi with uiFq. Define (D) to be
the integer whose binary expansion is the concatenation
of the bit strings representing the ui , i [0, deg(u(x))
1], as explained above.
Assume the following system parameters are
publicly known:
A group G and a map Z as above,
An element R G with large prime order r,
A hash function H that maps messages m to 160-bit
integers.
To create a key pair, Alice chooses a secret integer a
Z, and computes P = aR. The number a is Alices
secret key, and P is her public key. If Alice wants to sign
a message m, she has to do the following:
1. Choose a random integer k [1, r 1], and compute
Q = kR.

2. Compute s k 1 (H(m) + a(Q)) mod r.


The signature is
(m, Q, s).
To verify this signature, a verifier Bob has to do the
following:
1. Compute v1s 1H(m) mod r and v2s 1(Q) mod
r.
2. Compute V = v1R + v2P.
3. Accept the signature if V = Q. Otherwise, reject it.
The hyper elliptic curve as explained above is
implemented at the cloud side before storing it to the
cloud. It makes more protection to the stored data. The
data arrived at the cloud are divided into chunks and the
chunks are pass through the hyper elliptic curve
encryption system. Then it is converted into encrypted
form. These encrypted files are stored in the cloud.

4. RESULTS
We compare the performance of our scheme with the
previous paper. The previous paper didnt use any
security measures for storing the big data in cloud. This
paper employed encryption for the big data which gives
more advantage and efficiency to the system. The
computation and communication overhead when we
used the encryption to entire file (n) and randomly
choose file(c) is shown in the TABLE 1. It is much
lesser but provides great achievement to the work.
Table 1. Comparison of Overheads
n = 100,000
c = 460
Computation Overhead
13.15 sec
0.21 sec
Communication
2.11 MB
30.37 KB
Overhead
Signature generation time, extra storage space on
signatures are also evaluated with some other previous
works which uses another encryption method and the
result is obtained as shown in the TABLE 2.
Table 2. Comparison of Signature Complexity
[12]
[13]
Signature Generation
149.08
142.72 20.28
Time (ms)
Extra storage space on
2
20
32.8
signatures (MB)

5. CONCLUSION
In this paper, we used an efficient security system for the
big data in the cloud. So the data in the cloud kept
safely. The encryption method used is the Hyper Elliptic
Curve Cryptosystem with use the mathematical concepts
of Hyper Elliptic Curve to encrypt the data. This work is

www.ijsret.org

558

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 5, May 2015

done with the help of Cent OS, Horton works Sandbox.


The vibrant features of Java can be used for making the
theory into reality. This paper also considered the
download of data from cloud after clustering it.

[12]

[13]

REFERENCES
[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

M. Armbrust, A. Fox, R. Grifth, A. D. Joseph, R.


Katz, A. Konwinski, G. Lee, D. P. A. Rabkin, I.
Stoica, and M. Zaharia, Above the Clouds: A
Berkeley View of Cloud Computing, EECS,
University of California, Berkeley, Tech. Rep.,
2009.
S. Pandey, L. Wu, S. Guru, and R. Buyya, A
Particle Swarm Optimization (PSO)-based Heuristic
for Scheduling Workflow Applications in Cloud
Computing Environment, in Proc. of IEEE AINA,
2010.
E. E. Schadt, M. D. Linderman, J. Sorenson, L. Lee,
and G. P. Nolan, Computational Solutions to
Large-scale Data Management and Analysis, Nat
Rev Genet, vol. 11, no. 9, pp. 647657, 09 2010.
R. J. Brunner, S. G. Djorgovski, T. A. Prince, and A.
S. Szalay, Handbook of Massive Data Sets, J.
Abello, P. M. Pardalos, and M. G. C. Resende, Eds.
Norwell, MA, USA: Kluwer Academic Publishers,
2002, ch. Massive Datasets in Astronomy, pp. 931
979.
M. Cardosa, C. Wang, A. Nangia, A. Chandra, and
J. Weissman, Exploring MapReduce Efficiency
with Highly-Ditributed Data, in Proc. of ACM
MapReduce, 2011.
M. Hajjat, X. Sun, Y. E. Sung, D. Maltz, and S. Rao,
Cloudward Bound: Planning for Beneficial
Migration of Enterprise Applications to the Cloud,
in Proc. of ACM SIGCOM, August 2010.
X. Cheng and J. Liu, Load-Balanced Migration of
Social Media to Content Clouds, in Proc. of ACM
NOSSDAV, June 2011.
Y. Wu, C. Wu, B. Li, L. Zhang, Z. Li, and F. Lau,
Scaling Social Media Applications into GeoDistributed Clouds, in Proc. of IEEE INFOCOM,
Mar. 2012.
B. Cho and I. Gupta, New Algorithms for Planning
Bulk Transfer via Internet and Shipping Networks,
in Proc. of IEEE ICDCS, 2010.
B. Cho and I. Gupta, Budget-Constrained Bulk
Data Transfer via Internet and Shipping Networks,
in Proc. of ACM ICAC, 2011.
J. Scholten and F. Vercauteren, An Introduction to
Elliptic and Hyperelliptic Curve Cryptography and
the NTRU Cryptosystem.

www.ijsret.org

B. Wang, B. Li, and H. Li, Oruta: PrivacyPreserving Public Auditing for Shared Data in the
Cloud, in IEEE Cloud, June 2012, pp. 295302.
B. Wang, B. Li, and H. Li, Knox: PrivacyPreserving Auditing for Shared Data with Large
Groups in the Cloud, in ACNS, 2012, pp. 507- 525.

559

Вам также может понравиться