Академический Документы
Профессиональный Документы
Культура Документы
of an improved and
parallelized RSA algorithm
on multicore CPUs and
GPUs
Kennedy B.
Outline
Introduction
Objective
Problem of statement
Methodology
Results and discussions
Conclusions and Recommendations
References
2
Introduction
Graphics Processing Units (GPU) and their
development tools have advanced and more in need in
industry.
Among several development frameworks for CPU and
GPU(s), OpenCL provides a programming environment
to write portable code that can run in parallel across
heterogeneous platforms consisting of many different
types of devices, for example CPUs, GPUs, DSPs, FPGAs
and other types of processors.
Sensitive message exchange through the internet.
Secure messages in an unsecure-channel.
Security is achieved through encryption.
Cryptanalysis: - art of securing data over the internet.
symmetric-key cryptography is based on sharing secrecy;
asymmetric-key cryptography is based on personal secrecy.
Contd
Contd
The most common public-key algorithm is the RSA
cryptosystem, named for its inventors (Rivest, Shamir, and
Adleman).
Objective
main objective :
The main objective of this project aims at
speeding up RSA encryption and
decryption algorithm.
Specific objective:
The specific objective of this project is to
parallelize the RSA algorithm and
implementing it on a multicore CPUs and
GPU.
Problem of statement
Public key algorithms (e.g., RSA algorithm) rely on hard
mathematical problems :
modular multiplication and modular exponentiation of very
large integers, ranging from 128 to 2048 bits.
The achievement of the calculation process will not be easy to
implement.
With the rapid developments in hardware and software
technologies- sequential implementation not safer and fast
enough.
Parallel algorithms play a significant role in maintaining rapid
growth- on CPUs and GPU
However, such parallelization is a challenging process.
Motivated by such challenge this project proposes a hybrid system
to parallelize the RSA for multicore CPUs and many cores GPU.
Methodology
RSA algorithm is appropriate for encryption
and digital signature.
Internet security depends significantly on the
security properties of the RSA cryptosystem.
Its security depends upon the insolvability of
the integer factorization problem.
Modular arithmetic is used to implement
modular exponentiations in RSA algorithm.
The RSA algorithm consists of three steps which
include key generation, encryption and
decryption ones.
Contd
Contd
Montgomery Algorithm
Presented by Peter Montgomery in 1985, an algorithm used in
public-key cryptography.
Serve as an efficient algorithm for modular multiplication and
exponentiation operations.
Efficient computation for a large modular arithmetic (>=1024 bits).
The Montgomery algorithm consists of two approaches:
multiplication and reduction.
Montgomery multiplication is a method for computing a. b mod n
for positive integers a, b, and n.
It is useful to compute ae mod n for a large value of n.
It eliminates the mod n reduction steps and as a result, tends to
reduce the size of the timing characteristics.
In common, Montgomery multiplication algorithm computes the
Montgomery product as specified by: MonMul (a', b') = a' .b' .r-1
(mod n)
Where, a and b are less than the modulus n. it is needed to declare
another integer r which must be greater than n, as the gcd (r, n) = 1
10
Contd
Montgomery Reduction Algorithm.
Step 1: Input a, e, n.
Step 2: Function: MonExp (a, e, n).
Step 3: Calculate a'= a .r mod n.
Step 4: Calculate x'= 1 .r mod n.
Step 5: for i = n 1 down to 0 loop
11
Contd
IMPLEMENTATION OF RSA ALGORITHM
Sequential RSA implementation on the CPU alg.
Step 1: Generate the keys as mentioned earlier.
Public key {e,n}
public struct RSA_Public_Key
12
Contd
Parallel RSA implementation on the multicore CPU alg.
13
Contd
Parallel RSA implementation on the many core GPU alg.
Step 1: Generate the keys as mentioned earlier.
Public key {e,n} : public struct RSA_Public_Key
Private key {d,n} :public struct RSA_Secret_Key
Step 2: Insert the text that will be encrypted from a file or typing
it.
Step 3: Set kernel launch parameters (Set grid/ block size for GPU
execution).
Launcher.SetGridSize (512);
Launcher.SetBlockSize (128);
14
15
Contd
In order to compare the speedup gain of
parallelizing RSA in multicore CPU and GPU
computing environments, the sequential and
parallelized algorithms has been
implemented and the elapsed time for the
encryption/decryption process has been
recorded.
it is seen that the GPU implementation begin
to be faster than the other two
implementation when the key size is gets
higher.
The experiments are conducted on a desktop
with Intel Core I7, 3.23 GHz CPU and Nvidia
GeForce GT630M GPU.
16
Sequential
CPU
(Enc | Dec)
Parallelized
CPU
(Enc | Dec)
(Enc | Dec)
768
0.110
5.03
0.87
5.46
1.08
2.42
1024
0.130
8.89
0.94
7.89
0.92
2.78
2048
0.49
76.294
1.28
38.66
0.91
9.27
3072
0.85
250.034
1.4
73.98
0.8
23.59
4096
1.54
411.453
1.9
140.38
0.99
41.26
GPU
6144
4.01
1727.58
3.13
369.3
1.95
93.28
8192
5.93
2664.31
3.9
724.96
1.980
201.07
17
SEQUENTIAL
CPUparallel
GPUparallel
18
KeySize 512 1024
2048
3072
4096
6144
8192
19
References
1)
2)
3)
4)
5)
6)
7)
8)
9)
https://en.wikipedia.org/wiki/RSA_(cryptosystem)
mathworld.wolfram.com
OpenCL C/C++ Programming Guide, khronous.org
Handbook of Applied Cryptography. CRC Press, Inc., Boca
Raton, FL, USA.
Diego Viot, Rodolfo Aurelio, Helano Castro and Jardel Silveria,
Modular
Multiplication Algorithm for PKC, Universiadade Federal do
Ceard, LESC
Josef Pieprzyk1 and David Pointcheval, Parallel Authentication
and Public key encryption, Springer-Verlag 2003
Chandra, S. S. & Chandra, K. 2005. Cbigint class: an
implementation of big integers in c++. J. Comput. Small Coll.,
20(4)
Bewick, G. 1994. Fast multiplication algorithms and
implementation.
20
21