Вы находитесь на странице: 1из 21

Design and implementation

of an improved and
parallelized RSA algorithm
on multicore CPUs and
GPUs
Kennedy B.

Outline
Introduction
Objective
Problem of statement
Methodology
Results and discussions
Conclusions and Recommendations
References
2

Introduction
Graphics Processing Units (GPU) and their
development tools have advanced and more in need in
industry.
Among several development frameworks for CPU and
GPU(s), OpenCL provides a programming environment
to write portable code that can run in parallel across
heterogeneous platforms consisting of many different
types of devices, for example CPUs, GPUs, DSPs, FPGAs
and other types of processors.
Sensitive message exchange through the internet.
Secure messages in an unsecure-channel.
Security is achieved through encryption.
Cryptanalysis: - art of securing data over the internet.
symmetric-key cryptography is based on sharing secrecy;
asymmetric-key cryptography is based on personal secrecy.

Contd

Asymmetric key cryptography uses two


separate keys: one private and one public.

Fig: General idea of asymmetric-key cryptosystem

Contd
The most common public-key algorithm is the RSA
cryptosystem, named for its inventors (Rivest, Shamir, and
Adleman).

Fig. Complexity of operations in RSA

Objective
main objective :
The main objective of this project aims at
speeding up RSA encryption and
decryption algorithm.
Specific objective:
The specific objective of this project is to
parallelize the RSA algorithm and
implementing it on a multicore CPUs and
GPU.

Problem of statement
Public key algorithms (e.g., RSA algorithm) rely on hard
mathematical problems :
modular multiplication and modular exponentiation of very
large integers, ranging from 128 to 2048 bits.
The achievement of the calculation process will not be easy to
implement.
With the rapid developments in hardware and software
technologies- sequential implementation not safer and fast
enough.
Parallel algorithms play a significant role in maintaining rapid
growth- on CPUs and GPU
However, such parallelization is a challenging process.
Motivated by such challenge this project proposes a hybrid system
to parallelize the RSA for multicore CPUs and many cores GPU.

Methodology
RSA algorithm is appropriate for encryption
and digital signature.
Internet security depends significantly on the
security properties of the RSA cryptosystem.
Its security depends upon the insolvability of
the integer factorization problem.
Modular arithmetic is used to implement
modular exponentiations in RSA algorithm.
The RSA algorithm consists of three steps which
include key generation, encryption and
decryption ones.

Contd

The RSA algorithm can be summarized in the following steps:


Step 1: Generate randomly two large prime's p and q of
approximately the same size, but not too close together.
which are kept secret.
Step 2: Calculate the modulus n = p*q. and Calculate: (n) =
(p-1) (q-1); Where (n) represents the Euler Totient
function.
Step 3: Choose a random encryption exponent e less than n
such that the GCD ( (n), e) =1, 1<e< (n).
Step 4: Calculate the decryption exponent d using the
extended inverse of e modulo (n) (i.e d.e=1 mod (n))
Step 5: The encryption function is: E (M) = Me mod n.
Step 6: The decryption function is: D (C) = Cd mod n.
Step 7: The RSA keys are: The public key is (n, e), and the
private key is (n, d).

Contd
Montgomery Algorithm
Presented by Peter Montgomery in 1985, an algorithm used in
public-key cryptography.
Serve as an efficient algorithm for modular multiplication and
exponentiation operations.
Efficient computation for a large modular arithmetic (>=1024 bits).
The Montgomery algorithm consists of two approaches:
multiplication and reduction.
Montgomery multiplication is a method for computing a. b mod n
for positive integers a, b, and n.
It is useful to compute ae mod n for a large value of n.
It eliminates the mod n reduction steps and as a result, tends to
reduce the size of the timing characteristics.
In common, Montgomery multiplication algorithm computes the
Montgomery product as specified by: MonMul (a', b') = a' .b' .r-1
(mod n)
Where, a and b are less than the modulus n. it is needed to declare
another integer r which must be greater than n, as the gcd (r, n) = 1

10

Contd
Montgomery Reduction Algorithm.

Step 1: Input a, e, n.
Step 2: Function: MonExp (a, e, n).
Step 3: Calculate a'= a .r mod n.
Step 4: Calculate x'= 1 .r mod n.
Step 5: for i = n 1 down to 0 loop

x' = MonMul(x', x')


If e.i= 1; then x' = MonMul(x', a') End
loop.
Step 6: x = MonMul(x', 1)
Step 7: return x.
Step 8: Output: ae mod n.

11

Contd
IMPLEMENTATION OF RSA ALGORITHM
Sequential RSA implementation on the CPU alg.
Step 1: Generate the keys as mentioned earlier.
Public key {e,n}
public struct RSA_Public_Key

Private key {d,n}


public struct RSA_Secret_Key

Step 2: Insert the text that will be encrypted from a file or


typing it.
Step 3: Send the data to a for loop to do encryption
For (int i = 0; i < list_source.Count; i++)
{ var item = new {Id = i, Data = list source[i]}
};
Step 4: The decryption process is done using public Encrypt
(long int biPlain, RSA_Public_Key rpkKey)

12

Contd
Parallel RSA implementation on the multicore CPU alg.

Step 1: Generate the keys as mentioned above.


Public key {e,n}
public struct RSA_Public_Key

Private key {d,n}


public struct RSA_Secret_Key

Step 2: Insert the text that will be encrypted from a file or


typing it.
Step 3: Create a pool of threads
ThreadStartsList
ParameterizedThreadStart

Step 4: Each thread will take a portion of data to implement


encryption on it.
For (int i = 0; i < list_source.Count; i++)
{
ParameterizedThreadStart ();
{doEncrypt ((ThreadParameters)); };
}

Step 5: The decryption process can be executed same wise.

13

Contd
Parallel RSA implementation on the many core GPU alg.
Step 1: Generate the keys as mentioned earlier.
Public key {e,n} : public struct RSA_Public_Key
Private key {d,n} :public struct RSA_Secret_Key
Step 2: Insert the text that will be encrypted from a file or typing
it.
Step 3: Set kernel launch parameters (Set grid/ block size for GPU
execution).
Launcher.SetGridSize (512);
Launcher.SetBlockSize (128);

Step 4: Call kernel method (GPU kernel)


Reduce_GPU (A, n, m, mPrime);
Step 5: Get the thread id and total number of thread.
int ThreadId = BlockDimension.X * BlockIndex.X +
ThreadIndex.X;
int TotalThreads = BlockDimension.X * GridDimension.X;

14

Results and Discussions


The CPU carries out the key generation.
As for the encryption and decryption process it is handled with these
three cases:
1) A sequential implementation of the RSA algorithm runs on the CPU.
2) An RSA parallel implementation executed on the multicore CPU.
3) An RSA parallel implementation executed on the many core GPU.
The proposed variant implementations support variable key size as
demand.
The main bottleneck of the RSA encryption process is the large size of
data.
In order to provide a parallel implementation of the RSA, it is desired to
have no dependencies between the data.
As so, the data can be divided into small portions, each thread can
calculate a portion.
As a result, this data parallelism method increases the computing speed
of RSA.

15

Contd
In order to compare the speedup gain of
parallelizing RSA in multicore CPU and GPU
computing environments, the sequential and
parallelized algorithms has been
implemented and the elapsed time for the
encryption/decryption process has been
recorded.
it is seen that the GPU implementation begin
to be faster than the other two
implementation when the key size is gets
higher.
The experiments are conducted on a desktop
with Intel Core I7, 3.23 GHz CPU and Nvidia
GeForce GT630M GPU.

16

The execution time (latency) in Milliseconds for


encryption/decryption with variant key size
Key
Size
in bits

Sequential
CPU
(Enc | Dec)

Parallelized
CPU
(Enc | Dec)

(Enc | Dec)

768

0.110

5.03

0.87

5.46

1.08

2.42

1024

0.130

8.89

0.94

7.89

0.92

2.78

2048

0.49

76.294

1.28

38.66

0.91

9.27

3072

0.85

250.034

1.4

73.98

0.8

23.59

4096

1.54

411.453

1.9

140.38

0.99

41.26

GPU

6144

4.01

1727.58

3.13

369.3

1.95

93.28

8192

5.93

2664.31

3.9

724.96

1.980

201.07

17

SEQUENTIAL
CPUparallel
GPUparallel

18
KeySize 512 1024

2048

3072

4096

6144

8192

Conclusions and Recommendations


for future work
Due to its roots in modular arithmetic based on very large
numbers, RSA is considered to be slow algorithm.
This paper proposed a variants implementations of
executing modular exponentiation on multicore CPUs and
GPU.
The GPU implementation gained moderately a higher speed
up over the sequential CPU implementation; while the
multithread CPU implementation gained only moderate
speed up over the sequential CPU implementation.
Furthermore, additional speedup could be gained as far as
the throughput is concerned.
Results reveal that the GPU is appropriate to speed up the
RSA algorithm.
In the future work, implementing this algorithm on FPGAs
is recommended.

19

References
1)
2)
3)
4)
5)
6)
7)

8)
9)

https://en.wikipedia.org/wiki/RSA_(cryptosystem)
mathworld.wolfram.com
OpenCL C/C++ Programming Guide, khronous.org
Handbook of Applied Cryptography. CRC Press, Inc., Boca
Raton, FL, USA.
Diego Viot, Rodolfo Aurelio, Helano Castro and Jardel Silveria,
Modular
Multiplication Algorithm for PKC, Universiadade Federal do
Ceard, LESC
Josef Pieprzyk1 and David Pointcheval, Parallel Authentication
and Public key encryption, Springer-Verlag 2003
Chandra, S. S. & Chandra, K. 2005. Cbigint class: an
implementation of big integers in c++. J. Comput. Small Coll.,
20(4)
Bewick, G. 1994. Fast multiplication algorithms and
implementation.

20

21

Вам также может понравиться