0 оценок0% нашли этот документ полезным (0 голосов)

6 просмотров39 страницsample cuda project

Nov 07, 2019

© © All Rights Reserved

PDF, TXT или читайте онлайн в Scribd

sample cuda project

© All Rights Reserved

0 оценок0% нашли этот документ полезным (0 голосов)

6 просмотров39 страницsample cuda project

© All Rights Reserved

Вы находитесь на странице: 1из 39

PROJECT REPORT

By

Aryan Saxena 16BCE0022

Rajdeepa Chakrabarty 16BCE0732

Vatsal Agrawal 16BCB0090

S. Shubhjeet Singh 16BCE2315

Slot: B1

October, 2018

Abstract:

Images play an important role in the modern-day world. Confidentiality, integrity and

authentication of images are as important as that of a text. But performing

encryption/decryption of images is significantly different than plain text due to their large

size and redundancy of pixels. With the rise of the powerful GPUs containing thousands of

high-performance and efficient cores, the cryptographic algorithms which were once thought

as secured can now be easily broken in a matter of seconds. Earlier increasing complexity

meant an increase in the processing time for encryption as well as decryption. Now, with the

advent of these cutting-edge GPUs and the evolvement of GPU computing, the processing

time has been reduced to a fraction of the time it used to take earlier. This paper presents a

parallel implementation of AES using NVIDIA CUDA and OpenCV to encrypt images

rapidly. We achieved an average speed up of four times on GPU as compared to CPU-only

for both 128-bit as well as 256-bit keys.

Introduction:

Parallel programming today is being exploited in various arenas of computer science to

improve the performance. It is dominating the market and there is an ever-growing demand

to make more and more algorithms adaptive to parallel computing. Moreover, GPU now is

not restricted to application in graphical computation, it is being used to perform general

purpose computing too. Hence, it has given us the freedom to utilize GPU on various new

applications to accelerate the processing time.

Security of data is the prime concern of this era. We need a much higher security in data

transmission as compared to previous generations. AES-256 (Advanced Encryption

Algorithm) is one of the most popular and extensively used algorithm in cryptography. It is

secured enough to be used by the US government for transmitting their classified

information. It basically consists of different rounds of permutations and substitutions,

depending upon the key size: 10 rounds for 128-bit, 12 for 192-bit and 14 rounds for 256-

bits. Security increases with increasing key size. The implementation of the AES algorithm

on the image differs highly from a general text due to its large size and the various lossy or

lossless compression techniques used on image. The images can be viewed as a two-

dimensional matrix of integers of varying values depending upon the bit depth of the image.

Higher resolution images have a large number of pixels and due to this higher size, it takes a

long time to process them. Hence, they have to be efficiently implemented on a GPU.

The advantage in our methodology is that we can share the encrypted file in the image form.

Without knowing the key, it is computationally infeasible (at the time of writing this paper)

to know the contents of the image as all the image will be just a random pattern of different

colours. The next section presents a brief review of different methods researchers have used

in the past for encryption/decryption of text and images. We then present our proposed

method in detail along with the observations and results in the subsequent sections.

Literature Review:

No r name & future enhancement

and Date

AES Encryption algorithms on text

1 Performan M. Internatio 1. (Advanced 1. It takes 40-45% less time

ce Nagen nal Encryption for performing the encryption

Improvem dra #1 Journal of Algorithm) and decryption than the

ent of and Software cryptography

sequential implementation

Advanced M. Engineeri algorithm

Encryptio Chand ng and Its implementation 2. Parallel Implementation

n ra Applicati on dual core using gpu would have given

Algorithm Sekha o ns processor by lesser time which shall be

using r #2 Vol.8 using OpenMP implemented in future.

Parallel , No.2 API

(2014)

Computati 2. Reduces the

on. [M. execution time

Nagendra

#1 and M.

Chandra

Sekhar

#2,2014]

2 A design Ghada Internatio 1. Used eleven 1.Pipelining improves both

of a fast F.Elka nal stages of encryption and decryption by

parallel- bbany, Journal of pipelining in approximately 95%

pipelined Heba Computer

order to exploit

implement K.Asl Science &

ation of an and Informatio the sources of

aes: Moha n parallelism in

advanced med Technolog both initial and

encryption N.Ras y (IJCSIT) final

standard. slan Vol 6, No round.

[GhadaF.E 6, 2. Combines

lkabbany, December

pipelining of

HebaK.As 2014

lan and rounds and

Mohamed parallelization of

N.Rasslan, Mix_Column

2014] and

Add_Round_Ke

y

transformations.

Implement Shrest Notes on describes an correct encryption efficiently

ation of ha Informatio implementation with a speedup of 1.6 and

AES n Theory

of AES using with 37% efficiency

Using Vol. 1,

XTS No. 4, XTS mode in 2. Future work aims at

Mode of December parallel via MPI. increasing number of

Operation. 2013 processing elements and data

[Muna size at a constant rate.

Shrestha,2

013]

4 Aes on tomoi Internatio 1.Paper tells 1. Since data stored in local

gpu using agă nal about memory of the GPU this

cuda radu Journal of Implementation implementation

[tomoiagă daniel, Informatio Offers speedups of

of AES on

radu stratul n Sciences

daniel, at and NVIDIA GPU almost 40 times in

stratulat mircea Technique using CUDA comparison to the CPU

mircea,20 s (IJIST) 2. Future work aims to

14] Vol.2, implement and test AES 192 ,

No.6, AES 256 on GPU ,encryption

November and decryption and OpenSSL

2014

adaptation to use these

algorithms

high azlSol essors and efficient achieves a high throughput of

throughpu tani, Microsyst multipliers and 260.15 Gbps.

t and fully Saeed ems 39 multiplicative 2. A maximum operational

Sharifi (2015)

pipelined 480–493 inverters in frequency of 508.104 MHz is

an

implement GF2^8. also achieved.

ation of 2. Loop-

AES unrolling, fully

algorithm pipelining and

on FPGA sub-pipelining

[AbolfazlS techniques

oltani, employed in all

Saeed proposed

Sharifian, methods. 3.

2015] Inserted registers

of pipelining in

optimal

placement.

6 Implement Qinjia Internatio 1. Implementatio 1. A very high performance of

ation and n Li nal n of Electronic around 60 Gbps throughput

Analysis Cheng Journal of Codebook on NVIDIA Tesla C2050

wen Networkin

of AES (ECB) mode GPU, which is 50 times faster

Zhong g and

Encryptio Computin encoding than a sequential

n on GPU g 2015 process and implementation

.[ Qinjian Cipher Feedback

Li (CBC) mode

Chengwen decoding on

Zhong,201 cuda process on

5] GPU.

2. T-boxes are

allocated on on-

chip shared

memory

3. One single

thread handles a

16 Bytes AES

block .

ng aashre nal n using CUDA amount of data is large, the

encryption e P Journal of platform. encryption/decryption time

Scientific

/decryptio required is greatly reduced, if

&

n using Engineeri it runs on a graphics

gpu’s for ng processing environment.

aes Research, 2.Future work aims at

algorithm. Volume 4, efficient implementations of

[Sanjanaas Issue 2, other common symmetric and

hree February- asymmetric algorithms.

2013

P,2013] Implementations of hashing

and public key algorithms on

gpu also may be done.

8. Improved D. Internatio 1. Fast DES 1. High performance

Software Noer∗ nal bitslice computing

Implement , A.P. Journal of bruteforce 2.Future work aims at

ation of Engsi Scientific software tool improvements in

g- &

DES Engineeri utilizing Performance on AMD GPUs

Karup

Using ∗ , E. ng consumer which will require an

CUDA Zenne Research Graphics implementation of the model

and r† Processing in OpenCL.

OpenCL 2014 Units(GPUs)

[D. Noer∗

, A.P.

Engsig-

Karup∗ ,

E.

Zenner†,2

014]

Image Encryption

9. Image Roza 2013 8th The paper The results obtained from the

Encryptio Afarin Iranian proposes an analysis shows the algorithm

n Using ; Conferenc improved can be applied successfully

Genetic Saeed e on version of for any type of image.

Algorithm Mozaf Machine Genetic The result is sensitive to key

[Roza fari Vision and Algorithm for and distortions in the image.

Afarin; Image the image Further there can be an

Saeed Processin encryption. In attempt to increase the cipher

Mozaffari, g (MVIP) the inital steps value to get a better result.

2013] the image is

dislocated

randomly and

then divided into

four parts.urther

crossover and

mutation is

applied. If

entropy of the

final result

becomes

more.Further

randomness are

obtained through

entropy,

correlation

coefficient and

histogramanalysi

s.

10. Pixel S.Sow 2012 To avoid the More secured as compared to

Based miya; Internatio data redundancy PMS because of the 64 keys.

Image I.Moni nal the pixel and It overlooks the security

Encryptio ca Conferenc magic square while changing to RGB

n Using Tresa; e on method has been format and can be easily be

Magic A.Pra Control combined to attacked.

Square bhu Engineeri give a new Further an attemt to minimize

[S.Sowmi Chakk ng and algorithm that is thetime and space complexity

ya; aravar Communi also named as can be made in future.

I.Monica thy cation Pan Magic

Tresa; Technolog Square

A.Prabhu, y method.The

2012] plain text is

divided into

pixels and a total

of 64 keys are

genrated.

The encrypted

result is very

different when

compared to the

text so can easily

be transmitted

over the internet.

11. A New Jianmi 2012 An attempt is From Lyapunov exponent

Duffing- ng Internatio made to modify simulation we get to know

Lorenz Liu; nal the Duffing and that the improved version has

Chaotic Huijin Conferenc Lorenz chaotic better chotic feature.

Algorithm g Lv e on algorithm A new designed dynamic

and Its Control because it is very mapping is required for this

Applicatio Engineeri simple and application of this version.

n in Image ng and cannot be used Further since it is 6

Encryptio Communi for any dinmmensional we can

n cation encryption simply it to 4 dimensions for

[Jianming Technolog cryptography.It ease of calculations.

Liu; y 6 dimenssional

Huijing and more

Lv,2012] complex.

12. A Novel Xiao 2012 4th This paper In comparison to encrypted

Image Feng, Internatio introduces a image and decrypted image

Encryptio Xiaoli nal method which is the performance is very high

n n Congress based on the when fourier transform is

Algorithm Tian, on Image fractional used.

Based On Shaow and Signal Fourier This algorithm has large

Fractional ei Xia Processin equations for the space for keys which could

Fourier g purpose of image be vulnerable for the

Transform encryption.Furth attacking agents.

and Magic er the magical Further an attempt can be

Cube cube rotation made to ake it more safe

Rotation. scrambling is from the attackers and

[Xiao also used in this enhance the security of the

Feng, paper. data.

Xiaolin

Tian,

Shaowei

Xia,2012]

13. Attack to Shuai 2012 3rd In this paper the The leaking of the

an Image Ren, Internatio security information can be traced

Encryptio Cheng nal concerns of the from the proposed algorithm

n shi Congress image to ensure the security of the

Algorithm Gao, on Image encryption using image transmission.Security

based on Qing and Signal Chaotic maps is of the image is based on the

Improved Dai, Processin addressed. It encryption algoritm any

Chaotic XiaoF g shows how the attempt to compromise it will

Cat Maps ei Fei (CISP201 decryption of the make make it vulnerable to

[Shuai 2) image can be one the attakers.

Ren, by the the In future key gen algorithm

Chengshi proposed key may be used for the

Gao, Qing solved enhancemet of overall

Dai, algorithms very security of the image.

XiaoFei easily.Hencepro

Fei,2012] ves the

vulneribility of

the previous

algorithms

14. Design Zhang 2012 4th This paper The security of the image is

and Lei Internatio presents an enhanced and gray level pixel

Realizatio Li Li nal image produces better results.

n of Image Gao Congress encryption SMS4 as we know is given

Encryptio Xianw on Image algorithm which by State Secrets

n System ei and Signal is derived from Administration.

Based on Processin SMS4 Further an attempt can be

SMS4 g commercial made to design commercial

Commerci cipher algorithm. ciher prooducts.

al Cipher In the proposed

Algorithm algorithm

. encryption,

[Zhang decryption and

Lei Li Li safety

Gao transmission of

Xianwei,2 the image is

012] ensured. The

conclusion

drawn is that

grey level

images are more

equally

distributed.

AES encryption on CUDA

ation and n Li IEEE 14th AES; Electronic decryption make significant

Analysis Cheng Internatio Codebook; performance enhancements.

of AES wen nal Cipher

Encryptio Zhong Conferenc Feedback; Bandwidth of PCI-E bus and

n on GPU. e on High parellel page-lock memory allocation

[Qinjian Performan computing costs are vital limitations. It

Li ce makes the throughput of

Chengwen Computin encryption and decryption

Zhong,201 g and greatly reduced.

2] Communi

cations Future work may investigate

other common encryption

algorithms.

16 High Ahme Ieee AES; ECB; Workload distribution over

Performan d M. CUDA; GPU; threads and thread blocks, to

ce CUDA Mousa Computin Rijndael; gain higher performance

AES Ahme g Throughput;

Implement d A. Conferenc Granularity; Some of the optimizations

ation: A Abdel e Performance such as parallel granularity

Quantitati rahma 18-20 July optimization; tweaking did not have effect

ve nMoh 2017 Encryption on the older platforms.

Performan amed

ce M. development of an auto-tuner

Analysis Fouad that select the best

Approach. and configuration parameters

[Ahmed Hisha based on the GPU

M. Mousa m architecture

Ahmed A. Dahsh

Abdelrah an

manMoha

med M.

Fouad and

Hisham

Dahshan,2

017]

17 AES-128 A. H. IEEE Advanced Advantage :

ECB Khan, SNPD Encryption Input plaintext pattern more

Encryptio M. A. 2014, June Standard (AES); random and less repetitive.

n on GPUs Al- 30-July 2, Parallel Speed up of 87 times

and Mouh 2014 Encryption; compared to modern CPU.

Effects of amed, CUDA based

Input A. Cipher. Execution time increased by

Plaintext Almo taking more random input

Patterns usa, plaintext .

on A.

Performan Fataya In future traditional

ce. r, A. techniques of loop unrolling

[A. H. R. etc. can be applied to

Khan, M. Ibrahi improve performance.

A. Al- m, and

Mouhame A. J.

d, A. Siddiq

Almousa, ui

A.

Fatayar,

A. R.

Ibrahim,

and A. J.

Siddiqui,2

014]

18 Batch Moha ISBN: Batching, AES, Higher throughput when

Processing naraj 978-1- GPU, CUDA, processing data size larger

of Multi- Patcha 4799- Parallelization than 512MB

Variant ppen, 6211-2

AES Yaszri 2015 Greater performance then

Cipher na IEEE multi core CPU at all data

with GPU. Mohd. sizes

[Mohanara Yassin

j , In future support for cipher

Patchappe Ettika processing using 192 bit key

n, n K. size and other parallelizable

Yaszrina Karup mode of operation like CFB

Mohd. piah decryption can be added.

Yassin,

Ettikan K.

Karuppiah

,2015]

19 Implement Jianw 2017 AES , GPU , Shows GPU can accelerate

ation and ei Ma IEEE CBC-AES the speed of implementation

Evaluation ,Xiaoj Second ,CUDA, AES-NI of

of un Internatio AES to a large extent when

Different Chen, nal input data is large .

Parallel Rui Conferenc 112 time Faster than standard

Designs of Xu e on Data AES .

AES ,Jinqia Science in

Using o Shi Cyberspac In future data transferring

CUDA e overhead optimized and the

[Jianwei implementation of other

Ma encryption algorithm on GPU

,Xiaojun can be done .

Chen, Rui

Xu

,Jinqiao

Shi,2017]

20 Side- Chao IEEE Graphics Various challenges of power

Channel Luo, 2015 processing analysis on a GPU are

Power Yunsi 33 ICCD units, Instruction highlighted.

Analysis Fei, sets, AES Successful correlation power

of a GPU Pei Encryption, Har analysis.

AES Luo, dware, Power

Implement Saoni demand, Registe In future it can measure

ation. Mukh rs GPU’s electromagnetic

[Chao erjee, emission signals in a non-

Luo, David invasive manner and asses

Yunsi Fei, Kaeli current work with EM

Pei Luo, signals.

Saoni

Mukherjee The attack method can

, David further be improved.

Kaeli,201

5]

21 Parallel Wagn 2012 Heterogeneous Proposed techniques show

Speculativ er M IEEE Computing, high scalability features to

e Nunan Scalable GPU work with multiple

Encryptio Zola parallel encryption contexts in GPU.

n of algorithm, Accomplishment of low

Multiple Luis Speculative latency.

AES Carlos encryption, AES

Contexts Erpen In future envision other cloud

on GPUs. De computing applications using

[Wagner Bona scalable encryption as well as

M Nunan accelerate secure

Zola communication services .

Luis

Carlos

Erpen De

Bona,2012

]

22. Accelerati Yuhen 2014 Graphics I/O is bottleneck of the

on of AES g Ninth Asia processing performance when using

encryption Yuan ; Joint units, Instruction GPU to obtain parallel

with Conferenc sets, Encryption, processing, and the

Zhen

OpenCL e on Resource throughput rate does not

[Yuheng zhong Informatio management, P include I/O overheads.

Yuan ; Zh He ; Z n Security arallel

enzhong heng processing, Co Future work can be

He ; Zhen Gong mputational experimenting with various

; Weid modeling, Throu techniques to improve I/O

g

ong ghput efficiency and finally

Gong ; W improve the overall efficiency

eidong Qiu

of parallelization. Testing

Qiu scheme on other algorithms

,2014] and apply parallel computing

to cryptanalysis

Image encryption on CUDA

selective QIU, IEEE encryption; of selective encryption in

encryption Gerar Internatio GPGPU; DCT frequency domain for

method for nal uncompressed images like

d

bit maps Symposiu bitmap format.

based on MEM m on

GPU- MI Multimedi Performance gain obtained is

acceleratio a widely usable.

n

[Han QIU, In future work, research on

Gerard parallelizing AES on GPU

MEMMI,2 and evaluating performance

014] gain will be done.

Furthermore, an adaptor can

be designed to auto-

conﬁgure hardware (for

multiple GPU environments)

in order to accelerate SE

method by intelligently

allocating calculation tasks.

24 1D Chaos- Leila Indian Accelerating 1. Execution time of the 1D

based Habib Journal of Image chaos-based image

Image pour*, Science Encryption encryption algorithm in

Encryptio and Process, Graphic parallel implementations had

Shami

n Technolog Processing Unit improvement for huge

Accelerati m y, Vol 9(6) (GPU) colored images

on by Youse February One-Dimension 2. Execution time of the

using GPU fi, 2016 (1D) Chaos- algorithm improved 93.28%

[Leila Mina based Image when a combination of CPU

Habibpour Zolfy Encryption and GPU is used

*, Shamim Lighv Algorithm

Yousefi, Parallel Data

an and

Mina Processing

Zolfy Hadi Parallel Task

Lighvan S. Processing

and Hadi Aghda (PARFOR)

S. si

Aghdasi,2

016]

25 New 1wale Journal of 1.Evaluates 1. The time complexity is

processing ed Theoretica encryption reduced from quadratic O

of chaos- khalid l and techniques using (n^2) to constant O (1) time.

based fast abdulj Applied two dimensional

Informatio chaotic maps

image abbar, n 2.Compares

Technolog strengths of the

encryption syariz y 15th encryption 2.Main bottleneck is the

algorithms a June 2017 algorithms encryption key generation

[waleed abdul- security and time algorithm since chaos-based

responses for

khalid rahma encryption algorithms are

many images

abduljabb n,raza having different dependent on the principle of

ar min sizes feedback system where

syariza ramli current values are dependent

abdul- upon the previous values. As

rahman, a result they can’t be made

razamin

parallel.

ramli]

Description of parallel platform Interest

model created by Nvidia. It allows software developers and software engineers to use

a CUDA-enabled graphics processing unit (GPU) for general purpose processing —

an approach termed GPGPU (General-Purpose computing on Graphics Processing

Units). The CUDA platform is a software layer that gives direct access to the

GPU's virtual instruction set and parallel computational elements, for the execution

of compute kernels.

C++, and Fortran. This accessibility makes it easier for specialists in parallel

programming to use GPU resources, in contrast to prior APIs like Direct3D and

OpenGL, which required advanced skills in graphics programming. Also, CUDA

supports programming frameworks such as OpenACC and OpenCL.

Advantages

GPUs (GPGPU) using graphics APIs:

• Shared memory – CUDA exposes a fast shared memory region that can be

shared among threads. This can be used as a user-managed cache, enabling

higher bandwidth than is possible using texture lookups.

• Full support for integer and bitwise operations, including integer texture

lookups

Steps to install CUDA 10 on Ubuntu 18.04

http://developer.download.nvidia.com/compute/cuda/repos/ubu

ntu1804/x86_64/7fa2af80.pub

http://developer.download.nvidia.com/compute/cuda/repos/ubu

ntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list'

Step 6) Add the following lines to your ~/.profile file for CUDA 10.0

# set PATH for cuda 10.0 installation

if [ -d "/usr/local/cuda-10.0/bin/" ]; then

export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}

export LD_LIBRARY_PATH=/usr/local/cuda-

10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

fi

Step 7) Reboot the computer

Step-10) Sample program

#include<stdio.h>

#include<cuda.h>

void fromCPU()

}

int main()

fromCPU(); fromGPU<<<2,3>>>();

cudaDeviceSynchronize(); return 0;

}

Algorithm explanation and identifying the areas of parallelism.

The Advanced Encryption Standard (AES), also known by its original name

Rijndael is a specification for the encryption of electronic data established by the

U.S. National Institute of Standards and Technology (NIST) in 2001.

by two Belgian cryptographers, Vincent Rijmen and Joan Daemen, who submitted

a proposal to NIST during the AES selection process. Rijndael is a family of

ciphers with different key and block sizes.

For AES, NIST selected three members of the Rijndael family, each with a block

size of 128 bits, but three different key lengths: 128, 192 and 256 bits.

As DES has a smaller key size which makes it less secure to overcome this triple

DES was introduced but it turns out to be slower. Hence, later AES was introduced

by the National Institute of Standard and Technology.

The basic difference between DES and AES is that in DES plaintext block is

divided into two halves before the main algorithm starts whereas, in AES the entire

block is processed to obtain the ciphertext.

• AES supports larger key sizes than 3DES's 112 or 168 bits.

How AES encryption works?

AES comprises three block ciphers: AES-128, AES-192 and AES-256. Each

cipher encrypts and decrypts data in blocks of 128 bits using cryptographic keys of

128-, 192-and 256-bits, respectively. The Rijndael cipher was designed to accept

additional block sizes and key lengths, but for AES, those functions were not

adopted.

The key size used for an AES cipher specifies the number of transformation rounds

that convert the input, called the plaintext, into the final output, called the

ciphertext. The number of rounds are as follows:

Each round consists of several processing steps, including one that depends on the

encryption key itself. A set of reverse rounds are applied to transform ciphertext

back into the original plaintext using the same encryption key.

1. KeyExpansion—round keys are derived from the cipher key using

Rijndael's key schedule. AES requires a separate 128-bit round key block

for each round plus one more.

the round key using bitwise xor.

3. 9, 11 or 13 rounds:

replaced with another according to a lookup table.

state are shifted cyclically a certain number of steps.

columns of the state, combining the four bytes in each column.

4. AddRoundKey

1. SubBytes

2. ShiftRows

3. AddRoundKey

In the SubBytes step, each byte in the state is replaced with its entry in a fixed 8-bit

lookup table

In the ShiftRows step, bytes in each row of the state are shifted cyclically to the

left. The number of places each byte is shifted differs for each row.

In the MixColumns step, each column of the state is multiplied with a fixed

polynomial c(x)

In the AddRoundKey step, each byte of the state is combined with a byte of the

round subkey using the XORoperation (⊕).

Parallelization strategy

⚫ For encryption, the image file to be encrypted has been divided into

numerous blocks of 128 bits.

⚫ All the rounds of AES has to been ran on each of these 128 bit blocks

parallely

has 384 cores each. So we choose the number of CUDA blocks in the grid

as the number of processors on our GPU (i.e. 3)

⚫ The number of threads in each block is the number of cores (i.e. 384)

Implementation Details

We start by taking the image to be encrypted as one of the inputs. After this, we

input the private key which the users want to use to encrypt their images. The key-

lengths accepted by the AES cipher are 128, 192 or 256 bits, depending upon the

strength of the encryption required. Longer keys require more rounds of processing

to provide better security but at the same time they require higher processing time.

Hence, the choice of key-length is left to the user. Users are required to enter a text

of their choice of any arbitrary length. This text can be anything - a paragraph, a

sentence, a word or even a single character. We then calculate the hash digest of

the entered text with the hashing algorithm depending upon the key length

required. We used MD5 for 128 bit, SHA-192 for 192-bit and SHA-256 for 256-bit

length keys. These keys will further be expanded in accordance to the AES

algorithm automatically without the intervention of the user. We have used

OpenCV C++ library for pixel-wise encryption of images using the cipher-block

chain (CBC) mode of encryption. For this purpose, we read the image into our

program in the RGB mode so that we will have three channels, one each for red,

green and blue colours. Each has its own 8-bit value (0-255). Hence, we use an

unsigned char data type to represent each channel component of each pixel. These

will be stored in the form of a two-dimensional matrix. AES is a block cipher

which operates upon 128-bit of input at a time. Hence, we need to divide our image

into blocks of sixteen pixels each. This is because each pixel is an 8-bit unsigned

character and sixteen pixels together will sum to 128-bits. Pad- ding can be used to

make sum to 128 bits if the number of pixels is lesser than 16 in any block.

NVIDIA CUDA provides abstraction to the programmer. It offers a single GRID

divided into BLOCKS containing equal number of THREADS. The CUDA

platform architecture is greatly simplified and is shown in Fig. 1. Allocating each

of these user- defined threads to a GPU core is done automatically by the CUDA.

Fig. 1. CUDA Grid, Blocks and Threads

For efficiency, we have taken the number of blocks in grid as the number of

processors in the multiprocessor GPU. Our GPU has three processors and hence

we made three blocks in our grid. We divide the number of pixel-blocks of input

image by this number of blocks to obtain the number of threads. If the number of

threads in a block exceeds 1024, we allocate a new block to maintain the number.

Each block of sixteen - pixels will then be run on an independent thread for the

further rounds of AES. This will ensure maximum parallelization.

After obtaining the cipher-text, we can optionally convert the cipher-text into

hexadecimal numbers and then represent each encrypted pixel as a combination of

these hexadecimal numbers taking two numbers at a time. These pixels can then be

displayed in the form of an encrypted image. This image will mostly contain

random structures. The advantage of this conversion of cipher text into an image is

the reduction in the size of the encrypted file, provided a lossless image

compression technique has been used.

The same technique in the reverse order has been used for decrypting the encrypted

image back to the original image.

Evaluation results

We processed various images of different resolutions ranging from an HD image

of size 0.3 MB to high-resolution images of size 11.6 MB using both 128-bit as

well as 256-bit keys. Fig. 2 shows the original image and the encrypted image

obtained. After that we compared the time taken for processing between GPU and

CPU. The configuration details of the machine the experiments were performed

upon are tabulated in Table 1. The timing results for images are tabulated in Table

2 for 128-bit key and Table 3 for 256-bit key. Table 4 and Table 5 list the timing

results for text files of same sizes as that of images for 128-bit and 256-bit key

respectively.

Note that these timings depict the total time taken to read an image, encrypt it,

generate and save the encrypted image, then again read the encrypted image,

decrypt it and finally output the original image.

GPU CPU

NVIDIA GeForce

Intel® Core™ i5

940MX

7th Generation

2 GB DDR3

4 cores

3 Multiprocessors

8 GB DDR4 RAM

128 CUDA Cores/MP

Fig. 2. Notice how the encrypted image has random pixels and hence original

image cannot be guessed just by looking at it.

Table 2. Timing results for encrypting/decrypting images with a 128-bit key

File Image GP CPU Speed Throughput

size U up (Mbps)

(MB) resoluti (sec (sec) GPU CPU

on )

11.6 7680x4 29.4 119.6 4.1x 3.15 0.77

320 2 3

5.32 5120x2 13.1 52.36 4x 3.22 0.81

880 9

3.17 3840x2 7.40 29.67 4x 3.42 0.85

160

1.32 2560x1 3.36 13.20 3.9x 3.14 0.80

440

0.71 1920x1 2.02 7.44 3.7x 2.81 0.76

080

0.30 1280x7 0.91 3.34 3.6x 2.64 0.72

20

File Image GP CPU Speed Throughput

size U up (Mbps)

(MB) resoluti (sec (sec) GPU CPU

on )

11.6 7680x4 29.5 147.6 5x 3.14 0.63

320 2 3

5.32 5120x2 13.2 64.94 4.9x 3.20 0.65

880 8

3.17 3840x2 7.58 36.71 4.8x 3.34 0.69

160

1.32 2560x1 3.53 16.49 4.7x 2.99 0.64

440

0.71 1920x1 1.90 9.14 4.8x 2.99 0.62

080

0.30 1280x7 0.91 4.09 4.5x 2.64 0.59

20

Table 4. Timing results for encrypting/decrypting text files with a 128-bit key

size U up (Mbps)

(MB) (sec (sec) GPU CP

) U

11.6 29.5 147.6 5x 3.14 0.63

2 3

5.32 13.2 64.94 4.9x 3.20 0.65

8

3.17 7.58 36.71 4.8x 3.34 0.69

1.32 3.53 16.49 4.7x 2.99 0.64

0.71 1.90 9.14 4.8x 2.99 0.62

0.30 0.91 4.09 4.5x 2.64 0.59

Table 5. Timing results for encrypting/decrypting text files with a 256-bit key

File GP CPU Speed Throughput

size U up (Mbps)

(MB) (sec (sec) GPU CP

) U

11.6 1.33 6.19 4.65x 69.77 14.9

9

5.32 0.66 2.86 4.33x 64.48 14.8

8

3.17 0.46 1.66 3.61x 55.13 15.2

7

1.32 0.21 0.71 3.38x 50.28 14.8

7

0.71 0.17 0.40 2.35x 33.41 14.2

0

0.30 0.12 0.16 1.33x 20.00 15.0

0

We observe that it takes a long time for the CPU to encrypt an image while a GPU

can complete similar task in a fraction of time. We can infer from Figure 1 that

there is just a small difference between the execution times on GPU and CPU for

low resolution images like 1280x720 and 1920x1080 images. But as we

encrypt/decrypt high resolution images of the order of 5120x2880 and 7680x4320,

the time difference becomes significant. Another interesting observation that we

make here is that the time taken on GPU seems to be independent on the key size

used i.e. there is not much time difference between 128-bit and 256-bit key

encryptions. On the other hand, the time taken for encryption/decryption increases

significantly when a 256-bit key is used in place of a 128-bit key for the CPU-only

implementation.

Performance parameters considered

Fig. 3. GPU vs CPU: Time comparison for different resolution images for both key sizes

Fig. 4. GPU vs CPU: Time comparison for different text files for both key sizes

Conclusion

We have presented a method to accelerate image encryption/decryption with AES

using the GPU. We have also calculated the timing results for images of different

resolutions and have compared them with a CPU-only computation. The speed-up

on GPU achieved is around four times with respect to the CPU. Along with this we

noted an interesting observation that the encryption/decryption time of image

seems to be independent of the key-size when the encryption has been done in

parallel on the GPU. On the other hand, the time taken for encryption/decryption of

image increased significantly when a serial computation was done on the CPU.

Another observation is that the speed up on GPU over CPU increases with the

increase in the resolution of images and also when longer keys are used i.e. 256-bit

key instead of 128-bit key. Hence, the user need not compromise the security of

the image to encrypt it in a shorter time by taking a smaller key.

Our future aim is to try the same parallel implementation on other parallel

architectures too and compare their performance. We also aim to reduce the space

complexity of our implementation. With increasing sharing of multimedia over

social networking sites, we also need efficient encryption techniques for secure

transfer of media types like audio and videos. Also, the method described in this

paper works on GPUs of personal computers. We aim to extend this technique to

GPUs of modern smartphones.

References

[1] Nagendra, M., & Sekhar, M. C. (2014). Performance improvement of

Advanced Encryption Algorithm using parallel computation. International

Journal of Software Engineering and Its Applications, 8(2), 287-296.

fast parallel- pipelined implementation of AES: advanced encryption standard.

arXiv preprint arXiv:1501.01427.

Operation.

GPU using CUDA. In 2010 European Conference for the Applied

Mathematics & Informatics. World Scientific and Engineering Academy

and Society Press.

[5] Soltani, A., & Sharifian, S. (2015). An ultra-high throughput and fully

pipelined implementation of AES algorithm on FPGA. Microprocessors and

Microsystems, 39(7), 480-493.

[6] Li, Q., Zhong, C., Zhao, K., Mei, X., & Chu, X. (2012, June).

Implementation and analysis of AES encryption on GPU. In High

Performance Computing and Communication & 2012 IEEE 9th International

Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE

14th International Conference on (pp. 843-848). IEEE.

USING GPU’S FOR AES ALGORITHM.

[8] Noer, D., Engsig-Karup, A. P., & Zenner, E. (2011). Improved software

implementation of DES using CUDA and OpenCL. In Western European

Workshop on Research in Cryptology.

[9] Afarin, R., & Mozaffari, S. (2013, September). Image encryption using

genetic algorithm. In Machine Vision and Image Processing (MVIP), 2013 8th

Iranian Conference on(pp. 441-445). IEEE.

[10] Sowmiya, S., Tresa, I. M., & Chakkaravarthy, A. P. (2017, February). Pixel

based

image encryption using magic square. In Algorithms, Methodology, Models

and Applications in Emerging Technologies (ICAMMAET), 2017 International

Conference on (pp. 1-4). IEEE.

Chaotic Algorithm and Its Application in Image Encryption. In

Control Engineering and Communication Technology (ICCECT),

2012 International Conference on (pp. 1022-1025). IEEE.

[12] Feng, X., Tian, X., & Xia, S. (2011, October). A novel image encryption

algorithm based on fractional fourier transform and magic cube rotation. In

Image and Signal Processing (CISP), 2012 4th International Congress on

(Vol. 2, pp. 1008-1011). IEEE.

[13] Ren, S., Gao, C., Dai, Q., & Fei, X. (2010, October). Attack to an

image encryption algorithm based on improved chaotic cat maps. In Image

and Signal Processing (CISP), 2012 3rd International Congress on (Vol. 2,

pp. 533-536). IEEE.

[14] Lei, Z., Li, L., & Xianwei, G. (2011, October). Design and

realization of image encryption system based on SMS4 commercial cipher

algorithm. In Image and Signal Processing (CISP), 2012 4th International

Congress on (Vol. 2, pp. 741- 744). IEEE.

[15] Li, Q., Zhong, C., Zhao, K., Mei, X., & Chu, X. (2012, June).

Implementation and analysis of AES encryption on GPU. In High

Performance Computing and Communication & 2012 IEEE 9th International

Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE

14th International Conference on (pp. 843-848). IEEE.

M. (2017, July). High performance CUDA AES implementation: A

quantitative performance analysis approach. In Computing

Conference, 2017 (pp. 1077- 1085). IEEE.

R., & Siddiqui, A. J. (2014, June). Aes-128 ecb encryption on gpus and effects

of input plaintext patterns on performance. In Software Engineering, Artificial

Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2014

15th IEEE/ACIS International Conference on (pp. 1-6). IEEE.

[18] Patchappen, M., Yassin, Y. M., & Karuppiah, E. K. (2015, April). Batch

processing of multi-variant AES cipher with GPU. In Computing Technology

and Information Management (ICCTIM), 2015 Second International

Conference on (pp. 32-36). IEEE.

[19] Ma, J., Chen, X., Xu, R., & Shi, J. (2017, June). Implementation and

Evaluation of Different Parallel Designs of AES Using CUDA. In Data

Science in Cyberspace (DSC), 2017 IEEE Second International Conference on

(pp. 606-614). IEEE.

[20] Luo, C., Fei, Y., Luo, P., Mukherjee, S., & Kaeli, D. (2015, October).

Side-channel power analysis of a GPU AES implementation. In 2015 33rd

IEEE International Conference on Computer Design (ICCD) (pp. 281-288).

IEEE.

encryption of multiple AES contexts on GPUs. In Innovative Parallel

Computing (InPar), 2012 (pp. 1- 9). IEEE.

[22] Yuan, Y., He, Z., Gong, Z., & Qiu, W. (2014, September).

Acceleration of AES encryption with OpenCL. In Information Security

(ASIA JCIS), 2014 Ninth Asia Joint Conference on(pp. 64-70). IEEE.

[23] Qiu, H., & Memmi, G. (2014, December). Fast selective encryption

method for bitmaps based on GPU acceleration. In Multimedia (ISM),

2014 IEEE International Symposium on (pp. 155-158). IEEE.

[24] Habibpour, L., Yousefi, S., Lighvan, M. Z., & Aghdasi, H. S. (2016).

1D Chaos- based image encryption acceleration by using GPU. Indian

Journal of Science and Technology, 9(6).

[25] ABDULJABBAR, W. K., ABDUL-RAHMAN, S. Y. A. R. I. Z. A., &

RAMLI, R. (2017). A NEW PROCESSING OF CHAOS-BASED FAST

IMAGE ENCRYPTION ALGORITHMS. Journal of Theoretical & Applied

Information Technology, 95(11).

## Гораздо больше, чем просто документы.

Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.

Отменить можно в любой момент.