Вы находитесь на странице: 1из 39

Accelerating Image Encryption with AES using GPU: A Quantitative Analysis


Submitted for the course: Parallel and Distributed Computing (CSE4001)



Aryan Saxena 16BCE0022
Rajdeepa Chakrabarty 16BCE0732
Vatsal Agrawal 16BCB0090
S. Shubhjeet Singh 16BCE2315

Slot: B1

Name of faculty: DR. SAIRA BANU


October, 2018
Images play an important role in the modern-day world. Confidentiality, integrity and
authentication of images are as important as that of a text. But performing
encryption/decryption of images is significantly different than plain text due to their large
size and redundancy of pixels. With the rise of the powerful GPUs containing thousands of
high-performance and efficient cores, the cryptographic algorithms which were once thought
as secured can now be easily broken in a matter of seconds. Earlier increasing complexity
meant an increase in the processing time for encryption as well as decryption. Now, with the
advent of these cutting-edge GPUs and the evolvement of GPU computing, the processing
time has been reduced to a fraction of the time it used to take earlier. This paper presents a
parallel implementation of AES using NVIDIA CUDA and OpenCV to encrypt images
rapidly. We achieved an average speed up of four times on GPU as compared to CPU-only
for both 128-bit as well as 256-bit keys.

Keywords: AES, GPU, Image Encryption, Parallelization, CUDA.

Parallel programming today is being exploited in various arenas of computer science to
improve the performance. It is dominating the market and there is an ever-growing demand
to make more and more algorithms adaptive to parallel computing. Moreover, GPU now is
not restricted to application in graphical computation, it is being used to perform general
purpose computing too. Hence, it has given us the freedom to utilize GPU on various new
applications to accelerate the processing time.

Security of data is the prime concern of this era. We need a much higher security in data
transmission as compared to previous generations. AES-256 (Advanced Encryption
Algorithm) is one of the most popular and extensively used algorithm in cryptography. It is
secured enough to be used by the US government for transmitting their classified
information. It basically consists of different rounds of permutations and substitutions,
depending upon the key size: 10 rounds for 128-bit, 12 for 192-bit and 14 rounds for 256-
bits. Security increases with increasing key size. The implementation of the AES algorithm
on the image differs highly from a general text due to its large size and the various lossy or
lossless compression techniques used on image. The images can be viewed as a two-
dimensional matrix of integers of varying values depending upon the bit depth of the image.
Higher resolution images have a large number of pixels and due to this higher size, it takes a
long time to process them. Hence, they have to be efficiently implemented on a GPU.

The advantage in our methodology is that we can share the encrypted file in the image form.
Without knowing the key, it is computationally infeasible (at the time of writing this paper)
to know the contents of the image as all the image will be just a random pattern of different
colours. The next section presents a brief review of different methods researchers have used
in the past for encryption/decryption of text and images. We then present our proposed
method in detail along with the observations and results in the subsequent sections.
Literature Review:

S. Title Autho Journal Key Concepts Advantages, disadvantages

No r name & future enhancement
and Date
AES Encryption algorithms on text
1 Performan M. Internatio 1. (Advanced 1. It takes 40-45% less time
ce Nagen nal Encryption for performing the encryption
Improvem dra #1 Journal of Algorithm) and decryption than the
ent of and Software cryptography
sequential implementation
Advanced M. Engineeri algorithm
Encryptio Chand ng and Its implementation 2. Parallel Implementation
n ra Applicati on dual core using gpu would have given
Algorithm Sekha o ns processor by lesser time which shall be
using r #2 Vol.8 using OpenMP implemented in future.
Parallel , No.2 API
Computati 2. Reduces the
on. [M. execution time
#1 and M.
2 A design Ghada Internatio 1. Used eleven 1.Pipelining improves both
of a fast F.Elka nal stages of encryption and decryption by
parallel- bbany, Journal of pipelining in approximately 95%
pipelined Heba Computer
order to exploit
implement K.Asl Science &
ation of an and Informatio the sources of
aes: Moha n parallelism in
advanced med Technolog both initial and
encryption N.Ras y (IJCSIT) final
standard. slan Vol 6, No round.
[GhadaF.E 6, 2. Combines
lkabbany, December
pipelining of
HebaK.As 2014
lan and rounds and
Mohamed parallelization of
N.Rasslan, Mix_Column
2014] and

3 Parallel Muna Lecture 1. Paper 1. This algorithm performed

Implement Shrest Notes on describes an correct encryption efficiently
ation of ha Informatio implementation with a speedup of 1.6 and
AES n Theory
of AES using with 37% efficiency
Using Vol. 1,
XTS No. 4, XTS mode in 2. Future work aims at
Mode of December parallel via MPI. increasing number of
Operation. 2013 processing elements and data
[Muna size at a constant rate.
4 Aes on tomoi Internatio 1.Paper tells 1. Since data stored in local
gpu using agă nal about memory of the GPU this
cuda radu Journal of Implementation implementation
[tomoiagă daniel, Informatio Offers speedups of
of AES on
radu stratul n Sciences
daniel, at and NVIDIA GPU almost 40 times in
stratulat mircea Technique using CUDA comparison to the CPU
mircea,20 s (IJIST) 2. Future work aims to
14] Vol.2, implement and test AES 192 ,
No.6, AES 256 on GPU ,encryption
November and decryption and OpenSSL
adaptation to use these

5 An ultra- Abolf Microproc 1. Area-delay 1.This implementation

high azlSol essors and efficient achieves a high throughput of
throughpu tani, Microsyst multipliers and 260.15 Gbps.
t and fully Saeed ems 39 multiplicative 2. A maximum operational
Sharifi (2015)
pipelined 480–493 inverters in frequency of 508.104 MHz is
implement GF2^8. also achieved.
ation of 2. Loop-
AES unrolling, fully
algorithm pipelining and
on FPGA sub-pipelining
[AbolfazlS techniques
oltani, employed in all
Saeed proposed
Sharifian, methods. 3.
2015] Inserted registers
of pipelining in
6 Implement Qinjia Internatio 1. Implementatio 1. A very high performance of
ation and n Li nal n of Electronic around 60 Gbps throughput
Analysis Cheng Journal of Codebook on NVIDIA Tesla C2050
wen Networkin
of AES (ECB) mode GPU, which is 50 times faster
Zhong g and
Encryptio Computin encoding than a sequential
n on GPU g 2015 process and implementation
.[ Qinjian Cipher Feedback
Li (CBC) mode
Chengwen decoding on
Zhong,201 cuda process on
5] GPU.
2. T-boxes are
allocated on on-
chip shared
3. One single
thread handles a
16 Bytes AES
block .

7 Accelerati Sanjan Internatio 1.Implementatio 1.achievs great speed. If the

ng aashre nal n using CUDA amount of data is large, the
encryption e P Journal of platform. encryption/decryption time
/decryptio required is greatly reduced, if
n using Engineeri it runs on a graphics
gpu’s for ng processing environment.
aes Research, 2.Future work aims at
algorithm. Volume 4, efficient implementations of
[Sanjanaas Issue 2, other common symmetric and
hree February- asymmetric algorithms.
P,2013] Implementations of hashing
and public key algorithms on
gpu also may be done.
8. Improved D. Internatio 1. Fast DES 1. High performance
Software Noer∗ nal bitslice computing
Implement , A.P. Journal of bruteforce 2.Future work aims at
ation of Engsi Scientific software tool improvements in
g- &
DES Engineeri utilizing Performance on AMD GPUs
Using ∗ , E. ng consumer which will require an
CUDA Zenne Research Graphics implementation of the model
and r† Processing in OpenCL.
OpenCL 2014 Units(GPUs)
[D. Noer∗
, A.P.
Karup∗ ,

Image Encryption

9. Image Roza 2013 8th The paper The results obtained from the
Encryptio Afarin Iranian proposes an analysis shows the algorithm
n Using ; Conferenc improved can be applied successfully
Genetic Saeed e on version of for any type of image.
Algorithm Mozaf Machine Genetic The result is sensitive to key
[Roza fari Vision and Algorithm for and distortions in the image.
Afarin; Image the image Further there can be an
Saeed Processin encryption. In attempt to increase the cipher
Mozaffari, g (MVIP) the inital steps value to get a better result.
2013] the image is
randomly and
then divided into
four parts.urther
crossover and
mutation is
applied. If
entropy of the
final result
randomness are
obtained through
coefficient and
10. Pixel S.Sow 2012 To avoid the More secured as compared to
Based miya; Internatio data redundancy PMS because of the 64 keys.
Image I.Moni nal the pixel and It overlooks the security
Encryptio ca Conferenc magic square while changing to RGB
n Using Tresa; e on method has been format and can be easily be
Magic A.Pra Control combined to attacked.
Square bhu Engineeri give a new Further an attemt to minimize
[S.Sowmi Chakk ng and algorithm that is thetime and space complexity
ya; aravar Communi also named as can be made in future.
I.Monica thy cation Pan Magic
Tresa; Technolog Square
A.Prabhu, y method.The
2012] plain text is
divided into
pixels and a total
of 64 keys are
The encrypted
result is very
different when
compared to the
text so can easily
be transmitted
over the internet.
11. A New Jianmi 2012 An attempt is From Lyapunov exponent
Duffing- ng Internatio made to modify simulation we get to know
Lorenz Liu; nal the Duffing and that the improved version has
Chaotic Huijin Conferenc Lorenz chaotic better chotic feature.
Algorithm g Lv e on algorithm A new designed dynamic
and Its Control because it is very mapping is required for this
Applicatio Engineeri simple and application of this version.
n in Image ng and cannot be used Further since it is 6
Encryptio Communi for any dinmmensional we can
n cation encryption simply it to 4 dimensions for
[Jianming Technolog cryptography.It ease of calculations.
Liu; y 6 dimenssional
Huijing and more
Lv,2012] complex.
12. A Novel Xiao 2012 4th This paper In comparison to encrypted
Image Feng, Internatio introduces a image and decrypted image
Encryptio Xiaoli nal method which is the performance is very high
n n Congress based on the when fourier transform is
Algorithm Tian, on Image fractional used.
Based On Shaow and Signal Fourier This algorithm has large
Fractional ei Xia Processin equations for the space for keys which could
Fourier g purpose of image be vulnerable for the
Transform encryption.Furth attacking agents.
and Magic er the magical Further an attempt can be
Cube cube rotation made to ake it more safe
Rotation. scrambling is from the attackers and
[Xiao also used in this enhance the security of the
Feng, paper. data.
13. Attack to Shuai 2012 3rd In this paper the The leaking of the
an Image Ren, Internatio security information can be traced
Encryptio Cheng nal concerns of the from the proposed algorithm
n shi Congress image to ensure the security of the
Algorithm Gao, on Image encryption using image transmission.Security
based on Qing and Signal Chaotic maps is of the image is based on the
Improved Dai, Processin addressed. It encryption algoritm any
Chaotic XiaoF g shows how the attempt to compromise it will
Cat Maps ei Fei (CISP201 decryption of the make make it vulnerable to
[Shuai 2) image can be one the attakers.
Ren, by the the In future key gen algorithm
Chengshi proposed key may be used for the
Gao, Qing solved enhancemet of overall
Dai, algorithms very security of the image.
XiaoFei easily.Hencepro
Fei,2012] ves the
vulneribility of
the previous
14. Design Zhang 2012 4th This paper The security of the image is
and Lei Internatio presents an enhanced and gray level pixel
Realizatio Li Li nal image produces better results.
n of Image Gao Congress encryption SMS4 as we know is given
Encryptio Xianw on Image algorithm which by State Secrets
n System ei and Signal is derived from Administration.
Based on Processin SMS4 Further an attempt can be
SMS4 g commercial made to design commercial
Commerci cipher algorithm. ciher prooducts.
al Cipher In the proposed
Algorithm algorithm
. encryption,
[Zhang decryption and
Lei Li Li safety
Gao transmission of
Xianwei,2 the image is
012] ensured. The
drawn is that
grey level
images are more
AES encryption on CUDA

15 Implement Qinjia 2012 -CUDA; GPU; AES encryption and

ation and n Li IEEE 14th AES; Electronic decryption make significant
Analysis Cheng Internatio Codebook; performance enhancements.
of AES wen nal Cipher
Encryptio Zhong Conferenc Feedback; Bandwidth of PCI-E bus and
n on GPU. e on High parellel page-lock memory allocation
[Qinjian Performan computing costs are vital limitations. It
Li ce makes the throughput of
Chengwen Computin encryption and decryption
Zhong,201 g and greatly reduced.
2] Communi
cations Future work may investigate
other common encryption
16 High Ahme Ieee AES; ECB; Workload distribution over
Performan d M. CUDA; GPU; threads and thread blocks, to
ce CUDA Mousa Computin Rijndael; gain higher performance
AES Ahme g Throughput;
Implement d A. Conferenc Granularity; Some of the optimizations
ation: A Abdel e Performance such as parallel granularity
Quantitati rahma 18-20 July optimization; tweaking did not have effect
ve nMoh 2017 Encryption on the older platforms.
Performan amed
ce M. development of an auto-tuner
Analysis Fouad that select the best
Approach. and configuration parameters
[Ahmed Hisha based on the GPU
M. Mousa m architecture
Ahmed A. Dahsh
Abdelrah an
med M.
Fouad and
17 AES-128 A. H. IEEE Advanced Advantage :
ECB Khan, SNPD Encryption Input plaintext pattern more
Encryptio M. A. 2014, June Standard (AES); random and less repetitive.
n on GPUs Al- 30-July 2, Parallel Speed up of 87 times
and Mouh 2014 Encryption; compared to modern CPU.
Effects of amed, CUDA based
Input A. Cipher. Execution time increased by
Plaintext Almo taking more random input
Patterns usa, plaintext .
on A.
Performan Fataya In future traditional
ce. r, A. techniques of loop unrolling
[A. H. R. etc. can be applied to
Khan, M. Ibrahi improve performance.
A. Al- m, and
Mouhame A. J.
d, A. Siddiq
Almousa, ui
A. R.
and A. J.
18 Batch Moha ISBN: Batching, AES, Higher throughput when
Processing naraj 978-1- GPU, CUDA, processing data size larger
of Multi- Patcha 4799- Parallelization than 512MB
Variant ppen, 6211-2
AES Yaszri 2015 Greater performance then
Cipher na IEEE multi core CPU at all data
with GPU. Mohd. sizes
[Mohanara Yassin
j , In future support for cipher
Patchappe Ettika processing using 192 bit key
n, n K. size and other parallelizable
Yaszrina Karup mode of operation like CFB
Mohd. piah decryption can be added.
Ettikan K.
19 Implement Jianw 2017 AES , GPU , Shows GPU can accelerate
ation and ei Ma IEEE CBC-AES the speed of implementation
Evaluation ,Xiaoj Second ,CUDA, AES-NI of
of un Internatio AES to a large extent when
Different Chen, nal input data is large .
Parallel Rui Conferenc 112 time Faster than standard
Designs of Xu e on Data AES .
AES ,Jinqia Science in
Using o Shi Cyberspac In future data transferring
CUDA e overhead optimized and the
[Jianwei implementation of other
Ma encryption algorithm on GPU
,Xiaojun can be done .
Chen, Rui
20 Side- Chao IEEE Graphics Various challenges of power
Channel Luo, 2015 processing analysis on a GPU are
Power Yunsi 33 ICCD units, Instruction highlighted.
Analysis Fei, sets, AES Successful correlation power
of a GPU Pei Encryption, Har analysis.
AES Luo, dware, Power
Implement Saoni demand, Registe In future it can measure
ation. Mukh rs GPU’s electromagnetic
[Chao erjee, emission signals in a non-
Luo, David invasive manner and asses
Yunsi Fei, Kaeli current work with EM
Pei Luo, signals.
Mukherjee The attack method can
, David further be improved.
21 Parallel Wagn 2012 Heterogeneous Proposed techniques show
Speculativ er M IEEE Computing, high scalability features to
e Nunan Scalable GPU work with multiple
Encryptio Zola parallel encryption contexts in GPU.
n of algorithm, Accomplishment of low
Multiple Luis Speculative latency.
AES Carlos encryption, AES
Contexts Erpen In future envision other cloud
on GPUs. De computing applications using
[Wagner Bona scalable encryption as well as
M Nunan accelerate secure
Zola communication services .

Erpen De
22. Accelerati Yuhen 2014 Graphics I/O is bottleneck of the
on of AES g Ninth Asia processing performance when using
encryption Yuan ; Joint units, Instruction GPU to obtain parallel
with Conferenc sets, Encryption, processing, and the
OpenCL e on Resource throughput rate does not
[Yuheng zhong Informatio management, P include I/O overheads.
Yuan ; Zh He ; Z n Security arallel
enzhong heng processing, Co Future work can be
He ; Zhen Gong mputational experimenting with various
; Weid modeling, Throu techniques to improve I/O
ong ghput efficiency and finally
Gong ; W improve the overall efficiency
eidong Qiu
of parallelization. Testing
Qiu scheme on other algorithms
,2014] and apply parallel computing
to cryptanalysis
Image encryption on CUDA

23. Fast Han 2014 image selective Provides fast implementation

selective QIU, IEEE encryption; of selective encryption in
encryption Gerar Internatio GPGPU; DCT frequency domain for
method for nal uncompressed images like
bit maps Symposiu bitmap format.
based on MEM m on
GPU- MI Multimedi Performance gain obtained is
acceleratio a widely usable.
[Han QIU, In future work, research on
Gerard parallelizing AES on GPU
MEMMI,2 and evaluating performance
014] gain will be done.
Furthermore, an adaptor can
be designed to auto-
configure hardware (for
multiple GPU environments)
in order to accelerate SE
method by intelligently
allocating calculation tasks.
24 1D Chaos- Leila Indian Accelerating 1. Execution time of the 1D
based Habib Journal of Image chaos-based image
Image pour*, Science Encryption encryption algorithm in
Encryptio and Process, Graphic parallel implementations had
n Technolog Processing Unit improvement for huge
Accelerati m y, Vol 9(6) (GPU) colored images
on by Youse February One-Dimension 2. Execution time of the
using GPU fi, 2016 (1D) Chaos- algorithm improved 93.28%
[Leila Mina based Image when a combination of CPU
Habibpour Zolfy Encryption and GPU is used
*, Shamim Lighv Algorithm
Yousefi, Parallel Data
an and
Mina Processing
Zolfy Hadi Parallel Task
Lighvan S. Processing
and Hadi Aghda (PARFOR)
S. si
25 New 1wale Journal of 1.Evaluates 1. The time complexity is
processing ed Theoretica encryption reduced from quadratic O
of chaos- khalid l and techniques using (n^2) to constant O (1) time.
based fast abdulj Applied two dimensional
Informatio chaotic maps
image abbar, n 2.Compares
Technolog strengths of the
encryption syariz y 15th encryption 2.Main bottleneck is the
algorithms a June 2017 algorithms encryption key generation
[waleed abdul- security and time algorithm since chaos-based
responses for
khalid rahma encryption algorithms are
many images
abduljabb n,raza having different dependent on the principle of
ar min sizes feedback system where
syariza ramli current values are dependent
abdul- upon the previous values. As
rahman, a result they can’t be made
Description of parallel platform Interest

We have used CUDA for the implementation of the proposed algorithm.

CUDA is a parallel computing platform and application programming interface (API)

model created by Nvidia. It allows software developers and software engineers to use
a CUDA-enabled graphics processing unit (GPU) for general purpose processing —
an approach termed GPGPU (General-Purpose computing on Graphics Processing
Units). The CUDA platform is a software layer that gives direct access to the
GPU's virtual instruction set and parallel computational elements, for the execution
of compute kernels.

The CUDA platform is designed to work with programming languages such as C,

C++, and Fortran. This accessibility makes it easier for specialists in parallel
programming to use GPU resources, in contrast to prior APIs like Direct3D and
OpenGL, which required advanced skills in graphics programming. Also, CUDA
supports programming frameworks such as OpenACC and OpenCL.


CUDA has several advantages over traditional general-purpose computation on

GPUs (GPGPU) using graphics APIs:

• Scattered reads – code can read from arbitrary addresses in memory

• Unified virtual memory (CUDA 4.0 and above)

• Unified memory (CUDA 6.0 and above)

• Shared memory – CUDA exposes a fast shared memory region that can be
shared among threads. This can be used as a user-managed cache, enabling
higher bandwidth than is possible using texture lookups.

• Faster downloads and readbacks to and from the GPU

• Full support for integer and bitwise operations, including integer texture
Steps to install CUDA 10 on Ubuntu 18.04

Step 1) Get Ubuntu 18.04 installed!

Step 2) Install the key:

sudo apt-key adv --fetch-keys


Step 3) Add the repository:

sudo bash -c 'echo "deb

ntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list'

Step 3) Update the apt cache:

sudo apt update

Step 4) Get the NVIDIA driver installed

sudo apt install nvidia-driver-410

Step 5) Install CUDA 10.0.

sudo apt install cuda-10-0

Step 6) Add the following lines to your ~/.profile file for CUDA 10.0
# set PATH for cuda 10.0 installation
if [ -d "/usr/local/cuda-10.0/bin/" ]; then

export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}

export LD_LIBRARY_PATH=/usr/local/cuda-

Step 7) Reboot the computer

Step 8) Check NVIDIA Cuda Compiler with nvcc --version:

Step 9) Check NVIDIA driver with nvidia-smi:

Step-10) Sample program



void fromCPU()

printf("Hello from CPU\n");

global void fromGPU()

printf("Hello from GPU\n");

int main()

fromCPU(); fromGPU<<<2,3>>>();

cudaDeviceSynchronize(); return 0;

Algorithm explanation and identifying the areas of parallelism.

We are implementing AES algorithm for image encryption:

What is AES algorithm?

The Advanced Encryption Standard (AES), also known by its original name
Rijndael is a specification for the encryption of electronic data established by the
U.S. National Institute of Standards and Technology (NIST) in 2001.

AES is a subset of the Rijndael block cipher developed

by two Belgian cryptographers, Vincent Rijmen and Joan Daemen, who submitted
a proposal to NIST during the AES selection process. Rijndael is a family of
ciphers with different key and block sizes.

For AES, NIST selected three members of the Rijndael family, each with a block
size of 128 bits, but three different key lengths: 128, 192 and 256 bits.

Why AES Algorithm?

As DES has a smaller key size which makes it less secure to overcome this triple
DES was introduced but it turns out to be slower. Hence, later AES was introduced
by the National Institute of Standard and Technology.

The basic difference between DES and AES is that in DES plaintext block is
divided into two halves before the main algorithm starts whereas, in AES the entire
block is processed to obtain the ciphertext.

Advantages of AES over 3DES:

• AES is more secure (it is less susceptible to cryptanalysis than 3DES).

• AES supports larger key sizes than 3DES's 112 or 168 bits.

• AES is faster in both hardware and software.

How AES encryption works?

AES comprises three block ciphers: AES-128, AES-192 and AES-256. Each
cipher encrypts and decrypts data in blocks of 128 bits using cryptographic keys of
128-, 192-and 256-bits, respectively. The Rijndael cipher was designed to accept
additional block sizes and key lengths, but for AES, those functions were not

Steps Involved in AES Algorithm:

The key size used for an AES cipher specifies the number of transformation rounds
that convert the input, called the plaintext, into the final output, called the
ciphertext. The number of rounds are as follows:

• 10 rounds for 128-bit keys.

• 12 rounds for 192-bit keys.

• 14 rounds for 256-bit keys.

Each round consists of several processing steps, including one that depends on the
encryption key itself. A set of reverse rounds are applied to transform ciphertext
back into the original plaintext using the same encryption key.
1. KeyExpansion—round keys are derived from the cipher key using
Rijndael's key schedule. AES requires a separate 128-bit round key block
for each round plus one more.

2. Initial round key addition:

1. AddRoundKey—each byte of the state is combined with a block of

the round key using bitwise xor.

3. 9, 11 or 13 rounds:

1. SubBytes—a non-linear substitution step where each byte is

replaced with another according to a lookup table.

2. ShiftRows—a transposition step where the last three rows of the

state are shifted cyclically a certain number of steps.

3. MixColumns—a linear mixing operation which operates on the

columns of the state, combining the four bytes in each column.

4. AddRoundKey

4. Final round (making 10, 12 or 14 rounds in total):

1. SubBytes

2. ShiftRows

3. AddRoundKey

In the SubBytes step, each byte in the state is replaced with its entry in a fixed 8-bit
lookup table
In the ShiftRows step, bytes in each row of the state are shifted cyclically to the
left. The number of places each byte is shifted differs for each row.

In the MixColumns step, each column of the state is multiplied with a fixed
polynomial c(x)
In the AddRoundKey step, each byte of the state is combined with a byte of the
round subkey using the XORoperation (⊕).

Parallelization strategy

⚫ For encryption, the image file to be encrypted has been divided into
numerous blocks of 128 bits.

⚫ All the rounds of AES has to been ran on each of these 128 bit blocks

⚫ NVIDIA GeForce 940 MX has three multiprocessors and each processor

has 384 cores each. So we choose the number of CUDA blocks in the grid
as the number of processors on our GPU (i.e. 3)

⚫ The number of threads in each block is the number of cores (i.e. 384)

⚫ Similar process is followed for the decryption process.

Implementation Details
We start by taking the image to be encrypted as one of the inputs. After this, we
input the private key which the users want to use to encrypt their images. The key-
lengths accepted by the AES cipher are 128, 192 or 256 bits, depending upon the
strength of the encryption required. Longer keys require more rounds of processing
to provide better security but at the same time they require higher processing time.
Hence, the choice of key-length is left to the user. Users are required to enter a text
of their choice of any arbitrary length. This text can be anything - a paragraph, a
sentence, a word or even a single character. We then calculate the hash digest of
the entered text with the hashing algorithm depending upon the key length
required. We used MD5 for 128 bit, SHA-192 for 192-bit and SHA-256 for 256-bit
length keys. These keys will further be expanded in accordance to the AES
algorithm automatically without the intervention of the user. We have used
OpenCV C++ library for pixel-wise encryption of images using the cipher-block
chain (CBC) mode of encryption. For this purpose, we read the image into our
program in the RGB mode so that we will have three channels, one each for red,
green and blue colours. Each has its own 8-bit value (0-255). Hence, we use an
unsigned char data type to represent each channel component of each pixel. These
will be stored in the form of a two-dimensional matrix. AES is a block cipher
which operates upon 128-bit of input at a time. Hence, we need to divide our image
into blocks of sixteen pixels each. This is because each pixel is an 8-bit unsigned
character and sixteen pixels together will sum to 128-bits. Pad- ding can be used to
make sum to 128 bits if the number of pixels is lesser than 16 in any block.
NVIDIA CUDA provides abstraction to the programmer. It offers a single GRID
divided into BLOCKS containing equal number of THREADS. The CUDA
platform architecture is greatly simplified and is shown in Fig. 1. Allocating each
of these user- defined threads to a GPU core is done automatically by the CUDA.
Fig. 1. CUDA Grid, Blocks and Threads

For efficiency, we have taken the number of blocks in grid as the number of
processors in the multiprocessor GPU. Our GPU has three processors and hence
we made three blocks in our grid. We divide the number of pixel-blocks of input
image by this number of blocks to obtain the number of threads. If the number of
threads in a block exceeds 1024, we allocate a new block to maintain the number.
Each block of sixteen - pixels will then be run on an independent thread for the
further rounds of AES. This will ensure maximum parallelization.
After obtaining the cipher-text, we can optionally convert the cipher-text into
hexadecimal numbers and then represent each encrypted pixel as a combination of
these hexadecimal numbers taking two numbers at a time. These pixels can then be
displayed in the form of an encrypted image. This image will mostly contain
random structures. The advantage of this conversion of cipher text into an image is
the reduction in the size of the encrypted file, provided a lossless image
compression technique has been used.

The same technique in the reverse order has been used for decrypting the encrypted
image back to the original image.
Evaluation results
We processed various images of different resolutions ranging from an HD image
of size 0.3 MB to high-resolution images of size 11.6 MB using both 128-bit as
well as 256-bit keys. Fig. 2 shows the original image and the encrypted image
obtained. After that we compared the time taken for processing between GPU and
CPU. The configuration details of the machine the experiments were performed
upon are tabulated in Table 1. The timing results for images are tabulated in Table
2 for 128-bit key and Table 3 for 256-bit key. Table 4 and Table 5 list the timing
results for text files of same sizes as that of images for 128-bit and 256-bit key
Note that these timings depict the total time taken to read an image, encrypt it,
generate and save the encrypted image, then again read the encrypted image,
decrypt it and finally output the original image.

Table 1. Machine hardware configuration


Intel® Core™ i5
7th Generation
4 cores
3 Multiprocessors
128 CUDA Cores/MP

Fig. 2. Notice how the encrypted image has random pixels and hence original
image cannot be guessed just by looking at it.
Table 2. Timing results for encrypting/decrypting images with a 128-bit key
File Image GP CPU Speed Throughput
size U up (Mbps)
(MB) resoluti (sec (sec) GPU CPU
on )
11.6 7680x4 29.4 119.6 4.1x 3.15 0.77
320 2 3
5.32 5120x2 13.1 52.36 4x 3.22 0.81
880 9
3.17 3840x2 7.40 29.67 4x 3.42 0.85
1.32 2560x1 3.36 13.20 3.9x 3.14 0.80
0.71 1920x1 2.02 7.44 3.7x 2.81 0.76
0.30 1280x7 0.91 3.34 3.6x 2.64 0.72

Table 3. Timing results for encrypting/decrypting images with a 256-bit key

File Image GP CPU Speed Throughput
size U up (Mbps)
(MB) resoluti (sec (sec) GPU CPU
on )
11.6 7680x4 29.5 147.6 5x 3.14 0.63
320 2 3
5.32 5120x2 13.2 64.94 4.9x 3.20 0.65
880 8
3.17 3840x2 7.58 36.71 4.8x 3.34 0.69
1.32 2560x1 3.53 16.49 4.7x 2.99 0.64
0.71 1920x1 1.90 9.14 4.8x 2.99 0.62
0.30 1280x7 0.91 4.09 4.5x 2.64 0.59
Table 4. Timing results for encrypting/decrypting text files with a 128-bit key

File GP CPU Speed Throughput

size U up (Mbps)
(MB) (sec (sec) GPU CP
) U
11.6 29.5 147.6 5x 3.14 0.63
2 3
5.32 13.2 64.94 4.9x 3.20 0.65
3.17 7.58 36.71 4.8x 3.34 0.69
1.32 3.53 16.49 4.7x 2.99 0.64
0.71 1.90 9.14 4.8x 2.99 0.62
0.30 0.91 4.09 4.5x 2.64 0.59

Table 5. Timing results for encrypting/decrypting text files with a 256-bit key
File GP CPU Speed Throughput
size U up (Mbps)
(MB) (sec (sec) GPU CP
) U
11.6 1.33 6.19 4.65x 69.77 14.9
5.32 0.66 2.86 4.33x 64.48 14.8
3.17 0.46 1.66 3.61x 55.13 15.2
1.32 0.21 0.71 3.38x 50.28 14.8
0.71 0.17 0.40 2.35x 33.41 14.2
0.30 0.12 0.16 1.33x 20.00 15.0
We observe that it takes a long time for the CPU to encrypt an image while a GPU
can complete similar task in a fraction of time. We can infer from Figure 1 that
there is just a small difference between the execution times on GPU and CPU for
low resolution images like 1280x720 and 1920x1080 images. But as we
encrypt/decrypt high resolution images of the order of 5120x2880 and 7680x4320,
the time difference becomes significant. Another interesting observation that we
make here is that the time taken on GPU seems to be independent on the key size
used i.e. there is not much time difference between 128-bit and 256-bit key
encryptions. On the other hand, the time taken for encryption/decryption increases
significantly when a 256-bit key is used in place of a 128-bit key for the CPU-only
Performance parameters considered

Fig. 3. GPU vs CPU: Time comparison for different resolution images for both key sizes

Fig. 4. GPU vs CPU: Time comparison for different text files for both key sizes
We have presented a method to accelerate image encryption/decryption with AES
using the GPU. We have also calculated the timing results for images of different
resolutions and have compared them with a CPU-only computation. The speed-up
on GPU achieved is around four times with respect to the CPU. Along with this we
noted an interesting observation that the encryption/decryption time of image
seems to be independent of the key-size when the encryption has been done in
parallel on the GPU. On the other hand, the time taken for encryption/decryption of
image increased significantly when a serial computation was done on the CPU.
Another observation is that the speed up on GPU over CPU increases with the
increase in the resolution of images and also when longer keys are used i.e. 256-bit
key instead of 128-bit key. Hence, the user need not compromise the security of
the image to encrypt it in a shorter time by taking a smaller key.
Our future aim is to try the same parallel implementation on other parallel
architectures too and compare their performance. We also aim to reduce the space
complexity of our implementation. With increasing sharing of multimedia over
social networking sites, we also need efficient encryption techniques for secure
transfer of media types like audio and videos. Also, the method described in this
paper works on GPUs of personal computers. We aim to extend this technique to
GPUs of modern smartphones.
[1] Nagendra, M., & Sekhar, M. C. (2014). Performance improvement of
Advanced Encryption Algorithm using parallel computation. International
Journal of Software Engineering and Its Applications, 8(2), 287-296.

[2] Elkabbany, G. F., Aslan, H. K., & Rasslan, M. N. (2015). A design of a

fast parallel- pipelined implementation of AES: advanced encryption standard.
arXiv preprint arXiv:1501.01427.

[3] Shrestha, M. (2018). Parallel Implementation of AES using XTS Mode of


[4] Daniel, T. R., & Mircea, S. T. R. A. T. U. L. A. T. (2010). AES on

GPU using CUDA. In 2010 European Conference for the Applied
Mathematics & Informatics. World Scientific and Engineering Academy
and Society Press.

[5] Soltani, A., & Sharifian, S. (2015). An ultra-high throughput and fully
pipelined implementation of AES algorithm on FPGA. Microprocessors and
Microsystems, 39(7), 480-493.

[6] Li, Q., Zhong, C., Zhao, K., Mei, X., & Chu, X. (2012, June).
Implementation and analysis of AES encryption on GPU. In High
Performance Computing and Communication & 2012 IEEE 9th International
Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE
14th International Conference on (pp. 843-848). IEEE.



[8] Noer, D., Engsig-Karup, A. P., & Zenner, E. (2011). Improved software
implementation of DES using CUDA and OpenCL. In Western European
Workshop on Research in Cryptology.
[9] Afarin, R., & Mozaffari, S. (2013, September). Image encryption using
genetic algorithm. In Machine Vision and Image Processing (MVIP), 2013 8th
Iranian Conference on(pp. 441-445). IEEE.

[10] Sowmiya, S., Tresa, I. M., & Chakkaravarthy, A. P. (2017, February). Pixel
image encryption using magic square. In Algorithms, Methodology, Models
and Applications in Emerging Technologies (ICAMMAET), 2017 International
Conference on (pp. 1-4). IEEE.

[11] Liu, J., & Lv, H. (2012, December). A New Duffing-Lorenz

Chaotic Algorithm and Its Application in Image Encryption. In
Control Engineering and Communication Technology (ICCECT),
2012 International Conference on (pp. 1022-1025). IEEE.

[12] Feng, X., Tian, X., & Xia, S. (2011, October). A novel image encryption
algorithm based on fractional fourier transform and magic cube rotation. In
Image and Signal Processing (CISP), 2012 4th International Congress on
(Vol. 2, pp. 1008-1011). IEEE.

[13] Ren, S., Gao, C., Dai, Q., & Fei, X. (2010, October). Attack to an
image encryption algorithm based on improved chaotic cat maps. In Image
and Signal Processing (CISP), 2012 3rd International Congress on (Vol. 2,
pp. 533-536). IEEE.

[14] Lei, Z., Li, L., & Xianwei, G. (2011, October). Design and
realization of image encryption system based on SMS4 commercial cipher
algorithm. In Image and Signal Processing (CISP), 2012 4th International
Congress on (Vol. 2, pp. 741- 744). IEEE.

[15] Li, Q., Zhong, C., Zhao, K., Mei, X., & Chu, X. (2012, June).
Implementation and analysis of AES encryption on GPU. In High
Performance Computing and Communication & 2012 IEEE 9th International
Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE
14th International Conference on (pp. 843-848). IEEE.

[16] Abdelrahman, A. A., Fouad, M. M., Dahshan, H., & Mousa, A.

M. (2017, July). High performance CUDA AES implementation: A
quantitative performance analysis approach. In Computing
Conference, 2017 (pp. 1077- 1085). IEEE.

[17]Khan, A. H., Al-Mouhamed, M. A., Almousa, A., Fatayar, A., Ibrahim, A.

R., & Siddiqui, A. J. (2014, June). Aes-128 ecb encryption on gpus and effects
of input plaintext patterns on performance. In Software Engineering, Artificial
Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2014
15th IEEE/ACIS International Conference on (pp. 1-6). IEEE.
[18] Patchappen, M., Yassin, Y. M., & Karuppiah, E. K. (2015, April). Batch
processing of multi-variant AES cipher with GPU. In Computing Technology
and Information Management (ICCTIM), 2015 Second International
Conference on (pp. 32-36). IEEE.

[19] Ma, J., Chen, X., Xu, R., & Shi, J. (2017, June). Implementation and
Evaluation of Different Parallel Designs of AES Using CUDA. In Data
Science in Cyberspace (DSC), 2017 IEEE Second International Conference on
(pp. 606-614). IEEE.

[20] Luo, C., Fei, Y., Luo, P., Mukherjee, S., & Kaeli, D. (2015, October).
Side-channel power analysis of a GPU AES implementation. In 2015 33rd
IEEE International Conference on Computer Design (ICCD) (pp. 281-288).

[21] Zola, W. M. N., & De Bona, L. C. E. (2012, May). Parallel speculative

encryption of multiple AES contexts on GPUs. In Innovative Parallel
Computing (InPar), 2012 (pp. 1- 9). IEEE.

[22] Yuan, Y., He, Z., Gong, Z., & Qiu, W. (2014, September).
Acceleration of AES encryption with OpenCL. In Information Security
(ASIA JCIS), 2014 Ninth Asia Joint Conference on(pp. 64-70). IEEE.

[23] Qiu, H., & Memmi, G. (2014, December). Fast selective encryption
method for bitmaps based on GPU acceleration. In Multimedia (ISM),
2014 IEEE International Symposium on (pp. 155-158). IEEE.
[24] Habibpour, L., Yousefi, S., Lighvan, M. Z., & Aghdasi, H. S. (2016).
1D Chaos- based image encryption acceleration by using GPU. Indian
Journal of Science and Technology, 9(6).
[25] ABDULJABBAR, W. K., ABDUL-RAHMAN, S. Y. A. R. I. Z. A., &
IMAGE ENCRYPTION ALGORITHMS. Journal of Theoretical & Applied
Information Technology, 95(11).