Академический Документы
Профессиональный Документы
Культура Документы
INTRODUCTION
1.1 Introduction
In this research work, we propose the design and implementation of a real-time FPGA
based application, which demonstrates the creation of real-time process tasks in FPGA
systems for successful real-time communication between multiple FPGA systems. We have
chosen the RSA based encryption and decryption algorithm for this implementation, as
security is one of the most important need for data communication. The recent development
of Field-Programmable Gate Array (FPGA) architectures, with soft core (Micro Blaze) and
hard core (PowerPC) processors, embedded memories and IP cores, offers the potential for
high computing power. Presently FPGAs are considered as a major platform for high
performance embedded applications as it provides the opportunity for reconfiguration as well
as good clock speed and design resources.
As the complexities in the embedded applications increase, use of an operating system
brings in a lot of advantages. In present day application scenarios most embedded systems
have real-time requirements that demand the use of Real-time operating systems (RTOS),
which creates a suitable environment for real time applications to be designed and expanded
easily. In an RTOS the design process is simplified by splitting the application code into
separate tasks and then the scheduler executes them according to a specific schedule, meeting
the real-time deadline. In this research work, we propose the design and implementation of a
real-time FPGA based application, which demonstrates the creation of real-time process tasks
in FPGA systems for successful real-time communication between multiple FPGA systems.
We have chosen the RSA based encryption and decryption algorithm for this
implementation, as security is one of the most important need for data communication. At
first we demonstrate the real time execution of multiple process tasks in a single FPGA
system for the encryption and decryption of data. Next we describe the most challenging part
of our work, where we establish the real time communication between two FPGA systems,
each running the encryption engine and decryption engine respectively and communicating
with one another via an RS232 communication link. The results show that our design is better
in terms of execution speed in comparison with the existing research works.
At first we demonstrate the real time execution of multiple process tasks in a single
FPGA system for the encryption and decryption of data. Next we describe the most
challenging part of our work, where we establish the real time communication between two
FPGA systems, each running the encryption engine and decryption engine respectively and
communicating with one another via an RS232 communication link. The results show that
our design is better in terms of execution speed in comparison with the existing research
works. It achieves the real time secured information between the systems implemented in
multiple FPGAs by using RTOS (Real Time Operating System). This information sharing is
based on RSA algorithm (encryption and decryption). Very large Scale Integrations in the
recent trends of design. Network Security in the Techniques of Very large Scale Integrations
Plays Very Vital Role. FPGA, logic circuits, operating systems (computers), Micro Blaze
FPGA architectures, embedded memory, multiple FPGA systems and soft core processors.
It Design of The Present System using microcontroller with RTOS. So the system
operation speed will be less when compared to the FPGA. Then the information sending
between the systems is not secured. The proposed technology has been implemented over
here is based on RSA algorithm (encryption and decryption). This process is communicated
between multiple FPGAs in multitasking using RTOS (real time communication system)
with high execution speed compared to the existing system.
To demonstrate a 128-bit Advanced Encryption Standard (AES) both symmetric key
encryption and decryption algorithm by developing suitable hardware and software design on
Xilinx Spartan- 3EDK (XC3S200) device, the implementation has been tested successfully
The system is optimized in terms of execution speed and hardware utilization. It design using
application is Security purposes, Medical field. Network Security, online bank security. It
develop similar approaches for the implementation of AES, we can implement double AES
for more security and will less encryption speed .
In todays world most of the communication is done using electronic media. Data
Security plays a vital role in such communication. Hence, there is a need to protect data from
malicious attacks. Cryptography is the science of secret codes, enabling the confidentiality of
communication through an insecure channel. It protects against unauthorized parties by
preventing unauthorized alteration of use. Generally speaking, it uses a cryptographic system
to transform a plaintext into a cipher text, using most of the time a key.
2
The algorithm is composed of three main parts: Cipher, Inverse Cipher and Key
Expansion. Cipher converts data, commonly known as plaintext, to an unintelligible form
called cipher. Key Expansion generates a key schedule that is used in the Cipher and the
Inverse Cipher procedure. Cipher and Inverse Cipher are composed of specific number of
rounds For the AES algorithm; the number of rounds to be performed during the execution of
the algorithm is dependent on the key length. AES operates on a 4x4 array of bytes (referred
to as state). The algorithm consists of four different simple operations. These operations
are:
o
o
o
o
Sub Bytes
Shift Rows
Mix Columns
Add Round Key
5. Once the cipher key is entered, the message is successfully sent and is shown in encrypted
form in the thread.
6. All messages in thread are displayed in encrypted format to both sender and receiver.
7. Long pressing the thread wills pop-up an action box wherein the user can delete, view
contact details or call the recipient.
8. Long pressing any message in the thread will pop-up an action box wherein the user can
delete, forward or decrypt the message.
9. The cipher key is randomly generated if the user does not enter it.
10. Various settings such as notification settings, Display settings, Encryption settings, Tone
settings, Personalization settings are available for the users convenience.
11. This application is developed on Android platform. The reason behind using Android
platform is similar to other operating systems for mobile devices; Android OS supports
connectivity, messaging, language support, media support, Bluetooth etc. The main feature of
android would be open source technology and JAVA support. It also supports multitasking,
multi touch, Wi-Fi, tethering, 3G services, and very importantly security and privacy.
CHAPTER -2
DESCRIPTION OF THE PROJECT
2.1 Introduction
6
2.2. Preface
The following document provides a detailed and easy to understand explanation of the
implementation of the AES (RIJNDAEL) encryption algorithm. The purpose of this paper is
to give developers with little or no knowledge of cryptography the ability to implement AES.
2.3. Terminology
There are terms that are frequently used throughout this paper that need to be
clarified.
Block: AES is a block cipher. This means that the number of bytes that it encrypts is fixed.
AES can currently encrypt blocks of 16 bytes at a time; no other block sizes are presently a
part of the AES standard. If the bytes being encrypted are larger than the specified block then
AES is executed concurrently. This also means that AES has to encrypt a minimum of 16
bytes. If the plain text is smaller than 16 bytes then it must be padded. Simply said the block
is a reference to the bytes that are processed by the algorithm.
State: Defines the current condition (state) of the block. That is the block of bytes that are
currently being worked on. The state starts off being equal to the block, however it changes
as each round of the algorithms executes. Plainly said this is the block in progress.
XOR:Refers to the bitwise operator Exclusive Or. XOR operates on the individual bits in a
byte in the following way:
0 XOR 0 = 0
1 XOR 0 = 1
1 XOR 1 = 0
0 XOR 1 = 1
For example the Hex digits D4 XOR FF
11010100
XOR 11111111
= 00101011 (Hex 2B)
Another interesting property of the XOR operator is that it is reversible.
So Hex 2B XOR FF = D4.
Table.2.1: Most programming languages have the XOR operator built in.
HEX: Defines a notation of numbers in base 16. This simply means that; the highest number
that can be represented in a single digit is 15, rather than the usual 9 in the decimal (base 10)
system.
Table 2.2 Hex to Decimal table:
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
0
0
16
32
48
64
80
96
112
128
144
160
176
192
208
224
240
1
1
17
33
49
65
81
97
113
129
145
161
177
193
209
225
241
2
2
18
34
50
66
82
98
114
130
146
162
178
194
210
226
242
3
3
19
35
51
67
83
99
115
131
147
163
179
195
211
227
243
4
4
20
36
52
68
84
100
116
132
148
164
180
196
212
228
244
5
5
21
37
53
69
85
101
117
133
149
165
181
197
213
229
245
6
6
22
38
54
70
86
102
118
134
150
166
182
198
214
230
246
7
7
23
39
55
71
87
103
119
135
151
167
183
199
215
231
247
8
8
24
40
56
72
88
104
120
136
152
168
184
200
216
232
248
9
9
25
41
57
73
89
105
121
137
153
169
185
201
217
233
249
A
10
26
42
58
74
90
106
122
138
154
170
186
202
218
234
250
B
11
27
43
59
75
91
107
123
139
155
171
187
203
219
235
251
C
12
28
44
60
76
92
108
124
140
156
172
188
204
220
236
252
D
13
29
45
61
77
93
109
125
141
157
173
189
205
221
237
253
E
14
30
46
62
78
94
110
126
142
158
174
190
206
222
238
254
F
15
31
47
63
79
95
111
127
143
159
175
191
207
223
239
255
For example using the above table HEX D4 = DEC 212 All of the tables and
examples in this paper are written in HEX. The reason for this is that a single digit of Hex
represents exactly 4 bits. This means that a single byte can always be represented by 2 HEX
digits. This also makes it very useful in creating lookup tables where each HEX digit can
represent a table index.
SUB BYTE
10
SHIFT ROW
MIX COLUMN
An iteration of the above steps is called a round. The amount of rounds of the
Block
Size
(bytes)
16
16
16
Rounds
10
12
14
The only exception being that in the last round the Mix Column step is not
performed, to make the algorithm reversible during decryption.
2.7. Encryption
Table 2.4 :AES encryption cipher using a 32 byte key.
Round
Function
11
0
1
2
3
4
5
6
7
8
9
10
11
12
13
2.8. Decryption
Table2.5: AES decryption cipher using a 32 byte key.
Round
0
1
2
3
4
5
6
7
8
9
10
11
12
13
Function
Add Round Key(State)
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
Add Round Key(Byte Sub(Shift Row(State)))
Exp Key
XOR
XOR
XOR
XOR
XOR
XOR
XOR
XOR
XOR
10
11
12
13
14
15
16
XOR
XOR
XOR
XOR
XOR
XOR
XOR
10
11
12
13
14
15
16
12
XOR
XOR
XOR
XOR
XOR
XOR
XOR
XOR
XOR
17
18
19
20
21
22
23
24
25
State
Exp Key
10
11
12
13
14
15
16
XOR
XOR
XOR
XOR
XOR
XOR
XOR
26
27
28
29
30
31
32
And so on for each round of execution. During decryption this procedure is reversed.
Therefore the state is first XORed against the last 16 bytes of the expanded key, then the
second last 16 bytes and so on. The method for deriving the expanded key is described in
section 6.0
13
13
10
14
11
15
12
16
Each row is then moved over (shifted) 1, 2 or 3 spaces over to the right, depending on
the row of the state. First row is never shifted
Row1 0
Row2 1
Row3 2
Row4 3
14
The following table shows how the individual bytes are first arranged in the table and
then moved over (shifted). Blocks 16 bytes long:
From
To
1 5 9 13
1 5 9 13
2 6 10 14
6 10 14 2
3 7 11 15
11 15 3 7
4 8 12 16
16 4 8 12
During decryption the same process is reversed and all rows are shifted to the left:
From
To
1 5 9 13
1 5 9 13
2 6 10 14
14 2 6 10
3 7 11 15
11 15 3 7
4 8 12 16
8 12 16 4
15
Multiplication Matrix
The first result byte is calculated by multiplying 4 values of the state column against 4
values of the first row of the matrix. The result of each multiplication is then XORed to
produce 1 Byte.
b1 = (b1 * 2) XOR (b2*3) XOR (b3*1) XOR (b4*1)
The second result byte is calculated by multiplying the same 4 values of the state
column against 4 values of the second row of the matrix. The result of each multiplication is
then XORed to produce 1 Byte.
b2 = (b1 * 1) XOR (b2*2) XOR (b3*3) XOR (b4*1)
The third result byte is calculated by multiplying the same 4 values of the state
column against 4 values of the third row of the matrix. The result of each multiplication is
then
XORed to produce 1 Byte.
b3 = (b1 * 1) XOR (b2*1) XOR (b3*2) XOR (b4*3)
The fourth result byte is calculated by multiplying the same 4 values of the state
column against 4 values of the fourth row of the matrix. The result of each multiplication is
then
XORed to produce 1 Byte.
b4 = (b1 * 3) XOR (b2*1) XOR (b3*1) XOR (b4*2)
This procedure is repeated again with the next column of the state, until there are no
more state columns. Putting it all together: The first column will include state bytes 1-4 and
will be multiplied against the matrix in the following manner:
b1 = (b1 * 2) XOR (b2*3) XOR (b3*1) XOR (b4*1)
b2 = (b1 * 1) XOR (b2*2) XOR (b3*3) XOR (b4*1)
b3 = (b1 * 1) XOR (b2*1) XOR (b3*2) XOR (b4*3)
b4 = (b1 * 3) XOR (b2*1) XOR (b3*1) XOR (b4*2)
(b1= specifies the first byte of the state)
16
The second column will be multiplied against the second row of the matrix in the
following manner.
b5 = (b5 * 2) XOR (b6*3) XOR (b7*1) XOR (b8*1)
b6 = (b5 * 1) XOR (b6*2) XOR (b7*3) XOR (b8*1)
b7 = (b5 * 1) XOR (b6*1) XOR (b7*2) XOR (b8*3)
b8 = (b5 * 3) XOR (b6*1) XOR (b7*1) XOR (b8*2)
And so on until all columns of the state are exhausted.
Other than the change to the matrix table the function performs the same steps as
during encryption.
2.10.2 Mix Column Example During Encryption
The following examples are denoted in HEX.
Input = D4 BF 5D 30
Output(0)
Output(1)
17
Output(2)
Output(3)
Output(1)
Output(2)
18
Block
Size
(bytes)
16
16
16
Expanded
Key
(bytes)
176
208
240
Since the key size is much smaller than the size of the sub keys, the key is actually
stretcheout to provide enough key space for the algorithm. The key expansion routine
executes a maximum of 4 consecutive functions. These functions are:
ROT WORD
SUB WORD
RCON
EK
19
An iteration of the above steps is called a round. The amount of rounds of the key
expansion algorithm depends on the key size.
Table 2.9.key expansion algorithm depends on the key size.
Key
Size
(bytes)
16
24
32
Block
Size
(bytes)
16
16
16
Expansion Expanded
Algorithm
Bytes /
Rounds
Round
44
4
52
4
60
4
Rounds
Key Copy
4
6
8
Rounds
Expanded
Key
Key
Expansion
(bytes)
40
176
46
208
52
240
The first bytes of the expanded key are always equal to the key. If the key is 16 bytes
long the first 16 bytes of the expanded key will be the same as the original key. If the key size
is 32 bytes then the first 32 bytes of the expanded key will be the same as the original key.
Each round adds 4 bytes to the Expanded Key. With the exception of the first rounds each
round also takes the previous rounds 4 bytes as input operates and returns 4 bytes. One more
important note is that not all of the 4 functions are always called in each round. The algorithm
only calls all 4 of the functions every:
4 Rounds for a 16 byte Key
6 Rounds for a 24 byte Key
8 Rounds for a 32 byte Key
The rest of the rounds only a K function result is XORed with the result of the EK
function. There is an exception of this rule where if the key is 32 bytes long an additional call
to the Sub Word function is called every 8 rounds starting on the 13th round.
This does a circular shift on 4 bytes similar to the Shift Row Function.
1,2,3,4 to 2,3,4,1
Sub Word (4 bytes): This step applies the S-box value substitution as described in
Bytes Sub: Function to each of the 4 bytes in the argument.
20
= 01000000
= 02000000
= 04000000
= 08000000
= 10000000
= 20000000
= 40000000
= 80000000
= 1B000000
= 36000000
= 6C000000
= D8000000
= AB000000
= 4D000000
= 9A000000
For example for a 16 byte key Rcon is first called in the 4th round: (4/(16/4))-1=0
In this case Rcon will return : 01000000
For a 24 byte key Rcon is first called in the 6th round: (6/(24/4))-1=0
In this case Rcon will also return : 01000000
EK(Offset): EK function returns 4 bytes of the Expanded Key after the specified
offset. For example if offset is 0 then EK will return bytes 0,1,2,3 of the Expanded
Key
K(Offset): K function returns 4 bytes of the Key after the specified offset. For
example if offset is 0 then K will return bytes 0,1,2,3 of the Expanded Key
2.13. AES Key Expansion Algorithm
Since the expansion algorithm changes depending on the length of the key, it is
extremely difficult to explain in writing. This is why the explanation of the Key Expansion
Algorithm is provided in a table format. There are 3 tables, one for each AES key sizes (16,
24, and 32). Each table has 3 fields:
Table 2.10. Three fields of AES key sizes
Field
Description
A counter representing the current step in the key
Round
expansion
algorithm, think of this as a loop counter
Expanded key bytes effected by the result of the
Expanded Key Bytes function(s)
Function
The function(s) that will return the 4 bytes written to the
effected expanded key bytes
21
Expanded Key
Bytes
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
16 17 18 19
20 21 22 23
24 25 26 27
28 29 30 31
32 33 34 35
36 37 38 39
40 41 42 43
44 45 46 47
48 49 50 51
52 53 54 55
56 57 58 59
60 61 62 63
64 65 66 67
68 69 70 71
72 73 74 75
76 77 78 79
80 81 82 83
84 85 86 87
88 89 90 91
92 93 94 95
96 97 98 99
100 101 102 103
104 105 106 107
108 109 110 111
112 113 114 115
116 117 118 119
120 121 122 123
124 125 126 127
128 129 130 131
132 133 134 135
136 137 138 139
140 141 142 143
144 145 146 147
148 149 150 151
152 153 154 155
156 157 158 159
160 161 162 163
164 165 166 167
168 169 170 171
172 173 174 175
Function
K(0)
K(4)
K(8)
K(12)
K(16)
K(20)
K(24)
K(28)
Sub Word(Rot Word(EK((8-1)*4))) XORRcon((8/8)-1) XOR EK((8-8)*4)
EK((9-1)*4)XOR EK((9-8)*4)
EK((10-1)*4)XOR EK((10-8)*4)
EK((11-1)*4)XOR EK((11-8)*4)
Sub Word(EK((12-1)*4))XOR EK((12-8)*4)
EK((13-1)*4)XOR EK((13-8)*4)
EK((14-1)*4)XOR EK((14-8)*4)
EK((15-1)*4)XOR EK((15-8)*4)
Sub Word(Rot Word(EK((16-1)*4))) XORRcon((16/8)-1) XOR EK((16-8)*4)
EK((17-1)*4)XOR EK((17-8)*4)
EK((18-1)*4)XOR EK((18-8)*4)
EK((19-1)*4)XOR EK((19-8)*4)
Sub Word(EK((20-1)*4))XOR EK((20-8)*4)
EK((21-1)*4)XOR EK((21-8)*4)
EK((22-1)*4)XOR EK((22-8)*4)
EK((23-1)*4)XOR EK((23-8)*4)
Sub Word(Rot Word(EK((24-1)*4))) XORRcon((24/8)-1) XOR EK((24-8)*4)
EK((25-1)*4)XOR EK((25-8)*4)
EK((26-1)*4)XOR EK((26-8)*4)
EK((27-1)*4)XOR EK((27-8)*4)
Sub Word(EK((28-1)*4))XOR EK((28-8)*4)
EK((29-1)*4)XOR EK((29-8)*4)
EK((30-1)*4)XOR EK((30-8)*4)
EK((31-1)*4)XOR EK((31-8)*4)
Sub Word(Rot Word(EK((32-1)*4))) XORRcon((32/8)-1) XOR EK((32-8)*4)
EK((33-1)*4)XOR EK((33-8)*4)
EK((34-1)*4)XOR EK((34-8)*4)
EK((35-1)*4)XOR EK((35-8)*4)
Sub Word(EK((36-1)*4))XOR EK((36-8)*4)
EK((37-1)*4)XOR EK((37-8)*4)
EK((38-1)*4)XOR EK((38-8)*4)
EK((39-1)*4)XOR EK((39-8)*4)
Sub Word(Rot Word(EK((40-1)*4))) XORRcon((40/8)-1) XOR EK((40-8)*4)
EK((41-1)*4)XOR EK((41-8)*4)
EK((42-1)*4)XOR EK((42-8)*4)
EK((43-1)*4)XOR EK((43-8)*4)
22
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
176
180
184
188
192
196
200
204
208
212
216
220
234
228
232
177
181
185
189
193
197
201
205
209
213
217
221
225
229
233
178
182
186
190
194
198
202
206
210
214
218
222
226
230
234
179
183
187
191
195
199
203
207
211
215
219
223
227
231
235
23
XC-Xilinx Commercial,
SPARTAN-III FPGA,
1, 50,000 Gate Count,
plastic quad package,
Speed Grade: -5.
2.15. Conclusion
The above document provides you with only the basic information needed to
implement the AES encryption algorithm. The mathematics and design reasons behind AES
were purposely left out. For more information on these topics in Rijndael.
CHAPTER -3
24
DESIGN ANALYSIS
3.1 Introduction
MicroBlaze Soft Core processor is used to Synthesis using EDK10.1 on Spartan3E.
The Embedded Development Kit (EDK) from Xilinx allows the designer to build a complete
processor system on Xilinx's FPGAs. The systems that can be produced using EDK ranges
from simple single processor architecture to a complex multi-processor system with multiple
hardware accelerators. The tool mainly supports two types of processors:
i)
ii)
Depending on the FPGA chip we are using, multiple MicroBlazes and Power-PCs can
be integrated together in a single design. EDK provides C/C++ compilers for both
MicroBlaze and Power-PC along with several tools for debugging/profiling of the
applications running on each processor. Besides, using ISE, you can perform several types of
simulations for the generated architectures which allow the estimation of both the
performance and power consumption of the architecture. This tutorial will demonstrate the
process of creating and testing a MicroBlaze system design using the Embedded
Development Kit (EDK) and Spartan 3E starter board from Xilinx.
3.2 Objectives
The project contains these sections:
System Requirements
Steps
Starting XPS
25
EDK 10.1i.
ISE 10.1i.
27
It converts the serial data obtained into parallel data as needed for
processing in the FPGA. It also takes control over the FPGA by acting as a Master over the
slave till configuration gets completed. Once it successfully configures FPGA then it releases
hold over it to make the FPGA function independently based upon the inputs provided. The
function of the CPLD is to coordinate and provide separate access to address and data bus
values attained from a common bus. Moreover, it also acts as a voltage controller to provide
28
the FPGA with the necessary 2.5 V from its input supply of 5 V. From the programmable
port, the hex file will be driven into the Micro-controller and from there to CPLD and then to
the target device.
Though this is a round-about process rather than programming the chip directly from
the JTAG port, it eliminates the need for costlier cables and high speed configuring software
by sacrificing the configuration speed to some extent, which is in fact affordable in many
situations.
Now, with the necessary inputs and the clock, we can run the configured gate-level
extracted circuit to achieve the functionality that we have designed and downloaded which
may be either encryption or decryption. The output generated before the UART software
module goes into the transmitter state-machine and the data will be converted from parallel to
serial which is collected at the serial communication port. The data obtained now can be
communicated to the other side using serial cable RS-232 which is connected directly to the
COM port of the other PC wherein the encipher or decipher can be seen. Thus, the FPGA
based processor achieves the implementation of the desired algorithm very effectively.
29
Base System Builder is the wizard that is used to automatically generate a hardware
platform according to the user specifications that is defined by the MHS (Microprocessor
Hardware Specification) file.
The MHS file defines the system architecture, peripherals and embedded processors].
The Platform Generation tool creates the hardware platform using the MHS file as input.
The software platform is defined by MSS (Microprocessor Software Specification)
file which defines driver and library customization parameters for peripherals, processor
customization parameters, standard 110 devices, interrupt handler routines, and other
software related routines. The MSS file is an input to the Library Generator tool for
customization of drivers, libraries and interrupts handlers.
30
Note that the XPS main window is divided into three areas:
i.
ii.
iii.
Base System Builder allows creation of a fully functional processor system in minutes
System Assembly View allows user to quickly customize and configure design details
Extensive catalog of AXI and PLB based processors, peripherals, and utility IP
Tightly integrated with ISE Project Navigator, ISim, and Chip Scope
Debug Wizard automates hardware / software cross triggering and Chip Scope
inclusion
31
The Project Tab lists references to project related files. Information is grouped in the
following general categories:
1. Project Files: All project-specific files such as the Microprocessor Hardware
Specification (MHS) files, Microprocessor Software Specification (MSS) files,
User Constraints File (UCF) files, Impact Command files, Implementation Option
files, and Bitgen Option files.
2. Project Options: All project specific options, such as Device, Net,
Implementation, Hardware Description Language (HDL), and Sim Model options.
3. Reference Files: All log and output files produced by the XPS implementation
processes.
The Applications tab lists all software application option settings, header files, and
source files associated with each application project. With this tab selected, you can:
Create and add a software application project, build the project, and load it to the
block RAM.
32
IP catalog tab:
The IP Catalog tab lists all the EDK IP cores and any custom IP cores you
created as shown in figure 6.4. If a project is open, only the IP cores compatible with
the target Xilinx device architecture are displayed.
The catalog lists information about the IP cores, including release version,
status (active, early access or deprecated), lock (not licensed, locked, or unlocked),
processor support, and a short description. Additional details about the IP core,
including the version change history, data sheet, and Microprocessor Peripheral
Description (MPD) file, are available in the right-click menu. By default, the IP cores
are grouped hierarchically by function.
33
A vertical line represents a bus, and a horizontal line represents a bus interface to an
IP core.
A hollow connector represents a connection that you can make, and a filled connector
represents a connection made. To create or disable a connection, click the connector
symbol.
34
The first step in this tutorial is using the Xilinx Platform Studio (XPS) to create a project file.
XPS allows you to control the hardware and software development of the MicroBlaze system,
and includes the following:
An editor and a project management interface for creating and editing source code
Starting XPS
(a)To open XPS, select Start All Programs Development Xilinx ISE Design
Suite10.1 EDK Xilinx Platform Studio.
(b) Select Base System Builder Wizard (BSB) to open the \Create New XPS Project
Using BSB Wizard" dialogue box shown in Figure6.1.
Fig 3.9: Create New XPS Project Using Base System Builder Wizard
(f) Click Ok to start the BSB wizard. The wizard window will appear, which will be
used to build the design as will be discussed in following sections.
Embedded processor: either the soft core MicroBlaze processor or the hard core
PowerPC (only available in Virtex-II Pro and Virtex-4 FX devices)
Buses
hardware system building tool). Conceptually, the MHS file is a textual schematic of the
embedded system. To instantiate a component in the MHS file, you must include information
specific to the component.
MPD File:
Each system peripheral has a corresponding MPD file. The MPD file is the symbol of
the embedded system peripheral to the MHS schematic of the embedded system. The MPD
36
file contains all of the available ports and hardware parameters for a peripheral. The MPD file
is located in the following directory:
$XILINX EDK= hw =Xilinx Processor IPLib= Pcores = < Peripheral name > =data
EDK provides two methods for creating the MHS file. Base System Builder Wizard
and the Add/Edit Cores Dialog assist you in building the processor system, which is defined
in the MHS file. This illustrates the Base System Builder.
In the Base System Builder - Select I would like to create a new design" then click
Next.
In the Base System Builder - Select Board Dialog select the following, as shown in
Figure 6.8:
Board Revision: C
Click Next. You will now specify several processor options as shown in Figure 6.8:
Processor-Bus clock frequency: This is the frequency of the clock driving the
processor system.
Processor Configuration:
Debug I/F:
On-Chip H/W Debug module: When the H/W debug module is selected; a PLB MDM
module is included in the hardware system. This introduces hardware intrusive
debugging with no software stub required. This is the recommended way of
debugging for MicroBlaze system.
37
XMD with S/W Debug stub: Selecting this mode of debugging interface introduces a
software intrusive debugging. There is a 1200-byte stub that is located at 0x00000000.
This stub communicates with the debugger on the host through the JTAG interface of
the PLB MDM module.
Users can specify the size of the local instruction and data memory.
Cache setup:
Enable cache link: Caching will be used through the FSL bus
You can also specify the use of the floating point unit (FPU).
Click Next.
Select the peripheral subset (Configure IO Interfaces wizard) as shown in Figure 6.5.
It should be noted that the number of peripheral shown on each dialogue box is dynamic
based upon your computers resolution.
38
RS232_DTE deselect
RS232_DCE select
Click Next
FLASH deselect
Click Next
Click Next through the Add Internal Peripherals page as we will not add any in this
project.
Click Next
This completes the hardware specification and we will now configure the software
settings. Using the
Software Setup dialogue box as shown in Figure 6.13, specify the following software
settings:
Click Next.
Fig
3.13:
Configure I/O
Interfaces 2
Fig
3.14:
Configure I/O Interfaces 3
Using the Configure Memory Test Application dialogue box as shown in Figure 6.8, specify
the following software settings:
Instructions ilmbcntlr
Data dlmbcntlr
Stack/Heap dlmbcntlr
Click Next.
40
3.10. Review
The Base System Builder Wizard has created the hardware and software specification
files that define the processor system. When we look at the project directory, shown in Figure
6.10, we see these as system.mhs and system.mss. There are also some directories created:
data - contains the UCF (user constraints file) for the target board.
etc - contains system settings for JTAG configuration on the board that is used when
downloading the bit file and the default parameters that are passed to the ISE tools.
TestApp Memory - contains a user application in C code source, for testing the
memory in the system.
42
Select Hardware Generate Netlist. This will elaborate the MHS file and generate a
netlist for the complete system (this will take a while!).
43
Select Hardware Generate Bitstream. This will call ISE tools to implement the
design and generate a bit file that could be downloaded into the FPGA.
At the end of this step the XPS output screen should look like Figure 6.14. The bit file
that is generate is called system.bit which contains all the required information to configure
the FPGA except the contents of the block ram (application/data). The bit file will be updated
with the application code after defining the software design.
3.10.4 Defining the Software Design
Now that the hardware design is completed, the next step is defining the software
design. There are two major parts to software design, configuring the Board Support Package
(BSP) and writing the software applications. The configuration of the BSP includes the
selection of device drivers and libraries.
Fig 3.20: after H/W and S/W Specification netlist generated the block diagram
From the system assemble view copy the address of DDR_SDRAM starting address.
Generating the linker script by selecting the Generate linker option from the same
menu.
44
tab as shown. The Base System Builder (BSB) generates a sample application which tests a
subset of the peripherals included in the design.
Select Software Build All User Applications to run mb-gcc. Mb-gcc compiles the
source files.
Connect the host computer to the target board, including connecting the Xilinx USB
download cable and the serial cable.
45
In EDK, select Device Configuration Update Bit-stream. This will update the bit
file with the application compiled code. Repeat this step each time the application
changes.
3.14.
After downloading both Hardware and Software .bit generation .elf file will be generated by
46
3.15. Conclusion
The implementation requirement which includes the primary input and primary output
of the design and the proper notation and conventions were discussed.
General implementation flow of the design were represented and explained in order to
understand the proper flow.
Finally the synthesis process was discussed which gives that in which FPGA family,
the design has been implemented.
47
CHAPTER-4
HARDWARE IMPLEMENTATION
4.1. Introduction
The purpose of the Design is to walk you through a complete hardware and software
processor system design. In this process, you will use the BSB of the XPS system to
automatically create a processor system and then add a custom OPB peripheral (adder circuit)
to that processor system which will consist of the following items:
Hardware components
Memory map
Software application
MicroBlaze
BRAM BLOCK
PLB Bus
49
MDM
XPS UARTLITE
2 - XPS GPIOs
Address
Address
Min
Max
0x0000_0000
0x8440_0000
0x8400_0000
0x8140_0000
0x8142_0000
0X000_3FFF
0x8440_FFFF
0x8400_FFFF
0x8140_FFFF
0x8142_FFFF
Size
Comment
16K bytes
64K bytes
64K bytes
64K bytes
64K bytes
LMB Memory
Debug_module
RS232_DCE
LED
DIP_Switches_4Bi
t
DDR_SDRAM
4.4. Background
The backbone of the architecture is a single-issue, 3-stage pipeline with 32 generalpurpose registers (does not have any address registers like the Motorola 68000 Processor), an
Arithmetic Logic Unit (ALU), a shift unit, and two levels of interrupt. This basic design can
then be configured with more advanced features to tailor to the exact needs of the target
embedded application such as: barrel shifter, divider, multiplier, single precision on floatingpoint unit (FPU), instruction and data caches, exception handling, debug logic, Fast Simplex
Link (FSL) interfaces and others.
This flexibility allows the user to balance the required performance of the target
application against the logic area cost of the soft processor MicroBlaze also supports reset,
interrupt, user exception, and break hardware exceptions. For interrupts, MicroBlaze supports
only one external interrupt source (connecting to the Interrupt input port). If multiple
50
interrupts are needed, an interrupt controller must be used to handle multiple interrupt
requests to MicroBlaze shown in figure4.2. An interrupt controller is available for use with
the Xilinx Embedded Development Kit (EDK) software tools. The processor will only react
to interrupts if the Interrupt Enable (IE) bit in the Machine Status Register (MSR) is set to 1.
On an interrupt the instruction in the execution stage will complete, while the instruction in
the decode stage is replaced by a branch to the interrupt vector (address Ox 10).
The interrupt return address (the PC associated with the instruction in the decode
stage at the time of the interrupt) is automatically loaded into general-purpose register. In
addition, the processor also disables future interrupts by clearing the IE bit in the MSR. The
IE bit is automatically set again when executing the RTlD instruction.
4.5. Features
The MicroBlaze soft core processor is highly configurable, allowing you to select a
specific set of features required by your design.
The fixed feature set of the processor includes:
Thirty-two 32-bit general purpose registers
32-bit instruction word with three operands and two addressing modes
32-bit address bus
Single issue pipeline
In addition to these fixed features, the MicroBlaze processor is parameterized to allow
selective enabling of additional functionality. Older (deprecated) versions of MicroBlaze
support a subset of the optional features described here. Only the latest (preferred) version of
MicroBlaze (v7.00) supports all options. Xilinx recommends that all new designs use the
latest preferred version of the MicroBlaze processor.
cycle 2
cycle 3
cycle 4
cycle 5
cycle 6
cycle 7
52
Instruction 1
Fetch
Instruction 2
Decode
Execute
eFetch
Decode
Execute
Execute
Execute
Fetch
Decode
Stall
Stall
Instruction 3
Execute
53
clock cycles. A data cache write normally has two cycles of latency (more if the posted-write
buffer in the memory controller is full).
The MicroBlaze instruction and data caches can be configured to use 4 or 8 word
cache lines. When using a longer cache line, more bytes are pre-fetched, which generally
improves performance for software with sequential access patterns.
However, for software with a more random access pattern the performance can
instead decrease for a given cache size. This is caused by a reduced cache hit rate due to
fewer available cache lines.
Processor Local Bus (PLB) Interface Description: The MicroBlaze PLB interfaces are
54
simple protocol to ensure that local block RAM are accessed in a single clock cycle. LMB
signals and definitions are shown in the following table. All LMB signals are active high.
The creation of the verification platform is optional and is based on the hardware
platform. The MHS file is taken as an input by the Sim-gen tool to create simulation files for
a specific simulator. Three types of simulation models can be generated by the Sim-gen tool:
behavioral, structural and timing models.
Some other useful tools available in EDK are Platform Studio which provides the
GUI for creating the MHS and MSS files. Create / Import IP Wizard which allows the
creation of the designer's own peripheral and import them into EDK projects. Bit stream
Initializer tool initializes the instruction memory of processors on the FPGA. GNU Compiler
tools are used for compiling and linking application executables for each processor in the
system [8]. There are two options available for debugging the application created using EDK
namely: Xilinx Microprocessor Debug (XMD) for debugging the application software using a
Microprocessor Debug Module (MDM) in the embedded processor system, and Software
Debugger that invokes the software debugger corresponding to the compiler being used for
the processor. Software Development Kit Xilinx Platform Studio Software Development Kit
(SDK) is an integrated development environment, complimentary to XPS, that is used for
C/C++ embedded software application creation and verification. The software application can
be written in a "C or C++" then the complete embedded processor system for user application
will be completed, else debug & download the bit file into FPGA. Then FPGA behaves like
processor implemented on it in a Xilinx Field Programmable Gate Array (FPGA) device.
using hardware description languages, the EDK enables the integration of both hardware and
software components of an embedded system.
For the hardware side, the design entry from VHDL/Verilog is first synthesized into a
gate-level netlist, and then translated into the primitives, mapped on the specific device
resources such as Look-up tables, flip-flops, and block memories. The location and
interconnections of these device resources are then placed and routed to meet with the timing
Constraints. A downloadable .bit file is created for the whole hardware platform. The
software side follows the standard embedded software flow to compile the source codes into
an executable and linkable file (ELF) format. Meanwhile, a microprocessor software
specification (MSS) file and a microprocessor hardware specification (MHS) file are used to
define software structure and hardware connection of the system. The EDK uses these files to
control the design flow and eventually merge the system into a single downloadable file. The
whole design runs on a real-time operating system (RTOS).
57
needs and make last minute changes. The FPGA based Design Flow consists of different
stages as shown in Fig.28.
Design Entry
Synthesis
Simulation
Implementation
58
use today. The most popular ones are VHDL (Very High Speed Integrated Circuit HDL),
Verilog HDL and Abel.
files
Machine-specific and language-specific compiler: Compiles C/C++ code
Assembler: Converts code to machine language and generates the object file
Linker: Links all the object files using user-defined or default linker script
59
4.17. JTAG
JTAG primary purpose is to allow a computer to take control of the state of all the IO
pins on a board. In turn, this allows each device connectivity to other devices on the board to
be tested. Standard JTAG commands can be used for this purpose.
FPGAs are JTAG-aware and so all the FPGA IO pins can be controlled from the
JTAG interface. FPGAs add the ability to be configured through JTAG (using proprietary
JTAG commands).
JTAG consists of 4 signals: TDI, TDO, TMS and TCK. A fifth pin, TRST, is optional.
A single JTAG port can connect to one or multiple devices (as long as they are all JTAGaware parts). With multiple devices, you create what is called a "JTAG chain". The TMS and
TCK are tied to all the devices directly, but the TDI and TDO form a chain: TDO from one
device goes to TDI of the next one in the chain. The master controlling the chain (a computer
usually) closes the chain.
4.18. RS232
As shown in Figure 4.7, the Spartan-3E Starter Kit board has two RS-232 serial ports:
a female DB9 DCE connector and a male DTE connector. The DCE-style port connects
directly to the serial port connector available on most personal computers and workstations
via a standard straight-through serial cable. Null modem, gender changers, or crossover
cables are not required.
Use the DTE-style connector to control other RS-232 peripherals, such as modems or
printers, or perform simple loop back testing with the DCE connector.
62
and to insure the data integrity, Start, Parity and Stop bits are added to the serial data. An
example of the UART frame format is shown in Figure 23 below.
Figure 4.9. UART Frame Format: (1 Start Bit, 8 Data Bits, 1 Parity Bit, 1 Stop Bit)
This design can also be instantiated many times to get multiple UARTs in the same
device. For easily embedding the design into a larger implementation, instead of using tristate buffers, the bi-directional data bus is separated into two buses, DIN and DOUT. The
transmitter and receiver both share a common internal Clk16X clock. This internal clock
which needs to be 16 times of the desired baud rate clock frequency is obtained from the onboard clock through the MCLK input directly.
4.19.2. Features
Inserts or extracts standard asynchronous communication bits (Start, Stop and Parity)
to or from the serial data.
Holding and shifting registers eliminate the need for precise synchronization between
the CPU and serial data.
Separate input and output data buses for use as an embedded module in a larger
design
64
4.20. Conclusion
In this chapter discuss about Hardware Implementation of project and description of the
each and every blocks in the block diagram.
CHAPTER -5
MATHEMATICAL ANALYSIS
65
5.1. Introduction
Any discussion of AES must begin with DES, the original Data Encryption Standard.
DES was selected as a Federal Information Processing Standard (FIPS) for the United States
in 1976. In 1977 the National Bureau of Standards (now the National Institute of Standards
and Technology, or NIST) adopted an IBM-designed cipher that encrypted 64-bit blocks
under 56- bit keys as the Data Encryption Standard (DES).
66
On October 2, 2000, NIST announced its choice for the Advanced Encryption
Standard: Rijndael (pronounced Rhine Dahl), an algorithm developed by two Belgian
cryptographers, Joan Daemen
mathematicians; the cryptosystem is quite algebraic. Rijndael repeats rounds, with the
number of rounds determined by key size. In the 128-bit key version, Rijndael runs for 10
rounds. As specified in the call for algorithms, Rijndael operates on a 128-bit block of data. It
divides the block into sixteen 8-bit bytes and treats these as elements of GF(28), defined by
the polynomial x8 + x4 + x3 + x + 1, which is irreducible over Z/2Z. The data are placed in a
4 x4 array, and all operations occur on the bytes of the array. Each round consists of four
operations: one transforms the bytes, one transforms the rows, one transforms the columns,
and one adds in the key. First, each of the bytes is modified by maps easily described in the
arithmetic of GF (28): inversion (with zero mapped to itself) and an affine transformation;
then the rows of the array are shifted circularly, with the bytes of row i moving i - 1 locations
to the right. Next the bytes in each column are mixed by multiplication: view the column
elements as coefficients of a polynomial of degree 3, and multiply this polynomial by
3x3
+ x2 + x + 2 modulo x4 + 1. The last operation is an XOR of the key bits with the elements of
the array.
The polynomials used for the field arithmetic were determined by two criteria: (a)
arithmetic efficiency and (b) resistance to cryptanalytic attack. Though DES was first cracked
by brute-force attack that searched the entire key space, linear and differential cryptanalysis
and weak keys are serious attacks on the security of the algorithm. Rijndaels multiplicative
map and affine transformation were chosen for their ability to resist these. The polynomial
3x3 + x2 + x + 2 was picked for its combination of fast multiplication and diffusion power.
(Diffusion is spreading changes in key or text bits into the cipher text.) NISTs evaluation
used published research from academic and industry experts and private advice from the
National Security Agency (NSA). NIST based its decision on security, efficiency, and
algorithm and implementation characteristics (including hardware and software suitability
and simplicity). Security is difficult to assess. The breaking of an algorithm is clear, but there
are no proofs of security, only proofs that an algorithm passes the tests we currently know to
perform. By contrast, results of efficiency tests, even though only using current technology,
provide more definitive information. Efficiency tests were conducted in a variety of venues,
including fast implementations in C++, Java, assembler code, FPGAs (Field Programmable
Gate Arrays) and ASICs (Application Specific Integrated Circuits).
67
All finalists were fine on these measures, but some were finer than others. Why did
NIST pick Rijndael? NIST judged the submission to be the best overall algorithm for the
AESRijndaels combination of security, performance, efficiency, implements ability, and
flexibility make it an appropriate selection for the AES. Rijndaels cryptographic complexity
rests on several well-studied cryptographic transformations, and the algorithm is easy to
describe. The algorithm performs efficiently on a variety of platforms (NIST noted that it was
a good performer in hardware and software across a wide range of computing
environments), and the algorithm is relatively easy to defend against power and timing
attacks. There were some comments that the polynomials chosen for Rijndaels primitives
might lead to breaks. But GF (2n) is a field that NSA knows well, and it is fair to assume that
Rijndael passed NSAs tests. Many of the finest minds in the field submitted candidates, and
the candidate algorithms were widely reviewed, criticized, and discussed by experts around
the world. As a result, AES is considered to be a high quality and trustworthy solution for
data encryption. AES became a government standard in 2002. In 2003, the U.S. Government
approved AES for use with classified information. Today, it is one of the most popular
algorithms used in symmetric key cryptography.
Implementation issues, such as key length support, keying restrictions, and additional
block/key/round sizes.
5.2.2. Definitions
1) Glossary of Terms and Acronyms
The following definitions are used throughout this standard:
Block: Sequence of binary bits that comprise the input, output, State, and Round
Key. The length of a sequence is the number of bits it contains. Blocks are also
interpreted as arrays of bytes.
Byte: A group of eight bits that is treated either as a single entity or as an array of 8
individual bits.
Ciphertext: Data output from the Cipher or input to the Inverse Cipher.
Key Expansion: Routine used to generate series of Round Keys from the Cipher Key.
Plaintext: Data input to the Cipher or output from the Inverse Cipher.
Round Key: Round keys are values derived from the Cipher Key using the Key
Expansion routine; they are applied to the State in the Cipher and Inverse Cipher.
State: Intermediate Cipher result that can be pictured as a rectangular array of bytes,
having four rows and Nbcolumns.
2) Mathematical Preliminaries: All bytes in the AES algorithm are interpreted as finite field
elements using the notation introduced in Sec. 2.2.3.2. Finite field elements can be added and
multiplied, but these operations are different from those used for numbers. The following
subsections introduce the basic mathematical concepts needed for Sec. 2.2.5.
3)Addition: The addition of two elements in a finite field is achieved by adding the
coefficients for the corresponding powers in the polynomials for the two elements.
For example, the following expressions are equivalent to one another:
Eq. 5.1.
It is also represented by {01}{1b} in hexadecimal notation. For example, {57} {83} = {c1},
because of the operations as shown:
The modular reduction by m(x) ensures that the result will be a binary polynomial of
degree less than 8, and thus can be represented by a byte. Unlike addition, there is no simple
operation at the byte level that corresponds to this multiplication. The multiplication defined
above is associative, and the element {01} is the multiplicative identity.
70
For any non-zero binary polynomial b(x) of degree less than 8, the multiplicative
inverse of b(x), denoted b-1(x), can be found as follows: the extended Euclidean algorithm is
used to compute polynomials a(x) and c(x) such that
It follows that the set of 256 possible byte values, with XOR used as addition and the
multiplication defined as above, has the structure of the finite field GF(2^8).
5) Multiplication by x: Multiplying the binary polynomial defined in Equation.) with the
polynomial x results in
Eq. 5.4
The resultb(x) is obtained by reducing the above result modulo m(x), as defined in
equation (2.2.4.1). If b7 = 0, the result is already in reduced form. If b7 = 1, the reduction is
accomplished by subtracting (i.e., XORing) the polynomial m(x). It follows that
multiplication by x (i.e., {00000010} or {02}) can be implemented at the byte level as a left
shift and a subsequent conditional bitwise XOR with {1b}. This operation on bytes is denoted
by xtime(). Multiplication by higher powers of x can be implemented by repeated application
of xtime(). By adding intermediate results, multiplication by any constant can be
implemented.
6) Polynomials with Coefficients in GF(28)
Four-term polynomials can be defined - with coefficients that are finite field elements - as:
Eq.5.5
which will be denoted as a word in the form [a0 , a1 , a2 , a3 ]. Note that the
polynomials in this section behave somewhat differently than the polynomials used in the
definition of finite field elements, even though both types of polynomials use the same
indeterminate, x. The coefficients in this section are themselves finite field elements, i.e.,
bytes, instead of bits; also, the multiplication of four-term polynomials uses a different
reduction polynomial, defined below. The distinction should always be clear from the
context.
To illustrate the addition and multiplication operations, let
Eq. 5.6
define a second four-term polynomial. Addition is performed by adding the finite field
coefficients of like powers of x. This addition corresponds to an XOR operation between the
corresponding bytes in each of the words in other words, the XOR of the complete word
71
Eq. 5.7
Multiplication is achieved in two steps. In the first step, the polynomial product c(x) = a(x)
b(x) is algebraically expanded, and like powers are collected to give
Eq. 5.8
Where,
Eq. 5.9
The result, c(x), does not represent a four-byte word. Therefore, the second step of the
multiplication is to reduce c(x) modulo a polynomial of degree 4; the result can be reduced to
a polynomial of degree less than 4. For the AES algorithm, this is accomplished with the
polynomial x4 + 1, so that
Eq.5.10
The modular product of a(x) and b(x), denoted by a(x) b(x), is given by the four-term
polynomial d(x), defined as follows:
Eq. 5.11
With
Eq.5.12
72
When a(x) is a fixed polynomial, the operation in Eq.2.2.4.11 can be written in matrix
form as:
Eq.5.13
Because x^41 is not an irreducible polynomial over GF(2^8), multiplication by a
fixed four-term polynomial is not necessarily invertible.
However, the AES algorithm specifies a fixed four-term polynomial that does have an
inverse.
73
The various operational blocks required and the state flow in our design consideration
of the AES-128 algorithm is shown here:
74
Eq. 5.17
for,0 -i-8 where bi is the ith bit of the byte, and ci is the ith bit of a byte c with the
value {63} or {01100011}. Here and elsewhere, a prime on a variable indicates that the
variable is to be updated with the value on the right.
In matrix form, the affine transformation element of the S-box can be expressed as:
The S-box used in the SubBytes() transformation is presented in hexadecimal form in Fig. 7.
For example, if S1,1={53}, then the substitution value would be determined by the intersection
of the row with index 5 and the column with index 3.
Fig 5.2: Substitution Values for the byte xy (in hexadecimal format)
9) ShiftRows() Transformation: In the ShiftRows()transformation, the bytes in the last three
rows of the State are cyclically shifted over different numbers of bytes (offsets). The first
row, r = 0, is not shifted. Specifically, the ShiftRows() transformation proceeds as follows:
Eq. 5.18
where the shift value shift (r, Nb) depends on the row number, r, as follows
(recall that Nb= 4):
shift(1,4) =1; shift(2,4) = 2 ; shift(3,4) = 3 .
Eq. (5.19)
This has the effect of moving bytes to lower positions in the row (i.e., lower values
of c in a given row), while the lowest bytes wrap around into the top of the row (i.e.,
higher values of c in a given row). Figure 8 illustrates the ShiftRows()transformation.
75
Fig 5.3.shift Rows () cyclically shifts the last three rows in the state
10) MixColumns() Transformation: The MixColumns() transformation operates on the
State column-by-column, treating each column as a four-term polynomial as described. The
columns are considered as polynomials over GF(28) and multiplied modulo x4 + 1 with a
fixed polynomial a(x), given by,
Eq.5.20
Eq.5.21
11). AddRoundKey() Transformation: In the AddRoundKey() transformation, a Round Key
is added to the State by a simple bitwise XOR operation. Each Round Key consists of
Nbwords from the key schedule Those Nbwords are each added into the columns of the
State, such that:
Eq.5.22
where [wi] are the key schedule words described in Sec. 2.2.5.2, and round is a value in the
range 0<= round <= Nr. In the Cipher, the initial Round Key addition occurs when round= 0,
prior to the first application of the round function (see Fig. 5). The application of the
AddRoundKey() transformation to the Nr rounds of the Cipher occurs when 1<= round <=
Nr. The action of this transformation is illustrated in Fig. 10, where l = round * Nb. The byte
address within words of the key schedule was described in.
76
12) Key Expansion: The AES algorithm takes the Cipher Key, K, and performs a Key
Expansion routine to generate a key schedule. The Key Expansion generates a total of Nb(Nr
+ 1) words: the algorithm requires an initial set of Nbwords, and each of the Nr rounds
requires Nbwords of key data. The resulting key schedule consists of a linear array of 4-byte
words, denoted [wi], with i in the range 0 <= i <Nb(Nr + 1). The expansion of the input key
into the key schedule proceeds according to the pseudo code.SubWord() is a function that
takes a four-byte input word and applies the S-box to each of the four bytes to produce an
output word. The function RotWord() takes a word [a0,a1,a2,a3] as input, performs a cyclic
permutation, and returns the word [a1,a2,a3,a0]. The round constant word array, Rcon[i],
contains the values given by [x^(i-1),{00},{00},{00}], with x^(i-1) being powers of x (x is
denoted as {02}) in the field GF(28), as discussed. (note that i starts at 1, not 0).From Fig. 11,
it can be seen that the first Nkwords of the expanded key are filled with the Cipher Key.
Every following word, w[i], is equal to the XOR of the previous word, w[i-1], and the word
Nkpositions earlier, w[i-Nk]. For words in positions that are a multiple of Nk, a
transformation is applied to w[i-1] prior to the XOR, followed by an XOR with a round
constant, Rcon[i]. This transformation consists of a cyclic shift of the bytes in a word
(RotWord()), followed by the application of a table lookup to all four bytes of the word
(SubWord()). It is important to note that the Key Expansion routine for 256-bit Cipher Keys
(Nk= 8) is slightly different than for 128- and 192-bit Cipher Keys. If Nk= 8 and i-4 is a
multiple of Nk, then SubWord() is applied to w[i-1] prior to the XOR.
13) Decryption ( Inverse Cipher Generation): The Cipher transformations in Section. can
be inverted and then implemented in reverse order to produce a straightforward Inverse
Cipher for the AES algorithm. The individual transformations used in the InverseCipher
InvShiftRows(),InvSubBytes(), InvMixColumns(), and AddRoundKey() process the State
and are described in the following subsections.The Inverse Cipher is described in the pseudo
code in Fig. 12. In Fig. 12, the array contains the key schedule, which was described
previously in
14.a. InvShiftRows() Transformation: InvShiftRows() is the inverse of the ShiftRows()
transformation. The bytes in the last three rows of the State are cyclically shifted over
different numbers of bytes (offsets). The first row, r = 0, is not shifted. The bottom three rows
are cyclically shifted by Nb- shift(r, Nb) bytes, where the shift value shift (r,Nb) depends on
the row number, and is given in equation.
Specifically, the InvShiftRows() transformation proceeds as follows:
77
Eq.5.23
Figure 13 illustrates the InvShiftRows() transformation.
InvMixColumns()Transformation:InvMixColumns()
is
the
inverse
of
the
Eq. 5.15
As a result of this multiplication, the four bytes in a column are replaced by the following:
described in Sec. 2.2.5.1.4, is its own inverse, since it only involves an application of the
XOR operation.
17)Equivalent Inverse Cipherthe adopted method to improve speed of Operation:
In the straightforward Inverse Cipher presented in Sec. 2.2.5.3 and Fig. 12, the sequence of
the transformations differs from that of the Cipher, while the form of the key schedules for
encryption and decryption remains the same. However, several properties of the AES
algorithm allow for an Equivalent Inverse Cipher that has the same sequence of
transformations as the Cipher (with the transformations replaced by their inverses). This is
accomplished with a change in the key schedule.
The two properties that allow for this Equivalent Inverse Cipher are as follows:
1. The SubBytes() and ShiftRows() transformations commute; that is, a SubBytes()
transformation immediately followed by a ShiftRows() transformation is equivalent
to a ShiftRows() transformation immediately followed buy a SubBytes()
transformation. The same is true for their inverses, InvSubBytes() and InvShiftRows.
2. The column mixing operations MixColumns() and InvMixColumns() are linear
with respect to the column input, which means
InvMixColumns (stateXORRoundKey)=InvMixColumns(state) XOR
InvMixColumns(Round Key).
79
These
properties
allow
the
order
of
InvSubBytes()
and
InvShiftRows()
Keying Restrictions: No weak or semi-weak keys have been identified for the AES
algorithm, and there is no restriction on key selection.
Parameterization of Key Length, Block Size, and Round Number: This standard
explicitly defines the allowed values for the key length (Nk), block size (Nb), and number
of rounds (Nr) see Fig. 4. However, future reaffirmations of this standard could include
changes or additions to the allowed values for those parameters. Therefore, implementers
may choose to design their AES implementations with future flexibility in mind.
instead.
Also,
in
Mixcolumns()
and
Invmixcolumns()
80
The linear mixing layer: guarantees high diffusion over multiply rounds
The non-linear layer: parallel application of S-boxes that have the optimum worstcase non-linearity properties.
The key addition layer: a simple XOR of the round key to the intermediate state
Key lengths of 128, 192, and 256 bits are supported. Each step in key size requires
only two additional rounds. The decipher is simply, the inverse of the cipher.
By using a true low level bit-serial approach, minimum cost AES co-processor
architecture can be achieved. This architecture can be used in many military,
industrial, and commercial applications that require compactness and low cost.
It has much higher strength of the key security as compared to that of the asymmetric
key cryptographic methods such as RSA, Elliptical Curve Cryptography.
It is more resistant to theoretical attacks such as linear and differential crypt analysis
and weak keys. And also resistant to various attacks on implementations such as
timing and power attacks.
It occupies minimum space due its inherent properties of modularity, regularity and
availability that greatly helps in instruction level parallelism potentialities.
81
then the number of operations required would be a function, O (2^n). Then one can hardly
imagine the exhaustive search that may find the secret key required in the 128-, 192-, or 256bit key spaces. For a chosen 128-bit key space, the effort required would be 2^128, which is
a magnificent 3 x 10 E 38. Then even with an approximately trillion number of chips that
would operate at 1000GHz frequency, it would take at least a million years to exhaustively
search a 128-bit key space and hence, one need not again say of the next higher 192-bit or
256-bit key space strength. The analysis figure below would represent the rough estimation of
finding a secret key from AES algorithm.
82
And the storage requirement to allocate such huge number of encryption and
decryption operations (to construct the two tables in order to assist in searching the required
secret key) for the key space would also be analogously a large amount. Thus, if we are ready
to afford these enormous costs and the unimaginably large electric bills particularly meeting
the above said conditions, at least for a million years continuously, perhaps we may break the
secret key!!!
5.4.3. Limitations and the possible attacks:
The main limitation of the Advanced Encryption Algorithm which is a major
development in symmetric key algorithms would be same as that of the major drawback of
the conventional cryptography that is the distribution of the secret key between the two
communicating parties without the third-party intervention would be the major weak link.
No matter how strong a cryptosystem would be, if an intruder could steal the key at least
while communicating through the weak channel, the whole system would render useless. So,
it has to take advantage of the public key algorithms at least for the purpose of safe keydistribution through the channel.
Another major offset is that AES is quite susceptible to the new type attack on the
cache behavior, if implemented in a Microprocessor/ DSP-based processor. If the attacker
can access the machine where AES runs, secret key can be retrieved in a fraction of a second.
Perhaps this type of attack can be minimized in our present idea of implementation through
the programmable logic devices such as FPGAs, CPLDs, ASICs which would act as virtual
processors that completely minimizes the burden on the actual processors.
83
5.6. Applications
Vendors of both hardware and software have enthusiastically adopted AES. Because
AES uses a simple and efficient algorithm, using it as an encryption specification decreases
system complexity, lowers costs, and promotes interoperability. There are many areas where
AES is now in commercial use.
84
5.7. Conclusion:
In This chapter Deals with Mathematical preliminaries and overview of project.
85
CHAPTER-6
OUTPUT VERIFICATION
6.1. Introduction
The functional verification was carried out for all the test cases and hence the Xilinx
platform studio is taken to the synthesis process using the Xilinx tool.
86
DECRYPTION:
87
CHAPTER 7
CONCLUSION AND FUTURE SCOPE
7.1 Introduction
The main aim of the project is to provide security for the Encrypted and Decrypted
data. These algorithms can be used for many applications. They are as follows.
7.2 Applications
1. This standard may be used by Federal department and agencies when an agency
determines that sensitive (unclassified) information (as defined in P.L. 100-235)
require cryptographic protection.
2. Security purposes.
3. Medical field.
4. Network Security.
5. online bank security.
6. Secure video teleconferencing.
7. Routers and remote access servers
8. High speed ATM/ Ethernet/Fiber-channel switches.
9. In addition , This standard may be adopted and used by non-Federal Government
organizations. Such use is encouraged when it provides the desired security for
commercial and private organizations.
7.3. Advantages
1
Through AES, input message of length 128 bits can be encrypted which is more than
ASE has the various secret key lengths such as 128 bits, 192 bits and 256 bits,
Whereas DES and Triple DES have fixed length of 64 bits.
The cipher key is expanded into a larger key, which is later used for the actual
operation.
The expanded key shall Always be derived from the cipher and never be specified
5
6
directly.
AES is very hard to attack or crack when compared to DES.
AES will be faster when compared to the Triple DES.
88
7.4 Conclusion
The project work aims at implementing the secure data communication between any
two users based on the realization of advanced Symmetric-key Cryptographic algorithm
called Advanced Encryption Standard (AES) on an FPGA based processor.Basically, starting
with the selection of highly-structured and immensely secure Advanced Encryption Standard
Algorithm, and making suitable modifications in the AES algorithm to improve the Speed
and the Parallelism of instruction execution, which is designed selectively in a superior
Description Language System C, simulated with a powerful debugging tool from Hyper
terminal, Spartan 3 EDK kit, and then synthesized in Xilinx Platform Studio with Speed as an
optimization goal aimed at reducing the unrelated logic and improving the maximum clockrate particularly targeted on a low cost, high speed and highly efficient architectural FPGA
chip SPARTAN-III-EDK using the low cost and Graphical User- Friendly (GUI)
configuration tool from SANDS, FPGA/CPLD Development Platform Software v 1.1, we
have ultimately achieved the proven tremendous performance and cost-effective parameters
of the hardware implementation of the Advanced Encryption Algorithm (AES) that suits the
greatest security demands from a wide variety of users and applications.
So, In future, there is a definite hope of vast utilization of the improved versions of
AES processors such as APES and ADES, wherein we may witness much greater security
due to increased key length as well as bit length and the enormous speeds of even the bulk
encryption/decryption achieved by employing sophisticated parallel execution schemes.
89
even compared with AES algorithm. Even the Speed of the bulk encryption/decryption
can be improved because of the Parallel Schemes employed.
2. Improvement in security: The probability of cracking the key becomes much less and
hence, the transmitted data will be more secure. Improvement in security may further be
possible by completely eliminating not only the precise timing attacks but also all the rest
of the side-channel attacks.
3. Improvements in FPGA and EDA tools: Modified algorithms would demand
implementations increasingly in FPGA rather than the DSP domain due to the further
possible growth in the fast processing, low power consumption and reduced size of VLSI
and evolution of the powerful EDA tools to implement.
CHAPTER-8
REFERENCES
90
[1] S. Sau , C. Pal and A Chakrabarti Design and Implementation of Real Time Secured
RS232 Link for Multiple FPGA Communication, Proc. Of International Conference on
Communication, Computing & Security,2011, ISBN - 978- 1-4503-0464- 1.
[2] C. D. Walter. August 1999. Montgomery's Multiplication Technique: How to Make It
Smaller and Faster. Cryptographic Hardware and Embedded Systems, Lecture Notes in
Computer Science, Springer.No. 17 17. pp. 80-93.
[3] A Mazzeo, L. Romano, G. P. Saggese and N. Mazzocca. 2003. FPGABased
Implementation of a Serial RSA Processor. Design. Proceedings of the conference on
Design, Automation and Test in Europe - Volume I. ISBN:O- 7695- 1870-2 .
[4] xilkernel_v3.00.pdf on www.xilinx .com.
[5] R. L. Rivest et al. 1978. A Method for Obtaining Digital Signatures and Public-Key
Cryptosystems. Communications of the ACM. Vol. 2 1. pp. 120- 126.
[6] Cryptography & Network Security ByBehrouzAForouzan.
[7] Montgomery Algorithm for Modular Multiplication Professor Dr. D. J. Guan ,August
25, 2003.
[8] RSA & Public Key Cryptography in FPGAs, John Fry, Martin Langhammer Altera
Corporation - Europe
[9] A. Tenca, C. Koc. 1999. A Scalable Architecture for Montgomery Multiplication.
Cryptographic Hardware and Embedded Systems, Lecture Notes in Computer Science, No.
17 17, pp. 94- 108.
[10]. A. Tenca, G. Todorov, C. Koc. May 200 1.High-radix design of a scalable modular
multiplier. Cryptographic Hardware and Embedded Systems, Lecture Notes in Computer
Science, Springer. No. 2 162.pp. 185- 20 1. [II] High-Speed RSA Implementation, Cetin
Kaya Koc, November 1994, Version 2.0, ftp://ftp.rsa.comlpub/pdfs/tr20I.pdf.
[ 12] ] http://csrc.nist.gov/publications/fips/fipsI97Ifips-197.pdf.
[ 13] http://www.design-reuse.comlarticlesIl398 1 /fpga-implementation-ofaes- encryptio nand-decryptio n. html.
[ 14]B. Schneier. 1996. Applied Cryptography, Protocols, Algorithms, and Source Code in
C, John Wiley and Sons Inc. 2nd Edition. New York, U.S.A.
[ 15] G.B. Arfken, D.F. Griffing , D.C. Kelly and J priest. University Physics San Diego, CA
Harcourt Brace, Jovanovich Publishers , 1989.
[ 16] http://www.techmaish.comlmaximum-internet-speed-available-in-theworld/.
[ 17] D. E. knuth , The Art of Computer Programming Seminumeritical Algorithm, Volume
2, Reading M.A. : Addison Wasley, Second Edition, 198 1.
91
CHAPTER 9
BIBILOGRAPHY
9.1. Book References:
92
APPENDIX A
SYSTEM C CODE
#include <stdio.h>
93
#include <string.h>
#define MAXBC
(256/32)
#define MAXKC
(256/32)
#define MAXROUNDS 14
#define SC
((BC - 4) >> 1)
3, 5,
4, 4
};
word8Logtable[256] = {
0, 0, 25, 1, 50, 2, 26, 198, 75, 199, 27, 104, 51, 238, 223, 3,
100, 4, 224, 14, 52, 141, 129, 239, 76, 113, 8, 200, 248, 105, 28, 193,
125, 194, 29, 181, 249, 185, 39, 106, 77, 228, 166, 114, 154, 201, 9, 120,
101, 47, 138, 5, 33, 15, 225, 36, 18, 240, 130, 69, 53, 147, 218, 142,
150, 143, 219, 189, 54, 208, 206, 148, 19, 92, 210, 241, 64, 70, 131, 56,
102, 221, 253, 48, 191, 6, 139, 98, 179, 37, 226, 152, 34, 136, 145, 16,
126, 110, 72, 195, 163, 182, 30, 66, 58, 107, 40, 84, 250, 133, 61, 186,
43, 121, 10, 21, 155, 159, 94, 202, 78, 212, 172, 229, 243, 115, 167, 87,
175, 88, 168, 80, 244, 234, 214, 116, 79, 174, 233, 213, 231, 230, 173, 232,
44, 215, 117, 122, 235, 22, 11, 245, 89, 203, 95, 176, 156, 169, 81, 160,
127, 12, 246, 111, 23, 196, 73, 236, 216, 67, 31, 45, 164, 118, 123, 183,
204, 187, 62, 90, 251, 96, 177, 134, 59, 82, 161, 108, 170, 85, 41, 157,
151, 178, 135, 144, 97, 190, 220, 252, 188, 149, 207, 205, 55, 63, 91, 209,
83, 57, 132, 60, 65, 162, 109, 71, 20, 42, 158, 93, 86, 242, 211, 171,
68, 17, 146, 217, 35, 32, 46, 137, 180, 124, 184, 38, 119, 153, 227, 165,
103, 74, 237, 222, 197, 49, 254, 24, 13, 99, 140, 128, 192, 247, 112, 7,
};
word8Alogtable[256] = {
1, 3, 5, 15, 17, 51, 85, 255, 26, 46, 114, 150, 161, 248, 19, 53,
95, 225, 56, 72, 216, 115, 149, 164, 247, 2, 6, 10, 30, 34, 102, 170,
229, 52, 92, 228, 55, 89, 235, 38, 106, 190, 217, 112, 144, 171, 230, 49,
83, 245, 4, 12, 20, 60, 68, 204, 79, 209, 104, 184, 211, 110, 178, 205,
76, 212, 103, 169, 224, 59, 77, 215, 98, 166, 241, 8, 24, 40, 120, 136,
131, 158, 185, 208, 107, 189, 220, 127, 129, 152, 179, 206, 73, 219, 118, 154,
181, 196, 87, 249, 16, 48, 80, 240, 11, 29, 39, 105, 187, 214, 97, 163,
254, 25, 43, 125, 135, 146, 173, 236, 47, 113, 147, 174, 233, 32, 96, 160,
251, 22, 58, 78, 210, 109, 183, 194, 93, 231, 50, 86, 250, 21, 63, 65,
195, 94, 226, 61, 71, 201, 64, 192, 91, 237, 44, 116, 156, 191, 218, 117,
95
159, 186, 213, 100, 172, 239, 42, 126, 130, 157, 188, 223, 122, 142, 137, 128,
155, 182, 193, 88, 232, 35, 101, 175, 234, 37, 111, 177, 200, 67, 197, 84,
252, 31, 33, 99, 165, 244, 7, 9, 27, 45, 119, 153, 176, 203, 70, 202,
69, 207, 74, 222, 121, 139, 134, 145, 168, 227, 62, 66, 198, 81, 243, 14,
18, 54, 90, 238, 41, 123, 141, 140, 143, 138, 133, 148, 167, 242, 13, 23,
57, 75, 221, 124, 132, 151, 162, 253, 28, 36, 108, 180, 199, 82, 246, 1,
};
word8 S[256] = {
99, 124, 119, 123, 242, 107, 111, 197, 48, 1, 103, 43, 254, 215, 171, 118,
202, 130, 201, 125, 250, 89, 71, 240, 173, 212, 162, 175, 156, 164, 114, 192,
183, 253, 147, 38, 54, 63, 247, 204, 52, 165, 229, 241, 113, 216, 49, 21,
4, 199, 35, 195, 24, 150, 5, 154, 7, 18, 128, 226, 235, 39, 178, 117,
9, 131, 44, 26, 27, 110, 90, 160, 82, 59, 214, 179, 41, 227, 47, 132,
83, 209, 0, 237, 32, 252, 177, 91, 106, 203, 190, 57, 74, 76, 88, 207,
208, 239, 170, 251, 67, 77, 51, 133, 69, 249, 2, 127, 80, 60, 159, 168,
81, 163, 64, 143, 146, 157, 56, 245, 188, 182, 218, 33, 16, 255, 243, 210,
205, 12, 19, 236, 95, 151, 68, 23, 196, 167, 126, 61, 100, 93, 25, 115,
96, 129, 79, 220, 34, 42, 144, 136, 70, 238, 184, 20, 222, 94, 11, 219,
224, 50, 58, 10, 73, 6, 36, 92, 194, 211, 172, 98, 145, 149, 228, 121,
231, 200, 55, 109, 141, 213, 78, 169, 108, 86, 244, 234, 101, 122, 174, 8,
186, 120, 37, 46, 28, 166, 180, 198, 232, 221, 116, 31, 75, 189, 139, 138,
112, 62, 181, 102, 72, 3, 246, 14, 97, 53, 87, 185, 134, 193, 29, 158,
225, 248, 152, 17, 105, 217, 142, 148, 155, 30, 135, 233, 206, 85, 40, 223,
140, 161, 137, 13, 191, 230, 66, 104, 65, 153, 45, 15, 176, 84, 187, 22,
};
word8 Si[256] = {
82, 9, 106, 213, 48, 54, 165, 56, 191, 64, 163, 158, 129, 243, 215, 251,
124, 227, 57, 130, 155, 47, 255, 135, 52, 142, 67, 68, 196, 222, 233, 203,
84, 123, 148, 50, 166, 194, 35, 61, 238, 76, 149, 11, 66, 250, 195, 78,
8, 46, 161, 102, 40, 217, 36, 178, 118, 91, 162, 73, 109, 139, 209, 37,
114, 248, 246, 100, 134, 104, 152, 22, 212, 164, 92, 204, 93, 101, 182, 146,
108, 112, 72, 80, 253, 237, 185, 218, 94, 21, 70, 87, 167, 141, 157, 132,
96
144, 216, 171, 0, 140, 188, 211, 10, 247, 228, 88, 5, 184, 179, 69, 6,
208, 44, 30, 143, 202, 63, 15, 2, 193, 175, 189, 3, 1, 19, 138, 107,
58, 145, 17, 65, 79, 103, 220, 234, 151, 242, 207, 206, 240, 180, 230, 115,
150, 172, 116, 34, 231, 173, 53, 133, 226, 249, 55, 232, 28, 117, 223, 110,
71, 241, 26, 113, 29, 41, 197, 137, 111, 183, 98, 14, 170, 24, 190, 27,
252, 86, 62, 75, 198, 210, 121, 32, 154, 219, 192, 254, 120, 205, 90, 244,
31, 221, 168, 51, 136, 7, 199, 49, 177, 18, 16, 89, 39, 128, 236, 95,
96, 81, 127, 169, 25, 181, 74, 13, 45, 229, 122, 159, 147, 201, 156, 239,
160, 224, 59, 77, 174, 42, 245, 176, 200, 235, 187, 60, 131, 83, 153, 97,
23, 43, 4, 126, 186, 119, 214, 38, 225, 105, 20, 99, 85, 33, 12, 125,
};
word32rcon[30] = {
0x01,0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36, 0x6c, 0xd8, 0xab,
0x4d, 0x9a, 0x2f, 0x5e, 0xbc, 0x63, 0xc6, 0x97, 0x35, 0x6a, 0xd4, 0xb3, 0x7d,
0xfa, 0xef, 0xc5, 0x91, };
uint
initial_key[]={0xd5d0d92a,0xd3a90372,0x9089018b,0x9fca4c3b,0x53198a16,0x561ce01f} ;
uint
initial_data[]={0x12121212,0x22334455,0x00000000,0x00000000,0x00000000,0x00000000
,0x00000000,0x00000000} ;
uint
last_data[]={0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0
x00000000,0x00000000} ;
int main()
{
intdata_num=256 ;
intkey_num=192 ;
xil_printf("\n**** Key length is : %d\n",key_num) ;
xil_printf("\n**** Data length is : %d\n",data_num) ;
97
xil_printf("This is Encryption") ;
main_aes(initial_key,initial_data,key_num,data_num,1,last_data) ;
xil_printf("\nThis is Decryption") ;
main_aes(initial_key,last_data,key_num,data_num,0,initial_data) ;
return 0 ;
}
//////////////////////////////////////////////////////////////////////////
void main_aes(uintfirst_key[],uintdatain[],intkey_bits,intblock_bits,intenc_dec,uintdataout[])
{
inti,j ;
uinttemp_byte ;
uinttemp_data[8] ;
uinttemp_key[6] ;
word8 data[4][MAXBC]={
0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0
};
word8 initial_key[4][MAXKC]={
0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0
};
word8 keys[MAXROUNDS+1][4][MAXBC] ;
xil_printf("\nFirst_key is : \n") ;
98
print_result(first_key,key_bits/32) ;
xil_printf("\nDatain is : \n") ;
print_result(datain,block_bits/32) ;
for(i=0 ; i < (key_bits/32) ; i++)
temp_key[i]=first_key[i] ;
for(i=0 ; i < (key_bits/32) ; i++)
for (j=0 ; j < 4 ; j++)
{
temp_byte = temp_key[i] ;
temp_byte = temp_byte<< (j*8) ;
initial_key[j][i] = ((temp_byte& 0xff000000) >> 24 ) ;
}
for(i=0 ; i < (block_bits/32) ; i++)
temp_data[i]=datain[i] ;
for(i=0 ; i < (block_bits/32) ; i++)
for (j=0 ; j < 4 ; j++)
{
temp_byte = temp_data[i] ;
temp_byte = temp_byte<< (j*8);
data[j][i] = ((temp_byte& 0xff000000) >> 24 );
}
/* xil_printf("key\n") ;
for(i=0 ; i < 4 ; i++)
{
for ( j=0 ; j < (key_bits/32) ; j++)
xil_printf(" %x ",initial_key[i][j]) ;
xil_printf("\n");
}*/
99
xil_printf("Data is : \n") ;
for(i=0 ; i < 4 ; i++)
{
for ( j=0 ; j < (block_bits/32) ; j++)
xil_printf(" %x ",data[i][j]) ;
xil_printf("\n");
}
rijndaelKeySched ( initial_key , key_bits , block_bits , keys ) ;
if ( enc_dec == 1 )
rijndaelEncrypt ( data , key_bits , block_bits , keys ) ;
else
rijndaelDecrypt ( data , key_bits , block_bits , keys ) ;
xil_printf("Data after encry_decry is \n") ;
for(i=0 ; i < 4 ; i++)
{
for ( j=0 ; j < (block_bits/32) ; j++)
xil_printf(" %x ",data[i][j]) ;
xil_printf("\n");
}
for ( i=0 ; i< (block_bits/32) ; i++ )
{
temp_data[i] = 0 ;
for (j=0 ; j < 4 ; j++)
{
temp_byte = 0 ;
temp_byte = data[j][i] ;
temp_byte = temp_byte<< (24-j*8) ;
temp_data[i] = temp_data[i] | temp_byte ;
}
100
}
for(i=0 ; i < (block_bits/32) ; i++)
dataout[i]=temp_data[i] ;
xil_printf("\nDataout is : \n") ;
print_result(dataout,block_bits/32) ;
xil_printf("\n") ;
}
/************************************************************************/
word8mul(word8 a, word8 b) {
if (a && b) return Alogtable[(Logtable[a] + Logtable[b])%255];
else return 0;
}
/************************************************************************/
voidKeyAddition(word8 a[4][MAXBC], word8 rk[4][MAXBC], word8 BC) {
int i, j;
for(i = 0; i < 4; i++)
for(j = 0; j < BC; j++) a[i][j] ^= rk[i][j];
}
/************************************************************************/
voidShiftRow(word8 a[4][MAXBC], word8 d, word8 BC) {
word8tmp[MAXBC];
int i, j;
for(i = 1; i < 4; i++) {
for(j = 0; j < BC; j++) tmp[j] = a[i][(j + shifts[SC][i][d]) % BC];
for(j = 0; j < BC; j++) a[i][j] = tmp[j];
}
101
}
/************************************************************************/
void Substitution(word8 a[4][MAXBC], word8 box[256], word8 BC) {
int i, j;
for(i = 0; i < 4; i++)
for(j = 0; j < BC; j++) a[i][j] = box[a[i][j]] ;
}
/************************************************************************/
voidMixColumn(word8 a[4][MAXBC], word8 BC) {
word8 b[4][MAXBC];
int i, j;
for(j = 0; j < BC; j++)
for(i = 0; i < 4; i++)
b[i][j] = mul(2,a[i][j])
^ mul(3,a[(i + 1) % 4][j])
^ a[(i + 2) % 4][j]
^ a[(i + 3) % 4][j];
for(i = 0; i < 4; i++)
for(j = 0; j < BC; j++) a[i][j] = b[i][j];
}
/************************************************************************/
voidInvMixColumn(word8 a[4][MAXBC], word8 BC) {
word8 b[4][MAXBC];
int i, j;
for(j = 0; j < BC; j++)
for(i = 0; i < 4; i++)
b[i][j] = mul(0xe,a[i][j])
^ mul(0xb,a[(i + 1) % 4][j])
102
^ mul(0xd,a[(i + 2) % 4][j])
^ mul(0x9,a[(i + 3) % 4][j]);
for(i = 0; i < 4; i++)
for(j = 0; j < BC; j++) a[i][j] = b[i][j];
}
/************************************************************************/
intrijndaelKeySched
(word8
k[4][MAXKC],
intkeyBits,
intblockBits,
word8
W[MAXROUNDS+1][4][MAXBC]) {
int KC, BC, ROUNDS;
int i, j, t, rconpointer = 0;
word8tk[4][MAXKC];
switch (keyBits) {
case 128: KC = 4; break;
case 192: KC = 6; break;
case 256: KC = 8; break;
default : return (-1);
}
switch (blockBits) {
case 128: BC = 4; break;
case 192: BC = 6; break;
case 256: BC = 8; break;
default : return (-2);
}
switch (keyBits>= blockBits ? keyBits : blockBits) {
case 128: ROUNDS = 10; break;
case 192: ROUNDS = 12; break;
case 256: ROUNDS = 14; break;
default : return (-3);
}
for(j = 0; j < KC; j++)
103
(word8
a[4][MAXBC],
intkeyBits,
intblockBits,
word8
rk[MAXROUNDS+1][4][MAXBC])
{
int r, BC, ROUNDS;
104
switch (blockBits) {
case 128: BC = 4; break;
case 192: BC = 6; break;
case 256: BC = 8; break;
default : return (-2);
}
switch (keyBits>= blockBits ? keyBits : blockBits) {
case 128: ROUNDS = 10; break;
case 192: ROUNDS = 12; break;
case 256: ROUNDS = 14; break;
default : return (-3);
}
KeyAddition(a,rk[0],BC);
for(r = 1; r < ROUNDS; r++) {
Substitution(a,S,BC);
ShiftRow(a,0,BC);
MixColumn(a,BC);
KeyAddition(a,rk[r],BC);
}
Substitution(a,S,BC);
ShiftRow(a,0,BC);
KeyAddition(a,rk[ROUNDS],BC);
return 0;
}
intrijndaelDecrypt
(word8
a[4][MAXBC],
intkeyBits,
intblockBits,
word8
rk[MAXROUNDS+1][4][MAXBC])
{
int r, BC, ROUNDS;
switch (blockBits) {
105
}
/
***************************************************************************
****/
107