Advanced Encryption Standard AES

Advanced Encryption Standard
CHAPTER 1 INTRODUCTION
1.1 Introduction to cryptography Serge Vaudenay, in his book "A classical introduction to cryptography", writes: Cryptography is the science of information and communication security. Cryptography is the science of secret codes, enabling the confidentiality of communication through an insecure channel. It protects against unauthorized parties by preventing unauthorized alteration of use. Generally speaking, it uses a cryptographic system to transform a plaintext into a cipher text, using most of the time a key. One has to notice that there exist certain ciphers that don't need a key at all. A simple Caesar-cipher that obscures text by replacing each letter with the letter thirteen places down in the alphabet. Since our alphabet has 26 characters, it is enough to encrypt the cipher text again to retrieve the original message. Let me just mention briefly that there are secure public-key ciphers, like the famous and very secure Rivest-Shamir-Adleman (commonly called RSA) that uses a public key to Encrypt a message and a secret key to decrypt it. Cryptography is a very important domain in computer science with many applications. The most famous example of cryptography is certainly the Enigma machine, the legendary cipher machine used by the German Third Reich to encrypt their messages, whose security breach ultimately led to the defeat of their submarine force. Before continuing, please read carefully the legal issues involving cryptography as in several Countries even the domestic use of cryptography is prohibited: Cryptography has long been of interest to intelligence gathering agencies and law enforcement agencies. Because of its facilitation of privacy, and the diminution of privacy attendant on its prohibition, cryptography is also of considerable interest to civil rights supporters. Accordingly, there has been a history of controversial legal issues surrounding cryptography, especially since the advent of inexpensive computers has made possible widespread access to high quality cryptography. In some countries, even the domestic use of cryptography is, or has been, restricted. Until 1999, France significantly restricted the use of cryptography domestically. In China, a license is still required to use cryptography. Many countries
Advanced Encryption Standard have tight restrictions on the use of cryptography. Among the more restrictive are laws in Belarus, Kazakhstan, Mongolia, Pakistan, Russia, Singapore, Tunisia, Venezuela, and Vietnam. In the United States, cryptography is legal for domestic use, but there has been much conflict over legal issues related to cryptography. One particularly important issue has been the export of cryptography and cryptographic software and hardware. Because of the importance of cryptanalysis in World War II and an expectation that cryptography would continue to be important for national security, many western governments have, at some point, strictly regulated export of cryptography. After World War II, it was illegal in the US to sell or distribute encryption technology overseas; in fact, encryption was classified as munitions, like tanks and nuclear weapons. Until the advent of the personal computer and The Internet, this was not especially problematic. Good cryptography is Indistinguishable from bad cryptography for nearly all users, and in any case most of the cryptographic techniques generally available were slow and error prone whether good or bad. However, as the Internet grew and computers became more widely available, high quality encryption techniques became well-known around the globe. As a result, export controls came to be seen to be an impediment to commerce and to research. This standard specifies the Rijndael algorithm, a symmetric block cipher that can process data blocks of 128 bits, using cipher keys with lengths of 128, 192, and 256 bits. Rijndael was designed to handle additional block sizes and key lengths; however they are not adopted in this standard. Throughout the remainder of this standard, the algorithm specified herein will be referred to as the AES algorithm. The algorithm may be used with the three different key lengths indicated above, and therefore these different flavors may be referred to as AES-128, AES-192, and AES-256. The Advanced Encryption Standard, in the following referenced as AES, is the winner of the contest, held in 1997 by the US Government, after the Data Encryption Standard was found too weak because of its small key size and the technological advancements in processor power. Fifteen candidates were accepted in 1998 and based on public comments the pool was reduced to five finalists in 1999. In
Advanced Encryption Standard October 2000, one of these five algorithms was selected as the forthcoming standard: a slightly modified version of the Rijndael. The Rijndael, whose name is based on the names of its two Belgian inventors, Joan Daemen and Vincent Rijmen, is a Block cipher, which means that it works on fixed-length group of bits, which are called blocks. It takes an input block of a certain size, usually 128, and produces a corresponding output block of the same size. The transformation requires a second input, which is the secret key. It is important to know that the secret key can be of any size (depending on the cipher used) and that AES uses three different key sizes: 128, 192 and 256 bits. To encrypt messages longer than the block size, a mode of operation is chosen, which I will explain at the very end of this tutorial, after the implementation of AES. While AES supports only block sizes of 128 bits and key sizes of 128, 192 and 256 bits, the original Rijndael supports key and block sizes in any multiple of 32, with a minimum of 128 and a maximum of 256 bits. Unlike DES, which is based on a Feistel network, AES is a substitutionpermutation network, which is a series of mathematical operations that use substitutions (also called S-Box) and permutations (P-Boxes) and their careful definition implies that each output bit depends on every input bit. 1.2 Description of the Advanced Encryption Standard Algorithm AES is an iterated block cipher with a fixed block size of 128 and a variable key length. The different transformations operate on the intermediate results, called state. The state is a rectangular array of bytes and since the block size is 128 bits, which is 16 bytes, the rectangular array is of dimensions 4x4. (In the Rijndael version with variable block size, the row size is fixed to four and the numbers of columns vary. The number of columns is the block size divided by 32 and denoted Nb). The cipher key is similarly pictured as a rectangular array with four rows. The number of columns of the cipher key, denoted Nk, is equal to the key length divided by 32.
Advanced Encryption Standard A state: ----------------------------| a0, 0 | a0, 1 | a0, 2 | a0, 3 | | a1, 0 | a1, 1 | a1, 2 | a1, 3 | | a2, 0 | a2, 1 | a2, 2 | a2, 3 | | a3, 0 | a3, 1 | a3, 2 | a3, 3 | ----------------------------A key: ----------------------------| k0, 0 | k0, 1 | k0, 2 | k0, 3 | | k1, 0 | k1, 1 | k1, 2 | k1, 3 | | K2, 0 | K2, 1 | K2, 2 | K2, 3 | | k3, 0 | k3, 1 | k3, 2 | k3, 3 | It is very important to know that the cipher input bytes are mapped onto the state bytes in the order a0,0, a1,0, a2,0, a3,0, a0,1, a1,1, a2,1, a3,1 ... and the bytes of the cipher key are mapped onto the array in the order k0,0, k1,0, k2,0, k3,0, k0,1, k1,1, k2,1, k3,1 ... At the end of the cipher operation, the cipher output is extracted from the state by taking the state bytes in the same order. AES uses a variable number of rounds, which are fixed: A key of size 128 has 10 rounds. A key of size 192 has 12 rounds. A key of size 256 has 14 rounds. During each round, the following operations are applied on the state: 1. Sub Bytes: every byte in the state is replaced by another one, using the Rijndael S-Box. 2. Shift Row: every row in the 4x4 array is shifted a certain amount to the left. 3. Mix Column: a linear transformation on the columns of the state. 4. AddRoundKey: each byte of the state is combined with a round key, which is a different key for each round and derived from the Rijndael key schedule. In the final round, the Mix Column operation is omitted. The algorithm looks like the following (pseudo-C): AES (state, Cipher Key) { Key Expansion (Cipher Key, Expanded Key); AddRoundKey (state, Expanded Key);
Advanced Encryption Standard For (i = 1; i < Nr; i++) { Round (state, Expanded Key + Nb*i); } Final Round (state, Expanded Key + Nb * Nr); }
Observations: The cipher key is expanded into a larger key, which is later used for the The round Key is added to the state before starting the with loop. The Final Round () is the same as Round (), apart from missing the Mix During each round, another part of the Expanded Key is used for the The Expanded Key shall ALWAYS be derived from the Cipher Key and
actual operations.
Columns () operation. operations. never be specified directly.
CHAPTER 2 VLSI DESIGN FLOW

2.1 INTRODUCTION The word digital has made a dramatic impact on our society. More significant is a continuous trend towards digital solutions in all areas from electronic instrumentation, control, data manipulation, signals processing, telecommunications etc., to consumer electronics. Development of such solutions has been possible due to good digital system design and modeling techniques. 2.2 CONVENTIONAL APPROACH TO DIGITAL DESIGN Digital ICs of SSI and MSI types have become universally standardized and have been accepted for use. Whenever a designer has to realize a digital function, he uses a standard set of ICs along with a minimal set of additional discrete circuitry. Consider a simple example of realizing a function as Q n+1 = Q n + (A B)
Here On, A, and B are Boolean variables, with Q n being the value of Q at the nth time step. Here A B signifies the logical AND of A and B; the + symbol signifies the logical OR of the logic variables on either side. A circuit to realize the function is shown in Figure 2.1. The circuit can be realized in terms of two ICs an A-O-I gate and a flip-flop. It can be directly wired up, tested, and used.
Figure 2.1. A simple digital circuit
With comparatively larger circuits, the task mostly reduces to one of identifying the set of ICs necessary for the job and interconnecting; rarely does one have to resort to a micro level design Wakerly. The accepted approach to digital design here is a mix of the top-down and bottom-up approaches as follows Hill & Peterson. Decide the requirements at the system level and translate them to circuit requirements. Identify the major functional blocks required like timer, DMA unit, register file etc., and say as in the design of a processor. Whenever a function can be realized using a standard IC, use the same for example programmable counter, mux, demux, etc. Whenever the above is not possible, form the circuit to carry out the block functions using standard SSI for example gates, flip-flops, etc. Use additional components like transistor, diode, resistor, capacitor, etc., wherever essential.
Once the above steps are gone through, a paper design is ready. Starting with the paper design, one has to do a circuit layout. The physical location of all the components is tentatively decided; they are interconnected and the circuit-on paper is made ready. Once a paper design is done, a layout is carried out and a net-list
Advanced Encryption Standard prepared. Based on this, the PCB is fabricated and populated and all the populated cards tested and debugged. The procedure is shown as a process flowchart in Figure 2.2
Figure 2.2. Sequence of steps in conventional electronic circuit design At the debugging stage one may encounter three types of problems: Functional mismatch: The realized and expected functions are different. One may have to go through the relevant functional block carefully and locate any error logically. Finally the necessary correction has to be carried out in hardware. Timing mismatch: The problem can manifest in different forms. One possibility is due to the signal going through different propagation delays in two paths and arriving at a point with a timing mismatch. This can cause faulty operation. Another possibility is a race condition in a circuit involving asynchronous feedback. This kind of problem may call for elaborate debugging. The preferred practice is to do debugging at smaller module stages and ensuring that feedback through larger loops is avoided: It becomes essential to check for the existence of long asynchronous loops. Overload: Some signals may be overloaded to such an extent that the signal transition may be unduly delayed or even suppressed. The problem manifests as reflections and erratic behavior in some cases (The signal has to be suitably buffered here.). In fact, overload on a signal can lead to timing mismatches.
Advanced Encryption Standard The above have to be carried out after completion of the prototype PCB manufacturing; it involves cost, time, and also a redesigning process to develop a bug free design. 2.3 VLSI DESIGN The complexity of VLSIs being designed and used today makes the manual approach to design impractical. Design automation is the order of the day. With the rapid technological developments in the last two decades, the status of VLSI technology is characterized by the following Wai-kai, Gopalan. A steady increase in the size and hence the functionality of the ICs. A steady reduction in feature size and hence increase in the speed of operation as well as gate or transistor density. A steady improvement in the predictability of circuit behavior. A steady increase in the variety and size of software tools for VLSI design. The above developments have resulted in a proliferation of approaches to VLSI design. We briefly describe the procedure of automated design flow Rabaey, Smith MJ. The aim is more to bring out the role of a Hardware Description Language (HDL) in the design process. An abstraction based model is the basis of the automated design. 2.3.1 Abstraction Model The model divides the whole design cycle into various domains (see Figure 2.3). With such an abstraction through a division process the design is carried out in different layers. The designer at one layer can function without bothering about the layers above or below. The thick horizontal lines separating the layers in the figure signify the compartmentalization. As an example, let us consider design at the gate level. The circuit to be designed would be described in terms of truth tables and state tables. With these as available inputs, he has to express them as Boolean logic equations and realize them in terms of gates and flip-flops. In turn, these form the inputs to the layer immediately below. Compartmentalization of the approach to design in the manner described here is the essence of abstraction; it is the basis for development and use of CAD tools in VLSI design at various levels.
Advanced Encryption Standard The design methods at different levels use the respective aids such as Boolean equations, truth tables, state transition table, etc. But the aids play only a small role in the process. To complete a design, one may have to switch from one tool to another, raising the issues of tool compatibility and learning new environments. 2.4 ASIC DESIGN FLOW As with any other technical activity, development of an ASIC starts with an idea and takes tangible shape through the stages of development as shown in Figure 2.4 and shown in detail in Figure 2.5. The first step in the process is to expand the idea in terms of behavior of the target circuit. Through stages of programming, the same is fully developed into a design description in terms of well defined standard constructs and conventions.
Figure 2.3 Design domain and levels of abstraction
10
Figure 2.4 Major activities in ASIC design The design is tested through a simulation process; it is to check, verify, and ensure that what is wanted is what is described. Simulation is carried out through dedicated tools. With every simulation run, the simulation results are studied to identify errors in the design description. The errors are corrected and another simulation run carried out. Simulation and changes to design description together form a cyclic iterative process, repeated until an error-free design is evolved. Design description is an activity independent of the target technology or manufacturer. It results in a description of the digital circuit. To translate it into a tangible circuit, one goes through the physical design process. The same constitutes a set of activities closely linked to the manufacturer and the target Technology 2.4.1 Design Description The design is carried out in stages. The process of transforming the idea into a detailed circuit description in terms of the elementary circuit components constitutes design description. The final circuit of such an IC can have up to a billion such components; it is arrived at in a step-by-step manner. The first step in evolving the design description is to describe the circuit in terms of its behavior. The description looks like a program in a high level language like C. Once the behavioral level design description is ready, it is tested extensively with the help of a simulation
11
Advanced Encryption Standard tool; it checks and confirms that all the expected functions are carried out satisfactorily. If necessary, this behavioral level routine is edited, modified, and rerun all done manually. Finally, one has a design for the expected system described at the behavioral level. The behavioral design forms the input to the synthesis tools, for circuit synthesis. The behavioral constructs not supported by the synthesis tools are replaced by data flow and gate level constructs. To surmise, the designer has to develop synthesizable codes for his design.
Figure 2.5 ASIC design and development flow
12
Advanced Encryption Standard The design at the behavioral level is to be elaborated in terms of known and acknowledged functional blocks. It forms the next detailed level of design description. Once again the design is to be tested through simulation and iteratively corrected for errors. The elaboration can be continued one or two steps further. It leads to a detailed design description in terms of logic gates and transistor switches. 2.4.2 Optimization The circuit at the gate level in terms of the gates and flip-flops can be redundant in nature. The same can be minimized with the help of minimization tools. The step is not shown separately in the figure. The minimized logical design is converted to a circuit in terms of the switch level cells from standard libraries provided by the foundries. The cell based design generated by the tool is the last step in the logical design process; it forms the input to the first level of physical design [Micheli]. 2.4.3 Simulation The design descriptions are tested for their functionality at every level behavioral, data flow, and gate. One has to check here whether all the functions are carried out as expected and rectify them. All such activities are carried out by the simulation tool. The tool also has an editor to carry out any corrections to the source code. Simulation involves testing the design for all its functions, functional sequences, timing constraints, and specifications. Normally testing and simulation at all the levels behavioral to switch level are carried out by a single tool; the same is identified as scope of simulation tool in Figure 2.5. 2.4.4 Synthesis With the availability of design at the gate (switch) level, the logical design is complete. The corresponding circuit hardware realization is carried out by a synthesis tool. Two common approaches are as follows: The circuit is realized through an FPGA [Old field]. The gate level design description is the starting point for the synthesis here. The FPGA vendors provide an interface to the synthesis tool. Through the interface the gate level design is realized as a final circuit. With many synthesis tools, one can directly use the design description at the data flow level itself to realize the final circuit through an FPGA.
13
Advanced Encryption Standard The FPGA route is attractive for limited volume production or a fast development cycle. The circuit is realized as an ASIC. A typical ASIC vendor will have his own library of basic components like elementary gates and flip-flops. Eventually the circuit is to be realized by selecting such components and interconnecting them conforming to the required design. This constitutes the physical design. Being an elaborate and costly process, a physical design may call for an intermediate functional verification through the FPGA route. The circuit realized through the FPGA is tested as a prototype. It provides another opportunity for testing the design closer to the final circuit. 2.4.5 Physical Design A fully tested and error-free design at the switch level can be the starting point for a physical design [Baker & Boyce, Wolf]. It is to be realized as the final circuit using (typically) a million components in the foundrys library. The step-bystep activities in the process are described briefly as follows: System partitioning: The design is partitioned into convenient compartments or functional blocks. Often it would have been done at an earlier stage itself and the software design prepared in terms of such blocks. Interconnection of the blocks is part of the partition process. Floor planning: The positions of the partitioned blocks are planned and the blocks are arranged accordingly. The procedure is analogous to the planning and arrangement of domestic furniture in a residence. Blocks with I/O pins are kept close to the periphery; those which interact frequently or through a large number of interconnections are kept close together, and so on. Partitioning and floor planning may have to be carried out and refined iteratively to yield best results. Placement: The selected components from the ASIC library are placed in position on the Silicon floor. It is done with each of the blocks above. Routing: The components placed as described above are to be interconnected to the rest of the block: It is done with each of the blocks by suitably routing the interconnects. Once the routing is complete, the physical design cam is taken as
14
Advanced Encryption Standard complete. The final mask for the design can be made at this stage and the ASIC manufactured in the foundry. 2.4.6 Post Layout Simulation Once the placement and routing are completed, the performance can be computed. Specifications like silicon area, power consumed, path delays, etc., Equivalent circuit can be extracted at the component level and performance analysis carried out. This constitutes the final stage called verification. One may have to go through the placement and routing activity once again to improve performance.
2.4.7
Critical Subsystems The design may have critical subsystems. Their performance may be crucial
to the overall performance; in other words, to improve the system performance substantially, one may have to design such subsystems afresh. The design here may imply redefinition of the basic feature size of the component, component design, placement of components, or routing done separately and specifically for the subsystem. A set of masks used in the foundry may have to be done afresh for the purpose. 2.4.8 Role of HDL An HDL provides the framework for the complete logical design of the ASIC. All the activities coming under the purview of an HDL. Verilog and VHDL are the two most commonly used HDLs today. Both have constructs with which the design can be fully described at all the levels. There are additional constructs available to facilitate setting up of the test bench, spelling out test vectors for them and observing the outputs from the designed unit. IEEE has brought out Standards for the HDLs, and the software tools conform to them. Verilog as an HDL was introduced by Cadence Design Systems; they placed it into the public domain in 1990.
15
Advanced Encryption Standard It was established as a formal IEEE Standard in 1995. The revised version has been brought out in 2001. However, most of the simulation tools available today conform only to the 1995 version of the standard. VHDL used by a substantial number of the VLSI designers today is the used in this project for modeling the design.
16
CHAPTER 3 ALGORITHM SPECIFICATION

The AES algorithm had the length of the input block, the output block and the State is 128 bits. This is represented by Nb = 4, which reflects the number of 32bit words (number of columns) in the State. For the AES algorithm, the length of the Cipher Key, K, is 128, 192, or 256 bits. The key length is represented by Nk = 4, 6, or 8, which reflects the number of 32-bit words (number of columns) in the Cipher Key. For the AES algorithm, the number of rounds to be performed during the execution of the algorithm is dependent on the key size. The number of rounds is represented by Nr, where Nr =10 when Nk = 4, Nr = 12 when Nk = 6, and Nr = 14 when Nk = 8. The only Key-Block-Round combinations that conform to this standard are given in Table 3.0. For implementation issues relating to the key length, block size and number of rounds.
Table 3.0. Key-Block-Round Combinations For both its Cipher and Inverse Cipher, the AES algorithm uses a round function that is composed of four different byte-oriented transformations: 1) Byte substitution using a substitution table (S-box) 2) Shifting rows of the State array by different offsets 3) Mixing the data within each column of the State array 4) Adding a Round Key to the State.
17
Advanced Encryption Standard 3.1 CIPHER At the start of the Cipher, the input is copied to the State array using the conventions. After an initial Round Key addition, the State array is transformed by implementing a round function 10, 12, or 14 times (depending on the key length), with the final round differing slightly from the first Nr -1 rounds. The final State is then copied to the output. The round function is parameterized using a key schedule that consists of a one-dimensional array of four-byte words derived using the Key Expansion routine. The Cipher is described in the pseudo code in (A). The individual transformations SubBytes (), ShiftRows (), MixColumns (), and AddRoundKey () process the State and are described in the following subsections. In (A), the array w [ ] contains the key schedule. As shown in (A), all Nr rounds are identical with the exception of the final round, which does not include the MixColumns () transformation. Cipher (byte in [4*Nb], byte out [4*Nb], word w [Nb*(Nr+1)]) Begin Byte state [4, Nb] State = in AddRoundKey (state, w[0, Nb-1]) For round = 1 step 1 to Nr1 SubBytes (state) ShiftRows (state) MixColumns (state) AddRoundKey (state, w [round*Nb, (round+1)*Nb-1]) End for SubBytes (state) ShiftRows (state) AddRoundKey (state, w [Nr*Nb, (Nr+1)*Nb-1]) Out = state End (A). Pseudo Code for the Cipher
18
Advanced Encryption Standard 3.1.1 Sub Bytes () Transformation The Sub Bytes () transformation is a non-linear byte substitution that operates independently on each byte of the State using a substitution table (S-box). This S-box (Fig. 2.4), which is invertible, is constructed by composing two transformations: 1. Take the multiplicative inverse in the finite field GF (2^8) 2. Apply the following affine transformation (over GF (2)):
For 0 <i < 8, where bi is the ith bit of the byte, and Ci is the ith bit of a byte c with the value {63} or {01100011}. Here and elsewhere, a prime on a variable indicates that the variable is to be updated with the value on the right. In matrix form, the affine transformation element of the S-box can be expressed as:
Below Figure illustrates the effect of the SubBytes () transformation on the State.
19
Figure 3.1. Sub Bytes () applies the S-box to each byte of the State. The S-box used in the SubBytes () transformation is presented in hexadecimal form in Table 3.1.1. For example, if = 1,1 s {53}, then the substitution value would be determined by the intersection of the row with index 5 and the column with index 3 in Table 3.1.1. This would result as an ed.
Table 3.1.1. S-box: substitution values for the byte xy (in hexadecimal format).
20
Advanced Encryption Standard 3.1.2 ShiftRows () Transformation In the Shift Rows () transformation, the bytes in the last three rows of the state are cyclically shifted over different numbers of bytes (offsets). The first row, r = 0, is not shifted. Specifically, the ShiftRows () transformation proceeds as follows:
Where the shift value shift(r, Nb) depends on the row number, r, as follows (recall that Nb = 4):
This has the effect of moving bytes to lower positions in the row (i.e., lower values of C in a given row), while the lowest bytes wrap around into the top of the row (i.e., higher values of C in a given row).Figures 3.2 illustrates the ShiftRows () transformation
Figure 3.2. Shift Rows () cyclically shifts the last three rows in the state.
21
Advanced Encryption Standard 3.1.3 Mix Columns () Transformation The Mix Columns () transformation operates on the state column-by-column, treating each column as a four-term polynomial. The columns are considered as polynomials over GF (2^8) and multiplied modulo x^4 + 1 with a fixed polynomial a(x), given by
This can be written as a matrix multiplication. Let
As a result of this multiplication, the four bytes in a column are replaced by the following:
Figure 3.3 illustrates the MixColumns () transformation.
22
Figure 3.3 MixColumns () operates on the state column-by-column. 3.1.4 AddRoundKey () Transformation In the AddRoundKey () transformation, a Round Key is added to the state by a simple bitwise XOR operation. Each Round Key Consists of Nb words from the key schedule. Those Nb words are each added into the columns of the state, such that
Where [WI] are the schedule words and round is a value in the range 0 < round < Nr. In the Cipher, the initial Round Key addition occurs when round = 0, prior to the first application of the round function (see fig.2.2). The application of the addRoundKey () transformation to the Nr rounds of the Cipher occurs when 1 round Nr. The action of this transformation is illustrated in (A), where l = round * Nb. The byte address within words of the key schedule.
23
Figure 3.4. AddRoundKey () Xor each column of the state with a word from the key schedule. 3.2 KEY EXPANSION The AES algorithm takes the Cipher Key, K, and performs a Key Expansion routine to generate a key schedule. The Key Expansion generates a total of Nb (Nr + 1) words: the algorithm requires an initial set of Nb words, and each of the Nr rounds requires Nb words of key data. The resulting key schedule consists of a linear array of 4-byte words, denoted [wi], with i in the range 0 i < Nb (Nr + 1). The expansion of the input key into the key schedule proceeds according to the pseudo code in (B). SubWord () is a function that takes a four-byte input word and applies the S-box to each of the four bytes to produce an output word. The function RotWord () takes a word [a0, a1, a2, a3] as input, performs a cyclic permutation, and returns the word [a1, a2, a3, a0]. The round constant word array, Rcon[i], contains the values given by [xi-1, {00},{00},{00}], with X i-1 being powers of x (x is denoted as {02}) in the field GF (28). From (B), it can be seen that the first Nk words of the expanded key are filled with the Cipher Key. Every following word, w[i], is equal to the XOR of the previous word, w[i-1], and the word Nk positions earlier, w[i-Nk]. For words in positions that are a multiple of Nk, a transformation is applied to w [i1] prior to the XOR, followed by an XOR with a round constant, Rcon[i]. This transformation consists of a cyclic shift of the bytes in a word (RotWord ()), followed by the application of a table lookup to all four bytes of the word (SubWord ()).
24
It is important to note that the Key Expansion routine for 256-bit Cipher Keys (Nk = 8) is slightly different than for 128- and 192-bit Cipher Keys. If Nk = 8 and i-4 is a multiple of Nk, then SubWord () is applied to w [i-1] prior to the XOR. Key Expansion (byte key [4*Nk], word w [Nb*(Nr+1)], Nk) Begin Word temp i=0 While (i < Nk) W [i] = word (key [4*i], key [4*i+1], key [4*i+2], key [4*i+3]) i = i+1 End while i = Nk While (i < Nb * (Nr+1)] Temp = w [i-1] If (i mod Nk = 0) Temp = SubWord (RotWord (temp)) xor Rcon [i/Nk] Else if (Nk > 6 and i mod Nk = 4) Temp = SubWord (temp) End if W [i] = w [i-Nk] Xor temp i=i+1 End while End Note that Nk=4, 6, and 8 do not all have to be implemented; They are all included in the conditional statement above for conciseness. (B). Pseudo Code for Key Expansion 3.3 INVERSE CIPHER The Cipher transformations can be inverted and then implemented in reverse order to produce a straightforward Inverse Cipher for the AES algorithm. The individual transformations used in the Inverse Cipher - InvShiftRows (), InvSubBytes (), InvMixColumns (), and AddRoundKey () process the State and are described in the following subsections.
25
Advanced Encryption Standard The Inverse Cipher is described in the pseudo code(C). In (C), the array w [ ] contains the key schedule. InvCipher (byte in [4*Nb], byte out [4*Nb], word w [Nb*(Nr+1)]) Begin Byte state [4, Nb] State = in AddRoundKey (state, w [Nr*Nb, (Nr+1)*Nb-1]) For round = Nr-1 step -1 downto 1 InvShiftRows (state) InvSubBytes (state) AddRoundKey (state, w [round*Nb, (round+1)*Nb-1]) InvMixColumns (state) end for InvShiftRows (state) InvSubBytes (state) AddRoundKey (state, w [0, Nb-1]) Out = state End (C) Pseudo Code for the Inverse Cipher. 3.3.1 InvShiftRows () Transformation InvShiftRows () is the inverse of the ShiftRows () transformation. The bytes in the last three rows of the State are cyclically shifted over different numbers of bytes (offsets). The first row, r = 0, is not shifted. The bottom three rows are cyclically shifted by Nb - shift(r, Nb) bytes, where the shift value shift(r, Nb) depends on the row number, and is given in below equation. Specifically, the InvShiftRows () transformation proceeds as follows:
Below figure illustrates the Invshiftrows () transformation.
26
Figure 3.5. InvshiftRows () cyclically shifts the last three rows in the state. 3.3.2 InvSubBytes () Transformation InvSubBytes () is the inverse of the byte substitution transformation, in which the inverse Sbox is applied to each byte of the State. This is obtained by applying the inverse of the affine transformation followed by taking the multiplicative inverse in GF (2^8). The inverse S-box used in the InvSubBytes () transformation is presented in Table 3.3.2:
27
Table 3.3.2 Inverse S-box: substitution values for the byte xy (In hexadecimal format).
3.3.3 InvMixColumns () transformation InvMixColumns () is the inverse of the MixColumns () transformation. InvMixColumns () operates on the state column-by-column, treating each column as a four-term polynomial. The columns are considered as polynomials over GF (2^8) and multiplied modulo x^4 + 1 with a fixed polynomial a^-1(x), given by
This can be written as a matrix multiplication. Let
As a result of this multiplication, the four bytes in a column are replaced by the following:
28
3.3.4 Inverse of the AddRoundKey () Transformation AddRoundKey (), which was its own inverse, since it only, involves an application of the XOR operation. 3.3.5 Equivalent Inverse Cipher In the straightforward Inverse Cipher presented and (C), the sequence of the transformations differs from that of the Cipher, while the form of the key schedules for encryption and decryption remains the same. However, several properties of the AES algorithm allow for an Equivalent Inverse Cipher that has the same sequence of transformations as the Cipher (with the transformations replaced by their inverses). This is accomplished with a change in the key schedule. The two properties that allow for this Equivalent Inverse Cipher are as follows: 1. The SubBytes () and ShiftRows () transformations commute; that is, a SubBytes () transformation immediately followed by a ShiftRows () transformation is equivalent to a ShiftRows () transformation immediately Followed buy a SubBytes () transformation. The same is true for their inverses, InvSubBytes () and InvShiftRows. 2. The column mixing operations MixColumns () and InvMixColumns () are linear with respect to the column input, which means InvMixColumns (state XOR Round Key) = InvMixColumns (state) XOR InvMixColumns (Round Key). These properties allow the order of InvSubBytes () and InvShiftRows () Transformations to be reversed. The order of the AddRoundKey () and
29
Advanced Encryption Standard InvMixColumns () transformations can also be reversed, provided that the columns (words) of the decryption key schedule are modified using the InvMixColumns () transformation. The equivalent inverse cipher is defined by reversing the order of the InvSubBytes () and InvShiftRows () transformations shown in (C), and by reversing the order of the AddRoundKey () and InvMixColumns () transformations used in the round loop after first modifying the decryption key schedule for round = 1 to Nr-1 using the InvMixColumns () transformation. The first and last Nb words of the decryption key schedule shall not be modified in this manner. Given these changes, the resulting Equivalent Inverse Cipher offers a more efficient structure than the Inverse Cipher and (C). Pseudo code for the Equivalent Inverse Cipher appears in (D). (The word array dw [ ] contains the modified decryption key schedule. The modification to the Key Expansion routine is also provided in (D).) EqInvCipher (byte in [4*Nb], byte out[4*Nb], word dw[Nb*(Nr+1)]) Begin Byte state [4, Nb] State = in AddRoundKey (state, dw [Nr*Nb, (Nr+1)*Nb-1]) For round = Nr-1 step -1 downto 1 InvSubBytes (state) InvShiftRows (state) InvMixColumns (state) AddRoundKey (state, dw [round*Nb, (round+1)*Nb-1]) End for InvSubBytes (state) InvShiftRows (state) AddRoundKey (state, dw[0, Nb-1]) Out = state End For the Equivalent Inverse Cipher, the following pseudo code is added at The end of the Key Expansion routine. For i = 0 step 1 to (Nr+1)*Nb-1
30
dw[i] = w[i] end for for round = 1 step 1 to Nr-1 InvMixColumns (dw [round*Nb, (round+1)*Nb-1]) // note change of type end for Note that, since InvMixColumns operates on a two-dimensional array of bytes while the Round Keys are held in an array of words, the call to InvMixColumns in this code sequence involves a change of type (i.e. the input to InvMixColumns () is normally the State array, which is considered to be a two-dimensional array of bytes, whereas the input here is a Round Key computed as a one-dimensional array of words). (D).Pseudo Code for the Equivalent inverse Cipher. 3.4 IMPLEMENTATION ISSUES An implementation of the AES algorithm shall support at least one of the three keylengths specified is 128, 192, or 256 bits (i.e., Nk = 4, 6, or 8, respectively). Implementations may optionally support two or three key lengths, which may promote the interoperability of algorithm implementations. 3.4.2 Keying Restrictions No weak or semi-weak keys have been identified for the AES algorithm, and there is no restriction on key selection. 3.4.3 Parameterization of Key Length, Block Size, and Round Number This standard explicitly defines the allowed values for the key length (Nk), block size (Nb), and number of rounds (Nr) see in (C). However, future reaffirmations of this standard could include changes or additions to the allowed values for those parameters. Therefore, implementers may choose to design their AES implementations with future flexibility in mind.
3.4.1 Key Length Requirements
31
3.4.4 Implementation Suggestions Regarding Various Platforms Implementation variations are possible that may, in many cases, offer performance or other advantages. Given the same input key and data (plaintext or ciphertext), any implementation that produces the same output (ciphertext or plaintext) as the algorithm specified in this standard is an acceptable implementation of the AES.
32
CHAPTER 4 DESIGN IMPLEMENTATION

The block diagram of Advanced Encryption Standard is given in Figure 4.1.
Figure 4.1. Advanced Encryption Standard.
33
Advanced Encryption Standard As shown in block diagram, the Advanced Encryption Standard by Rijndael contains Encryption and Decryption two parts. Description: Block Diagram For encryption plaintext and key of size is 128 bits, that are two inputs for that and cipher text is the output of size is128 bits. Mainly in encryption it is having 10 rounds of arithmetic operation i.e. consists of substitute bytes, shift rows, mix columns, add round key block operations. Living to this before first round we have to do add round key operation. In 10th round we will not do mix column operation living to that all other operation we will do it in that round. For decryption cipher text (i.e. encryption output) and key of size is 128 bits are the two inputs. Finally plaintext is the output of decryption block i.e. nothing but an encryption input only. In this block also we will do same operation only but in a reverse order. 4.1 TOP BLOCK DIAGRAM The top block diagram for AES is shown in below diagram.
Figure 4.2.TOP BLOCK DIAGRAM
34
Advanced Encryption Standard As shown in above diagram, AES top block diagram consists of control FSM, encryption, decryption, add round key, key expansion. Control FSM block will controls the whole operation for this top block diagram. For encryption, decryption, key expansion and add round key input data will be given by this control unit only. For encryption, input plain text and round key of 128 bit data will be given by control unit, output from encryption will be given back to control unit. Same thing it is done for decryption also. 4.4.1 Encryption Top Block
Figure 4.3 encryption loop block diagram As shown in above diagram, Encryption loop block diagram consists of Mux, Round Encryption, and PIPO nbit. The Mux will select Add round out0 or PIPO out depending upon selection line as an output. The complete encryption block diagram will be repeated for 10 rounds. For the 1st round selection line will kept to 0, from next round onwards selection line will be kept 1.that round encryption executes for 10 times, it will receive different keys for each round from control FSM. The PIPO nbit will receive
35
Advanced Encryption Standard output from round encryption, which will be passed to mux as an input for 9 rounds after that it will be given to encryption as an output. 4.4.2 Round Encryption Block
Figure 4.4 Round encryption block diagram As shown in above diagram, Round Encryption block diagram consists of S BOX RAM, SHIFT REG, MIX COLUMN and ADD ROUND KEY. S box ram is a predefined table, first whatever input will be received as an input to round encryption block will be given to s box ram, after words that result will be given to shift register to shift according to shift operation. Then that result will be given to matrix multiplication, after matrix multiplication that result will be given to add round key to do XOR operation. 4.4.3 Control FSM
36
IDLE
START00
START01
START02
START03
START04
ROUND0
ROUND11
ROUND12
ROUND13 .repeated for 10 times
ROUND101 11111 ROUND102
ROUND103
37
ROUND_DEC00
ROUND_DEC01
ROUND_DEC11
ROUND_DEC12
ROUND_DEC13
Same steps will be repeated for 10 times ..
ROUND_DEC13
ROUND_DEC13
ROUND_DEC13
ROUND_DEC13
The above diagram will represent the control FSM, mainly from start00 to start04 in that state it will receive the 128-bit input data and key as a 32-bit data in 4 states. In round0 state the 128-bit of plain text and 128-bit of round key, will be given to encryption operation. At the same time that round key will be given to key generation block also to generate different keys for different rounds. Where as in round11 the output of add round key is assigned to encryption as a input, the round key generated from key generation will be given as a key, selection line will be kept
38
Advanced Encryption Standard for the first time as a 0 to select a plain text as a input and mix_column_en this should be kept 1 to do matrix multiplication. Like above operation that will be repeated for 10 times. Same steps will be repeated for decryption in reverse order. 4.4.4 Decryption Top Block
Figure 4.6 Decryption block diagram 4.2 THE ADDROUNDKEY OPERATION: In this operation, a Round Key is applied to the state by a simple bitwise XOR. The Round Key is derived from the Cipher Key by the means of the key schedule. The Round Key Length is equal to the block key length (=16 bytes). A0,0 a0,1 a0,2 a0,3 A1,0 a1,1 a1,2 a1,3 A2,0 a2,1 a2,2 a2,3 A3,0 a3,1 a3,2 a3,3 XOR k0,0 k0,1 k0,2 k0,3 k1,0 k1,1 k1,2 k1,3 = k2,0 k2,1 k2,2 k2,3 k3,0 k3,1 k3,2 k3,3 b0,0 b0,1 b0,2 b0,3 b1,0 b1,1 b1,2 b1,3 b2,0 b2,1 b2,2 b2,3 b3,0 b3,1 b3,2 b3,3
Where: b(i,j) = a(i,j) XOR k(i,j)
39
A graphical representation of this operation can be seen below
Figure 4.7: add round key 4.3 THE SHIFT ROW OPERATION: In this operation, each row of the state is cyclically shifted to the left, depending on the row index. The 1st row is shifted 0 positions to the left. The 2nd row is shifted 1 position to the left. The 3rd row is shifted 2 positions to the left. The 4th row is shifted 3 positions to the left. a0,0 a0,1 a0,2 a0,3 a2,0 a2,1 a2,2 a2,3 a3,0 a3,1 a3,2 a3,3 a0,0 a0,1 a0,2 a0,3 a2,2 a2,3 a2,0 a2,1 a3,3 a3,0 a3,1 a3,2
a1,0 a1,1 a1,2 a1,3 -> a1,1 a0,2 a1,3 a1,0
A graphical representation of this operation can be found below:
40
Figure 4.8: shift operation Please note that the inverse of Shift Row is the same cyclically shift but this time to the right. It will be needed later for decoding. 4.4 THE SUBBYTES OPERATION: The Sub Bytes operation is a non-linear byte substitution, operating on each byte of the state independently. The substitution table (S-Box) is invertible and is constructed by the composition of two transformations: 1. Take the multiplicative inverse in Rijndael's finite field. 2. Apply an affine transformation which is documented in the Rijndael documentation. A graphical representation of this operation can be found below:
41
Figure 4.9 Byte Substitution Since the S-Box is independent of any input, pre-calculated forms are used, if enough memory (256 bytes for one S-Box) is available. Each byte of the state is then substituted by the value in the S-Box whose index corresponds to the value in the state: a (i,j) = Sbox [a(i,j)] Please note that the inverse of Sub Bytes is the same operation, using the inversed S-Box, which is also precalculated. 4.5 THE MIXCOLUMN OPERATION: I will keep this section very short since it involves a lot of very advance mathematical calculations in the Rijndael's finite field. All you have to know is that it corresponds to the matrix multiplication with:
42
2311 1231 1123 3112

Figure 4.10: mixColumn operation
And that the addition and multiplication operations are a little different from the normal ones. You can skip this part if you are not interested in the math involved. Addition and Substraction: Addition and subtraction are performed by the exclusive or operation. The two operations are the same; there is no difference between addition and subtraction. Multiplication in Rijndael's Galois field is a little more complicated. The procedure is as follows: Take two eight-bit numbers, a and b, and an eight-bit product p. Set the product to zero.
43
Advanced Encryption Standard Make a copy of a and b, which we will simply call a and b in the rest of this algorithm. Run the following loop eight times: 1) If the low bit of b is set, exclusive or the product p by the value of a. 2) Keep track of whether the high (eighth from left) bit of a is set to one. 3) Rotate a one bit to the left, discarding the high bit, and making the low bit have a value of zero. 4) If a's hi bit had a value of one prior to this rotation, exclusive or a with the hexadecimal Number 0x1b 5) Rotate b one bit to the right, discarding the low bit, and making the high The product p now has the product of a and b
(eighth from Left) bit has a value of zero.
4.6 THE RIJNDAEL KEY SCHEDULE: The Key Schedule is responsible for expanding a short key into a larger key, whose parts are used during the different iterations. Each key size is expanded to a different size: A 128 bit key is expanded to a 176 byte key. A 192 bit key is expanded to a 208 byte key. A 256 bit key is expanded to a 240 byte key. There is a relation between the cipher key size, the number of rounds and the expanded Key size. For an 128-bit key, there is one initial AddRoundKey operation plus there are 10 rounds and each round needs a new 16 byte key, therefore we require 10+1 RoundKeys of 16 byte, which equals 176 byte. The same logic can be applied to the two other cipher key sizes. The general formula is that:
44
Figure 4.11: key generation block ExpandedKeySize = (nbrRounds+1) * Block Size The Key Schedule is made up of iterations of the Key schedule core, which works on 4-byte words. The core uses a certain number of operations, which are explained here:
45
4.6.1 Rotate: The 4-byte word is cyclically shifted 1 byte to the left: 1d 2c 3a 4f -->> 2c 3a 4f 1d
4.6.2 Rcon: This section is again extremely mathematical and I recommend everyone who is interested to read this description. Just note that the Rcon values can be precalculated, which results in a simple substitution (a table lookup) in a fixed Rcon table (again, Rcon can also be calculated on-the-fly if memory is a design constraint.) 4.6.3 S-Box: The Key Schedule uses the same S-Box substitution as the main algorithm body. The key schedule core: Now that we know what the operations are, let me show you the key schedule core (in pseudo-C): Keyschedulecore (word) { Rotate (word); Sboxsubstitution (word); Word [0] = word [0] XOR RCON[i]; }
In the above code, word has a size of 4 bytes and I is the iteration counter from the key schedule. 4.6.4 The key Expansion:
46
Advanced Encryption Standard First, let me show you the key Expansion function as you can find it in the Rijndael documentation (there are 2 versions, one for key size 128, 192 and one for key size 256): Keyexpansion (byte key [4*nk] word w [nb*(nr+1)]) { For (I = 0; I <nk; i++) W[i] = (key [4*i], key [4*i+1], key [4*i+2], key [4*i+3]); For (I = nk; I < nb * (nr + 1); i++) { Temp = w [I 1]; If (I % nk == 0) Temp = subbyte (rot byte (temp)) ^ rcon [I / nk]; W[i] = w[I nk] ^ temp; } } Nk is the number of columns in the cipher key (128-bit -> 4, 192-bit -> 5, W is of type word, which is 4-bytes. 256-bit -> 6).
Let me try to explain this in an easier understandable way: The first n bytes of the expanded key are simply the cipher key ( n = the The rcon value I is set to 1. Until we have enough bytes of expanded key, we do the following to
size of the encryption key)
generate n more bytes of expanded key (please note once again that n is used here, this varies depending on the key size) 1. we do the following to generate four bytes we use a temporary 4-byte word called t we assign the previous 4 bytes to t
47
Advanced Encryption Standard we perform the key schedule core on t, with i as rcon value we increment i we XOR t with the 4-byte word n bytes before in the expanded Key (where n
is once either 16,24 or 32 bytes) 2. we do the following x times to generate the next x*4 bytes of the we assign the previous 4-byte word to t we XOR t with the 4-byte word n bytes before in the expandedKey (where n
expandedKey (x = 3 for n=16,32 and x = 5 for n=24)
is once either either 16,24 or 32 bytes) 3. 4. if n = 32 (and ONLY then), we do the following to generate 4 more bytes we assign the previous 4-byte word to t We run each of the four bytes in t through Rijndael's S-box we XOR t with the 4-byte word 32 bytes before in the expandedKey if n = 32 (and ONLY then), 4. We do the following three times to generate We assign the previous 4-byte word to t We XOR t with the 4-byte word 32 bytes before in the expandedKey .We now has our expandedKey Don't worry if you still have problems understanding the Key Schedule, you'll see that the implementation isn't very hard. What you should note is that: the part in red is only for cipher key size = 32 for n=16, we generate: 4 + 3*4 bytes = 16 bytes per iteration for n=24, we generate: 4 + 5*4 bytes = 24 bytes per iteration for n=32, we generate: 4 + 3*4 + 4 + 3*4 = 32 bytes per iteration The implementation of the key schedule is pretty straight forward, but since there is a lot of code repetition, it is possible to optimize the loop slightly and use the modulo operator to check when the additional operations have to be made.
twelve more bytes.
48
4.7 THE KEY SCHEDULE We will start the implementation of AES with the Cipher Key expansion. As you can read in the theoretical part above, we intend to enlarge our input cipher key, whose size varies between 128 and 256 bits into a larger key, from which different RoundKeys can be derived. I prefer to implement the helper functions (such as rotate, Rcon or S-Box first), test them and then move on to the larger loops. If you are not a fan of bottom-up approaches, feel free to start a little further in this tutorial and move your way up, but I felt that my approach was the more logical one here. 4.8 GENERAL COMMENTS Even though some might think that integers were the best choice to work with, since their 32 bit size best corresponds one word, I strongly discourage you from using integers. You wrongly assume that an integer, or more specifically the int type, always has 4 bytes. However, the required ranges for signed and unsigned int are identical to those for signed and unsigned short. On compilers for 8 and 16 bit processors (including Intel x86 processors executing in 16 bit mode, such as under MS-DOS), an int is usually 16 bits and has exactly the same representation as a short. On compilers for 32 bit and larger processors (including Intel x86 processors executing in 32 bit mode, such as Win32 or Linux) an in is usually 32 bits long and has exactly the same representation as a long. For this very reason, we will be using unsigned chars, since the size of a char (which is called CHAR_BIT and defined in limits.h) is required to be at least 8. Jack Klein wrote: Almost all modern computers today use 8 bit bytes (technically called octets, but there are still some in production and use with other sizes, such as 9 bits. Also some processors (especially Digital Signal Processors) cannot efficiently access memory in smaller pieces than the processor's word size. There is at least one DSP I
49
Advanced Encryption Standard have worked with where CHAR_BIT is 32. The char types, short, int and long are all 32 bits. Since we want to keep our code as portable as possible and since it is up to the compiler to decide if the default type for char is signed or not, we will specify unsigned char throughout the entire code. 4.9 IMPLEMENTATION 4.9.1 S-Box The S-Box values can either be calculated on-the-fly to save memory or the pre-calculated values can be stored in an array. Since I assume that every machine my code runs on will have at least 2x 256bytes (there are 2 S-Boxes, one for the encryption and one for the decryption) we will store the values in an array. Additionally, instead of accessing the values immediately from our program, I'll wrap a little function around which makes for a more readable code and would allow us to add additional code later on. Of course, this is a matter of taste; feel free to access the array immediately. 4.9.2 Rotate From the theoretical part, you should know already that Rotate takes a word (a 4-byte array) and rotates it 8 bit to the left. Since 8 bit correspond to one byte and our array type is character (whose size is one byte), rotating 8 bit to the left corresponds to shifting cyclically the array values one to the left. 4.9.3 Rcon Same as with the S-Box, the Rcon values can be calculated on-the-fly but once again I decide to store them in an array since they only require 255 bytes of space. To keep in line with the S-Box implementation, I write a little access function. 4.9.4 Key Schedule Core The implementation of the Key Schedule Core from the pseudo-C is pretty easy. All the code does is applying the operations one after the other on the 4-byte
50
Advanced Encryption Standard word. The parameters are the 4-byte word and the iteration counter, on which Rcon depends. 4.9.5 Key Expansion The Key Expansion is where it all comes together. As you can see in the pretty big list in the theory about the Rijndael Key Expansion, we need to apply several operations a number of times, depending on they key size. As the key size can only take a very limited number of values, I decided to implement it as an enumeration type. Not only does that limit the key size to only three possible values, it also makes the code more readable. Our key expansion function basically needs only two things: The input cipher key The output expanded key
4.9.6 AES Encryption To implement the AES encryption algorithm, we proceed exactly the same way as for the key expansion, that is, we first implement the basic helper functions and then move up to the main loop. The functions take as parameter a state, which is, as already explained, a rectangular 4x4 array of bytes. We won't consider the state as a 2-dimensional array, but as a 1-dimensional array of length 16. 4.9.7 Sub Bytes There isn't much to say about this operation, it's a simple substitution with the S-Box value: 4.9.8 Shift Rows I decided to split this function in two parts, not that it wasn't possible to do it all in one go, but simply because it was easier to read and debug. The Shift Rows function iterates over all the rows and then calls shift Row with the correct offset. Shift Row does nothing but to shift a 4-byte array by the given offset. 4.9.9 AddRoundKey
51
Advanced Encryption Standard This is the part that involves the roundKey we generate during each iteration. We simply XOR each byte of the key to the respective byte of the state.
4.9.10 MixColumns MixColumns is probably the most difficult operation of the 4. It involves the Galois addition and multiplication and processes columns instead of rows (which is unfortunate since we use a linear array that represents the rows). First of all, we need a function that multiplies two numbers in the Galois field. Once again, I decided to split the function in 2 parts, the first one would generate a column and then call mixColumn, which would then apply the matrix multiplication. The mixColumn is simply a Galois multiplication of the column with the 4x4 matrix provided in the theory. Since an addition corresponds to a XOR operation and we already have the multiplication function, the implementation is rather simple: 4.9.11 The Main AES body Now that we have all the small functions, the main loop gets really easy. All we have to do is take the state, the expandedKey and the number of rounds as parameters and then call the operations one after the other. A little function called create RoundKey () is used to copy the next 16 bytes from the expandedKey into the roundKey, using the special mapping order. 4.9.12 AES Encryption Finally, all we have to do is put it all together. Our parameters are the input plaintext, the key of size key Size and the output. First, we calculate the number of rounds based on they key Size and then the expanded Key Size based on the number of rounds. Then we have to map the 16 byte input plaintext in the correct order to the 4x4 byte state (as explained above), expand the key using our key schedule, encrypt the state using our main AES body and finally unmap the state again in the correct order to get the 16 byte output cipher text. 4.9.13 AES Decryption
52
Advanced Encryption Standard If you managed to understand and implement everything up to this point, you shouldn't have any problems getting the decryption to work either. Basically, we inverse the whole encryption and apply all the operations backwards. As the key schedule stays the same, the only operations we need to implement are the inversed subBytes, ShiftRows and mixColumns, while addRoundKey stays the same. Apart from the inversed mix Columns operation, the other operations are trivial and I provide you the code. As you can see, they are nearly identical to their encryption counterpart; except that the rotation this time is to the right and that we use the inversed S-Box for the substitution. As for the inversed mixColumns operation, the only difference is the multiplication matrix.
53
CHAPTER 5 SIMULATION RESULTS

5.1 Encryption block 5.1.1 Simulation result for Round Encryption:
FIG 5.1 result for Round Encryption As shown in above diagram, it will represent the simulation result for Encryption block. As output shown in hexadecimal in round_out. Input had been given of 128 bits (32 bits as hexadecimal) for data block and with key of 128 bits (32 bits as hexadecimal) from standard FIPS-197.After simulation, and then we can observe the final output as round_out of 128 bits (32 bits as hexadecimal) for round encryption. For output verification check in FIPS-197
54
5.2 Decryption Block 5.2.2 Simulation result for Round Decryption
FIG 5.2.2 result for Round Decryption
As shown in above diagram, it will represent the simulation result for Decryption block. As output shown in hexadecimal in round_out. Input had been given of 128 bits (32 bits as hexadecimal) for data block and with key of 128 bits (32 bits as hexadecimal) from standard FIPS-197.After simulation, and then we can observe the final output as round_out of 128 bits (32 bits as hexadecimal) for round decryption. For output verification check in FIPS-197
55
5.3 TOP AES BLOCK:
Figure 5.3 result for top AES block As shown in above diagram, it will represent the simulation result for top AES block. For top block plaintext and enter key are the 2 input of each is 32-bit.but for encryption it will require 128-bit, to make it 128-bit we have to group it 4 times. After encryption that output will be given to decryption as an input, the decryption output as a decryption out as shown in above figure.
56
CHAPTER 6 SYNTHESIS REPORT

Final Synthesis Report of Advanced Encryption Standard (AES) Description: Bidirectional Port Resolution Performing bidirectional port resolution... Synthesizing Unit <s_box1>. Related source file is "C:/Documents and Settings/aurora/Desktop/aes_xilinx/s_box1.vhd". Found 256x8-bit ROM for signal <s_box_out>. Summary: inferred 1 ROM(s). Unit <s_box1> synthesized.
Synthesizing Unit <shift_reg>. Related source file is "C:/Documents and Settings/aurora/Desktop/aes_xilinx/shift_reg.vhd". Found 128-bit register for signal <reg_out>. Summary: inferred 128 D-type flip-flop(s). Unit <shift_reg> synthesized. Synthesizing Unit <add_rnd_key>. Related source file is "C:/Documents and Settings/aurora/Desktop/aes_xilinx/add_rnd_key.vhd".
57
Advanced Encryption Standard Unit <add_rnd_key> synthesized. Synthesizing Unit <mux>. Related source file is "C:/Documents and Settings/aurora/Desktop/aes_xilinx/mux.vhd". Unit <mux> synthesized.
Synthesizing Unit <poly_multe>. Related source file is "C:/Documents and Settings/aurora/Desktop/aes_xilinx/poly_multe.vhd". Found 1-bit xor2 for signal <temp1$xor0000> created at line 22. Found 1-bit xor2 for signal <temp1$xor0001> created at line 22. Found 1-bit xor2 for signal <temp1$xor0002> created at line 22. Found 1-bit xor2 for signal <temp2$xor0000> created at line 23. Found 1-bit xor2 for signal <temp2$xor0001> created at line 23. Found 1-bit xor2 for signal <temp2$xor0002> created at line 23. Found 1-bit xor2 for signal <temp2$xor0003> created at line 23. Found 1-bit xor2 for signal <temp2$xor0004> created at line 23. Found 1-bit xor2 for signal <temp2$xor0005> created at line 23. Found 1-bit xor2 for signal <temp2$xor0006> created at line 23. Found 1-bit xor2 for signal <temp2$xor0007> created at line 23. Unit <poly_multe> synthesized.
Synthesizing Unit <mix_column>. Related source file is "C:/Documents and Settings/aurora/Desktop/aes_xilinx/mix_column.vhd". Found 128-bit xor4 for signal <dataout_sig>. Summary: Inferred 128 Xor(s). Unit <mix_column> synthesized.
58
Synthesizing Unit <round_encryption>. Related source file is "C:/Documents and Settings/aurora/Desktop/aes_xilinx/round_encryption.vhd". Found 4-bit up counter for signal <count>. Summary: Inferred 1 Counter(s). Unit <round_encryption> synthesized.
============================================================= ============ HDL Synthesis Report Macro Statistics # ROMs 256x8-bit ROM # Counters 4-bit up counter # Registers 1-bit register # Xors 1-bit xor2 1-bit xor4 =========== Advanced HDL Synthesis Report Macro Statistics # ROMs 256x8-bit ROM # Counters 4-bit up counter # Registers : 16 : 16 :1 :1 : 128 : 16 : 16 :1 :1 : 128 : 128 : 480 : 352 : 128
59
Advanced Encryption Standard Flip-Flops # Xors 1-bit xor2 1-bit xor4 Final Report Representation: * Final Report ========== Clock Information: Clock Signal clk count_3 | Clock buffer (FF name) | Load | | BUFGP |4 | | * : 128 : 480 : 352 : 128
=============================================================
-----------------------------------+------------------------+-------+ | NONE (U18/reg_out_73) | 1
-----------------------------------+------------------------+-------+ INFO: Xst: 2169 - HDL ADVISOR - Some clock signals were not automatically buffered by XST with BUFG/BUFR resources. Please use the buffer_type constraint in order to insert these buffers to the clock signals to help prevent skew problems. Asynchronous Control Signals Information: ---------------------------------------Control Signal rst | Buffer(FF name) | IBUF Timing Summary: ------------Speed Grade: -4 Minimum period: 2.610ns (Maximum Frequency: 383.142MHz) Minimum input arrival time before clock: No path found Maximum output required time after clock: 2.352ns |5 | Load | |
-----------------------------------+------------------------+-------+ -----------------------------------+------------------------+-------+
60
Advanced Encryption Standard Maximum combinational path delay: 2.917ns ============================================================= Process "Synthesize" completed successfully
CONCLUSIONS AND FUTURE WORK

Conclusions: In this project both the Encryption and Decryption sections of AES have been implemented on FPGA using Very High Speed Integrated Circuit Hardware Description Language (VHDL).The corresponding outputs for each block, both in Encryption and Decryption have been noted. A secure and effective Advanced Encryption Standard is established. The FPGA design will operate with an average encipher-decipher maximum frequency (including key expansion) of 409.333 MHZ. In this project in place of S_BOX we used it SRAM memory blocks to reduce the number of slices and LUTs.
The Modelsim simulator is used to simulate the design at various stages. Xilinx synthesis tool (XST) is used to synthesize the design for spartan3E family FPGA (XC3S500E). In conclusion, it can be said that Due to its general framework this system finds wide application in variety of applications. Future Work: Optimize the system to accommodate different key and data lengths Delay and Power estimation Optimize the design in synthesis
61
BIBLIOGRAPHY/ REFERENCES
[1] X. Zhang and K. K. Parhi, High-speed VLSI architectures for the AES algorithm, IEEE Trans. Very Large Scale Integer. (VLSI) Syst., vol. 12, no. 9, pp. 957967, Sep. 2004. [2] Tim Good, Very Small FPGA Application-Specific Instruction Processor for AES, IEEE Trans. Very Large Scale Integer. (VLSI) syst., vol. 53, NO. 7, JULY 2006. [3] Chi-Jeng Chang 8-bit AES FPGA Implementation using Block RAM, IEEE trans. Very Large Scale Integer. (VLSI) syst., Nov 2007. [4] A. Satoh, S. Morioka, K. Takano, and S. Munetoh, "A Compact Rijndael Hardware ArchitectureWith S-Box Optimization," in Proc. LNCS ASIACRYPT'01, vol. 2248, pp. 239-254, Dec. 2001. [5] Alireza Hodjat, Ingrid Verbauwhede,"Minimum Area Cost for a 30 to 70 Gbits/s AES Processor", IEEE Computer society Annual Symposium VLSI, 2004. Proceedings., Page(s):83 - 88, Feb. 2004. [6] Ricardo Chaves, Georgi Kuzmanov, Stamatis Vassiliadis, Leonel Sousa, "Reconfigurable Memory Based AES Co-Processor", IPDPS 2006. 20th International Parallel and Distributed Processing Symposium, Page(s):8 pp, April 2006 . [7] Pawel Chodowiec, Kris Gaj,"Very Compact FPGA Implementation of the AES Algorithm", CHES 2003, LNCS 2779, pp. 319-333, 2003. [8] G. Rouvroy, F.-X. Standaert, J.-J. Quisquater, J.-D. Legat,"Compact and efficient encryption/ decryption module for FPGA implementation of the AES Rijndael very well suited for small embedded applications", Information Technology Coding and Computing, 2004. Proceedings. ITCC 2004, Volume 2, Page(s):583 - 587 Vol.2, 2004.
62
[9] NIST. Announcing the advanced encryption standard(AES), FIPS 197. Technical report, National Institute of Standards and Technology, November 2001. 2659
Appendix A VHDL LANGUAGE OVERVIEW

INTRODUCTION: VHDL was developed under contract to the U.S. Department of Defense. It became an IEEE standard in 1987. Whereas Verilog is a C-like language, it is clear that VHDL has its roots in Ada. For many years there was intense competition between Verilog and VHDL for mind share and market share. Both languages have their strong points. In the end, most EDA companies came out with simulators that work with both. Early in the language wars it was noted that Verilog had a number of built-in, gate-level primitives. Over the years these had been optimized for performance by Cadence and later by other Verilog vendors. Verilog also had a single defined method of reading timing into a simulation from an external file. VHDL, on the other hand, was designed for a higher level of abstraction. Although it could model almost anything Verilog could, and without primitives, it Allowed things to be modeled in a multitude of ways. This made performance optimization or acceleration impractical. VHDL was not successfully competing with Verilog-XL as a sign-off ASIC simulator. The EDA companies backing VHDL saw they had to do something. The something was named VITAL, the VHDL Initiative toward ASIC Libraries.
63
DIGITAL SYSTEM DESIGN PROCESS: Fig (a) shows a typical process for the design of a digital system. . Analysis Specification s HDL(VHDL)
RTL Coding
Simulation
Wave forms
Synthesis Gate Level Net list Timing Analysis FPGA
Implementation
Fig (a): A Digital System Design Process
Analysis: The first step in a high level design is the analysis of the system to be designed. The process involved the specifying the behavior expected of the design. The designer puts enough detail into the specification so that the design can be built. RTL coding: After the specification has been completed, the design or the designer can begin the process of implementation. The designer created the RTL description that describes the clock behavior of the design.
64
Simulation: The synthesis tool converts the RTL description into a net list in the target FPGA or ASIC technology. The designer reads the RTL description, if there are no errors the designer can synthesize the design and map the design to the target technology. Timing Analysis: Typical timing analysis uses a timing analyzer that gives a number of report types that can be generated so that the designer can make sure the critical paths of the design and can verify whether they are within the specified or required timings. Implementation: Implementation describes the process of downloading the synthesized RTL description on to the target technology like FPGA or ASICs. LANGUAGE DESCRIPTION: VHDL is an acronym for Very High Speed Integrated Circuit Hardware Description Language. It is a hardware description language that can be used to model a digital system at many levels of abstraction, ranging from the Arithmetic level to the gate level. The complexity of digital system being modeled could vary from that of a simple gate to a complete digital electronic system or any thing in between. The digital system can also be described hierarchically.
Timing can also be explicitly modeled in the description. The VHDL language can be regarded as an integrated amalgamation of the following languages: 1. Sequential language 2. Concurrent language 3. Net-list language 4. Timing specifications 5. Waveform generation.
65
Advanced Encryption Standard Therefore, the language has constructs that enable you to express the concurrent or sequential behavior of a digital system with or without timing. It also allows you to model the system as an interconnection of components. Test waveforms can also be generated to provide a comprehensive description of the system in a single model. VHDL language not only defines the syntax but also defines very clear simulation semantics for each language construct. Therefore models written in this language can be verified using a VHDL simulator. The following are the major capabilities that the language provides along with the features that differentiate if from other hardware description languages: The language can be used as an exchange medium between chip-vendors and CAD users. Different chip-vendor can provide VHDL descriptions of their components to the system designers. CAD tool users can capture the behavior of the design at high level of abstraction. The language supports hierarchy; that is ,a digital system can be modeled as a set of interconnected components; each component, in turn, can be modeled as asset of interconnected sub-components. The language supports flexible design methodologies; top-down ,bottom-up, or mixed . The language is not technology specific, but is capable of supporting technology-specific features. It can also support various hardware technologies. For example, new logic types and new components may be defined; technology-specific attributes may also be specified. It supports both synchronous and asynchronous timing models. Various digital system modeling techniques, such as FSM descriptions, algorithmic descriptions, and Boolean equations can be modeled using the language. The language is publicly available, human readable, machine readable and above all it is not proprietary. It is an IEEE and ANSI standard; therefore models described using this language is portable.
66
Advanced Encryption Standard The language supports three different basic description styles: structural dataflow and behavioral. A design may also be expressed in an combination of these three descriptive styles. It supports a wide range of abstraction levels ranging from abstract behavioral descriptions to very precise gate-level descriptions. It does not, however, support modeling at or below the transistor level. It allows a design to be captured at a level using a single coherent language. Arbitrarily large designs can be modeled using the language and there are no limitations imposed by the language on the size of the design. The language has elements that make large-scale design modeling easier, for example, components, functions, procedures and packages. Test bench can be written using the same language to test other VHDL models. Nominal propagation delays, min-max delays, setup and hold timing constraints and spike detection and be described very naturally using this language. A model cannot only describe the functionality of a design but can also contain information about the design itself in terms of user-defined attributes, such as total area and speed. A common language can be used to describe the library components from different vendors. Tools that understand VHDL models will have no difficulty in reading models from a variety of vendors since the language we use is a standard one. Models written in this language can be verified by simulation since precise simulation semantics are defined for each language construct. Behavioral model that confirm to; a certain synthesis description style are capable of being synthesized to gate-level descriptions. The capability of defining new data types provides the power to describe and simulate design techniques at a very high level of abstraction without any concern about the implementation details.
BASIC VHDL TERMS:
67
Advanced Encryption Standard Before we go any further, lets define some of the terms that we will be using thoughtful our discussion. These are the VHDL building blocks that are used in almost every description. ENTITY: All designs are expressed in terms of entities. An entity is the most basic building block in a design. The uppermost level of the design is the top level entity. If the design is hierarchical, then the top-level description will have lower level descriptions contained in it . These lower level descriptions will be lower entities contained in the top-level entity description. ENTITY DECLARATION: ENTITY entity_make IS PORT (port1: port1_type; Port2: port2_type); END entity; ARCHITECTURE: All entities that can be simulated have architecture description. The architecture describes the behavior of the entity. A single entity can have multiple architectures. One architecture might be behavioral while another might be structural description of the design. As a system of inter-connected components (to represent the structure) As a set of concurrent assignment statements (to represent dataflow) As a set of assignments (to represent behavior) As any combination of the above three.
SYNTAX: ARCHITECTURE architecture_name OF entity IS -- declare some signals here BEGIN
68
Advanced Encryption Standard -- put some concurrent signals here END architecture_name; PROCESS: A process is the basic unit to execution in VHDL.All operations that are performed in simulation of a VHDL description are broken into single or multiple processes. PROCESS STATEMENTS: The process statements consist of number of parts. The first part is the Sensitivity list and the second part is called process declarative part. The list of signals is parenthesis after the key word process is called sensitivity list. This enumerates exactly process statement to execute. The process declarative part consists of area between the end of sensitivity list and the key word BEGIN. The statement part of the process starts at the key word BEGIN and at the END PROCESS. FPGA PROGRAMING: Implementing a logic design with an FPGA usually consists of the following steps (depicted in the figure which follows): 1 1. You enter a description of your logic circuit using a hardware description language (HDL) such as VHDL or Verilog. You can also draw your design using a schematic editor. 2 2. You use a logic synthesizer program to transform the HDL or schematic into a net list. The net list is just a description of the various logic gates in your design and how they are interconnected. 3. You use the implementation tools to map the logic gates and interconnections into the FPGA. The FPGA consists of many configurable logic blocks which can be further decomposed into look-up tables that perform logic operations. The CLBs and LUTs are interwoven with various routing resources. The mapping tool collects your net list gates into groups that fit into the LUTs and then the place & route tool assigns the gate collections to specific CLBs while opening or closing the switches in the routing matrices to connect the gates together.
69
4. Once the implementation phase is complete, a program extracts the state of the switches in the routing matrices and generates a bit stream where the ones and zeroes correspond to open or closed switches. (This is a bit of a simplification, but it will serve for the purposes of this tutorial). 5. The bit stream is downloaded into a physical FPGA chip (usually embedded in some larger system). The electronic switches in the FPGA open or close in response to the binary bits in the bit stream. Upon completion of the downloading, the FPGA will perform the operations specified by your HDL code or schematic.
4 That's really all there is to it. Xilinx Web PACK provides the HDL and schematic editors, logic synthesizer, fitter, and bit stream generator software. The XSTOOLs from XESS provide utilities for downloading the bit stream into an XSB-300E Board containing a Xilinx XC2S300E Spartan IIE FPGA.
70
Fig (b). Implementing a logic design with an FPGA
71

Advanced Encryption Standard AES

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Advanced Encryption Standard AES

Загружено:

Авторское право:

Доступные форматы

Advanced Encryption Standard

Columns () operation. operations. never be specified directly.