Вы находитесь на странице: 1из 79

Abstract

SEA is a scalable encryption algorithm targeted for small embedded


applications. It was initially designed for software implementations in controllers, smart
cards, or processors. In this letter, we investigate its performances in recent field-
programmable gate array (FPGA) devices. For this purpose, a loop architecture of the
block cipher is presented. Beyond its low cost performances, a significant advantage of
the proposed architecture is its full flexibility for any parameter of the scalable encryption
algorithm, taking advantage of generic VHDL coding. The letter also carefully describes
the implementation details allowing us to keep small area requirements. Finally, a
comparative performance discussion of SEA with the Advanced Encryption Standard
Rijndael and ICEBERG (a cipher purposed for efficient FPGA implementations) is
proposed. It illustrates the interest of platform/context-oriented block cipher design and,
as far as SEA is concerned, its low area requirements and reasonable efficiency.

Scalable encryption algorithm (SEA) is a parametric block cipher for resource


constrained systems (e.g., sensor networks, RFIDs) that has been introduced in [1]. It was
initially designed as a low-cost encryption/ authentication routine (i.e., with small code
size and memory) targeted for processors with a limited instruction set (i.e., AND, OR,
XOR gates, word rotation, and modular addition). Additionally and contrary to most
recent block ciphers (e.g., the DES [2] and AES Rijndael [3], [4]), the algorithm takes the
plaintext, key, and the bus sizes as parameters and, therefore, can be straightforwardly
adapted to various implementation contexts and/or security requirements. Compared to
older solutions for low-cost encryption like tiny encryption algorithm (TEA) [5] or
Yuval’s proposal [6], SEA also benefits from a stronger security analysis, derived from
recent advances in block cipher design/cryptanalysis.

In practice, SEA has been proven to be an efficient solution for embedded software
applications using microcontrollers, but its hardware performances have not yet been
investigated. Consequently, and as a first step towards hardware performance analysis,
this letter explores the features of a low-cost field-programmable gate array (FPGA)
encryption/ decryption core for SEA. In addition to the performance evaluation, we show
that the algorithm’s scalability can be turned into a fully generic VHDL design, so that
any text, key, and bus size can be straightforwardly reimplemented without any
modification of the hardware description language, with standard synthesis and
implementation tools.

1
CONTENTS

CHAPTER 1: Introduction to VLSI 9

1.1 Introduction 9
1.2 VLSI Design Style 10
1.3 VLSI Design Flow 11
1.4 VLSI Features 11
CHAPTER 2: Introduction to VHDL 12
2.1 Introduction 12
2.2 Capabilities 13
2.3 Abstraction levels of VHDL 13
2.4 Basic Terminology 14
2.5 Modeling Techniques for VHDL 17
2.6 Process Statements 18
2.7 Conditional Statements 19
2.8 Active HDL Overview 21
2.9 Macro language 22
2.10 Compilation 23
2.11 Simulation 23
2.12 X Linix 24

CHAPTER 3: Introduction to SEA 26


3.1 Specifications 27
3.2 Design properties 30
3.3 Overall Structure 31
3.4 Security Analysis 31
3.5 Performance Analysis 35

CHAPTER 4: An Exposition Of SEA 37


4.1 Overview of SEA
38

CHAPTER 5: SEA Architecture


39
5.1 Key Generation 40
5.2 Encryption 42
5.3 Decryption 44

Appendix-I Simulation Results


47
Appendix-II Synthesis Reports
50
Appendix- III Implementation
79
Appendix-IV Advantages 80
2
Appendix-V Conclusion
81
Appendix-VI Bibliography 82

CH:1 INTRODUCTION TO VLSI

The first digital circuit was designed by using electronic components like vacuum
tubes and transistors. Later Integrated Circuits (ICs) were invented, where a designer can
be able to place digital circuits on a chip consists of less than 10 gates for an IC called
SSI (Small Scale Integration) scale. With the advent of new fabrication techniques
designer can place more than 100 gates on an IC called MSI (Medium Scale Integration).
Using design at this level, one can create digital sub blocks (adders, multiplexes,
counters, registers, and etc.) on an IC. This level is LSI (Large Scale Integration), using
this scale of integration people succeeded to make digital subsystems (Microprocessor,
I/O peripheral devices and etc.) on a chip.

At this point design process started getting very complicated. i.e., manually
conversion from schematic level to gate level or gate level to layout level was becoming
somewhat lengthy process and verifying the functionality of digital circuits at various
levels became critical. This created new challenges to digital designers as well as circuit
designers. Designers felt need to automate these processes. In this process, Rapid
advances in Software Technology and development of new higher level programming
languages taken place. People could able to develop CAD/CAE (Computer Aided
Design/Computer Aided Engineering) tools, for design electronics circuits with
assistance of software programs. Functional verification and Logic verification of design
can be done using CAD simulation tools with greater efficiency. It became very easy to a
designer to verify functionality of design at various levels.

With advent of new technology, i.e., CMOS (Complementary Metal Oxide


Semiconductor) process technology. One can fabricate a chip contains more than Million
of gates. At this point design process still became critical, because of manual converting
the design from one level to other. Using latest CAD tools could solve the problem.
Existence of logic synthesis tools design engineer can easily translate to higher-level
design description to lower levels. This way of designing (using CAD tools) is certainly a
revolution in electronic industry. This may be leading to development of sophisticated
electronic products for both consumer as well as business. Designing Systems using
Hardware always gives best results when compared to software (like Speed Reliability,
performance and etc.,) Using CMOS VLSI Design methodology designer could design
and fabricate ICs without spending much time when compared to traditional way of
designing.

3
1.2 TYPICAL IC DESIGN FLOW:

Behavioral Simulation
Functional
Logic
Fabrication
Behavioral
Logic
Layout
LayBehavioral
Automatic
Out sSimulation
Specificatio
RTL
Gate
Library simulation
Constraint Level Net
Description
Management
Synthesis
P&R
Synthesis nl
D
ies
s
tc
r
i
p
t
i
o
n

4
1.3 MICRON TECHNOLOGY

The micron technology can be classified into 4 categories, Evolving from micron
technology and extending up to VDSM.

 Micron Technology : The technology up to 10-6 µ m is the


micron
Technology.

 Submicron Technology : The technology below 1um is known as the


Submicron technology. It generally ranges
up to 0.36µ m.

 DSM(Deep Sub Micron technology) : The technology extending up to 0.18µ m is


DSM.

 VDSM(Very Deep Sub Micron technology): The presently used technology is


VDSM. It ranges up to 0.09um.

1.4 FEATURES:
5
6
VDSM
Micron
DSM
SM
Micron Technology

7
2.1 INTRODUCTION TO VHDL

VHDL is acronym for VHSIC hardware Description language.


VHSIC is acronym for very high speed Integrated Circuits. It is a hardware description
language that can be used to model a digital system at many levels of abstraction, ranging
from the algorithmic level to the gate level.

The VHDL language can be regarded as an integrated amalgamation of the following


languages:

➢ Sequential language

➢ Concurrent language

➢ Net-list language

➢ Timing specifications

Waveform generation language  VHDL

This language not only defines the syntax but also defines very clear simulation
semantics for each language construct. Therefore, models written in this language can be
verified using a VHDL simulator. This subset is usually sufficient to model most
applications .The complete language, however, has sufficient power to capture the
descriptions of the most complex chips to a complete electronic system.

HISTORY:

The requirements for the language were first generated in 1988 under the VHSIC
chips for the department of Defence (DOD). Reprocurement and reuse was also a
big issue. Thus, a need for a standardized hardware description language for the design,
8
documentation, and verification of the digital systems was generated. The IEEE in the
December 1987 standardized VHDL language; this version of the language is known as
the IEEE STD 1076-1987. The official language description appears in the IEEE
standard VHDL language Reference manual, available from IEEE. The language has
also been recognized as an American National Standards Institute (ANSI) standard.

According to IEEE rules, an IEEE standard has to be reballoted every 5 years so that
it may remain a standard so that it may remain a standard. Consequently, the language
was upgraded with new features, the syntax of many constructs was made more uniform,
and many ambiguities present in the 1987 version of the language were resolved. This
new version of the language is known as the IEEE STD 1076-1993.

2.2 CAPABILITIES:

The following are the major capabilities that the language provides along with the
features that the language provides along with the features that differentiate it from other
hardware languages.

 The language can be used as exchange medium between chip vendors and CAD
tool users. Different chip vendors can provide VHDL descriptions of their
components to system designers.

 The language can be used as a communication medium between different CAD


and CAE tools

 The language supports hierarchy; that is a digital can be modeled as asset of


interconnected components; each component, in turn, can be modeled as a set of
interconnected subcomponents.

 The language supports flexible design methodologies: top-down, bottom-up, or


mixed. It supports both synchronous and asynchronous timing models.

 Various digital modeling techniques, such as finite –state machine descriptions,


and Boolean equations, can be modeled using the language.

 The language is publicly available, human-readable, and machine-readable.

 The language supports three basic different styles: Structural, Dataflow, and
behavioral.

 It supports a wide range of abstraction levels ranging from abstract behavioral


descriptions to very precise gate-level descriptions.

 Arbitrarily large designs can be modeled using the language, and there are no
limitations imposed by the language on the size of the design.
9
2.3 HARDWARE ABSTRACTION:

VHDL is used to describe a model for a digital hardware device. This model
specifies the external view of the device and one or more internal views. The internal
view of the device specifies functionality or structure, while the external view specifies
the interface of the device through which it communicates with the other modules in
the environment.

In VHDL each device model is treated as a distinct representation of a unique


device, called an Entity. The Entity is thus a hardware abstraction of the actual hardware
device. Each Entity is described using one model, which contains one external view and
one or more internal views.

2.4 Basic terminology:

VHDL is a hardware description language that can be used to model a digital


system. A hardware abstraction of this digital system is called an entity. An entity X,
when used in another entity Y, becomes a component for the entity Y.
To describe an entity, VHDL provides five different types of primary constructs, called
design units. They are:

1. Entity declaration
2. Architecture body
3. Configuration declaration
4. Package declaration
5. Package body

1. An entity is modeled using an entity declaration and at least one


architecture body the Entity declaration describes the external view of the
entity;
For example: the input and output signal names.
2. The architecture body contains the internal description of the entity; for
example, as a set of interconnected components that represents the
structure of the entity, or a set of concurrent or sequential statements that
represents the behavior of the entity.
3. A configuration declaration is used to create a configuration for an entity.
It specifies the binding of one architecture body from the many
architecture bodies that may be associated with the entity .It may also
specify the bindings of the architecture components used in the selected
architecture body to other entities. An entity may have any number of
configurations.
4. A package declaration encapsulates a set of related declarations, such type
of declaration s, subtype declaration and subprogram declaration, which
can be shared across two or more design units.
5. A package body contains the definition of subprogram declared in a
package declaration.

10
Once an entity has been modeled, it needs to be validated by a VHDL system. A
typical VHDL system consists of an analyzer and a simulator. The analyzer reads in one
or more design units contained in a single file and compiles them into a design library
after validating the syntax and performing some static checks.

The language is case insensitive; that is lowercase and uppercase characters are
treated alike the Language is also free format comments are specified in the language by
preceding the text with two Consecutive dashes (- -).

Entity Declaration:

The entity declaration specifies the name of entity being modeled and lists the set
of inter face ports. Ports are signals through which entity communicates with other
models in its external environment.

EXAMPLE:

Entity declaration for the half adder circuit is

Entity half adder is


Port (A, B: in Bit; sum, carry: out Bit);
End half adder;

The entity called half adder has two input ports, A and B and two out put ports sum and
carry Bit is predefined type of the language.

Architecture Body:

An architecture body using any of the following modeling styles specifies the
internal details of an entity.
1. As a set of interconnected components (to represent structure)
2. As a set of concurrent assignment statements (to represent data flow)
3. As a set of sequential assignment statements (to represent behavior)
4. As any combination of the above three.

2.5 Structural style of modeling:

In this one an entity is described as a set of interconnected components. Such a


model for the HALF_ADDER entity, is described in a n architecture body

Architecture ha of ha is
Component Xor2
Port (X, Y: in BIT; Z:out BIT);
End component;
Component And2
Port (L, M: in BIT; N:outBIT);
End component;
Begin
X1: Xor2portmap (A, B, SUM)
11
A1: AND2portmap (A, B, CARRY);
End ha;

The name of the architecture body is ha .the entity declaration for half adder
specifies the interface ports for this architecture body. The architecture body is composed
of two parts: the declaration part and the statement part. Two component declarations are
present in the declarative part of the architecture body.

The declared components are instantiated in the statement part of the architecture
body using component instantiation. The signals in the port map of a component
instantiation and the port signals in the component declaration are associated by the
position.

DATAFLOW STYLE OF MODELING:

In this modeling style, the flow of data through the entity is expressed primarily
using concurrent signal assignment statements. The data flow model for the half adder is
described using two concurrent signal assignment statements .In a signal assignment
statement, the symbol <=implies an assignment of a value to a signal.

BEHAVIORAL STYLE OF MODELING:

The behavioral style of modeling specifies the behavior of an entity as a set of


statements that are executed sequentially in the specific order. These sets of sequential
statements, which are specified inside a process statement, do not explicitly specify the
structure of the entity but merely its functionality. A process statement is a concurrent
statement that can appear with in an architecture body.

MIXED STYLE OF MODELING:

It is possible to mix the three modeling styles in a single architecture body. That
is, within an architecture body, we could use component instantiation statements,
concurrent signal assignment statements and process statements.

MODEL ANALYSIS:

Once an entity is declared in VHDL, it can be validated using analyzer and a


simulator that are apart of a VHDL system. The first step in the validation process is
analysis. The analyzer takes a file that contains one or more design units and compile s
them into an intermediate form. The generated intermediate form is stored in a specific
design library that has been designated as the working library.

There is a design library with the logic name STD predefined by the VHDL
language environment. This library contains two packages: STANDARD and TEXTIO.
The STANDARD package contains declarations for all the predefined types of the
language .The TexTIO package contains procedures and functions that are necessary for
supporting formatted text read and write operations. There also exists an IEEE standard
package called STD_LOGIC_1164,and contains its associated sub types; overloaded

12
operator functions, and other useful utilities. This standard is called the IEEE STD 1164 –
1993.

SIMULATION:

For a hierarchical entity to be simulated, all of its lowest –level components must be
described at the behavioral level. A simulation can be performed on either one of the
following:

1. An entity declaration and an architecture body pair.

2. A configuration

Preceding the actual simulation are two major steps:

1. Elaboration phase: IN this phase, the hierarchy of the entity is expanded


and linked, components are bound to entities in a library, and the top-
level entity is built as a network of behavioral models that is ready to be simulated.

2. Initialization phase: Driving and effective values for all explicitly declared signals
are computed, implicit signals are assigned values, processes are executed once
until they suspend, and simulation time is set to 0ns.

Simulation commences by advancing time to that of the next event. Values that
are assigned to signals at this time are assigned. If the value of a signal changes, and if
that signal is present in the sensitivity list of a process, the process is executed until it
suspends. Simulation stops when an assertion occurs, depending on the implementation
of the VHDL system or when the maximum time as defined by the language is reached.

Entity Declaration:

An entity declaration describes the external interface of the entity. It specifies the
name of the entity, the names of the interface ports, their mode and the type of ports .The
syntax for entity declaration is:

Entity entity _name is


[generic (list of –generics and –their types);]
[port (list of interface-port-names-and their types );]
[entity item declarations]
[begin
entity statements]
end [entity][entity name];

The entity –name is the name of the entity, and the interface ports are the signals
through which entity passes the information to and from its external environment. Each
interface port can have one of the following modes:
1. in: The value of an input port can only read with in the entity model .
2. out: The value of an out put port can only be updated within the entity model.

13
3. inout: The value of a bi directional port can be read and updated within the entity
model.
4. buffer: The value of a buffer port can be read and updated within the entity
model .It cannot have more than one source.

Declarations that are placed in the entity are common to all the design units that
are associated with that entity declaration.

ARCHITECTURE BODY:

An architecture body describes the internal view of an entity. It describes the


functionality of the structure of the entity.

Architecture <architecture name> of< entity name> is


Begin
Concurrent statements;
Process statements;
Block statements;
Concurrent signal assignment-statement;
Component –instantiation-statement;
Generate statement;
End [architecture] [architecture name];

The concurrent statements describe the internal composition of the entity. All
concurrent statements are executed in parallel. The internal composition of an entity can
be expressed in terms of structure, dataflow and sequential behavior.
Here we describe an entity by using the behavioral model. A process statement,
which is a concurrent statement, is the primary mechanism used to describe the
functionality of an entity in this modeling style.

2.6 PROCESS STATEMENT:

A process statement contains sequential statements that describe the functionality


of a portion of an entity in sequential terms. The syntax for the process statement is:

[Process-label:] process [(sensitivity-list)] [is]


begin
sequential statements;
variable-assignment-statement
signal assignment-statement
wait statement
if-statement
case-statement
loop-statement
null-statement
exit-statement
next-statement
assertion-statement
14
report-statement
procedure-call-statement
return
end process [process label];

A set of signals to which the process is sensitive is defined by the sensitivity list.
In other words, each time an event occurs on any of the signals in the sensitivity list, the
sequential statements with in the process are executed in a sequential order, that is in the
order in which they appear. The process then suspends after executing the last sequential
statement and waits for another event to occur on a signal in the sensitivity list.

VARIABLE ASSIGNMENT STAEMENT:

Variables can be declared and used inside a process statement. A variable is


assigned a value using the variable assignment statement that typically has the form

Variable-object: = expression;

The expression is evaluated when the statement is executed, and the computed
value is assigned to the variable object instantaneously, that is, at the concurrent
simulation time.
A variable can be declared outside of a process or subprogram. Such a variable
can be read and updated by more than one process. These variables are called shared
variables.

SIGNAL ASSIGNMENT STATEMENT:

Signals are assigned values using a signal assignment statement. The simplest
form of a signal assignment statement is:

Signal-object <= expression [after a delay value];

A signal assignment statement can appear within a process or outside of a process.


If it occurs outside of a process, it is considered to be a concurrent signal assignment
statement.

When a signal assignment statement appears with in a process, it is considered to


be a sequential signal assignment statement and is executed in sequences with respect to
the other statements which appear with in the process.

2.7 CONDITIONAL STATEMENTS:

IF STATEMENT:

An if statement selects a sequence of statements for execution of statements for


execution based on the value of a condition .the condition .The condition can be any
expression that evaluates to a Boolean value. The general form of an if statement is:
15
If Boolean expression then
Sequential statements
{elsif Boolean-expression then
Sequential-statements}
[else sequential statements]
end if;

The if statement is executed by checking each condition sequentially until the first
true condition is found; the set of sequential statements associated with this condition is
executed. An if statement is also a sequential statement.

CASE STATEMENT:

The format of a case statement is:


Case expression is
When choices =>sequential statements
When choices =>sequential statements
End case;
The case statement selects one of the branches for the execution based on the value
of the expression. The expression value must be of a discrete type or one-dimensional
array type. Choices may be expressed as single values, as a range of values by choosing
“others”. The other clause can be used as a choice to cover the “catch-all” values and, if
present, must be the last branch in the case statement

LOOP STATEMENTS:

A loop statement is used to iterate through a set of sequential statements the syntax
for loop statement is:

[Loop-label:] iteration-scheme loop


Sequential-statements
End loop [loop label];

16
2.8 Active HDL Overview:
Active-HDL is an integrated environment designed for development of VHDL, Verilog,
EDIF and mixed VHDL-Verilog-EDIF designs. It comprises three different design entry
tools, VHDL'93 compiler, Verilog compiler, single simulation kernel, several debugging
tools, graphical and textual simulation output viewers, and auxiliary utilities designed for
easy management of resource files, designs, and libraries.

Standards Supported

VHDL:
The VHDL simulator implemented in Active-HDL supports the IEEE Std. 1076-1993
standard.

Verilog:
The Verilog simulator implemented in Active-HDL supports the IEEE Std. 1364-1995
standard. Both PLI (Programming Language Interface) and VCD (Value Change Dump)
are also supported in Active-HDL.

EDIF:
Active-HDL supports Electronic Design Interchange Format version 2 0 0.

VITAL:

The simulator provides built-in acceleration for VITAL packages version 3.0. The
VITAL-compliant models can be annotated with timing data from SDF files. SDF files
must comply with OVI Standard Delay Format Specification Version 2.1.
WAVES:
Active-HDL supports automatic generation of test benches compliant with the WAVES
standard. The basis for this implementation is a draft version of the standard dated to May
1997 (IEEE P1029.1/D1.0 May 1997). The WAVES standard (Waveform and Vector
Exchange to Support Design and Test Verification) defines a formal notation that

17
supports the verification and testing of hardware designs, the communication of hardware
design and test verification data, the maintenance, modification and procurement of
hardware system.

2.9 ACTIVE-HDL Macro Language:

All operations in Active-HDL can be performed using Active-HDL macro language. The
language has been designed to enable the user to work with Active-HDL without using
the graphical user interface (GUI).

1. HDL Editor:
HDL Editor is a text editor designed for HDL source files. It displays specific
syntax categories in different colors (keyword coloring). The editor is tightly
integrated with the simulator to enable debugging source code. The keyword
coloring is also available when HDL Editor is used for editing macro files, Perl
scripts, and Tcl scripts.
2. Block Diagram Editor:
Block Diagram Editor is a graphical tool designed to create block diagrams. The
editor automatically translates graphically designed diagrams into VHDL or
Verilog code.
3. State Diagram Editor:
State Diagram Editor is a graphical tool designed to edit state machine diagrams.
The editor automatically translates graphically designed diagrams into VHDL or
Verilog code.
4. Waveform Editor:
Waveform Editor displays the results of a simulation run as signal waveforms. It
allows you to graphically edit waveforms so as to create desired test vectors.
5. Design Browser:
The Design Browser window displays the contents of the current design, that is:
➢ Resource files attached to the design.
➢ The contents of the default-working library of the design.
➢ The structure of the design unit selected for simulation.
➢ VHDL, Verilog, or EDIF objects declared within a selected region of the
current design.

6. Console window:
The Console window is an interactive input-output text device providing entry for
Active-HDL macro language commands, macros, and scripts. All Active-HDL tools
output their messages to Console.

18
2.10 Compilation:

Compilation is a process of analysis of a source file. Analyzed design units contained


within the file are placed into the working library in a format understandable for the
simulator. In Active-HDL, a source file can be on of the following:
• VHDL file (.vhd)
• Verilog file (.v)
• EDIF net list file
• State diagram file (.asf)
• Block diagram file (.bde)
In the case of a block or state diagram file, the compiler analyzes the intermediate
VHDL, Verilog, or EDIF file containing HDL code (or net list) generated from the
diagram.
A net list is a set of statements that specifies the elements of a circuit (for example,
transistors or gates) and their interconnection.
Active-HDL provides three compilers, respectively for VHDL, Verilog, and EDIF. When
you choose a menu command or toolbar button for compilation, Active-HDL
automatically employs the compiler appropriate for the type of the source file being
compiled.

2.11 Simulation:
The purpose of simulation is to verify that the circuit works as desired.
The Active-HDL simulator provides two simulation engines.
➢ Event-Driven Simulation
➢ Cycle-Based Simulation
The simulator supports hybrid simulation – some portions of a design can be simulated in
the event-driven kernel while the others in the cycle-based kernel. Cycle-based
simulation is significantly faster than event-driven.

2.12 XILINX:

Integrated Software Environment (ISE) is the Xilinx design software suite. This overview
explains the general progression of a design through ISE from start to finish.

19
ISE enables you to start your design with any of a number of different source types,
including:
• HDL (VHDL, Verilog HDL, ABEL)
• Schematic design files
• EDIF
• NGC/NGO
• State Machines
• IP Cores
From your source files, ISE enables you to quickly verify the functionality of these
sources using the integrated simulation capabilities, including ModelSim Xilinx Edition
and the HDL Bencher test bench generator. HDL sources may be synthesized using the
Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone
or integrated into ISE. The Xilinx implementation tools continue the process into a
placed and routed FPGA or fitted CPLD, and finally produce a bit stream for your device
configuration.

Design Entry:

• ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code
and viewing reports.
• Schematic Editor - The Engineering Capture System (ECS) is a graphical user
interface (GUI) that allows you to create, view, and edit schematics and symbols
for the Design Entry step of the Xilinx® design flow.
• CORE Generator - The CORE Generator System is a design tool that delivers
parameterized cores optimized for Xilinx FPGAs ranging in complexity from
simple arithmetic operators such as adders, to system-level building blocks such
as filters, transforms, FIFOs, and memories.
• Constraints Editor - The Constraints Editor allows you to create and modify the
most commonly used timing constraints.
• PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and
edit I/O, Global logic, and Area Group constraints.
• State CAD State Machine Editor - State CAD allows you to specify states,
transitions, and actions in a graphical editor. The state machine will be created in
HDL.

Implementation:

• Translate - The Translate process runs NGDBuild to merge all of the input net
lists as well as design constraint information into a Xilinx database file.
• Map - The Map program maps a logical design to a Xilinx FPGA.
• Place and Route (PAR) - The PAR program accepts the mapped design, places
and routes the FPGA, and produces output for the bit stream generator.

20
• Floor planner - The Floor planner allows you to view a graphical representation of
the FPGA, and to view and modify the placed design.
• FPGA Editor - The FPGA Editor allows you view and modify the physical
implementation, including routing.
• Timing Analyzer - The Timing Analyzer provides a way to perform static timing
analysis on FPGA and CPLD designs. With Timing Analyzer, analysis can be
performed immediately after mapping, placing or routing an FPGA design, and
after fitting and routing a CPLD design.
• Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices
and creates the JEDEC programming file.
• Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of
the inputs and outputs, macro cell details, equations, and pin assignments.

Device Download and Program File Formatting

• BitGen - The BitGen program receives the placed and routed design and produces
a bit stream for Xilinx device configuration.
• iMPACT - The iMPACT tool generates various programming file formats, and
subsequently allows you to configure your device.
• XPower - XPower enables you to interactively and automatically analyze power
consumption for Xilinx FPGA and CPLD devices.
• Integration with ChipScope Pro.

CH 3: Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between


implementation cost and resulting performances. In addition, they generally aim to be
implemented efficiently on a large variety of platforms. In this paper, we take an opposite
approach and consider a context where we have very limited processing resources and
throughput requirements. For this purpose, we propose low-cost encryption routines (i.e.
with small code size and memory) targeted for processors with a limited instruction set
21
(i.e. AND, OR, XOR gates, word rotation and modular addition). The proposed design is
parametric in the text, key and processor size, allows efficient combination of
encryption/decryption, “on-the-fly” key derivation and its security against a number of
recent cryptanalytic techniques is discussed. Target applications for such routines include
any context requiring low-cost encryption and/or authentication.

In this paper, we consequently consider a general context where we have very


limited processing resources (e.g. a small processor) and throughput requirements. It
yields design criteria such as: low memory requirements, small code size, limited
instruction set. In addition, we propose the flexibility as another unusual design principle.
SEAn,b is parametric in the text, key and processor size. Such an approach was
motivated by the fact that many algorithms behave differently on different platforms (e.g.
8-bit or 32-bit processors). In opposition, SEAn,b allows to obtain a small encryption
routine targeted to any given processor, the security of the cipher being adapted in
function of its key size. Beyond these general guidelines, alternative features were
wanted, including the efficient combination of encryption and decryption or the ability to
derive keys “on the fly”.

Those goals are particularly relevant in contexts where the same constrained
device has to perform encryption and decryption operations (e.g. authentication). Finally,
the simplicity of SEAn,b makes its implementation straightforward. Embedded
applications such as building infrastructures present a significant opportunity and
challenge for such new cryptosystems.

For example, introducing programmability into the configuration of lights


and switches, thermostats and air handlers, promises to improve the cost of construction,
flexibility in occupancy, and energy efficiency of buildings. But meeting this demand on
a scale compatible with the economics of the construction industry is going to require
secure lightweight implementations of peer-to-peer networks in resource-constrained
systems. The Internet-0 approach to end-to-end modulation for interdevice
internetworking is typically appropriate in this limit [20]. SEAn,b constitutes a suitable
solution for low-cost encryption/authentication within such networks. RFID’s or any
power/space-limited applications are similarly targeted.

3.1 Specifications:

Parameters and Definitions:

SEAn,b operates on various text, key and word sizes. It is based on a Feistel
structure with a variable number of rounds, and is defined with respect to the
following parameters:
– n: plaintext size, key size.
– b: processor (or word) size.
– nb = n
2b : number of words per Feistel branch.
--nr: number of block cipher rounds.

22
As only constraint, it is required that n is a multiple of 6b. For example, using
an 8-bit processor, we can derive 48, 96, 144, . . . -bit block ciphers, respectively
denoted as SEA48,8, SEA96,8, SEA144,8, ... Let x be a n2
-bit vector. In the following, we will consider two representations:
– Bit representation: xb = x(n2− 1) x(n2− 2) . . . , x(2) x(1) x(0).
--Word representation: xW = xnb−1 xnb−2 . . . x2 x1 x0.

Basic Operations

Due to its simplicity constraints, SEAn,b is based on a limited number of elementary


operations (selected for their availability in any processing device) denoted
as follows: (1) bitwise XOR ⊕, (2) substitution box S, (3) word (left) rotation
R and inverse word rotation R−1, (4) bit rotation r, (5) addition mod 2b _.

These operations are formally defined as follows:

1. Bitwise XOR:

The bitwise XOR is defined on n2-bit vectors:


⊕ : Zn2
2 ラ Zn2
2 → Zn2
2 : x, y → z = x ⊕ y ⇔ z(i) = x(i) ⊕ y(i), 0 ≤ i ≤n2 − 1

2. Substitution Box S:

SEAn,b uses the following 3-bit substitution table:


ST := {0, 5, 6, 7, 4, 3, 1, 2},
in C-like notation. For efficiency purposes, it is applied bitwise to any set of three
words of data using the following recursive definition:

S : Znb
2b → Znb

2b : x → x = S(x) ⇔
x3i = (x3i+2 ∧ x3i+1) ⊕ x3i,
x3i+1 = (x3i+2 ∧ x3i) ⊕ x3i+1,
x3i+2 = (x3i ∨ x3i+1) ⊕ x3i+2, 0≤ i ≤ nb3 − 1,
where ∧ and ∨ respectively represent the bitwise AND and OR.

Word Rotation R:

The word rotation is defined on nb-word vectors:


R : Znb
2b → Znb
2b : x → y = R(x) ⇔ yi+1 = xi, 0 ≤ i ≤ nb − 2,
y0 = xnb−1

23
Bit Rotation r:

The bit rotation is defined on nb-word vectors:


r : Znb
2b → Znb
2b : x → y = r(x) ⇔ y3i = x3i≫1,
y3i+1 = x3i+1,
y3i+2 = x3i+2 ≪1, 0 ≤ i ≤ nb3 − 1,
where≫and ≪represent the cyclic right and left shifts inside a word.

Addition mod2b _:

The mod 2b addition is defined on nb-word vectors:


r : Znb
2b ラ Znb
2b → Znb
2b : x, y → z = x _ y ⇔ zi = xi _ yi, 0 ≤ i ≤ nb − 1

The Round and Key Round

Based on the previous definitions, the encrypt round FE, decrypt round FD
and key round FK are pictured in Figure 1 and defined as the functions F :
Z2 2n/2 ラ Z2n/2 → Z2 2n/2 such that:

[Li+1,Ri+1] = FE(Li,Ri,Ki) Ri+1 = R(Li) ⊕ r_S(Ri


_ Ki)_
Li+1 = Ri

[Li+1,Ri+1] = FD(Li,Ri,Ki) Ri+1 = R−1_Li ⊕


r_S(Ri _ Ki)__
Li+1 = Ri

[KLi+1,KRi+1] = FK(KLi,KRi, Ci) KRi+1=KLi ⊕


R_r_S(KRi _ Ci)__
KLi+1 = KRi

24
kKR
L
Ri
Li
C
rS
Rii+-1i+i11 i+i+11
KRi
KLi
KL

FIG 3.1 Encrypt/decrypt round and key round

The Complete Cipher:

The cipher iterates an odd number nr of rounds. The following pseudo-C code
encrypts a plaintext P under a key K and produces a ciphertext C. P,C and
K have a parametric bit size n. The operations within the cipher are performed
considering parametric b-bit words.
C=SEAn,b(P,K)
{
% initialization:
L0&R0 = P;
KL0&KR0 = K;

% key scheduling:
for i in 1 to _nr2_
[KLi,KRi] = FK(KLi−1,KRi−1, C(i));
switch KL_ nr
for i in nr 2_, KR_ nr2_;2 to nr − 1
25
[KLi,KRi] = FK(KLi−1,KRi−1, C(r − i));

% encryption:
for i in 1 to nr2

[Li,Ri] = FE(Li−1,Ri−1,KRi−1);
for i in nr2 + 1 to nr
[Li,Ri] = FE(Li−1,Ri−1,KLi−1);

% final:
C = Rnr&Lnr ;
switch KLnr−1, KRnr−1;
},
where
where & is the concatenation operator, KR
_ nr2 _
is taken before the switch and
C(i) is a nb-word vector of which all the words have value 0 excepted the LSW
that equals i. Decryption is exactly the same, using the decrypt round FD.

3.2 Design Properties of the Components

Substitution Box S:

The substitution box was searched exhaustively in order to meet the following security
and efficiency criteria:
– λ-parameter1: 1/2.
– δ-parameter2: 1/4.
– Maximum nonlinear order, namely 2.
– Recursive definition.
– Minimum number of instructions.
Remark that, if 3-operand instructions are available, the recursive definition allows to
perform the substitution box in 2 operations per word of data. As a comparison, the 3 ラ 3
bitwise substitution box used in 3-WAY [15] requires 3. The counterpart of this
efficiency is the presence of two fixed points in the table.

Bit and Word Rotations r and R:

The cyclic rotations were defined in order to provide predictable low-cost diffusion
within the cipher, when combined with the bitslice substitution box. It is illustrated in
Figure 2 for a single substitution box scheme with parameters n = 48, b = 8, nb = 3.
Looking at the figure, it can be seen that SEAn,b divides its data in 2nb
3 blocks of 3 words. The substitution box is applied in parallel to these blocks. Therefore,
the diffusion process (starting with one single active bit in the left branch) is divided into
two steps3:

The first phase is obtained by the combination of the word rotation R (which is the only
transform to provide inter-word diffusion) with the substitution box. It requires at most
26
nb rounds to be completed (in our example, nb = 3 which yields 3 rounds). Once every
word has at least one active bit, the combination of r and S yields six more active bits per
block in each round. Therefore, finishing the diffusion of all the blocks requires at most
_b/2_ rounds. Combining these observations, the diffusion is complete after nb + _b/2_
rounds.

Addition mod 2b _:

Using a mod 2b key addition in place of a bitwise XOR was motivated by different
reasons: (1) improvement of the diffusion process, (2) improvement of the non-linearity,
(3) same cost/speed as the bitwise XOR in
most processors, (4) necessity to avoid structural attacks.

3.3 Overall Structure:

The overall structure of the cipher follows the Feistel strategy. However, a few points are
specific to SEAn,b, namely the key schedule and the position of R, R−1 in the
encrypt/decrypt rounds.The key schedule is designed such that the master key is
encrypted during half the rounds and decrypted during the other half. It allows to obtain a
particular structure of the sequence of round keys such that the key expansion is exactly
the same in encryption and decryption. Namely, we have:
K0,K1,K2, . . . , K_ r
2 _,K_ r
2 _−1, . . . , K2,K1,K0
As a consequence of this structure, the encryption/decryption rounds cannot
keep the traditional Feistel structure: it would result in having identical encryption
and decryption functions. This is the reason of moving the word rotation
to the left branch of the Feistel round.

3.4 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis:

From the properties of the substitution box, we can compute bounds for the best linear
and differential characteristics through the cipher. We first use the following lemma
[29]:Lemma 1. Let f be the bijective nonlinear function of a 3-round Feistel cipher.
Assuming that the linear parameter of f is smaller than λ and its differential parameter is
smaller than δ, then the linear, differential parameters of the 3-round cipher Δ,Λ are
respectively smaller than λ2, δ2. Since our nonlinear function S has parameter δ = 2−2
and parameter λ = 2−1,
it implies that 3 rounds of SEAn,b have their linear and differential parameters
respectively bounded by Δ < 2−4 and Λ < 2−2. However, for a n-bit block cipher, it is
respectively required that Δ _ 2−n and Λ _ 2−n2 to resist against differential [4] and
linear cryptanalysis [28]. In order to approach these bounds, we require that:
δ2nr/3 = _2−2_2nr/3
< 2−n and λ2nr/3 = _2−1_2nr/3< 2−n2. (1)
In both cases, the required number of rounds is: nr ≥ 3n/4. We note that we used a hybrid
approach, between the provable security against linear and differential attacks that
consists in bounding the parameter of the best differential/hull, like in lemma 1, and the
usual heuristics to estimate the best linear/differential characteristic through a cipher (as
27
in the previous estimation for nr). In fact, the strategy of Equation (1) is similar to the one
of e.g. the AES Rijndael [17], but we only assume one active s-box per round.

Extensions of Linear and Differential Cryptanalysis:

Classical extensions of linear and differential cryptanalysis are non-linear


approximations of outer rounds [26], bi-linear cryptanalysis [14], differential-linear
cryptanalysis [27], multiple linear cryptanalysis [22, 10], boomerang [31] and rectangle
[8] attack.

However these extensions usually imply only a small improvement compared to the basic
attacks. As a matter of fact, non-linear approximations of outer rounds allow to improve
the bias of one or two rounds only. Regarding bi-linear cryptanalysis, we quote the author
of [14]: For ciphers similar to DES, based on small substitution boxes, we claim that bi-
linear cryptanalysis is very closely related to LC, and we do not expect to find a bi-linear
attack much faster than by LC.

It is difficult to evaluate the efficiency of multiple linear cryptanalysis, but it seems


more promising for big substitution boxes (as mentioned in [22]). Moreover the
improvement on classical cryptanalysis obtained in [10] for the case of DES (which
shares with SEAn,b a Feistel structure and a poor diffusion) is limited. Finally, the
complexity of differential-linear cryptanalysis and of the boomerang attack and its
variants is inherently greater than the one of the basic attacks.

As an example, the boomerang (or rectangle) attack allows us to use two short
differentials instead of a long one, but using a long differential with probability pq is in
general highly preferable to applying a boomerang attack with two short differentials of
probability p and q. Therefore although these attacks can perform slightly better in
specific cases, the expected improvement is never outstanding.The conclusion is that
these extensions actually deserve to be considered in the estimation of the number of
rounds necessary to achieve security, but that a reasonable multiplicative factor should be
enough to take them into account.

A Dedicated Related-Key Attack Against a Modified Version. Forx ∈ Znb2b, we denote


by x≪a the left rotation by a bits of each of the nb wordsof x. The non-linear and
diffusion layers have the following properties:
– S(x≪a) = S(x)≪a
– r(x≪a) = r(x)≪a
– R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather
than modular addition, and where all round constants Ci are such that Ci ≪ a = Ci, e.g.
all Ci’s equal 0. As a consequence of the previous observations, the modified round F_E
and the key round FK satisfy:
F_E (L≪a,R≪a,K ≪a) = F_E (L,R,K)≪a
FK(KL≪a,KR≪a, 0) = FK(KL,KR, 0)≪a

28
These properties are iterative, in the sense that they also hold for the composition of
several block cipher rounds. It is immediate to deduce from them a distinguisher on the
modified cipher, which requires 2 chosen encryption queries under 2 related keys K and
K ≪a. In the actual SEAn,b, the key addition is performed word-wise mod 2b. As the
property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations, it
only holds with a probability p, which depends on a and the word size b. For a = 1, p
rapidly converges to 3/8 as b grows. It is smaller for 1 < a < b−1. Of course, this
probability is averaged for all possible (X,K) and certain keys (e.g. “all zeroes”) yield no
carry propagation at all. However, the design properties of the key schedule prevent
SEAn,b from having such weak keys.

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they
are generated from a counter). Combined with the diffusion in the key schedule, it
implies that the similarity between the round keys derived from K and those derived from
K ≪a rapidly vanishes. These properties avoid this structural distinguisher to be
propagated through more than a few rounds of SEAn,b.

Square Attacks:

We explored square attacks [16] on SEA48,8. More precisely, we considered all possible
sets of inputs to one branch of the Feistel structure, where the input to some of the
substitution boxes is active (i.e. takes all possible input values the same number of times),
and the input to the other substitution boxes is constant. The other branch is also constant.
Therefore the number of plaintexts considered goes from 23 (when the input to only one
substitution box is active) to 221 (when the input to 7 substitution boxes is active). Our
experiments showed that square attacks do not allow to pass through more rounds than
the diffusion pattern illustrated in Figure. It is expected that it remains the same when
different parameters n and b are considered, which implies that nb + _b/2_ rounds are
enough to prevent square attacks. Note that although our observations also hold for ⊕-
SEAn,b, the use of addition mod 2b provides better resistance against square attacks.

Truncated and Impossible Differentials:

As for square attacks, the diffusion analysis illustrated in Figure provides an estimation
of the number of rounds required to prevent truncated differential attacks [25].
Impossible differentials[7] are usually built by concatenating two incompatible truncated
differentials. As a consequence, we estimate the number of rounds necessary to prevent
the construction of an impossible differential distinguisher as 2 キ (nb + _b/2_).

Interpolation Attacks:

The interpolation attack [21] is possible when the whole cipher can be written as a
relatively simple algebraic expression. It requires the substitution box to have a compact
expression, and the diffusion layer to permit the composition of these expressions. In the
case of SEAn,b, there is a priori no such expression, and the bitwise diffusion would
make the combination of algebraic expressions difficult anyway.
29
Slide Attacks:

The sequence of round keys of SEAn,b is the same as the one of ICEBERG. Therefore
the analysis done in [30] is still valid. Namely, the non periodicity of the sequence should
make slide attacks [11, 12] irrelevant. The particular structure of this sequence also has
some similarities with the one of GOST, of which the vulnerability against slide attacks is
examined in [12]. None of the attacks presented in [12] seems to be applicable to our
cipher.

Related-Key Attacks:

The first related-key attack has been described in [5]. It is the related-key counterpart of
the slide attack. Such an attack is applicable when a round key Ki is computed from the
previous round key Ki−1 using a function f which is always the same: Ki = f(Ki−1).
However in the case of SEAn,b, a round constant that changes for each key round is used,
which prevents this attack. Another type of related-key attack is the differential related
key attack [23, 24]. The non-linearity of the SEAn,b key schedule should prevent it.
Moreover, note that the improvement of the differential related-key attack over classical
differential cryptanalysis usually results from the fact that choosing a given round key
difference allows to “counter” the effect of the diffusion layer on the differential
characteristic; a typical example is the attack on 3-WAY [24]. As the security of SEAn,b
against differential cryptanalysis results from its large number of rounds rather than from
its diffusion, this effect is notrelevant here.

Complementation Properties:

The DES has the following complementation property: if P K→C denotes the fact that
encryption of P under key K gives ciphertext C, then: P K −→ C ⇐⇒ P K −→ C. The
non-linear key scheduling and the presence of carry propagations in the actual SEAn,b
algorithm prevents this property. We are not aware of any other similar structural feature
in the design.

Algebraic Attacks:

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher. For
example, certain block ciphers can be written as an overdefined system of quadratic
equations. Reference [13] argues that a method called XSL might provide a way to
effectively solve this type of equations and recover the key from a few plaintext-
ciphertext pairs. Clearly, SEAn,b has a simple algebraic structure, as it is based on a 3-bit
substitution box. Therefore, if such an attack practically applies to a cipher like Serpent
[1], it is likely applicable to one of the versions of our routines. As the complexity of
XSL is supposedly polynomial in the plaintext size and number of rounds, it is specially
true when those values increase. However, as the criteria for these techniques to be
successful are still being discussed [9], we did consider this latter point as a scope for
further research. We note that resistance against algebraic attacks would anyway exclude
the use of small substitution boxes and therefore the possibility to build very low cost
encryption routines.

30
Suggested Number of Rounds:

From the previous descriptions, the minimum required number of rounds to provide
security against known attacks would be 3n4 + 2 キ (nb + _b/2_). This roughly
corresponds to the number of rounds to resist linear/differential attacks plus twice the
number of rounds to obtain complete diffusion (to prevent both structural attacks and
outer rounds improvements of statistical attacks). A more conservative approach (applied
in most present block ciphers) would be to take a large security margin, e.g. by doubling
this number of rounds4. nr has to be odd: we add one if it is even. We also assume a
minimum word size b ≥ 8 bits.

3.5 Performance Analysis:

SEAn,b is targeted for being implemented on low-cost processors, with little code size
and a small instruction set. However, SEAn,b’s simple structure makes it easy to
implement on any processor. In appendix, we propose a pseudo-assembly code of an
encryption/decryption design with “on the fly” key scheduling. The implementation
objectives were, in decreasing order of importance: (1) low RAM and registers usage, (2)
low code size and (3) speed. It is based on the following (very) reduced instruction set
(assuming 2-operand instructions only):
– Arithmetic and logic operators: ∨, ∧,⊕,_,≫,≪.

– Branch instructions: goto, subroutine call and return.

– Comparison, load RAM in register, store register in RAM.

According to the code in appendix, the performances can be roughly estimated as


follows.
First, the combined number of RAM words and registers equals 5nb + 3. Then, the code
size and implementation time (both in expressed in ops.) is evaluated by summing the
values given in appendix. For the code size, it directly yields 31nb+36 ops. For the
implementation time, the round and key round respectively require 12nb + 11 ops. and
10nb + 11 ops. It yields a total of (nr − 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) +
8nb + 7. These values are summarized in Table 1. Remark that, due to the particular
structure of the key scheduling, we do not need to keep the master key in memory as, at
the end of an encryption/decryption, we have Knr−1 = K0. Remark also that this
implementation uses a low number of registers, namely nb +3. However, if more registers
are available, they can be traded for RAM words, which will result in lower code size and
faster implementation.

31
For illustration purposes, we implemented SEAn, b on Atmel AVR ATtiny[3] And ARM
[2] microprocessors. The Atmel ATtiny represents a typical target for such a low-cost
encryption routine. We chose the ARM platform in order to provide rough comparisons
between SEAn,b and the AES Rijndael. While direct comparisons are made difficult by
their high dependencies on the target devices, the following general comments can be
made:
– SEAn,b designs combine encryption and decryption more efficiently than most other
encryption algorithms. In particular, key agility in decryption is usually not possible (e.g.
for the AES Rijndael).
– The combined number of RAM words and registers of SEAn,b implementations (i.e.
5nb + 3) is generally lower than for other block ciphers.
– The code size of SEAn,b is generally lower than for other block ciphers implemented
on similar platforms.

The flexibility of SEAn,b also makes it less sensitive to the choice of a processor than
fixed-sized algorithms, although it is obvious that large buses improve efficiency. The
drawback of these limited resources is in the number of cycles required for the encryption
(i.e. SEAn,b trades space for time, which may be relevant due to present processors
speeds). Looking at the code size - cycles product, the efficiency of SEAn,b remains
similar to the one of Rijndael (encryption only) that is well known for its efficient smart
cards implementations.

32
CH:4 AN EXPOSITION OF THE SEA ALGORITHM

The Schoof{Elkies{Atkin algorithm is an e_cient way to count the number of points on


an elliptic curve de_ned over a large prime _eld. This expository paper describes the
algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to
implement the algorithm. The mathematical background for the technique is then
given.Let p be a large (odd) prime and let E : y2 = x3 + a4x + a6 be an elliptic curve,
where a4 and a6 are given _xed integers. In the case where p does not divide 4a34
+27a26 , E can be reduced to an elliptic curve over Fp. The number of points of E over
Fp, denoted by #E(Fp), is of cryptographic interest, since the properties of this number
determine the security of elliptic curve cryptosystems based on E against various known
attacks.

The _rst polynomial time algorithm for determining the number of rational points on an
elliptic curve de_ned over a _nite _eld is due to Schoof. He used calculations with torsion
points on the curve to arrive at the number of points. At _rst Schoof's algorithm was
considered impractical, but Elkies suggested the use of \good" primes (now known as
Elkies primes), where isogenies and modular curves can be involved to speed up the
calculation. Atkin also made a number of important contributions to the algorithm, which
then became known as the Schoof{Elkies{Atkin (SEA) algorithm.

Further improvements were later proposed by Dewaghe and


Couveignes{Dewaghe{Morain. The SEA algorithm was implemented by Morain,
Muller, and Izu et al. Schoof's seminal paper [18] describes the original algorithm. He
later also published a paper [19] that is a lovely overview of the developments in the
subject up to 1995. Elkies' paper [9] describes the ideas of his original manuscript [8] and
contains many other theoretical insights and illuminating examples. The implementations
of Morain and Muller are described in [15] and [16]. The implementation of Izu, Kogure,
Noro and Yokoyama, which focuses on speeding up the algorithm as much as possible, is
described in [13].

Dewaghe's improvement is published in [7. The improvement by


Couveignes{Dewaghe{Morain is published in [5]. Atkin never formally published his
contributions described in [1], but they are discussed extensively in [9, 19]. This paper,
which is not aimed at the experts in the area, describes in detail a reasonably fast
implementation of the SEA algorithm that is closely modeled upon Morain's. The
algorithm considered below is probabilistic and, for a 200-bit prime p, succeeds with a
probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set
A of auxiliary primes below). The algorithm implemented on a typical personal computer
takes several minutes to _nd the number of points on a typical curve over Fp, where p has
200 bits.

It is known that
#E(Fp) = p + 1 � t;
where t is an integer which satis_es the Hasse bound
�2pp _ t _ 2pp:

33
The algorithm works by calculating t modulo several small auxiliary primes `. When the
product of the auxiliary primes exceeds 4pp, the Chinese Remainder Theorem is used to
recover the exact value of t, and hence that of #E(Fp). The algorithm works its way
though a _xed list of 40 candidates for auxiliary primes given below. For each candidate,
a calculation has to be carried out to generate a certain polynomial ` that is necessary for
further calculations with this `. These polynomials` do not depend on the curve E under
consideration and hence might be precomputed and stored if memory allows. Then for
any elliptic curve E we can quickly decide if our algorithm applies (the probability that
the algorithm applies for a speci_c E and ` is 1=2). For those curves where the algorithm
applies, we can determine t modulo `. When we _nished with all our candidates for the
auxiliary primes, we can look at the elliptic curve and check whether the product of
auxiliary primes that worked exceeds 4pp or not. In the former case, we succeeded in
determining t.

A typical application for this point counting would be to take a random prime p and a
random elliptic curve E over Fp, with the intention of _nding an E with #E(Fp) = xr,
where r is a prime and x is small. Given such a curve, a point P of order r can be located
easily and the pair (E; P) could be used for a number of cryptographic algorithms, such as
Di_e-Hellman key exchange, El Gamal encryption, etc. If we use 200-bit primes for p
and require x _ 32, then the probability that #E = xr is about 2.5%, so we expect to have
to run our algorithm on about 55 curves. Section 2 describes the algorithm in detail.
Section 3 presents the mathematical background of the algorithm. Section 4 presents
ideas by which the algorithm could be improved. Section 5 contains certain tables of data
that need to be hardwired into a program implementing this algorithm.

The Algorithm

4.1 Overview:

The set A of potential auxiliary primes is the union of the set As of small primes and the
set Al of larger primes. For each ` 2 A, we need to determine a polynomial `(F; J)
2 Z[F; J]. For ` 2 As, this is stored in the program. For ` 2 Al, must be calculated by
determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out
certain algebraic operations on it. The polynomials do not depend on the elliptic curve
under consideration and therefore may be pre-calculated and stored if there is enough
space for them (they require just under a half megabyte to store).
We start out with a given prime p and an elliptic curve
E : y2 = x3 + a4x + a6:

34
CH 5: SEA Architecture Block Diagram

35
M
Round
KeyReg[9
KeyRe
K
SBo
W
Encryption
Cipher
DataI
Mo
B
SBox
IW
XO
Rou
Decryption
Plain
Key0
Key9
Data
Dat
SM
R
E
E/
KeyI
Ke
C text
data
data
xod
Reg
5:0]
g0[95:
g1[95:
g8[95:
g9[95:
E
xxx
daL
R
nd
Computational
Register
O[95:
aR
lk
st
nC
DO
[95:0
slyL Block
xY
Block
a0]
tkd]vr
Reg
C
O
M
P
U
T
A
T
I
O
N
A
L

B
L
O
C
K

36
FIG: 5.1

5.1 KEY GENERATION

Key generation is the process of generating keys for cryptography. A


key is used to encrypt and decrypt whatever data is being
encrypted/decrypted.
Modern cryptographic systems include symmetric-key algorithms
(such as DES and AES) and public-key algorithms (such as RSA).
Symmetric-key algorithms use a single shared key; keeping data secret
requires keeping this key secret. Public-key algorithms use a public key and a
private key. The public key is made available to anyone (often by means of a
digital certificate). A sender will encrypt data with the public key; only the
holder of the private key can decrypt this data.

Since public-key algorithms tend to be much slower than symmetric-


key algorithms, modern systems such as TLS and SSH use a combination of
the two: one party receives the other's public key, and encrypts a small piece
of data (either a symmetric key or some data that will be used to generate it).
The remainder of the conversation uses a (typically faster) symmetric-key
algorithm for encryption.

In computer cryptography keys are integers. In some cases keys are


randomly generated using a random number generator (RNG) or
pseudorandom number generator (PRNG), the latter being a computer
algorithm that produces data which appears random under analysis. Of the
PRNGs those which use system entropy to seed data generally produce better
results, since this makes the initial conditions of the PRNG much more
difficult for an attacker to guess. In other situations, the key is created using a
passphrase and a key generation algorithm, usually involving a cryptographic
hash function such as SHA-1.

The simplest method to read encrypted data is a brute force attack—


simply attempting every number, up to the maximum length of the key.
37
Therefore, it is important to use a sufficiently long key length; longer keys
take exponentially longer to attack, rendering a brute force attack impractical.
Currently, key lengths of 128 bits (for symmetric key algorithms) and 1024
bits (for public-key algorithms) are common.

38
Cryptography:

Cryptography is the art and science of secret writing. The term is derived from the Greek
language
• krytos - secret
• graphos - writing

5.2 Encryption:

Encryption is the actual process of applying cryptography. Much of cryptography


is math oriented and uses patterns and algorithms to encrypt messages, text, words,
signals and other forms of communication. Cryptography has many uses, especially in the
areas of espionage, intelligence and military operations. Cryptography deals with all
aspects of secure messaging, authentication, digital signatures, electronic money, and
other applications.

Today, many security systems and companies use cryptography to transfer


information over the Internet or radio for fears of interception. Some of this encryption is
highly advanced, however even simple encryption techniques can help uphold the privacy
of any everyday person. The term cryptography also meant the breaking of encrypted
messages until the early 1920s, when the concept of Cryptanalysis began being used and
is now practically an art and science all on its own.

The two main areas of cryptography are Cipher and Code.


Code is one of the two major methods of cryptography. This method involves the
replacement of complete words or phrases by code words or numbers.
Cipher is the other major method of cryptography. This works on the principal of
replacing individual letters by other numbers or letter.
Cryptographic algorithms all perform the same basic function: They take two
inputs – a message and a key -- and transform them into a single output. There are two
ways to perform this function. Encryption, as shown in Figure 1, uses the cryptographic
key to transform the original message into an encrypted form. Decryption, as shown in
Figure 2, does the reverse; it uses a cryptographic key to transform an encrypted message
back into its original (a.k.a. plaintext) form.

39
FIG 5.2 ENCRYPTION BLOCK

FIG 5.3 Encryption Operation

40
5.3 DECRYPTION :

The process of decoding data that has been encrypted into a secret format.
Decryption requires a secret key or password.
It is a commonly held misconception that every encryption method can be broken.
In connection with his WWII work at Bell Labs, Claude Shannon proved that the one-
time pad cipher is unbreakable, provided the key material is truly random, never reused,
kept secret from all possible attackers, and of equal or greater length than the message.[22]
Most ciphers, apart from the one-time pad, can be broken with enough computational
effort by brute force attack, but the amount of effort needed may be exponentially
dependent on the key size, as compared to the effort needed to use the cipher.
In such cases, effective security could be achieved if it is proven that the effort
required (i.e., "work factor", in Shannon's terms) is beyond the ability of any adversary.
This means it must be shown that no efficient method (as opposed to the time-consuming
brute force method) can be found to break the cipher. Since no such showing can be
made currently, as of today, the one-time-pad remains the only theoretically unbreakable
cipher.
There are a wide variety of cryptanalytic attacks, and they can be classified in any
of several ways. A common distinction turns on what an attacker knows and what
capabilities are available. In a ciphertext-only attack, the cryptanalyst has access only to
the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-
only attacks). In a known-plaintext attack, the cryptanalyst has access to a ciphertext and
its corresponding plaintext (or to many such pairs). In a chosen-plaintext attack, the
cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many
times); an example is gardening, used by the British during WWII.
Finally, in a chosen-ciphertext attack, the cryptanalyst may be able to choose
ciphertexts and learn their corresponding plaintexts.[10] Also important, often
overwhelmingly so, are mistakes (generally in the design or use of one of the protocols
involved; see Cryptanalysis of the Enigma for some historical examples of this).
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks
against the block ciphers or stream ciphers that are more efficient than any attack that
could be against a perfect cipher. For example, a simple brute force attack against DES
requires one known plaintext and 255 decryptions, trying approximately half of the
possible keys, to reach a point at which chances are better than even the key sought will
have been found. But this may not be enough assurance; a linear cryptanalysis attack
against DES requires 243 known plaintexts and approximately 243 DES operations.[23] This
is a considerable improvement on brute force attacks.
Public-key algorithms are based on the computational difficulty of various
problems. The most famous of these is integer factorization (e.g., the RSA algorithm is
based on a problem related to integer factoring), but the discrete logarithm problem is
also important. Much public-key cryptanalysis concerns numerical algorithms for solving
these computational problems, or some of them, efficiently (ie, in a practical time).

41
For instance, the best known algorithms for solving the elliptic curve-based
version of discrete logarithm are much more time-consuming than the best known
algorithms for factoring, at least for problems of more or less equivalent size. Thus, other
things being equal, to achieve an equivalent strength of attack resistance, factoring-based
encryption techniques must use larger keys than elliptic curve techniques. For this reason,
public-key cryptosystems based on elliptic curves have become popular since their
invention in the mid-1990s.
While pure cryptanalysis uses weaknesses in the algorithms themselves, other
attacks on cryptosystems are based on actual use of the algorithms in real devices, and are
called side-channel attacks. If a cryptanalyst has access to, say, the amount of time the
device took to encrypt a number of plaintexts or report an error in a password or PIN
character, he may be able to use a timing attack to break a cipher that is otherwise
resistant to analysis.
An attacker might also study the pattern and length of messages to derive
valuable information; this is known as traffic analysis,[24] and can be quite useful to an
alert adversary. Poor administration of a cryptosystem, such as permitting too short keys,
will make any system vulnerable, regardless of other virtues. And, of course, social
engineering, and other attacks against the personnel who work with cryptosystems or the
messages they handle (e.g., bribery, extortion, blackmail, espionage, torture, ...) may be
the most productive attacks of all.

42
FIG: 5.4 DECRYPTION BLOCK

FIG: 5.5 Decryption Operation

43
SIMULATION RESULTS

Key Generation Results

44
Encryption Results

45
Decryption Results

46
SYNTHESIS REPORTS

KEY INPUT:

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT:
47
Release 6.1i - ngdbuild G.23
Copyright (c) 1995-2003 Xilinx, Inc. All rights reserved.

Command Line: ngdbuild -intstyle ise -dd c:\xilinx\bin\vasu/_ngo -i -p


xc2s15-cs144-6 keyreg.ngc keyreg.ngd

Reading NGO file "c:/xilinx/bin/vasu/keyreg.ngc" ...


Reading component libraries for design expansion...

Checking timing specifications ...


Checking expanded design ...

NGDBUILD Design Results Summary:


Number of errors: 0
Number of warnings: 0

Total memory usage is 37996 kilobytes

Writing NGD file "keyreg.ngd" ...

Writing NGDBUILD log file "keyreg.bld"...

Release 6.1i Map G.23


Xilinx Mapping Report File for Design 'keyreg'

Design Summary
--------------
Number of errors: 0
Number of warnings: 0
Logic Utilization:
Logic Distribution:
Number of Slices containing only related logic: 0 out of 0 0%
Number of Slices containing unrelated logic: 0 out of 0 0%
*See NOTES below for an explanation of the effects of unrelated logic
Number of bonded IOBs: 194 out of 86 225% (OVERMAPPED)
IOB Flip Flops: 96
Number of GCLKs: 1 out of 4 25%
Number of GCLKIOBs: 1 out of 4 25%

Total equivalent gate count for design: 768


Additional JTAG gate count for IOBs: 9,360
Peak Memory Usage: 57 MB

MAPPING REPORT:

Rele'keyreg'
48
Design Information
------------------
Command Line : C:/Xilinx/bin/nt/map.exe -intstyle ise -p xc2s15-cs144-6 -cm
area -pr b -k 4 -c 100 -tx off -o keyreg_map.ncd keyreg.ngd keyreg.pcf
Target Device : x2s15
Target Package : cs144
Target Speed : -6
Mapper Version : spartan2 -- $Revision: 1.16 $ase 6.1i Map G.23
Xilinx Mapping Report File for Design
Mapped Date : Mon Mar 30 12:42:43 2009

Design Summary
--------------
Number of errors: 0
Number of warnings: 0
Logic Utilization:
Logic Distribution:
Number of Slices containing only related logic: 0 out of 0 0%
Number of Slices containing unrelated logic: 0 out of 0 0%
*See NOTES below for an explanation of the effects of unrelated logic
Number of bonded IOBs: 194 out of 86 225% (OVERMAPPED)
IOB Flip Flops: 96
Number of GCLKs: 1 out of 4 25%
Number of GCLKIOBs: 1 out of 4 25%

Total equivalent gate count for design: 768


Additional JTAG gate count for IOBs: 9,360
Peak Memory Usage: 57 MB

Placing & Routing Report:

Design Summary
--------------
49
Number of errors: 0
Number of warnings: 0
Logic Utilization:
Logic Distribution:
Number of Slices containing only related logic: 0 out of 0 0%
Number of Slices containing unrelated logic: 0 out of 0 0%
*See NOTES below for an explanation of the effects of unrelated logic
Number of bonded IOBs: 194 out of 86 225% (OVERMAPPED)
IOB Flip Flops: 96
Number of GCLKs: 1 out of 4 25%
Number of GCLKIOBs: 1 out of 4 25%

Total equivalent gate count for design: 768


Additional JTAG gate count for IOBs: 9,360
Peak Memory Usage: 57 MB

KEY REGISTER:

Release 6.1i - xst G.23


Copyright (c) 1995-2003 Xilinx, Inc. All rights reserved.
--> Parameter TMPDIR set to __projnav
CPU : 0.00 / 0.67 s | Elapsed : 0.00 / 1.00 s

--> Parameter xsthdpdir set to ./xst


CPU : 0.00 / 0.67 s | Elapsed : 0.00 / 1.00 s

--> Reading design: keyreg.prj

TABLE OF CONTENTS
1) Synthesis Options Summary
2) HDL Compilation
3) HDL Analysis
4) HDL Synthesis
4.1) HDL Synthesis Report
5) Advanced HDL Synthesis
6) Low Level Synthesis
7) Final Report
7.1) Device utilization summary
7.2) TIMING REPORT

===============================================================
==========
* Synthesis Options Summary *
===============================================================
==========
---- Source Parameters
50
Input File Name : keyreg.prj
Input Format : mixed
Ignore Synthesis Constraint File : NO
Verilog Include Directory

---- Target Parameters


Output File Name : keyreg
Output Format : NGC
Target Device : xc2s15-6-cs144

---- Source Options


Top Module Name : keyreg
Automatic FSM Extraction : YES
FSM Encoding Algorithm : Auto
FSM Style : lut
RAM Extraction : Yes
RAM Style : Auto
ROM Extraction : Yes
ROM Style : Auto
Mux Extraction : YES
Mux Style : Auto
Decoder Extraction : YES
Priority Encoder Extraction : YES
Shift Register Extraction : YES
Logical Shifter Extraction : YES
XOR Collapsing : YES
Resource Sharing : YES
Multiplier Style : lut
Automatic Register Balancing : No

---- Target Options


Add IO Buffers : YES
Global Maximum Fanout : 100
Add Generic Clock Buffer(BUFG) : 4
Register Duplication : YES
Equivalent register Removal : YES
Slice Packing : YES
Pack IO Registers into IOBs : auto

---- General Options


Optimization Goal : Speed
Optimization Effort :1
Keep Hierarchy : NO
Global Optimization : AllClockNets
51
RTL Output : Yes
Write Timing Constrain : NO
Hierarchy Separator :_
Bus Delimiter : <>
Case Specifier : maintain
Slice Utilization Ratio : 100
Slice Utilization Ratio Delta :5

---- Other Options


lso : keyreg.lso
Read Cores : YES
cross_clock_analysi : NO
verilog2001 : YES
Optimize Instantiated Primitives : NO

===============================================================
==========

WARNING:Xst:1885 - LSO file is empty, default list of libraries is used

===============================================================
==========
* HDL Compilation *
===============================================================
==========
Compiling vhdl file c:/xilinx/bin/vasu/KeyReg.vhd in Library work.
Architecture keyreg of Entity keyreg is up to date.

===============================================================
==========
* HDL Analysis *
===============================================================
==========
Analyzing Entity <keyreg> (Architecture <keyreg>).
Entity <keyreg> analyzed. Unit <keyreg> generated.

===============================================================
==========
* HDL Synthesis *
===============================================================
==========

Synthesizing Unit <keyreg>.


52
Related source file is c:/xilinx/bin/vasu/KeyReg.vhd.
Found 96-bit register for signal <Dreg>.
Summary:
inferred 96 D-type flip-flop(s).
Unit <keyreg> synthesized.

===============================================================
==========
HDL Synthesis Report

Macro Statistics
# Registers :1
96-bit register :1

===============================================================
==========

===============================================================
==========
* Advanced HDL Synthesis *
===============================================================
==========

===============================================================
==========
* Low Level Synthesis *
===============================================================
==========

Optimizing unit <keyreg> ...


Loading device for application Xst from file '2s15.nph' in environment C:/Xilinx.

Mapping all equations...


Building and optimizing final netlist ...
Found area constraint ratio of 100 (+ 5) on block keyreg, actual ratio is 28.

===============================================================
==========
* Final Report *
===============================================================
==========
Final Results
RTL Top Level Output File Name : keyreg.ngr
Top Level Output File Name : keyreg
Output Format : NGC
53
Optimization Goal : Speed
Keep Hierarchy : NO

Design Statistics
# IOs : 195

Macro Statistics :
# Registers :1
# 96-bit register :1

Cell Usage :
# BELS :1
# LUT1 :1
# FlipFlops/Latches : 96
# FDCE : 96
# Clock Buffers :1
# BUFGP :1
# IO Buffers : 194
# IBUF : 98
# OBUF : 96
===============================================================
==========

Device utilization summary:


---------------------------

Selected Device : 2s15cs144-6

Number of Slices: 55 out of 192 28%


Number of Slice Flip Flops: 96 out of 384 25%
Number of 4 input LUTs: 1 out of 384 0%
Number of bonded IOBs: 194 out of 90 215% (*)
Number of GCLKs: 1 out of 4 25%

WARNING:Xst:1336 - (*) More than 100% of Device resources are used

===============================================================
==========
TIMING REPORT

NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE.


FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE
REPORT
GENERATED AFTER PLACE-and-ROUTE.

54
Clock Information:
------------------
-----------------------------------+------------------------+-------+
Clock Signal | Clock buffer(FF name) | Load |
-----------------------------------+------------------------+-------+
Clk | BUFGP | 96 |
-----------------------------------+------------------------+-------+

Timing Summary:
---------------
Speed Grade: -6

Minimum period: No path found


Minimum input arrival time before clock: 7.962ns
Maximum output required time after clock: 6.788ns
Maximum combinational path delay: No path found

Timing Detail:
--------------
All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------
Timing constraint: Default OFFSET IN BEFORE for Clock 'Clk'
Offset: 7.962ns (Levels of Logic = 1)
Source: KeyEna (PAD)
Destination: Dreg_95 (FF)
Destination Clock: Clk rising

Data Path: KeyEna to Dreg_95


Gate Net
Cell:in->out fanout Delay Delay Logical Name (Net Name)
---------------------------------------- ------------
IBUF:I->O 96 0.776 6.300 KeyEna_IBUF (KeyEna_IBUF)
FDCE:CE 0.886 Dreg_0
----------------------------------------
Total 7.962ns (1.662ns logic, 6.300ns route)
(20.9% logic, 79.1% route)
-------------------------------------------------------------------------

Timing constraint: Default OFFSET OUT AFTER for Clock 'Clk'


Offset: 6.788ns (Levels of Logic = 1)
Source: Dreg_95 (FF)
Destination: KeyO<95> (PAD)
Source Clock: Clk rising

Data Path: Dreg_95 to KeyO<95>


Gate Net
55
Cell:in->out fanout Delay Delay Logical Name (Net Name)
---------------------------------------- ------------
FDCE:C->Q 1 1.085 1.035 Dreg_95 (Dreg_95)
OBUF:I->O 4.668 KeyO_95_OBUF (KeyO<95>)
----------------------------------------
Total 6.788ns (5.753ns logic, 1.035ns route)
(84.8% logic, 15.2% route)

===============================================================
==========
CPU : 3.59 / 4.64 s | Elapsed : 4.00 / 5.00 s

-->

Total memory usage is 54400 kilobytes

SBOX:

56
RTL SCHEMATIC

GATE LEVEL

===============================================================
==========
* Synthesis Options Summary *
===============================================================
==========
57
---- Source Parameters
Input File Name : sbox8x3.prj
Input Format : mixed
Ignore Synthesis Constraint File : NO
Verilog Include Directory :

---- Target Parameters


Output File Name : sbox8x3
Output Format : NGC
Target Device : xc2s15-6-cs144

---- Source Options


Top Module Name : sbox8x3
Automatic FSM Extraction : YES
FSM Encoding Algorithm : Auto
FSM Style : lut
RAM Extraction : Yes
RAM Style : Auto
ROM Extraction : Yes
ROM Style : Auto
Mux Extraction : YES
Mux Style : Auto
Decoder Extraction : YES
Priority Encoder Extraction : YES
Shift Register Extraction : YES
Logical Shifter Extraction : YES
XOR Collapsing : YES
Resource Sharing : YES
Multiplier Style : lut
Automatic Register Balancing : No

---- Target Options


Add IO Buffers : YES
Global Maximum Fanout : 100
Add Generic Clock Buffer(BUFG) :4
Register Duplication : YES
Equivalent register Removal : YES
Slice Packing : YES
Pack IO Registers into IOBs : auto

---- General Options


Optimization Goal : Speed
Optimization Effort :1
Keep Hierarchy : NO
58
Global Optimization : AllClockNets
RTL Output : Yes
Write Timing Constraint : NO
Hierarchy Separator :_
Bus Delimiter : <>
Case Specifier : maintain
Slice Utilization Ratio : 100
Slice Utilization Ratio Delta :5

---- Other Options


lso : sbox8x3.lso
Read Cores : YES
cross_clock_analysi : NO
verilog2001 : YES
Optimize Instantiated Primitives : NO

===============================================================
==========

WARNING:Xst:1885 - LSO file is empty, default list of libraries is used

===============================================================
==========
* HDL Compilation *
===============================================================
==========
Compiling vhdl file c:/xilinx/bin/vasu/KeyReg.vhd in Library work.
Architecture sbox8x3 of Entity sbox8x3 is up to date.

===============================================================
==========
* HDL Analysis *
===============================================================
==========
Analyzing Entity <sbox8x3> (Architecture <sbox8x3>).
INFO:Xst:1561 - c:/xilinx/bin/vasu/KeyReg.vhd line 29: Mux is complete : default of
case is discarded
Entity <sbox8x3> analyzed. Unit <sbox8x3> generated

===============================================================
==========
* HDL Synthesis *
===============================================================
==========

Synthesizing Unit <sbox8x3>.


59
Related source file is c:/xilinx/bin/vasu/KeyReg.vhd.
Unit <sbox8x3> synthesized.

===============================================================
=========
HDL Synthesis Report

Found no macro
===============================================================
==========

===============================================================
==========
* Advanced HDL Synthesis *
===============================================================
==========

===============================================================
==========
* Low Level Synthesis *
===============================================================
==========

Optimizing unit <sbox8x3> ...


Loading device for application Xst from file '2s15.nph' in environment C:/Xilinx.

Mapping all equations...


Building and optimizing final netlist ...
Found area constraint ratio of 100 (+ 5) on block sbox8x3, actual ratio is 1.

===============================================================
==========
* Final Report *
===============================================================
==========
Final Results
RTL Top Level Output File Name : sbox8x3.ngr
60
Top Level Output File Name : sbox8x3
Output Format : NGC
Optimization Goal : Speed
Keep Hierarchy : NO

Design Statistics
# IOs :7

Cell Usage :
# BELS :3
# LUT4 :3
# IO Buffers :7
# IBUF :4
# OBUF :3
===============================================================
Device utilization summary:
---------------------------

Selected Device : 2s15cs144-6

Number of Slices: 2 out of 192 1%


Number of 4 input LUTs: 3 out of 384 0%
Number of bonded IOBs: 7 out of 90 7%

Total memory usage is 53376 kilobytes

TRANSLATION REPORT:

Checking timing specifications ...


Checking expanded design ...

NGDBUILD Design Results Summary:


Number of errors: 0
Number of warnings: 0

Total memory usage is 37996 kilobytes

61
FLOOR PLANNING

62
MAPPING REPORT:
Design Summary
--------------
Number of errors: 0
Number of warnings: 0
Logic Utilization:
Number of 4 input LUTs: 3 out of 384 1%
Logic Distribution:
Number of occupied Slices: 2 out of 192 1%
Number of Slices containing only related logic: 2 out of 2 100%
Number of Slices containing unrelated logic: 0 out of 2 0%
*See NOTES below for an explanation of the effects of unrelated logic
Total Number of 4 input LUTs: 3 out of 384 1%
Number of bonded IOBs: 7 out of 86 8%

Total equivalent gate count for design: 18


Additional JTAG gate count for IOBs: 336
Peak Memory Usage: 56 MB

Maping Report:

Device utilization summary:

Number of External IOBs 7 out of 86 8%


Number of LOCed External IOBs 0 out of 7 0%

Number of SLICEs 2 out of 192 1%

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is: 0

The AVERAGE CONNECTION DELAY for this design is: 0.871


The MAXIMUM PIN DELAY IS: 1.512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is: 0.707

63
KEY GENERATION:

RTL SCHEMATIC

64
GATE LEVEL

65
===============================================================
==========
* Synthesis Options Summary *
===============================================================
==========
---- Source Parameters
Input File Name : keygenblock.prj
Input Format : mixed
Ignore Synthesis Constraint File : NO
Verilog Include Directory :

---- Target Parameters


Output File Name : keygenblock
Output Format : NGC
Target Device : xc2s15-6-cs144

---- Source Options


Top Module Name : keygenblock
Automatic FSM Extraction : YES
FSM Encoding Algorithm : Auto
FSM Style : lut
RAM Extraction : Yes
RAM Style : Auto
ROM Extraction : Yes
ROM Style : Auto
Mux Extraction : YES
Mux Style : Auto
Decoder Extraction : YES
Priority Encoder Extraction : YES
Shift Register Extraction : YES
Logical Shifter Extraction : YES
XOR Collapsing : YES
Resource Sharing : YES
Multiplier Style : lut
Automatic Register Balancing : No

---- Target Options


Add IO Buffers : YES
Global Maximum Fanout : 100
Add Generic Clock Buffer(BUFG) :4
Register Duplication : YES
Equivalent register Removal : YES
Slice Packing : YES
Pack IO Registers into IOBs : auto

66
---- General Options
Optimization Goal : Speed
Optimization Effort :1
Keep Hierarchy : NO
Global Optimization : AllClockNets
RTL Output : Yes
Write Timing Constraints : NO
Hierarchy Separator :_
Bus Delimiter : <>
Case Specifier : maintain
Slice Utilization Ratio : 100
Slice Utilization Ratio Delta :5

---- Other Options


lso : keygenblock.lso
Read Cores : YES
cross_clock_analysis : NO
verilog2001 : YES
Optimize Instantiated Primitives : NO

TRANSLATION REPORT:

Release 6.1i - ngdbuild G.23


Copyright (c) 1995-2003 Xilinx, Inc. All rights reserved.

Command Line: ngdbuild -intstyle ise -dd c:\xilinx\bin\vasu/_ngo -i -p


xc2s15-cs144-6 keygenblock.ngc keygenblock.ngd

Reading NGO file "c:/xilinx/bin/vasu/keygenblock.ngc" ...


Reading component libraries for design expansion...

Checking timing specifications ...


Checking expanded design ...

NGDBUILD Design Results Summary:


Number of errors: 0
Number of warnings: 0

Total memory usage is 42092 kilobytes

Writing NGD file "keygenblock.ngd" ...

Writing NGDBUILD log file "keygenblock.bld"...

67
MAPPING REPORT:

Design Summary
--------------
Number of errors: 0
Number of warnings: 0
Logic Utilization:
Total Number Slice Registers: 419 out of 384 109% (OVERMAPPED)
Number used as Flip Flops: 415
Number used as Latches: 4
Number of 4 input LUTs: 1,016 out of 384 264% (OVERMAPPED)
Logic Distribution:
Number of occupied Slices: 665 out of 192 346%
(OVERMAPPED)
Number of Slices containing only related logic: 648 out of 665 97%
Number of Slices containing unrelated logic: 17 out of 665 2%
*See NOTES below for an explanation of the effects of unrelated logic
Total Number 4 input LUTs: 1,066 out of 384 277% (OVERMAPPED)
Number used as logic: 1,016
Number used as a route-thru: 50
Number of bonded IOBs: 1,060 out of 86 1232% (OVERMAPPED)
IOB Flip Flops: 960
Number of GCLKs: 1 out of 4 25%
Number of GCLKIOBs: 1 out of 4 25%

Total equivalent gate count for design: 17,572


Additional JTAG gate count for IOBs: 50,928
Peak Memory Usage: 72 MB

68
ENCRYPTION:

RTL SCHEMATIC

69
GATE LEVEL

70
SYNTHESIS REPORT:

===============================================================
==========
* Synthesis Options Summary *
===============================================================
==========
---- Source Parameters
Input File Name : encryption.prj
Input Format : mixed
Ignore Synthesis Constraint File : NO
Verilog Include Directory :

---- Target Parameters


Output File Name : encryption
Output Format : NGC
Target Device : xc2s15-6-cs144

---- Source Options


Top Module Name : encryption
Automatic FSM Extraction : YES
FSM Encoding Algorithm : Auto
FSM Style : lut
RAM Extraction : Yes
RAM Style : Auto
ROM Extraction : Yes
ROM Style : Auto
Mux Extraction : YES
Mux Style : Auto
Decoder Extraction : YES
Priority Encoder Extraction : YES
Shift Register Extraction : YES
Logical Shifter Extraction : YES
XOR Collapsing : YES
Resource Sharing : YES
Multiplier Style : lut
Automatic Register Balancing : No

---- Target Options


Add IO Buffers : YES
Global Maximum Fanout : 100
Add Generic Clock Buffer(BUFG) :4
Register Duplication : YES
Equivalent register Removal : YES
Slice Packing : YES
Pack IO Registers into IOBs : auto

71
---- General Options
Optimization Goal : Speed
Optimization Effort :1
Keep Hierarchy : NO
Global Optimization : AllClockNets
RTL Output : Yes
Write Timing Constraints : NO
Hierarchy Separator :_
Bus Delimiter : <>
Case Specifier : maintain
Slice Utilization Ratio : 100
Slice Utilization Ratio Delta :5

---- Other Options


lso : encryption.lso
Read Cores : YES
cross_clock_analysis : NO
verilog2001 : YES
Optimize Instantiated Primitives : NO

Translation Report:

NGDBUILD Design Results Summary:


Number of errors: 0
Number of warnings: 0

Total memory usage is 42092 kilobytes

72
DECRYPTION:

GATE LEVEL

73
SYNTHESIS REPORT:

===============================================================
==========
* Synthesis Options Summary *
===============================================================
==========
---- Source Parameters
Input File Name : decryption.prj
Input Format : mixed
Ignore Synthesis Constraint File : NO
Verilog Include Directory :

---- Target Parameters


Output File Name : decryption
Output Format : NGC
Target Device : xc2s15-6-cs144

---- Source Options


Top Module Name : decryption
Automatic FSM Extraction : YES
FSM Encoding Algorithm : Auto
FSM Style : lut
RAM Extraction : Yes
RAM Style : Auto
ROM Extraction : Yes
ROM Style : Auto
Mux Extraction : YES
Mux Style : Auto
Decoder Extraction : YES
Priority Encoder Extraction : YES
Shift Register Extraction : YES
Logical Shifter Extraction : YES
XOR Collapsing : YES
Resource Sharing : YES
Multiplier Style : lut
Automatic Register Balancing : No

---- Target Options


Add IO Buffers : YES
Global Maximum Fanout : 100
Add Generic Clock Buffer(BUFG) : 4
Register Duplication : YES
Equivalent register Removal : YES
Slice Packing : YES
Pack IO Registers into IOBs : auto

74
---- General Options
Optimization Goal : Speed
Optimization Effort :1
Keep Hierarchy : NO
Global Optimization : AllClockNets
RTL Output : Yes
Write Timing Constraints : NO
Hierarchy Separator :_
Bus Delimiter : <>
Case Specifier : maintain
Slice Utilization Ratio : 100
Slice Utilization Ratio Delta :5

---- Other Options


lso : decryption.lso
Read Cores : YES
cross_clock_analysis : NO
verilog2001 : YES
Optimize Instantiated Primitives : NO

Translation Report:

NGDBUILD Design Results Summary:


Number of errors: 0
Number of warnings: 0

Total memory usage is 45122 kilobytes

75
76
ADVANTAGES

➢ SEA is parametric in text, key and processor size.

➢ It is a low cost encryption routine targeted for the processors with limited
instruction set.

➢ It is a small encryption routine targeted to any given processor , the security of


the cipher being adapted in function of its key size.

➢ It is also used in applications where the same constrained device has to perform
both encryption and decryption

APPLICATIONS

➢ This is a low-cost encryption routine basically designed for processors with a


limited instruction set.

➢ In wireless communication and mobile computing and networking systems.

➢ For the encryption of JPEG2000 images.

➢ In scalable video coding .

➢ In sensor networks and RFID’s.

77
CONCLUSION

SEAn,b is a scalable encryption algorithm targeted for small embedded


applications. The plaintext size, key size and processor (or word) size are parameters of
the design. The structure of SEAn,b allows a fast evaluation of the cipher efficiency on
any RISC machine. Its typical performances (encryption + decryption) for present key
sizes and processors (e.g. 128-bit key, 1 Mhz 8-bit RISC) are in the range of an
encryption/decryption in a few milliseconds, using a few hundreds bytes of ROM. One
additional advantage of the design is its extreme simplicity. Based on the pseudo code
provided in this paper, it is expected that the implementation of the cipher in assembly
can be done within a few hours. We note finally that the design criteria of SEAn,b do
not make it a conservative algorithm by nature. Further cryptanalysis efforts are
consequently required.

This paper presented FPGA implementations of a scalable encryption algorithm


for various sets of parameters. The presented parametric architecture allows keeping the
flexibility of the algorithm by taking
advantage of generic VHDL coding. It executes one round per clock cycle, computes the
round and the key round in parallel and supports both encryption and decryption at a
minimal cost. Compared to other recent block ciphers, SEA exhibits a very small area
utilization that comes at the cost of a reduced throughput. Consequently, it can be
considered as an interesting alternative for constrained environments. Scopes for further
research include low power ASIC implementations purposed for RFIDs as well as further
cryptanalysis efforts and security evaluations.

Bibliography
78
Reference books:

Basic VLSI design, 3rd Edition Douglas A.Pucknell,


Kamran Eshraghian

A VHDL Primer J. Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S. Tannenbaum

Network Cryptology William Stalling

Reference Websites:

IEEE Transactions

www.wikipedia.com

www.webopedia.com

79