Вы находитесь на странице: 1из 191

Maîtrise Informatique

Option Micro-informatique Micro-électronique

Année 1995 - 1996

Université Paris 8

Département Micro-Informatique Micro-Electronique

2, rue de la Liberté 93526 SAINT DENIS CEDEX 02

Applied Neuro-cryptography

by

Sébastien Dourlens

To all beings thinking

That a secret can be kept longtime

Summary

This memory is carried out in the context of the master MIME at the Paris 8 University.

Its purpose is to do research on the neuronal applications to Cryptography.

Searching for existing shows that no thesis, no Conference summary report, no work and no internet (Web pages and news) user information and applied neural networks to Cryptography.

We then thought it would be interesting to define a new field called neuro-Cryptography whose aim is the use of neural networks to encrypt a message, decrypt a message or Exchange messages in the network. The cryptogaphie contains another area to study in a probabilistic manner the strength and weaknesses of an encryption algorithm, it comes to cryptanalysis. Neural networks can play a decisive role in this area, this is why we have also defined the neuro-cryptanalysis.

The areas of Artificial Intelligence with the networks of neurons, cryptography and cryptanalysis have long been highly studied by universities around the world and, among other things, by enterprises in electronic circuit design.

We begin by choosing the model and the most efficient learning of neural networks from its qualities of synthesis of complex functions and statistical analyses. This model is the network of perceptrons with back-propagation of the gradient. The hardware realization should not be neglected since cryptography requires a great speed of learning, which is a function of the number of keys and possible texts.

We have added elements allowing the creation of hardware architectures.

Then we choose the field of cryptographic applications, it is primarily the study of des (Data Encryption Standard) and its cryptanalysis.

Then we test and measure the performance of neuro-Cryptography and neuro-cryptanalysis which prove be quite interesting from all points of view. The calculation time can be improved by design of machine architecture dedicated to the learning of cryptographic algorithms through arsenide-based components or of massively parallel machines as it has already been done for neural networks and the D.E.S. but separately.

In regards to neuro-Cryptanalysis of des, realize us a neuro-cryptanalyseurs differential and linear studying the probabilities to get the entries of the S-tables based on outputs, allowing us to obtain characteristics for an unknown subkey.

This line of research is open now, should continue to the coherence between the neural network to learning of the globality of the cryptosystem and the neuro-cryptanalyseurs of the internal structure of this cryptosystem which are very fast in learning. Another reason is the ability of synthesis of the gradient back-propagation network.

Thanks

I want to thank my research director Mr. Christian Riesner researcher in Artificial Intelligence specializing in neural networks.

Thanks to teachers - researchers from the Department of Micro computer - Micro Electronics of the University Paris 8.

Thanks again to students, researchers and professors of universities that have provided me information valuable and useful for this memory.

Table of contents

1 Introduction

1.1 Searching for existing

1.2 Neural networks

1.3 Contemporary Cryptography

1.4 The neuro-Cryptography applied

1.5 The memory map

2. Neural networks

2.1 Introduction

2.2 Basic concepts and terminology

2.3 The situation presents

2.4 Neural networks are used in Cryptography?

2.5 What types of neural networks use in Cryptography?

2.6 The model structure of perceptrons with back-propagation of the gradient

2.7 The gradient back-propagation algorithm

2.8 Analysis of linear multi-layer networks

2.8.1 Problem of the linear perceptron multilayer

2.8.2 Discriminant analysis of rank p

2.8.3 Incremental learning of the hidden layer

2.8.4 Relations with the principal component analysis

2.9 Material

2.10 Conclusion

3. The Cryptography

3.2

Definitions

3.3 Contemporary Cryptography

3.3.1 The cryptosystem and strength

3.3.2 Protocols

3.3.3. The types of attacks in cryptanalysis

3.4 Cryptographic algorithms

3.4.1 The coding of blocks and the stream encoding

3.4.2 The number of Vigenère

3.4.3 The strong figures

3.5 Reference: the Data Encryption Standard (des)

3.5.1 History

3.5.2 Architecture

3.5.3 Cryptanalysis

3.5.4 The physical aspect

3.6 The Cryptanalysis of des

3.6.1 Differential cryptanalysis

3.6.2 Linear cryptanalysis

3.7 Conclusion

4. The Neuro-Cryptography

4.1 Introduction

Can 4.2 I bind the Cryptography and neural networks?

4.3 The new definitions

4.3.1 The neuro-encryption or neuro-encryption

4.3.2 The neuro-decryption or neuro-decryption

4.3.4

Neuro-cryptanalysis

4.4 The generation of bases of learning

4.4.1 Examples

4.4.2 Order of presentation

4.4.3 Automatic generation of texts

4.4.4 The coefficient of learning

4.5 Self-learning

4.6 The realization of applications

4.6.1 The learning of the exclusive or (XOR)

4.6.2 The learning of cryptographic algorithms

4.6.3 Key learning

4.7 The advantages and disadvantages

4.8 Conclusion

5. The Neuro-cryptanalysis

5.1 Introduction

5.2 Definition

5.3 General principle

5.4 Applied Neuro-cryptanalysis

5.4.1 The Neuro-Cryptanalysis of the Vigenère figure

5.4.2 The Neuro-differential cryptanalysis of des

5.4.3 The Neuro-linear Cryptanalysis of des

5.4.4 Overall Neuro-Cryptanalysis of the crypt (3) UNIX

5.5 Analysis of the results of cryptanalysis

5.6 Hardware implementations

5.6.2 Algorithm for the Connection Machine CM-5

5.7 Performance

5.8 Conclusion

6 Glossary and math basics

6.1 Introduction

6.2 The information theory

6.3 The complexity of algorithms

6.4 The number theory

7 Conclusion

Bibliography

Neural networks

Cryptography

Mathematics

HTML pages and newsgroup on the internet Annexes

1 C sources

The gradient back-propagation neural network

The figure of Vigenère or XOR single

Cryptanalysis of the Vigenère figure

The code of the D.E.S.

Learning of the XOR in disorder

Automatic generation of basis for learning of the D.E.S.

The generation of tables of differences in des distributions

The generation of tables of linear approximations of des

Neuronal functions library

The differential des neuro-generator

The linear neuro-generator of the D.E.S.

2. The neural circuits

3. The tables of differences in des distributions

4. The tables of linear approximations of des

5. Tables simplified distributions of differences

6. The tables of the neuro-cryptanalyseur differential

7. The tables in the linear neuro-cryptanalyseur

8. The measures of learning of the XOR tables

9. The massively parallel machines

Chapter 1 - Introduction

1.1 Searching for existing

The purpose of the memorandum is to research on neuronal applications allowing for Cryptography.

Searching for existing shows that no thesis, no Conference summary report, no work and no internet (Web pages and news) user information and applied neural networks to Cryptography.

Indeed, David Pointcheval de l'Ecole Normale Supérieure de Paris is served by the problem of the perceptron to create an authentication protocol or it was an only mathematical and theoretical study.

The areas of Artificial Intelligence with the networks of neurons, cryptography and cryptanalysis have long been very studied by researchers at universities around the world and among other electronic circuits design firms.

We then thought it would be interesting to define a new field called neuro-Cryptography whose aim is the use of neural networks to encrypt a message, decrypt a message or Exchange messages in the network. Cryptography contains another area to study in a probabilistic manner the strength and weaknesses of an encryption algorithm, it comes to cryptanalysis. Neural networks can play a decisive role in this area, this is why we have also defined the neuro-cryptanalysis.

1.2 Neural networks

We present the neural networks, define and determine what model of neural networks the most appropriate Cryptography on algorithmic learning plan and material terms in relation to already completed architectures as well as the observed performance.

The most interesting Connectionist model turns out be the network of perceptrons with back- propagation of the gradient through its various properties.

These properties were analyzed and demonstrated by different scientists.

their generalization property

their low sensitivity to noise (if an error sneaks into the basis of examples)

their low sensitivity to fault (lost connections, modified weight or bug in the program)

information are outsourced

Research of statistical calculations and heuristics capabilities

We present the structure of the model chosen in the following figure:

the structure of the model chosen in the following figure: This architecture can both be software

This architecture can both be software (sequential single-processor computer program) material (massively parallel machines).

These machines and neural networks are two little different Connectionist approaches. The study of neural networks is equivalent to consider parallel machines interconnected except that they contain a matrix of weight compact and some "intelligence". Furthermore, neural networks have already been implemented on machines massively parallel.

An analysis of linear multilayer networks shows analogies with different statistical methods of analysis of the data, in particular linear regression and discriminant analysis. It has been shown that the backpropagation is a discriminant analysis of a population of N individuals (N being the number of examples included in learning) described by n parameters (wheren is the number of input neurons) and projected in a hyperplane of dimension p (wherep is the number of hidden units). It is therefore possible to use non-linearly separable problem to build a classifier where a probabilistic model. Which proves the interest of such an algorithm in cryptography and especially cryptanalysis.

We need to go back to the hardware aspect if we want a faster learning of a large number of keys and texts.

The most studied are numeric VLSI, the advantages are:

ease of use

the important signal-to-noise ratio

an easy-to-implement cascade circuit

a high adaptation (these circuits allow to solve various tasks)

a reduced price of manufacturing

The network of neurons in VLSI implementation requires 4 blocks:

the summons (of the inputs of a neuron) with logical Adders

the multiplication (for weight) with parallel multipliers

the function of non-linear transfer with a full circuit of calculation or a table that contains the values of the function approximations, or a circuit of calculation of approximations (for the sigmoid with 1/5 th of pas and a error of less than 13%.) Just 4 comparators and a few logic gates (ALIPPI 1990))

memorization of values (S-RAM or D - RAM memories)

We present then the three types of existing components on the market or research laboratory:

1. components dedicated to digital neural which speeds network go up to 1 GB of connections processed per second

2. the digital coprocessors particular purpose (also called neuro-accelerators) are special circuitry that can be connected to hosts (PCs or workstations), they work with a neuro- simulator program. The mix of hardware and software aspects gives these benefits:

accelerated speed, flexibility and improved user interface

3. networks of neurons on massively parallel machines

An implementation of the above mentioned algorithm has been developed on the Connection Machine CM-2 (created by THINKING MACHINES Corp.) with a topology hypercube 64 k processors, which gave 180 million interconnections calculated per second (IPS) are 40 million weight updated per second.

Here is the performance measured by machine in interconnections calculated by seconds (figure below).

CM-2

180 millions

CRAY X-MP

50

millions

WARP (10)

17

millions

ANZA PLUS

10

millions

The use of such configurations would get good results in learning of cryptographic ciphers.

1.3 Contemporary Cryptography

Cryptography is a very large and popular area of mathematicians and computer scientists. However, nowadays, cryptography is the study of more or less strong encryption of messages or files and study of protocols to Exchange private networks and other means of communication. Found in the study of ciphers, the means to find keys or decrease the exhaustive search of keys: it is cryptanalysis. We present the strength of a cryptosystem which depends entirely on the used key that it is public (known to all for message sending) or private (known to those who can read issued messages) and exchanges cryptographic protocols. We prefer to focus on the realization of neural and neuro-Cryptanalysis of cryptosystems.

Here are the different types of possible attacks in cryptanalysis:

to ciphertext only : the attacker must find the cleartext with the encrypted text. A ciphertext attack is practically impossible, everything depends on the encryption.

to known-plaintext : the attacker has the plaintext and corresponding ciphertext. The ciphertext was not chosen by the attacker but anyway the message is compromised. In some cryptosystem, a pair of encrypted text - plaintext can compromise the security of the system as well as the transmission medium.

to chosen plaintext : the attacker has the ability to find the ciphertext corresponding to an arbitrary plaintext of his choice.

to chosen ciphertext : the attacker can arbitrarily choose and find the corresponding unencrypted clear text. This attack may show weaknesses in the systems public key, and even to find the private key.

to suitable chosen plaintext : the attacker can determine the ciphertexts of plaintexts chosen in an iterative process or interactive based on the results previously found. An example is the differential.

We quickly describe modes of encryption with C i which is the i -th message M i encrypted, E the encryption function, D the function reverse for the key (or subkey) K and Vi an intermediate encrypted message:

The ECB (Electronic Code Book) mode where C i = E K (M i ) and M i = D K (C i )

CBC (Cipher Block Chaining) mode where C i = E K (M i XOR C i-1 ) and M i = D K (C i ) XOR C i-1

The OFB (Output FeedBack) mode where V i = E K (V i-1 ) and (c) i = M i XOR V i

The Cipher FeedBack (CFB) mode where C i = M i XOR E K (C i-1 ) and M i = C i XOR

E K (C - i-1 )

Any encryption algorithm can be implemented in these modes.

In what concerns our work, we will focus specifically on the ECB mode suited more to learning of the networks of neurons with an input and output number of fixed bits and not loop re-inbound, although it is possible to connect one or more networks of neurons in this way but learning time would be quite longer.

There are simple as the figure of Vigenère (simple XOR of contiguous blocks with a same key of the same size as a block) and algorithms more complex as the R.S.A. of the name of its designers (RIVEST, SHAMIR, and ALDEMAN) and des.

One uses a public key and one private key, the other only a private key.

These are actually figures of Vigenère with a different key for each block. In the R.S.A. key uses of large prime numbers while in the D.E.S. it depends on S-tables more or less linear and more or less affine.

We have chosen to tackle the D.E.S. because it is the older standard of encryption and the most studied algorithms.

The D.E.S. combines conversions and substitutions in a product code which the safety level is much higher than that of the two codes used base (text and key). These substitutions are non- linear which produces a cryptosystem resistant to any cryptanalysis. He has also designed to

withstand differential cryptanalysis which was classified by the army and unknown to researchers.

It uses blocks of 64-bit input L0 and R0, the length of the key K is 56-bit (8-byte without the last bit used for parity). This key allows to generate 16 different sub-keys of 48-bit K1 to K16 . Contrary to appearances, it was highly enough and it is a little less these days because it takes 2 56 ciphers to find the key with an exhaustive search.

The function f is called a round,i -th round receives inputs the right part R i (or 32 bits of the text to be encrypted) and the K i subkey (48 bits). The rounds of des are detailed below. He gets out of 32 bits that are added to L i . While R i is passed as what L i + 1 , the encrypted bits are transmitted to R i + 1 (except for the final round).

to R i + 1 (except for the final round). The physical aspect is very important

The physical aspect is very important for the speed of execution. The VLSI components are widespread and effective but there are even more interesting technology-based components that should not be disregarded: the Gallium Arsenic (GaAs) or arsenide technology. It has already been included in supercomputers.

The major differences between GaAs and VLSI are:

fast failover of the GaAs doors

the Exchange with components other than GaAs is a major difficulty

very small density of GaAs integrated circuits

With regard to the D.E.S., there is a circuit running at 50 MHz performing encryption in 20 ns, which allows to make 50 million of ciphers in a second.

Since late 1995, AMD sells a circuit encrypting at 250 MHz.

In August 1993, the Canadian Michael J. WIENER described how to build a machine for $ 1 million that performs a comprehensive search of des keys to find the right key in 3.5 hours. Each of its basic circuits has power equivalent to 14 million stations SUN.

It seems so obvious that the exhaustive search is faster to perform types of cryptanalysis because even if the number of attempts is less, the search time is much longer, cryptanalysis is still very interesting to measure the performance of cryptographic algorithms.

We analyze then both as successful cryptanalysis against des.

Differential cryptanalysis is to look at the specifics of a pair of ciphertexts for a pair of plaintexts with a particular difference.

It analyses the evolution of these differences when the plaintexts spread through rounds of DES to be encrypted with the same key.

After randomly choosing a pair of plaintexts with a difference set, calculate the difference in the resulting ciphertexts. Using these differences, it is possible to associate different probabilities to various bits of the sub-keys. Plus a large number of ciphertexts is analyzed, most most likely encryption key will emerge.

Force of residing in his rounds and all operations of a round being completely linear except S- tables, Eli BIHAM and Adi SHAMIR analyzed 8 S-tables for text input differences and differences in output texts, these information are synthesized in 8 tables called Tables of distribution of differences of the (see 8 tables in annex 3). We realized the algorithm to generate these tables.

Linear cryptanalysis is to study the statistical linear relationships between a plaintext bits, the bits of the ciphertext and key which allowed to encrypt. These relationships allow for some bits of the key values when we know the plaintexts and ciphertexts associated.

It deduced the linear relationships of each S-table by choosing a subset of bits of input and output bits, calculating parity (Xor) of these bits with parity of the subset is zero. In general, a subset will be entries with parity 0 (linear) and others with parity 1 (affine).

MATSUI has calculated the number of zero-parities of each subset of input bits and output for each S-table amongst the 64 x 16 = 1024 possible subsets. It is possible to associate different probabilities to various bits of the sub-keys. Probabilities for a parity-zero (linear relationship) are synthesized in 8 tables called Tables of linear approximations of a (see 8 tables in annex 4). We realized the algorithm to generate these tables.

1.4 Applied neuro-Cryptography

After showing the possible association between neural networks and cryptography, we define the field of neuro-Cryptography. All terms used in Cryptography must be preceded by the particle "neuro" where the cryptosystem contains one or more networks of neurons or one or more elements of the network as for example the perceptron.

We then analyze some important points for the correct use of neural networks.

How the basis of learning will be generated is very important for the realization of neural applications. Learning depends on random initialization of weights the network as well as the number of examples, the order of presentation of these examples then the consistency in the choice of a set of examples.

An example is composed of a value to be presented at the entrance to the network of neurons and a value to present output of this network, the value of output based on the input value. If the number of examples is too low, it is clear that the network will not seek a transfer function of the studied cryptosystem but will instead store the examples given and cannot therefore in any way find a result for an input value different from those given in the basis of examples. In cryptography to present more than half of all possible to be certain of the results examples even if it is true that in strong cryptography, the number of possible input values is very large.

Then we realized an algorithm to present the examples in a more or less complete mess. It's cutting the base k sub-bases then in turn present the elements of each of the sub-bases (k can be even or odd). The following figure shows the error rate final Tss for k different values (the number of presentation being fixed at 500 and 256 examples).

We note that the order of presentation of the basis of learning is not useful.
We note that the order of presentation of the basis of learning is not useful.
k
1
2
3
4
5
6
7
8
Tss
0,05
0,06
0,06
0,05
0,08
0,07
0,05
0,08

At the level of the automatic generation of contiguous texts, we present an algorithm that can generate clear examples regardless of the number of nested loops to a single body of loop which will be executed on each iteration of the innermost loop.

The coefficient of learning, usually noted Epsilon and also called learning rate, allows a more or less rapid learning with opportunities for convergence of the network to an inversely proportional solution due to local minima of the curve of error measured by the basis for learning and values output calculated by the neural network. Should empirically vary Epsilon between 0.1 and 2.0. If the network doesn't want any similarly converge, this is certainly due to the problem of the non-linearly separable, which is the case of learning of the XOR. You should then use a Momentum Term whose real value is between 0.1 and 1.0 and which will aim to avoid local minima by deriving the error function because it allows to take into account in the current step of learning from previous steps.

Self-study can be interesting for neuronal learning of cryptographic algorithms. The neuronal system has two parts: the emulator and the controller whose learning are carried out separately.

The task of the emulator is to simulate the complex function or the encryption algorithm. There therefore its entry State at any given time and an input at this time and its output is the output of the algorithm to the following time. The input of the controller is the State of the system at time k, its output is the value to be input to the algorithm or the function complex. The proper role of the controller is to learn the law of adaptive control. But for this learning, the error signal is not calculated on the order but on its result, the gap between actual condition and current state. It comes to the idea of a guided rather than supervised learning because no Professor learns the System Control Act. In fact, the system learns itself in dealing with the information he receives in return for shares. To make possible learning through backpropagation and retropropager error on the position, the structure of the emulator must be homogeneous at the controller.

Another quality of this device is its ability to e-learning. Learning of the controller is fast. In addition, the law of synthesized control is sufficiently robust to small random perturbations. It is therefore possible to perform neural networks for self-learning on a line of communication for encryption as for authentication of messages in real time.

We present several different applications.

On learning of the XOR, i.e. to achieve C = A XOR B, need us a network 16-bit input (i.e. 2 bytes A and B) and 8-bit output (a byte C). The network must therefore be 16 neurons input, 16 minimum layer (s) (s) hidden neurons and 8 output neurons. The broadcast consists of 65536 causes - effects. After various tests, the success to the XOR learning rate is very close to 100% depending on the random weight initialisation and the number of submissions. More the number of entries and hidden layer neurons are great, plus the number of presentations of the base can be reduced. If the random initialization of the weight is correct, a single submission can be sufficient and better quality.

The learning of cryptographic algorithms to determine a function or an algorithm for combining data entries (causes) for output data (effects). It is therefore to determine input and output of the network structures and to find a basis of causes and associated effects sufficient to learning of the network converges to a minimal amount of errors, or even almost.

converges to a minimal amount of errors, or even almost. The question that arises is to

The question that arises is to know how to make the neural network can memorize the algorithm. The answer is to present virtually all possible encryption keys (e.g. 64 bits) and all possible plaintexts (e.g. 64 bits) input and calculate all resulting ciphertexts with the encryption algorithm. Thus, the neural network will be synthesized algorithm since when it presents him an encryption key and a plain text input, it will give us output the ciphertext whereas.

If the encryption algorithm is bijectif (that is, if are presented as input encrypted text it gets output the plaintext) then the encryption algorithm is the same as the decryption algorithm and the neural network also decrypts.

With regard to key learning, an encryption key must be linked to an encryption or decryption algorithm and a clear or encrypted text.

If key has a fixed size of N -bit, should be N -bit output of the neural network and M bits input equal to two times the number of bits of text blocks clear and encrypted text.

In fact, the neural network realizes a function that finds the key directly from a plaintext and encrypted text.

finds the key directly from a plaintext and encrypted text. We present then the advantages and

We present then the advantages and disadvantages of the neuronal methods used. Learning of neural networks time remains long enough on the basis of the number of bits of the key and encrypted and clear texts, this time can be optimized if the neural network is implemented on a parallel machine.

Regards memorizing keys and encryption algorithms, neural networks are high achievers with over 90% success in learning of weak ciphers. A strong encryption algorithm, to rapid learning. Neural networks are widely used in image recognition, it is so simple to perform authentication. At the level of the hardware architecture, it is easy to parallèliser the algorithms. As well as at the level of networks of neurons and ciphers based on hardware architectures. But this solution is quite expensive. The design of neuro-ciphers can be useful in cases where a secret key and an encryption algorithm are taught how to network to hide information to the user, in particular, at the level of the key generator that could be kept secret by a distributor body. It would be messy to a cryptanalyst to discover the function of the generator algorithm of encryption keys. Neuro-cryptanalysis seems to be a lot more application to neural networks due to their emergent properties of massively parallel statistical analysis and their ownership of concentration of information or approximations of statistical matrices.

An application of the most important of neuro-Cryptography is neuro-cryptanalysis. Neuro- cryptanalysis is to perform the Cryptanalysis of cryptographic algorithms with the use of neural networks. I.e. to achieve one or more neural networks to find or help find the key of an encryption algorithm. The important principle is the presentation to the network of neurons a ciphertext and the encryption algorithm.

In neuro-cryptanalysis, the neural network to help find the encryption key used in the cipher text.

As a neural network can learn a cryptographic algorithm or can 'remember' (by a function approximation) a set of keys. This neural network structure is identical to that of the self-

study. It is clear that neural networks can take an important place in cryptography in the design, use, and verification of protocols.

We test and present possible forms of neuro-as cryptanalysis.

To neuro-cryptanalyser a Vigenère figure, it would take that our neural network either a frequency analysis or one analysis of a subset of n characters of a given language, and then measure the correlation between the plaintext and the ciphertext learned for all subsets of n characters. This type of problem is resolvable by a neural network but would be very long in supervised learning. However, it is possible to carry it out in self-learning mode but the ciphertext should be large enough.

We measure the performance of neural networks at the statistical level by neuro-differential cryptanalysis and linear neuro-Cryptanalysis of des according to the following scheme:

of des according to the following scheme: These performances proved to be particularly good. We then

These performances proved to be particularly good.

We then implemented a neuro-cryptanalyseur of the command of Unix crypt (3) or ufc_crypt (ultra fast crypt), which is an implementation of the des used in the encryption of passwords stored in the/etc/passwd file. It is a little special in the sense where the key is unknown to the user, no one has the ability to perform decryption of password. This key is specific to the Unix system in use. We thought it would be interesting to learn a certain amount of passwords clear and encrypted passwords corresponding to a neural network. The basis of learning should be large enough so the D.E.S. learning does not become a memorization of the examples of this basis, what makes that the network would be unable to find the solutions to other nearby examples of the database.

We have therefore made two applications. A UNIX (or GNU Linux) synthesizing the crypt function of Unix for password clear of 4 characters whose values are a miniscule letter or a point, a division, or about 615000 passwords bar and 2 hours of calculations per presentation. The other is MS-DOS, she realizes learning 1024 clear passwords of 7 characters and passwords encrypted in 11 characters (we remove the first 2 characters of salt used to re- encrypt the password encrypted for 65536 encrypted different passwords for the same clear text).

of salt used to re- encrypt the password encrypted for 65536 encrypted different passwords for the

We have added a visualization program of the first graphical statistics. The second provides information quickly.

We can deduce the following results.

The neuro-crypanalyses differential and linear methods are methods of probabilistic calculations to quickly get information about a part of des. They allow to perform the opposite

function of a S-table for a difference of texts chosen for one and for a linear relationship with

a

subkey selected for the other. Such neural networks learning is very fast.

It

is possible to gather for a method given, differential or linear, 8 x 16 = 128 neural networks

(one for each S-tablesnew each round) and to operate in parallel to the information given by the ciphertext output of des to the plaintext input. Thus these networks may be supervisors of

other neural networks learning unsupervised amending the key bits that different texts pass

through the D.E.S encryption key.

Would be a self-learning of the sub-keys. From the sub-keys, we find the

Statistical analysis of the program under MS-DOS version results are surprising with 90% of the encryption function for the base found by the network of neurons and about 80% of bits to

a close this basic example but not submitted to the network. This proves that for a low basis of learning, it is easy to a neural network to find a clear password from a password encrypted without taking into account the salt included by the Unix system.

We present then two architectures.

The first is a dedicated parallel architecture as a neuro-cryptanalyseur of strong ciphers needs

a very fast supervised learning. It is necessary to present all plaintexts, ciphertexts and keys to the neural network. The following figure shows the overview of learning dedicated to an encryption algorithm.

overview of learning dedicated to an encryption algorithm. A complete machine can be constructed on the

A complete machine can be constructed on the same pattern with a large number of units of

binary counters and circuits with the encryption algorithm. This number is limited by the time

of learning of the single neural circuit of approximately 1 s. It is preferable for the D.E.S.,

treat a fixed data subset as we have done in past applications.

In the second, we present our algorithms written to the distributed architecture of the CM-5

using 3 layers of processors with a processor for a neuron. The first is used to initialize (clear text) input and output (ciphertext) of neural network which is located on layers 2 and 3. It is likely that examples learning time is longer than for the dedicated machine of the preceding paragraph.

The performances are as follows:

learning time is quite long (from several days to several years), but interesting results (error rate is close to zero) are available in short presentations when the basis of examples is large enough (which is the case of algorithms strong such as D.E.S. or R.S.A. then for simple operations such as the XOR it takes between 200 and 500 presentations to get an error rate zero. Once the learning is done, the deadline for passage of information through the network of neurons is very short (in the order of tens of nanoseconds). What is prodigious when we know that it must repeat an exhaustive search for each text encrypted with a different key.

1.5 The memory map

Chapters 2 and 3 are organized so as to present the neural networks, cryptography in a clear manner and define our choices in the direction of our research.

In Chapter 4, we define the neuro-cryptography, settings for well use it in the creation of applications.

Chapter 5 presents the neuro-cryptanalysis from ciphers based on XOR and more complex ciphers. The study of neuro-Cryptanalysis of des shows the performance of the neuro- cryptographic applications. Different applications support our conclusions on the performance of neural networks.

In Chapter 6 supplementary, we give various definitions to clarify certain points on which is based the current cryptography: callbacks in information theory, complexity of algorithms and number theory.

You will then find the bibliography, HTML pages on the Internet and an annex with source codes and various documents.

Chapter 2 - Neural networks

2.1 Introduction

In this chapter, after some necessary definitions, we define current means to link neural

networks to Cryptography. We present the neural network model used as well as learning the most suitable to perform Cryptography. We describe then the algorithm and the benefits of such a model, specifically, at the level of the linear multilayer network analysis to evaluate their performance at statistical level. Then, we list various material aspects knowing that learning must be the fastest possible.

2.2 Basic concepts and terminology

Called self-organization network a network of elements of simultaneously active treatment (nodes and connections) with time-varying local interactions are the overall conduct of the system. Among such networks, Connectionist models use digital information and are dynamic systems that perform calculations similar to those of a neuron.

A

Connectionist model is characterized by: a network (all nodes) connected by directed links

or

connections, an activation rule (local procedure at each node updating activation level

based on their input and their common activation, each node performs this procedure in parallel) and a rule for learning or adaptation (local procedure that describes how connections vary over timemeaning that the weight of the connection is updated to reflect its current value and levels of activation of the nodes it connects each node will perform this procedure in parallel).

The concept of intelligence is an emergent property of its self-organization, it is an underlying principle of this type of network.

Early neural networks have appeared in 1943 with logical MC CULLOCH neurons, they exist various forms of networks.

Multilayer networks to flow of information towards the front are the most interesting. they have an input layer, a layer of output and one or more hidden so-called intermediate layers (figure 2.2.1).

or more hidden so-called intermediate layers (figure 2.2.1). input hidden ouput Figure 2.2.1 - multilayer network

input

hidden

ouput

Figure 2.2.1 - multilayer network to flow to the front

There are 3 modes of possible learning: supervised, Non-supportive and strengthened.

Supervised learning is more suitable to store a cryptographic algorithm or to remember a set of private encryption keys because this learning uses a Professor giving desired system inputs and outputs.

Supervised learning is to present to the inputs and outputs of the network a database causes and effects (unlike non-supervised learning where effects are not presented). Then asked the network we calculate the outputs corresponding to cases presented in its entries. Then measure the sum of the errors for each of the neurons in the network. We must continue to present the basis of causes and effects until the measured error is almost nil.

Neural networks behave well where the basis is not complete because they 'mainstreamed', that is, the information acquired is delocalized over the entire surface of the network. It is important that the number of neurons and the number of hidden layers are selected based on the number of entries in the network, the number of elements of the base to present and the number of submissions.

Figure 2.2.2 presents the response of a neural network during the learning phase, we can see how the error decreases as the presentations of the basis of causes and effects.

as the presentations of the basis of causes and effects. Figure 2.2.2 - Learning Phase The

Figure 2.2.2 - Learning Phase

The other two modes of learning are best in automatic control and correction of errors.

For more detailed information, see (BOURRET 1991).

2.3 The situation presents

Currently used neurons or perceptrons are elements made up of a number of input and output, each entry is weighted by an amplification function and the output is activated by the comparison of the sum of the weighted inputs and the activation threshold.

You will find all the models of neural networks in figure 2.3.1. The detail of each of these models as well as a complete description are contained in (MAREN).

You can consult the documents of authors of previous models: (GROSSBERG 1986) (HEBB, 1975), (HOPFIELD 1982), (KOHONEN 1984), (ROSENBLATT 1959), (RUMELHART 1986), (LIPPMAN, 1987) (MCCULLOCH 1943) and (WEISHBUCH 1989).

Neural networks

Authors and dates

Advantages / disadvantages

and learning

Set of Perceptrons with back- propagation of the gradient

WERBOS, PARKER, RUMELHART. 1987

Learning fast, low memory

Bidirectional associative memory

KOSKO. 1987

Low storage capacity, slow search.

CAUCHY machine

CAUCHY. 1986

 

Brain-state-in-a-box

ANDERSON. 1977

Unknown performance.

Self-Association memory

HOPFIELD

HOPFIELD. 1982

Low memory.

Self-Association memory

KOHONEN

KOHONEN. 1981

slow learning, unknown number of presentations.

The learning vector quantization

Self-organization

KOHONEN. 1981

slow learning, unknown number of presentations.

Auto-organisatrices cards

Figure 2.3.1 - models of neural networks

Among these networks should take to cryptography that allows us a quick learning with little memory capacity because the purpose of the use of such a network is a transfer function approximation or synthesis of cryptographic algorithms.

Perceptrons neural network has the advantage of being currently well known and to meet our needs, it is easy to implement, and his performances are very interesting.

2.4 Neural networks are they used in Cryptography?

There are a few applications that have been studied in the context of the compression of images or files and the identification of messages (completed application no) (PATHMA 1995). We believe that apart from secret military projects, no neural network is used for encryption, decryption and cryptanalysis. However some students specialized in cryptography in France and Belgium appear to be interested. But no literature or media contains information on this subject.

2.5 What types of neural networks are used in Cryptography?

As we have seen in paragraph 2.3, the model of perceptrons with back-propagation of the gradient is the most studied and demonstrated reliability with respect to the learning of the XOR, these networks are simple to implement and have a fast learning.

The advantages of the use of neural networks are:

their generalization property

their low sensitivity to noise (if an error sneaks into the basis of examples)

their low sensitivity to fault (lost connections, modified weight or bug in the program)

information are outsourced

Research of statistical calculations and heuristics capabilities

This model is more suited to the synthesis and looking for associations or recognition. In addition, all States and the outputs of the neurons of these networks can be updated simultaneously. (See the code in annex 1 of the learning of the XOR). A critique of learning algorithms lets say our choice for this model: (CAMARGO 1990). Paragraph 2.8 show these benefits specifically.

2.6 The model structure of perceptrons with back-propagation of the gradient

Figure 2.6.1 on the next page shows the structure of the model of perceptrons in back- propagation of the gradient. There are input bits, the hidden layer, and the output layer. The deltas of the hidden layer, those of the output layer and activations for learning.

The choice of the number of hidden layer neurons necessary must obey a compromise optimizing learning avoiding the overfitting which would be the consequence of a too large number of hidden units. This choice is often the result of know-how and practical experience. It can be guided by statistical considerations.

experience. It can be guided by statistical considerations. Figure 2.6.1 - Structure du modèle de perceptrons

Figure 2.6.1 - Structure du modèle de perceptrons à rétropropagation du gradient

This architecture can both be software (sequential single-processor computer program) material (massively parallel machines). On CM-1, CM - 2 and MASPAR implementations were realized, their performances have been measured (at paragraph 2.9)

2.7 The gradient back-propagation algorithm

Supervised learning in this case is to measure the error between the inputs and outputs and then perform the propagation of the error to neurons in the hidden layers and those entries. F transfer function is a sigmoid function which the differentiability plays an important role. Figure 2.7.1 shows (a) layer architecture and function of transfer, (b) the calculation of the error signal to one output device and (c) the calculation by backpropagation of error of a hidden unit signal.

by backpropagation of error of a hidden unit signal. Figure 2.7.1 - Apprentissage par rétropropagation

Figure 2.7.1 - Apprentissage par rétropropagation

Backpropagation of error formula is i = f'(e i )  k w ki .

Here's the algorithm for N neurons in input, M output neurons, N k the number of neurons in the hidden layer # k :

1. Initialize the weights of the connections randomly

2. Present a case (X 1 , X 2 ,

3. Calculate the outputs of each of the hidden layers and the output layer neurons by the perceptron-like formulas

,

X N ) and the associated effect (S 1 , S 2 ,

,

S M )

first layer:

effect (S 1 , S 2 , , S M ) first layer: second layer :

second layer :

and so on

S 2 , , S M ) first layer: second layer : and so on with

with

, , S M ) first layer: second layer : and so on with . 4.

.

4. Recursively change the weights of the connections of neurons outputs to the hidden layers. W ij is the weight between the i neuron and the neuron j. x i is the output of the neuron i. W ij (t + 1) = W ij (t) +   j x i with learning factor.

j =j x (1-x j )(s j -x j ) if the (X 1 ,

,

X M ) are the outputs to output neurons.

j= x j (1-x j )

neuron j.

. The sum is done on all the k + 1 layer neurons connected to the k + 1 layer neurons connected to the

In the next parts of this memory, neural networks which is discussed will be networks of perceptrons with back-propagation of the gradient.

Linear analysis of multilayer networks 2.8s

The success of the gradient back-propagation algorithm led researchers to analyze in detail the process. They showed analogies with different statistical methods of analysis of data, in particular linear regression and discriminant analysis. In this paragraph, we rely on P.GALLINARI publications, F.FOGELMAN - SOULIÉ (GALLINARI 1988) that carry out a comparison of the classical method of discriminant analysis and the linear multi-layer perceptron (with a layer of hidden units). In the linear case, it is shown that the backpropagation is a discriminant analysis of a population of N individuals (N being the number of examples included in learning) described by n parameters (wheren is the number of input neurons) and projected onto a hyperplane of dimension p (wherep is the number of hidden units).

These results are then used to validate an incremental construction of the hidden layer. It is thus shown that when we add a set of q hidden units, it is not necessary to repeat all the learning, simply freeze the existing connections and make learning about connections relating to units just to add. We can consider an incremental construction of the layer of hidden neurons that saves a precious learning time but it means a variable structure.

The general interest of this approach is to show how comparison algorithms Connectionist and classical methods suggests a permanent enrichment of the first allowing them to increase their performance.

2.8.1 Problem of the linear perceptron multilayer

The perceptron is a supervised classification problem. The characteristics of the input number is n (number of input devices). The number of classes is m (number of output neurons). The number of examples of the basis of learning is N.

Assuming N > n > m which is the case of a reasonable classification problem.

Either X the matrix n x N of entries X = (x 1 ,

(y

output space minimizing the quadratic distance there and fX. The problem is therefore to find

the matrix dimension M (m x n) minimizing   are - MX  2 . The solution to this problem is provided in the book (BOURRET, 1991, pages 189-212) by the Penrose pseudoinverse, it is the matrix W = YX + . Although the quadratic error function is convex, the uniqueness of the minimization problem is not ensured, there may be local minima.

N ) and Y (m x N) matrix output imposed Y =

,x

1 ,

,y

N ). The optimal linear classifier is the application of linear f space of entries in the

The interest of the study is to analyse the linear multilayer case to approximate the behavior of the back-propagation algorithm in the non-linear case (case of the XOR in Chapter 4). The solution of the problem of minimization is the PW matrix where P is the projector onto the subspace of R m generated by the p vector own C=WXY t related to the p more large eigenvalues.

2.8.2

Discriminant analysis of rank p

Discriminant analysis of rank p is to find the best dimensional subspace p R n such that the classes of projections of vectors of input on these subspaces are best separated possible. It is shown in (Bourret 1991) the following theorem: a problem of classification and M = HK optimal

achievement for the quadratic criterium of this classification by a linear perceptron with a layer

of p neurons hidden. Then K performs a discriminant analysis of rank p.

2.8.3 Incremental learning of the hidden layer

A serious gap in the gradient back-propagation algorithm is to apply that to a network already

structured and where the number of hidden neurons is fixed.

We can justify in (BOURRET 1991) the following procedure of incremental learning: the learning algorithm is first applied to a network only with a minimum number of neurons in the hidden layer. When we arrived at an optimal weighting of connections, the performance of the

network are not satisfactory, a hidden unit is added and the learning algorithm is applied only

to this neuron-related connections. The operation is repeated until a satisfactory performance.

Remember that beyond p = (W), it is pointless to increase the number of hidden neurons. The role of hidden neurons is clear: each neuron detects a feature contributing to the classification. These features are non-redundant (orthogonality of the eigenvectors) and their contribution to the separation of the classes is decreasing (classification and module descending eigenvalues).

2.8.4 Relations with the principal component analysis

The back-propagation with p hidden neurons that projects data space dimension p corresponding to that would be found by the principal component analysis. Moreover, in the practice of the principal components analysis was built one by one these components in the order of decreasing values of the modules of the eigenvalues of the covariance matrix of the input data until the sum of these modules divided by the trace of the matrix reaches a fixed threshold. The incremental construction of a back-propagation network to the same concern, the corresponding threshold in this case meets the error observed on the outputs.

It can therefore be concluded that the results obtained by back-propagation could be through

more traditional methods of data analysis (discriminant analysis, principal components analysis), this nearly the backpropagation occurs massively parallel. However, non-linearities of the neural units make changes to the studied behavior. These observable changes by numerical experimentation have been reported in (GALLINARI 1988). Notably, excess neurons in the non-linear case instead of extract negligible surplus features in the classification

(orthogonality of the eigenvectors) behave like neurons from previous layers contribute to robustness and improving the performance of the classifier.

2.9 Material

The physical aspect is very important for the cryptography because the implementation of neural networks in VLSI (very large capacity for integration of transistors components) allows faster and more suitable applications.

A large number of keys and text learning is faster.

The most studied are numeric VLSI, the advantages are:

ease of use

the important signal-to-noise ratio

an easy-to-implement cascade circuit

a high adaptation (these circuits allow to solve various tasks)

a reduced price of manufacturing

For more details, it should read reports written by Dr. VALERIU BEIU for the implementation and optimization of VLSI neural networks (BEIU 1995a), (BEIU 1995 b).

Figure 2.9.1 below shows comparison of different materials for the implementation of neural networks.

materials for the implementation of neural networks. Figure 2.9.1 - comparison of different materials The network

Figure 2.9.1 - comparison of different materials

The network of neurons in VLSI implementation requires 4 blocks (see figure 2.9.2):

the summons (of the inputs of a neuron) with logical Adders

the multiplication (for weight) with parallel multipliers

the transfer function non-linear with a full circuit of calculation or table that contain values of approximation of the function, or a circuit of calculation of approximations (for the sigmoid with 1/5 steps and a error < 13%, just 4 comparators and a few logic gates (ALIPPI 1990)).

memorization of values (S-RAM or D - RAM memories)

Figure 2.9.2 - Circuit CMOS with 1024 synapses to distributed neurons In regards to the

Figure 2.9.2 - Circuit CMOS with 1024 synapses to distributed neurons

In regards to the backpropagation, NIGRI completed a circuit containing a table for all real values of the sigmoid between-2 and 2 with 8-bit precision what is regarded as sufficiently precise (NIGRI 1991).

Here are the three types of existing components on the market or research laboratory:

1. components dedicated to digital neural which speeds network go up to 1 GB of connections processed per second

L-neuro Philips (Duranton 1988, 1989, 1990) *

X 1 and N64000 of Adaptive Solutions (Adaptive 1991, 1992; Hammerstrom 1990) *

Ni1000 Intel (Scofield, 1991; Holler 1992) *

p-RAM of King's College London (Clarkson 1989-1993) *

WSI's Hitachi (Yasunaga 1989, 1990, 1991) *

1.5-V chip (Watanabe1993) *

2. the digital coprocessors particular purpose (also called neuro-accelerators) are special circuitry that can be connected to hosts (PCs or workstations), they work with a neuro- simulator program. The mix of hardware and software aspects gives these benefits:

accelerated speed, flexibility and improved user interface.

Delta Floating Point Processor by SAIC (DARPA 1989) * connected to a PC

ANZA, Balboa Hecht - Nielsen Computers (Hecht-Nielsen 1991) * with a speed of the order of 10 Mega-connections per second

implementations on RISC, DSP or Transputer processors

3. networks of neurons on massively parallel machines

WARP (Arnould 1985; Kung 1985, Annaratone 1987) *

CM (MeansE 1991) *

RAP (Morgan 1990; Beck 1990) *

SANDY (Kato 1990) *

MUSIC (Gunzinger1992; Mueller 1995) *

MIND (Gamrat 1991) *

SNAP (Hecht-Nielsen 1991; Means R1991) *

GF-11 (Witbrock 1990; Jackson 1991) *

Toshiba (Hirai 1991) *

MANTRA(Lehmann 1991, 1993) *

SYNAPSE (Ramacher 1991a, 1991b, 1992, 1993;) Johnson1993a) *

HANNIBAL (Myers 1993) *

BACCHUS and PAN IV (Huch 1990; Pochmuller1991; Palm 1991) *

PE RISC (Hiraiwa 1990) *

RM-nc256 (Erdogan 1992) *

Hitachi WSI (Boyd 1990; Yasunaga 1989-1991) *

MasPar MP-1 (Grajski 1990; MasPar 1990 a - c;Nickolls 1990) *

CNS-1 (Asanovic 1993 b) *

For more information or the references of the machines above (with an asterisk), you can consult (Beiu 1995 c).

You will find in annex 2 a set of electronics manufacturers who directed networks of neurons in Silicon.

An implementation of the above mentioned algorithm has been developed on the Connection Machine CM-2 (created by THINKING MACHINES Corp.) with a topology hypercube 64 k processors, which gave 180 million interconnections calculated per second (IPS) or 40 million weight updated per second.

Here is the performance measured by machine in interconnections calculated by seconds (figure 2.9.3).

CM-2

180 millions

CRAY X-MP

50

millions

WARP (10)

17

millions

ANZA PLUS

10

millions

Figure 2.9.3 - performance of parallel machines

The use of such configurations would allow to obtain excellent results in learning of cryptographic ciphers.

You will find in chapters 4 and 5 How to use the implementation of neural networks on the Connection machine CM-2 or CM-5 in Cryptography.

2.10 Conclusion

In this chapter, we see that the neural network model most interesting model is the perceptron in back-propagation of the gradient and supervised learning is the most suitable. In addition, the use of the networks of neurons in cryptography is very low and even very little known while the study which has been made so far of neural networks allows to say that perceptrons networks are able to learn to synthesize a transfer function fairly easily. They allow to give statistics, as well as more traditional statistical methods, based on the values of entries making it very useful in Cryptography. It also emerges that neural networks are currently at the level of hardware implementing comprehensive enough and made at the industrial level. These networks can be perfectly parallel and excessively fast.

Everything shows that should bind neural networks to cryptography, but what is Cryptography appropriate? And what cryptographic tools use? The answers are the following chapters.

Chapter 3 - Cryptography

3.1 Introduction

We give in this chapter of the important definitions to understand the continuation of our work as well as clarification regarding the current situation of the world "known" Cryptography then we describe the composition of cryptographic algorithms, weak and strong. We specifically detail the D.E.S. because, after more than 20 years of existence, it remains the most used and the most studied, especially at the level of its cryptanalysis which is very difficult.

3.2 Definitions

Cryptography is the art of hiding (encrypt) messages.

A cryptosystem is a hardware or software system performing the cryptographic, it can contain one or more encryption algorithms.

Cryptanalysis is the art of breaking codes or the cryptosystems, i.e. to find the key to read all or part of the message.

Cryptology is the mathematical study of cryptography and cryptanalysis.

An original message is called plaintext or plaintext.

A resulting message is called cipher text.

An encryption key is a code to encrypt a plaintext.

A decryption key is a secret code to decrypt a ciphertext.

A private key allows the encryption and decryption, it must be secret.

A public key allows only encryption, it may be broadcast; only the person with the associated private key can decrypt the message.

Is called exhaustive search the test of the set of all possible keys to find the decryption key. Feel free to consult (FAQ 1996).

3.3 Contemporary Cryptography

Cryptography is a very large and popular area of mathematicians and computer scientists. However, nowadays, cryptography is the study of more or less strong encryption of messages or files and study of protocols to Exchange private networks and other means of communication. Found in the study of ciphers, the means to find keys or decrease the exhaustive search of keys: it is cryptanalysis.

3.3.1 The cryptosystem and strength

The strength of a cryptosystem lies in the used key and the algorithm of encryption (or digit) if it is kept secret (which is reserved for the military).

The key size must be large (512, 1024 or 2048 bit is reasonable) so the distance of uniqueness is great (see Chapter 6 supplementary) and the powerful key generator or secret.

The ciphertext should appear random to all standard statistical tests

The cryptosystem must withstand all known attacks.

However, even if the cryptosystem meet the previous criteria, it cannot conclude that this system is infallible!

The cryptosystems are of two types: public key or private key.

A cryptosystem private key K is defined by D K (= me C K (M)) = M where C is the function of encryption and decryption function, M D a clear message and me the encrypted message.

3.3.2 Protocols

The protocols are a series of steps to human beings (at least two) to accomplish a task. Cryptographic protocols allow participants to exchange secret information between them.

Applications using them are data communications, authentication, management of private and public keys, cutting messages, mix of messages, access to databases, dating services, subliminal messages, digital signatures, collective signatures, pledging, playing heads or tails, playing poker blind evidence disclosure void, silver electronics and anonymous messages. The best would be a protocol to intrinsic discipline because he himself would ensure the integrity of the transaction (without intervenor or "arbitrator"), its construction would make impossible challenges; There are no!

The study of the protocols is very documented in (SCHNEIER 95). We will attach in the pages that follow to the neuronal development and neuro-Cryptanalysis of cryptosystems that looking for protocols making it more secure exchange of information between participants.

3.3.3. The types of attacks in cryptanalysis

Cryptanalysis distinguishes between the following different types of possible attacks:

to ciphertext only : the attacker must find the plaintext having only the ciphertext. A ciphertext attack is practically impossible, everything depends on the encryption.

to known-plaintext : the attacker has the plaintext and corresponding ciphertext. The ciphertext was not chosen by the attacker but anyway the message is compromised. In some cryptosystem, a pair of encrypted text - plaintext can compromise the security of the system as well as the transmission medium.

to chosen plaintext : the attacker has the ability to find the ciphertext corresponding to an arbitrary plaintext of his choice.

to chosen ciphertext : the attacker can arbitrarily choose and find the corresponding unencrypted clear text. This attack may show weaknesses in the systems public key, and even to find the private key.

to suitable chosen plaintext : the attacker can determine the ciphertexts of chosen plaintexts in an iterative and interactive process based on the results previously found. An example is the differential.

Some of these attacks can be interesting when they are used against ciphers strong. See (FAQ 96) and (SCHNEIER 95) for details of these attacks.

3.4 Cryptographic algorithms

In general, the plaintext M is divided into blocks of bits of fixed length: M = 1 M 2 M

M

N.

Each M i block is encrypted: C i = E k (M i ) and the result is added to the ciphertext C =12C C

C N .

There are 2 main types of coding: coding blocks and the stream encoding.

In the coding of blocks, the size of a block must be high to prevent an attack: it is usual to use 64-bit to be 2 64 research opportunities. The transformation function T (M) = C is the same for each block which can memory and goes relatively quickly to encode.

In the stream encoding, blocks are encoded sequentially and each block is encoded by a separate transformation which depends on:

1. previous coded blocks, and/or

2. previous processing, and/or

3. the number of blocks

This information must be in memory between each coding of blocks. If the transformation varies in each block, the block size can be short (usually between 1 and 8 bits).

The same clear text or message M won't give so necessarily the same ciphertext C.

Block coding is a coding of substitution in which the plaintext and ciphertext blocks are binary vectors of length N. For each key, the encryption function E K (M) is a permutation of the set {0,1} N to itself. D K (C) is the decryption function (inverse permutation) such as D K (E K (M)) = E K (D K (C)) = identity.

There are 4 modes of encryption which are ECB, CBC, OFB, CFB

ECB mode (Electronic Code Book)

C i = E K (M i ) and M i = D K (C i )

i = E K (M i ) and M i = D K (C i )

CBC mode (Cipher Block Chaining)

C i = E K (M i XOR C i-1 ) and M i = D K (C i ) XOR C i-1

 OFB mode (Output FeedBack) V i = E K (V i - 1 )

OFB mode (Output FeedBack)

V i = E K (V i-1 ) and C i =M i XOR V i

= E K (V i - 1 ) and C i =M i XOR V i

CFB mode (Cipher FeedBack)

C i = M i XOR E K (C i-1 ) and M i = C i XOR E K (C i-1 )

1 ) and M i = C i XOR E K (C i - 1 )

Any encryption algorithm can be implemented in these modes.

In regard to our work, we will focus specifically on the ECB mode that most fits the learning of neural networks with an input and an output of fixed-bit numbers and not loop re-inbound, although it is possible to connect one or more networks of neurons in this way but learning time would be quite longer.

3.4.2 The number of Vigenère

The only XOR based encryption algorithm is called Figure of Vigenère (the code is located in the annex).

Encryption is performed between a clear M and a key of N characters:

1. M is divided into blocks of N characters

2. For each block, the XOR operation is performed between the block and the key.

This algorithm is trivial has broken, if we accept that the characters are ASCII and the length of the key is unknown:

1. You must first discover the key by a process called counting of coincidences (FRIEDMAN 1920): compare the text encrypted to itself but shifted a given number of bytes: count the number of identical bytes. If the two blocks of text put face to face have been encoded with the same key, more than 6% of the bytes will be equal. If they have been encoded with a different key so less than 0.4% of the bytes will be equal. The smallest movement indicating a high coincidence is the length of the desired key.

2. Then he must offset the ciphertext of this length and apply the XOR between ciphertext and thereby offset text. This operation removes the key and leave you with the result of the XOR of the plaintext with itself shifted. The English language rate is between 1 and 1.5 bit/letter, 1.2 for Shannon; the French is between 1 and 1.8 bit/letter (see Chapter 6). There is enough redundancy to choose the correct decryption.

The code in C of this program is in the annex.

This figure is too low to be sure!

3.4.3 The strong figures

There are two kinds of strong encryption algorithms: only XOR operation between text and code-based ciphers at base of very large prime numbers and others.

An example of the first case is the R.S.A. (RIVEST, SHAMIR and ALDEMAN) which is PKI.

Here is the algorithm:

1. Decompose data into blocks of length equal to the length of the code word

2. Make a XOR between the block (modified by a given encryption) and code (key or subkey encrypted)

3. Write the encrypted block

4. Repeat step 2 for each block

This algorithm is the same as almost all encryption algorithms, the differences come from the generation of the keys to encrypt or decrypt.

In the R.S.A., it is necessary to generate codes (2 public codes and 3 secret codes) to encrypt and decrypt, so the authors had to:

1. choose two large numbers p and q (512 bits),

2. make the product n = pq,

3. Choose randomly d first with (p-1)(q-1) between max(p,q) + 1 and n-1,

4. calculate e = d -1 modulo (p-1)(q-1).

This gives n and e public and p, q, d secret.

The R.S.A. is based on the theory of numbers (see chapter V), in particular, the difficulty of factorization of a number into its prime factors. Its effectiveness lies in the proliferation of these factors. For more details on the R.S.A. should absolutely read (ALDEMAN 78).

PGP (Pretty Good Privacy) Zimmerman combines the RSA and the use of very long primes.

In the second case, it has the D.E.S. we describe in the next paragraph, it works with a private key. (LUCIFER is the ancestor of REDOC II, SNEFRU, KHAFRE, IDEA, LOKI and FEAL are of the same type and weaker algorithms that the of).

3.5 A reference: the Data Encryption Standard (des)

3.5.1 History

The algorithm of 1977, it was developed by the I.B.M. Corporation for the federal bureau of standards of the United States, which has made the encryption standard for all exchanges of

confidential information (banking networks, smart cards, communications,

).

The D.E.S. combines conversions and substitutions in a product code which the safety level is much higher than that of the two codes used base (text and key). These substitutions are non- linear which produces a cryptosystem resistant to any cryptanalysis. He has also designed to withstand differential cryptanalysis which was classified by the army and unknown to researchers.

3.5.2 Architecture

3.5.2.1 Below figure shows a graphical representation of the internal architecture of des. It uses blocks of 64-bit input L0 and R0, the length of the key K is 56-bit (8-byte without the last bit used for parity). This key will generate 16 48-bit different K1 to K16 sub-keys. Contrary to appearances was very adequate and it is a little less these days because it takes 2 56 ciphers to find the key with an exhaustive search.

The function f is called a round,i -th round receives inputs the right part R i (or 32 bits of the text to be encrypted) and the K i subkey (48 bits). The rounds of des are detailed below. He gets out of 32 bits that are added to L i . While Ri is passed as what L i + 1 , the encrypted bits are transmitted to R i + 1 (except for the final round).

Figure 3.5.2.1 - des 3.5.2.2 (A) and (b) figures are the algorithms used by des.

Figure 3.5.2.1 - des

3.5.2.2 (A) and (b) figures are the algorithms used by des. Function IP (bit permutation) and IP -1 (inverse permutation of bits) can be ignored because they are well known and do not add so not by force to the D.E.S.

We realize that all of the encryption is based on expansions, reductions and permutations of bits. Apart from the round, these operations are linear.

 

L, R: part low and high current text block

Separation of the 16 sub-keys (48 bits per round) (56-bit) key

C

(0), D (0) = PC1 (key)

C, D: low and high compressed key

for(i = 0; i < 16;i++)) {

PC1, PC2: permutation and key compression

C

(i) = LS (i) (C (i-1))

LS: offset

D

(i) = LS (i) (D (i-1))

IP: initial permutation (fig.II - 2 b)

K

(i) = PC2 (i) (C (i), D (i))

IP -1 : inverse permutation (fig.II - 2 b)

}

FP: Exchange (fig.II - 2 b)

3.5.2.2 Figure (a) - des algorithms

Coding of a block (64-bit)

Decoding of a block (64-bit)

L

(0), R (0) = IP (block plaintext)

L

(16), R (16) = IP -1 (block ciphertext)

for(i = 0; i < = 16;i++)) {

for(i = 0; i < = 16;i++)) {

L

(i) = R (i-1)

R

(i-1) = L (i)

R

(i) = R (i-1) ^ f (R (i-1), K (i))

L

(i-1) = R (i) ^ f (L (i), K (i))

block cipher text = FP (R (16), L (16))

block plaintext = FP (L (0), R (0))

}

}

3.5.2.2 Figure (b) - des algorithms

The D.E.S. combines 2 mathematical techniques: confusion and dissemination (see Chapter 6). The round f apply the text substitution (8 S - boxes or S-tables) followed by a permutation (P-boxing or P-table) based on the text and the key.

3.5.2.3 Figure which follows presents the synopsis of a round (the function f).

Figure 3.5.2.3 - a f round of the DES The content of this round is

Figure 3.5.2.3 - a f round of the DES

The content of this round is otherwise presented in figure the following paragraph 3.5.3.1. Various standards have emerged to standardize the exchange of encrypted information D.E.S.; ANSI standards references are X3.92.digital: D.E.S., X3.106: modes of operation, X3.105:

network, X9.19: authentication, X9.24: distribution of keys; the references of standards of the Federal standard are 1027 and 1028.

3.5.3 Cryptanalysis

3.5.3.1 Figure following shows the architecture of a round with its S-tables which, unlike other operations, are more or less mi-lineaires/mi-affines. If they were completely ripened, des would be very easy to break, but they have been selected to withstand attacks. The subkey bits, and those of the once expanded text block are added, and substituted through S-tables then swapped.

Figure 3.5.3.1 - a round of des with its S-tables Current research to break des,

Figure 3.5.3.1 - a round of des with its S-tables

Current research to break des, without exhaustive research, have managed to weaken the D.E.S. but little. The results are in figure 3.5.3.2 and (SCHNEIER, 1996).

 

Exhaustive search

Differential cryptanalysis

Linear

cryptanalysis

A clear texts chosen

2

56

2

47

 

A known plaintexts

2

56

2

55

2

43

Des operations

2

56

2

37

Figure 3.5.3.2 - results of different as cryptanalysis (for des-16 round)

There are two types of cryptanalysis: differential cryptanalysis and linear cryptanalysis are described in paragraph 3.6.

The complete and commented code in C to the D.E.S. is located in Appendix 1.

3.5.4 The physical aspect

The physical aspect is very important for the speed of execution. The VLSI components are very widespread and effective but there are even more interesting technology-based components that should not be disregarded: the Gallium Arsenic (GaAs) or arsenide technology. It has already been included in supercomputers.

The major differences between GaAs and VLSI are:

Fast failover of the GaAs doors

The Exchange with components other than GaAs is a major difficulty

Very small density of GaAs integrated circuits

The GaAs (DCFL E/D-MESFET) Gates times are less than or equal to 50 picoseconds, while

it takes at least a nanosecond in Silicon (NMOS).

The access time to memory RAM GaAs takes approximately 500 picosecond and 10 nanoseconds in Silicon. This indicates that the performance of computers based on the GaAs technology should be 20 times higher than the fastest silicon-based supercomputers. On the other hand, the level of integration GaAs is of about 50,000 transistors per integrated circuit while it is 1 million in Silicon due to the problem of heat dissipation. This problem is greater the number of GaAs circuits required to design a computer and a high-performance computer is to optimize the number of circuits integrated on the motherboard.

GaAs circuits with outside communication is another factor. The problem is the downturn forced by other components. However, the signal propagation is not very different between silicon and GaAs. The only solution to solve this exchange rate is to introduce a memory with

a multi-level hierarchy. However it does not exist for the moment which works with the GaAs technology.

Although the GaAs technology cannot be fully exploited for the moment, it is certainly a very interesting technology of the future for the Cryptography due to its excellent performance. If the CM - 2 has its equivalent in arsenide, is the property of the military.

With regard to the D.E.S., there is a circuit running at 50 MHz performing encryption in 20 ns, which allows to make 50 million of ciphers in a second.

Since late 1995, AMD sells a circuit encrypting at 250 MHz.

In August 1993, the Canadian Michael J. WIENER described how to build a machine for $ 1 million that performs a comprehensive search of des keys to find the right key in 3.5 hours. Each of its basic circuits has power equivalent to 14 million stations SUN.

See (WIENER 1993) for more details on this machine.

It seems so obvious that the exhaustive search is faster to perform types of cryptanalysis because even if the number of attempts is less, the search time is much longer, cryptanalysis is still very interesting to measure the performance of cryptographic algorithms.

You will find in annex 9, the characteristics of the MASPAR machines CM-5.

3.6 The Cryptanalysis of des

3.6.1 Differential cryptanalysis

It is an attack to clear texts chosen on the rounds of des to find the key. (the presentation of the various attacks was made in paragraph 3.3.3 In 1990 and 1991, Eli BIHAM and Adi SHAMIR create differential cryptanalysis, this method is to look at the specifics of a pair of ciphertexts for a pair of plaintexts with a particular difference.

Differential cryptanalysis analyzes the evolution of these differences when the plaintexts spread through rounds of DES to be encrypted with the same key.

After randomly choosing a pair of plaintexts with a difference set, calculate the difference in the resulting ciphertexts. Using these differences, it is possible to associate different probabilities to various bits of the sub-keys. Plus a large number of ciphertexts is analyzed, most most likely encryption key will emerge.

Force of the resident in his rounds and all operations of a round being completely linear except S-tables (or S-boxes), Eli BIHAM and Adi SHAMIR analyzed 8 S-tables for text input differences and differences in output texts, these information are synthesized in 8 tables called Tables of distribution of differences of of (see the 8 tables in annex 3). We realized the algorithm to generate these tables in figure 3.6.1.1. P is a plaintext, P * is another clear text, X is the encrypted text of P, X * is the encrypted text of P * , P' is the difference of P and P * , X' X and X *

Initialize the Table boxes to 0

For t = 1 to 8 do / / number of S-table

For P = 0 to 63 To

For P * = 0 to 63 To

P'= P xor P *

X = S-table t (P)

X * = S-table t (P) * )

X'= X xor X *

Table t [P'] [X'] = Table t [P'] [X'] + 1

End for

End for

End for

Figure 3.6.1.1 - distribution tables generation algorithm

Once these tables is generated, pictured in figure 3.6.1.2 on next page it is possible to have information about B (B = B xor B *) according to C (C = C xor C *). So for a has known text (A = A xor A *), the combination of A and C suggests bit a xor K i values and has ' xor K i which allows to have information on a few bits of the K t subkey.

With this information, it is possible to overlook a large number of chosen plaintexts.

Figure 3.6.1.2 - a round of the analyzed The likelihood of having a pair of

Figure 3.6.1.2 - a round of the analyzed

The likelihood of having a pair of inputs P' S-table on the basis of a pair of outputs X' is p= Table [P'] [X'] / 64. I recall that E is the permutation of the round and P the toggle function function.

You will find the program of generation of tables of distributions of the differences in annex

1.

This attack also works fine on FEAL, IDEA, LOKI, REDOC II, SNEFRU, KHAFRE and LUCIFER. For more information, you can consult (BIHAM 1991), (BIHAM 1993a) and (BIHAM 1993b).

3.6.2 Linear cryptanalysis

It is an attack to clear texts known on the rounds of des to find the key.

It was in 1993 that Mitsuru MATSUI created linear cryptanalysis, this method is to study the statistical linear relationships between a plaintext bits, the bits of the ciphertext and key which allowed to encrypt. These relationships allow for some bits of the key values when we know the plaintexts and ciphertexts associated.

It deduced the linear relationships of each S-table by choosing a subset of bits of input and output bits, calculating parity (Xor) of these bits with parity of the subset is zero. In general, a subset will be entries with parity 0 (linear) and others with parity 1 (affine).

MATSUI has calculated the number of parity zero of each subset of bits of input and output for each S-table amongst the 64 x 16 = 1024 possible subsets. is possible to associate different probabilities to various bits of the sub-keys. Probabilities of obtaining parity zero (linear relationship) are synthesized in 8 tables called Tables of linear approximations of a (see 8 tables in annex 4). We realized the algorithm to generate these tables in figure 3.6.2.1. P is a plaintext, C is the text encrypted p, K is a subkey.

For t = 1 to 8 do / / number of S-table

ForP = 0 to 63, do

For C = 0 to 15 do

Table [i] [j] = - 32 / / remove half

For K = 0 to 63, do

PA = (parity(S-table t (K) & C) + parity(K & P)) & 1

If (AP == 0) ThenTable [i] [j] ++;

End for

End for

End for

End for

Figure 3.6.2.1 - linear approximations tables generation algorithm

Once these tables is generated, if a box of the table is set to 0 then the probability is (32/64 - Table ij 64) and this information cannot be exploited to attack des. on the other hand, if the value of this checkbox is non-zero, we have a linear relationship of probability p = 1/2 - Table ij 64 on having the K t subkey bits based on the bits of output of the S-table t.

You will find the program of generation of tables of linear approximations in annex 1.

You can consult (MATSUI 1994) and (harp, 1995). In (SCHNEIER, 1996), we learn that searches are performed by combining differential cryptanalysis and linear cryptanalysis.

3.7 Conclusion

In this chapter, you have seen a terminology and a set of points on which it is interesting to consider neuro-cryptography, especially in the study of their as cryptanalysis and encryption algorithms, in means of hardware and software of the Cryptography implementation. The des and its cryptanalysis, study with neural network architecture should prove their effectiveness of memorization and probabilistic research for complex encryption algorithms. Found in the following chapters of the theories and applications implemented to prove these theories.

Chapter 4 - Neuro-Cryptography

4.1 Introduction

In this chapter, we define the possible association between neural networks and Cryptography. We then present the neuro-Cryptography as well as the range of possible applications to perform encryption, decryption and Cryptanalysis of a chosen algorithm. Also found in this chapter the formation of a learning base and different parameters related to the learning of ciphers and discuss self-study as part of a line of communications information control applications.

4.2 Can we bind Cryptography and Neural networks?

The two preceding chapters show that although that this has not been done (or made by the military confidentially), neural networks can be useful in Cryptography. Learning of neural networks must still be optimized and fast. On the other hand, the use of the once-trained network is excessively fast and efficient.

To achieve satisfactory applications for learning a strong figure, must be a great execution speed. This implies that used neural networks must be implemented in parallel hardware architecture as cryptographic algorithms. Nevertheless, it is possible to create software applications on data smaller to get results more quickly.

Low numbers can be simulated on a PC. The problem arises when you want to associate a cryptographic algorithm with a neural network in a unique parallel architecture without wasting time in Exchange for information. You can make applications on strong ciphers but not a general point of view, i.e. the entire algorithm, it is better in this case to simplify the task by working on small parts of the algorithm whose complexity is reduced. In addition, can ignore completely linear or completely affine functions and endeavour to weaken the other functions through the neural network synthesis facility.

4.3 The new definitions

It comes to define the field of neuro-Cryptography. All terms used in Cryptography must be preceded by the particle "neuro" where the cryptosystem contains one or more networks of neurons or one or more elements of the network as for example the perceptron.

4.3.1 Neuro-encryptionor neuro-encryption

It is the action of encrypting it with a cryptosystem with a hardware or software architecture based on the functioning of neural networks.

4.3.2 Neuro-decryptionor neuro-decryption

It is the action of decipher with a cryptosystem with a hardware or software architecture based on the functioning of neural networks.

4.3.3

The neuro-generator

A neuro-generator is a generator of all or part of a public or private encryption key with a

hardware or software architecture based on the functioning of neural networks.

4.3.4 Neuro-cryptanalysis

Neuro-cryptanalysis is the Cryptanalysis of a cryptosystem using a hardware or software architecture based on the functioning of neural networks and neuro-cryptanalyseur the way to the neuro-cryptanalysis. Chapter 5 is completely devoted to neuro-cryptanalysis and its applications, in particular, at the level of a strong cipher like des.

4.4 The generation of bases of learning

How the basis of learning will be generated is very important for the realization of neural applications. Learning depends on random initialization of weights the network as well as the number of examples, the order of presentation of these examples then the consistency in the choice of a set of examples.

4.4.1 Examples

An example is composed of a value to be presented at the entrance to the network of neurons and a value to present output of this network, the value of output based on the input value.

If the number of examples is too low, it is clear that the network will not seek a transfer

function of the studied cryptosystem but will instead store the examples given and cannot therefore in any way find a result for an input value different from those given in the basis of examples.

In cryptography to present more than half of all possible to be certain of the results examples

even if it is true that in strong cryptography, the number of possible input values is very large.

4.4.2 Order of presentation

If all possible examples are in the basis of learning, i.e. If for N input neurons there are 2 N - 1

examples presented, it is not necessary to present the examples in the order of generation (in general, ascending).

We conducted an algorithm to present the examples in a more or less complete mess. It's cutting the base k sub-bases then in turn present the elements of each of the sub-bases (k can be even or odd).

The following algorithm uses n for the total number of examples of the basis for learning and p for the current addressing element, it returns the index of the sample to the neural network:

Begin

d =Integer(p * k/n);

return ((p- Integer(d*n/k)) * k) + d;

End

Figure 4.4.2.1 - choice of an example in one of the sub-bases k

This mathematical formula is trivially demonstrated by recurrence because it is a suite of discrete values.

The C source code is located in Appendix 1 (learning of the XOR in the mess). 4.4.2.2 Figure shows error rates end Tss for k different values (the number of presentation being fixed at 500 and 256 examples).

We note that the order of presentation of the basis of learning is not useful.
We note that the order of presentation of the basis of learning is not useful.
k
1
2
3
4
5
6
7
8
TSS
0.05
0.06
0.06
0.05
0.08
0.07
0.05
0.08

Figure 4.4.2.2 - error for a disordered presentation rate

4.4.3 Automatic generation of texts

To generate a regular automatic learning basis, i.e. following an alphabet given by generating all possible examples in the order must be a N characters in input to the encryption algorithm N nested loops to a single body of loop which will be executed on each iteration of the innermost loop as shown in figure 4.4.3.1 for an alphabet of P characters.

The body of loop retrieves the values of the counters and generates a plain text (one character of the text by meter), this text is encrypted by an encryption algorithm which gives an example (plaintext - ciphertext) to present to the neural network.

For compteur1 = 0 to P-1 do

For compteur2 = 0 to P-1 do

For counterN= 0 to P-1 do

End

End

Corps(compteur1,compteur2,

End

,compteurN ))

Figure 4.4.3.1 - loops nested for the generation of ordained texts

The algorithm we present in 4.4.3.2 figure to generate clear examples regardless of the number of nested loops N :

/ * Initialize loop counters and values of end conditions */

For b = 0 to N do i_bcl [b] = 0;End

For b = 0 to N do f_bcl [b] =P-1. End

/ * Execute the nested loops *

Repeat to infinity

b=N -1;

If (Body (i_bcl) ==true) Then exit;

If (i_bcl [b] < f_bcl [b]) then i_bcl [b] ++;

Else

Label _precedent:

i_bcl [b] = 0; / * Reset the counter to 0 * /

If(b== 0) Then exit; Otherwise b-;

If (i_bcl [b] < f_bcl [b]) Then i_bcl [b] ++; else go to _precedent;

End else

End repeat

Figure 4.4.3.2 - variable nested loops for the generation of ordained texts

In this case, the body function has arguments the values of the counters of loops and return a Boolean value to indicate whether or not exit loops. b is the value of the current loop. An example of C source code is located in Appendix 1 (Automatic Generation of basis for learning of des).

4.4.4 The coefficient of learning

This coefficient, generally noted Epsilon and also called learning rate, allows a more or less rapid learning with opportunities for convergence of the network to an inversely proportional solution due to local minima of the curve of error measured by the basis for learning and values output calculated by the neural network.

Should empirically vary Epsilon between 0.1 and 2.0. If the network doesn't want any similarly converge, it is certainly due to the problem of the non-linearly separable, which is the case of learning of the XOR. Should then use a Momentum Term whose real value is between 0.1 and 1.0 and which will aim to avoid local minima by deriving the error function, meaning that it allows to take into account in the current step of learning from previous steps.

4.5 Self-learning

Self-study can be interesting for neuronal learning of cryptographic algorithms. The neuronal system consists of two parts the emulator and the controller whose learning are carried out separately.

The task of the emulator is to simulate the complex function or the encryption algorithm. There therefore its entry State at any given time and an input at this time and its output is the output of the algorithm to the following time. Learning is done by presenting every moment a different input (figure 4.5.1).

by presenting every moment a different input (figure 4.5.1). Figure 4.5.1 - learning a complex function

Figure 4.5.1 - learning a complex function or an algorithm

Once completed the learning of the emulator, it is connected to the controller (figure 4.5.2).

Figure 4.5.2 - learning of the controller through the emulator The input of the controller

Figure 4.5.2 - learning of the controller through the emulator

The input of the controller is the State of the system at time k, its output is the value to be input to the algorithm or the function complex. The proper role of the controller is to learn the law of adaptive control. But for this learning, the error signal is not calculated on the order but on its result, the gap between actual condition and current state. It comes to the idea of a guided rather than supervised learning because no Professor learns the System Control Act. In fact, the system learns itself in dealing with the information he receives in return for shares. To make possible learning through backpropagation and retropropager error on the position, the structure of the emulator must be homogeneous at the controller.

Another quality of this device is its ability to e-learning. Learning of the controller is fast. In addition, the law of synthesized control is sufficiently robust to small random perturbations.

It is therefore possible to perform neural networks for self-learning on a line of communication for encryption as for authentication of messages in real time.

4.6 The realization of applications

4.6.1 The learning of the exclusive or (XOR))

The XOR is a simple operation that is particularly used in Cryptography. 4.6.1.1 Figure below represents its truth table with a, b and c binary, c being the sum without restraint of a and b.

The purpose of this paragraph is to show that the XOR is easily achievable and that all of the XOR-based cryptographic applications are feasible with one or more networks of neurons. You will find how cryptanalyser a single digit XOR-based on 64-bit in Chapter 3.

a

b

c

0

0

0

0

1

1

1

0

1

1

1

0

Figure 4.6.1.1 - Table of truth of the XOR

To achieve C = A XOR B, need us a network 16-bit input (i.e. 2 bytes A and B) and 8-bit output (a byte C). The network must therefore be 16 neurons input, 16 minimum layer (s) (s) hidden neurons and 8 output neurons. The broadcast consists of 65536 causes - effects.

You can find the code in C of this network in annex 1. (The coefficient of learning is referred by EPSILON). The rate of success at learning of the XOR is very close to 100% depending on the random weight initialisation and the number of submissions.

More the number of entries and hidden layer neurons are great, plus the number of presentations of the base can be reduced. If the random initialization of the weight is correct, a single submission can be sufficient and better quality.

The table in annex 8 the measurement error rate for each presentation.

4.6.2 The learning of cryptographic algorithms

Just as the previous paragraph, to determine a function or an algorithm for combining data entries (causes) for output data (effects).

It is therefore to determine input and output of the network structures and to find a basis of causes and associated effects sufficient to learning of the network converges to a minimal amount of errors, or even almost.

Any encryption algorithm consists as in figure 4.6.2.1.

Any encryption algorithm consists as in figure 4.6.2.1. Figure 4.6.2.1 - synoptic of an encryption algorithm

Figure 4.6.2.1 - synoptic of an encryption algorithm

The question that arises is to know how to make the neural network can memorize the algorithm. The only answer is to present virtually all possible encryption keys (e.g. 64 bits) and all possible plaintexts (e.g. 64 bits) input and calculate all resulting ciphertexts with the encryption algorithm.

Thus, the neural network will be synthesized algorithm since when it presents him an encryption key and a plain text input, it will give us output the ciphertext whereas.

If the encryption algorithm is bijectif (that is, if are presented as input encrypted text it gets output the plaintext) then the encryption algorithm is the same as the decryption algorithm and the neural network also decrypts.

Initialize the network weights randomly

Repeat

for each key make

for each text make clear

Encrypt the plaintext with key

Initialize the network with the clear text entries

Calculate the outputs of the network

Initialize the outputs of the network with the cipher text

Calculate the deltas of the network

Change the weight of the network

Measure the error of the network

end for

end for

until the error is almost nil

Figure 4.6.2.2 - learning algorithm

4.6.2.2 Figure presents the learning algorithm regardless of the "Encrypt" function which computes the ciphertext from the provided clear text.

If

the number of bits of the plaintext is 64 and the key is 56, gives us 2 120 examples to present

to

the network of neurons, which may be huge in calculation time if the encryption function is

long.

Hence the importance of the physical aspect and dedicated architectures.

Various applications can be carried out, including the Cryptanalysis of des, that you can see in Chapter 5 of this memorandum.

4.6.3 Key learning

A single encryption or decryption key has no meaning, it must be linked to an encryption or

decryption algorithm and a clear or encrypted text.

If key has a fixed size of N bits, then N bits in the neural network outputs and M bits input

equal to two times the number of bits of the plaintext and ciphertext blocks.

4.6.3.1 Figure shows text input and the output key:

4.6.3.1 Figure shows text input and the output key: Figure 4.6.3.1 - memorization of key In

Figure 4.6.3.1 - memorization of key

In fact, the neural network realizes a function that finds the key directly from a plaintext and encrypted text.

4.7 The advantages and disadvantages

Learning of neural networks time remains long enough on the basis of the number of bits of the key and encrypted and clear texts, this time can be optimized if the neural network is implemented on a parallel machine.

In regards to memorizing keys and ciphers, neural networks are high achievers with over 90% success in learning of weak ciphers. A strong encryption algorithm, to rapid learning.

Neural networks are used extensively in recognition of images, it is so simple to perform authentication.

At the level of the hardware architecture, it is easy to parallèliser the algorithms. As well as at the level of networks of neurons and ciphers based on hardware architectures. But this solution is quite expensive financially.

The design of neuro-encryption can be useful in cases where a secret key and an encryption algorithm are taught how to network to hide information to the user, in particular, at the level of the key generator that could be kept secret by a distributor body. It would be messy to a cryptanalyst to discover the function of the generator algorithm of encryption keys.

Neuro-cryptanalysis is an application much more adapted to neural networks due to their emergent properties of massively parallel statistical analysis and their ownership of concentration of information or approximations of statistical matrices. Chapter 5 on neuro- cryptanalysis should enlighten you about the possibilities of neural networks.

In addition, over a problem cryptographic complexity class of P-SPACE requiring a very large capacity memory, Neuron network is compact and its size is fixed.

4.8 Conclusion

We have defined in this chapter, the association of two broad areas of Artificial Intelligence neural networks and contemporary Cryptography. We presented the neuro-Cryptography as well as the range of applications possible to perform encryption, decryption and Cryptanalysis of a chosen algorithm. Also found in this chapter the formation of a learning base and different parameters related to the learning of ciphers and discuss self-study as part of a line of communications information control applications. Learning of a strong encryption

algorithm being quite long and requiring the use of parallel machines, to use neural networks to synthesize an encryption algorithm with a given key, this algorithm and this key being kept secret for example by a distributor body.

Chapter 5 - Neuro-cryptanalysis

5.1 Introduction

In this chapter, we present the neuro-Cryptanalysis of strong encryption, the general principle being the search for key by a neural networks-based study, whether learning the functions of the texts clear and encrypted keys. Then we describe applications. We present differential neuro-cryptanalysis and linear neuro-Cryptanalysis of des, allowing us to measure the statistical performance of neural networks. A dedicated hardware application is described as a last resort.

5.2 Definition

Neuro-cryptanalysis is to perform the Cryptanalysis of cryptographic algorithms with the use of neural networks. I.e. to achieve one or more neural networks to find or help find the key of an encryption algorithm.

The reader will find in Chapter 3 an introduction to applications in neuro-cryptanalysis. Neuro-cryptanalyseur means then a system performing the Cryptanalysis of a cryptographic algorithm, this system is a hardware or software program containing at least a neural network useful in cryptanalysis in question.

5.3 General principle

The important principle is the presentation to the network of neurons a ciphertext and the encryption algorithm.

In neuro-cryptanalysis, the neural network must help find the encryption key used in the cipher text, figure 5.3.1 shows a possible architecture of neuro-cryptanalyseur.

5.3.1 shows a possible architecture of neuro-cryptanalyseur. Figure 5.3.1 - Overview of neuro-cryptanalyseur According to

Figure 5.3.1 - Overview of neuro-cryptanalyseur

According to Chapter 2, a neural network can learn a cryptographic algorithm or can 'remember' (by a function approximation) a set of keys, I therefore infer that the neuro- cryptanalyseur can be broken down into 2 subnets of neurons as follows:

cryptanalyseur can be broken down into 2 subnets of neurons as follows: Figure 5.3.2 - a

Figure 5.3.2 - a neuro-cryptanalyseur learning

This neural network structure is identical to that of self-learning in paragraph 4.5.

Applications carried out in the following paragraph will allow to check learning described in Chapter 2.

It is clear that neural networks can take an important place in cryptography in the design, use, and verification of protocols presented in Chapter 3.

5.4 Applied Neuro-cryptanalysis

5.4.1 Neuro-Cryptanalysis of the Vigenère

This figure, as well as its cryptanalysis are explained in paragraph 3.4.2.

To neuro-cryptanalyser such a algorithm, should our neural network performs either a frequency analysis or one analysis of a subset of n characters of a given language, and then measure the correlation between the plaintext and the ciphertext learned for all subsets of n characters.

This type of problem is resolvable by a neural network but would be very long in supervised learning. However, it is possible to carry it out in self-learning mode but the ciphertext should be large enough.

5.4.2 The Neuro-differential cryptanalysis of DES

Differential cryptanalysis is described in Chapter 3.6.1.

To better understand the information given by the tables of differences by BIHAM and SHAMIR distributions, we have generated for each S-table, the tables with x-axis values of outputs of the S-table and ordered the bits of S-table entries. These tables are in Appendix 5, it can therefore be directly see the probabilities p= Table [P'] [bits of X'] / 64 to have any particular bit depending on the value of output.

-What is presenting pairs of plaintexts input and pairs of ciphertexts in output of an S-to a neural network, would be close probability or no previous tables for each of the input bits?

We have to create a network of neurons with 16 input bits (each of these bits is a value of output among the 16 who are the category of preceding tables) and 6 output neurons giving the probability of having a 1 on one of the 6 bits of S-table entries.

of having a 1 on one of the 6 bits of S-table entries. Figure 5.4.2.1 -

Figure 5.4.2.1 - use of the differential neuro-cryptanalyseur

For examples using learning algorithm and realization of this neural network, you can read the code C in annex 1. 5.4.2.1 Figure presents the neuro-cryptanalyseur after learning, it returns information about the probability of having one bit to 1 of P' . One gets not directly of probabilities on the bits of the subkey, just make a XOR between the bits of the input text pair and those calculated for information on the bits of the subkey.

The neural network, at the end of 10 presentations of 4096 examples (pairs of texts among 64 S-table entry texts), gives the results contained in the table in annex 6. Just increase the number of presentations to get more accurate probability values. Note that the obtained probability exactly match the values given by the classical method of differential cryptanalysis.

The advantage of the neural network is its concentration of the set of S-tables-specific statistical matrices and massively parallel operation which allows to calculate the 8 S 8 cryptanalyseurs neuro - tables simultaneously.

5.4.3 Neuro-linear Cryptanalysis of DES

Linear cryptanalysis is described in section 3.6.2.

The neural network will generate all quadratic forms for obtaining information outputs on the basis of its inputs, which amounts to generalized linear Cryptanalysis of of.S., generalized linear cryptanalysis looks up information about the key from the study of the rounds of des and more precisely of its S-tables which is different from the global study of the cryptosystem by our neuro-cryptanalyzer.

global study of the cryptosystem by our neuro-cryptanalyzer. Figure 5.4.3.1 - use of the linear neuro-cryptanalyseur

Figure 5.4.3.1 - use of the linear neuro-cryptanalyseur

Unlike differential neuro-cryptanalysis, it shouldn't try to simplify the tables of linear approximations because make the sum of the probabilities for each bit would be a loss of information. Indeed, these sums are all almost equal. On the other hand, should create a network of neurons with 16 input bits (each of these bits is a value of output among the 16 who are the category of preceding tables) and 6 output neurons giving the probability of having a 1 on one of the 6 bits of S-table entries. The advantage of the neural network is that it refers to excellent values of probabilities. You can check the correlation between the bits of the neuro-single tables and tables of linear approximations input values for each output value.

The basis examples learning algorithm and realization of this neural network are in annex 1.

5.4.3.1 Figure presents the neuro-cryptanalyseur after learning, it returns information about the probability of having one of the 6 bits of the S-1 table entries. One gets not directly of

probabilities on the bits of the subkey, just make a XOR between plaintext bits, and those calculated for information on the bits of the subkey.

The results are given in annex 7. Just increase the number of presentations to get more accurate probability values.

5.4.4 Overall Neuro-Cryptanalysis of the crypt (3) UNIX

The command of Unix crypt (3) or ufc_crypt (ultra fast crypt) is an implementation of the des used in the encryption of passwords stored in the/etc/passwd file, a little special in the direction where the key is unknown to the user, no one has the ability to perform decryption of password. This key is specific to the Unix system in use. The goal is not to find the clear password. It is encrypted with the same key given clear password and compare it with the password from the/etc/passwd file. If they are identical, the user is authenticated and access to its own account.

Crack is an application seeking the passwords of users on a Unix server. Its role is to generate

a clear passwords set on the basis of a multitude of syntactic rules and/or from a dictionary. It takes several hours to several days to penetrate a system and then retrieve the password file and search out others.

We thought it would be interesting to learn a certain amount of passwords clear and encrypted passwords corresponding to a neural network. The basis of learning should be large enough so the D.E.S. learning does not become a memorization of the examples of this basis, what makes that the network would be unable to find the solutions to other nearby examples of the database.

We have therefore made two applications. A UNIX (or GNU Linux) synthesizing the crypt function of unix for password clear of 4 characters whose values are a lowercase letter or a point or a division, or about 615000 passwords bar and 2 hours of calculations per presentation. The other is MS-DOS, she realizes learning 1024 clear passwords of 7 characters and passwords encrypted in 11 characters (we remove the first 2 characters of salt used to re-encrypt the password encrypted for 65536 encrypted different passwords for the same clear text).

We have added a visualization program of the first graphical statistics. The second provides quick information.

The source and the results are available in the annex.

5.5 Analysis of the results of cryptanalysis

The neuro-crypanalyses differential and linear methods are methods of probabilistic calculations to quickly get information about a part of des. They allow to perform the opposite

function of a S-table for a difference of texts chosen for one and for a linear relationship with

a

subkey selected for the other. Such neural networks learning is very fast.

It

is possible to gather for a method given, differential or linear, 8 x 16 = 128 neural networks

(one for each S-tablesnew each round) and to operate in parallel to the information given by the ciphertext output of des to the plaintext input. Thus these networks may be supervisors of

other non-supervised learning neural networks amending the bits of the key as and as different

texts passes through the D.E.S we find the encryption key.

Would be a self-learning of the sub-keys. From the sub-keys,

Statistical analysis of the program under MS-DOS version results are surprising with 90% of the encryption function for the base found by the network of neurons and about 80% of bits to a close this basic example but not submitted to the network. This proves that for a low basis of learning, it is easy for a neural network to find a clear password from a password encrypted without taking into account salt included by the Unix system.

5.6 Hardware implementations

There are 2 possible hardware implementations. One is based on existing architectures and more precisely consists of an implementation on machine massively parallel type MASPAR

or Connection Machine (characteristics of these machines are given in annex 9).

The other is based on the design of architecture dedicated cryptanalyser encryption algorithm.

5.6.1 Dedicated Machine

The idea is to present a strong ciphers with a very fast supervised learning neuro- cryptanalyseur. As we show in paragraph 4.6.2 it is necessary to present all plaintexts, ciphertexts and keys to the neural network. 5.6.1.1 Figure shows the overview of learning dedicated to an encryption algorithm.

overview of learning dedicated to an encryption algorithm. A complete machine can be constructed on the

A complete machine can be constructed on the same pattern with a large number N of units of

binary counters (120 bits: 64 bits of text and 56 bits of key) and circuits with the encryption algorithm (for the D.E.S., AMD has built a circuit at a clock frequency of 250 MHz arsenide

approximately 5.10 9 encryptions per second). The number N is limited by the time of learning

of the single neural circuit of approximately 1 s. Each unit has less than 14 ns.

For des, the time interval between each unit is necessarily 1s. What gives 10 6 learning per

second to learn the 2 56 possible keys. Either 10 30 s for all possible values of text and key, or

4 22 years for a presentation. If the neural circuit took 14 ns, should be 3 18 years.

In the case of a single key, it would take 41 years and for a single text should be 2 months.

While the exhaustive search for a key takes 3.5 hours for a dedicated non-neuronal machine which would cost 5 million francs.

Nevertheless, it is possible that the neural circuits of the future go much faster. It is preferable for the D.E.S., treat a fixed data subset as we have done in paragraph 5.4.4.

5.6.2 Algorithm for the Connection Machine CM-5

The following algorithms was written for the distributed architecture of the CM-5 using 3 layers of processors with a processor for a neuron. The first is used to initialize input (plaintext) and output (ciphertext) network of neurons located on layers 2 and 3. We use duplicated and used in layers 2 and 3 variables. NB_ENTREES, NB_CACHEES, NB_SORTIES, EPSILON are constants that define the number of entries, hidden layer neurons and output of the neural network and the coefficient of learning. So in a single processor is:

NB_ENTREES poids_cachee for the weight of the hidden layer in a processor;

1 seuil_cachee; 1 activation_cachee; 1 delta_cachee;

NB_CACHEES poids_sortie; 1 seuil_sortie; 1 activation_sortie; 1 delta_sortie.

Before you start, we initialize the weights of the connections with random values.

Repeat to infinity do

generate key & texte_clair in M

For i = 0 to NB_CACHES-1 Do issue M to all layer 2 end processors

encrypt M with the encryption algorithm in C

For i = 0 to NB_SORTIES-1 Do issue C to all layer 3 end processors

Finrepeter

Figure 5.6.2.1 - algorithm of the first layer of processors

It defines a small macro: bit (i, m) {return (! ())}m & (1 < < i))); } for the following algorithm.

Repeat to make infinity

Integer tempo [i];

Floating output, error;

receive the layer 1 M

output = 0.0

For i = 0 to NB_ENTREES-1 If (bit (i,M)) then exit += poids_cachee [i]; tempo [i] =M end

Activation_cachee= sigmoid (output-seuil_cachee)

For i = 0 to NB_SORTIES-1 do activation_cachee transmitting to layer 3 end

error= 0.0

For i = 0 to NB_SORTIES-1 do

receive M layer 3 / * poids_sortie for this hidden below neuron *

error=error+ receive M layer 3 / * delta_sortie * /.

End

delta_cachee = error*activation_cachee*(1-activation_cachée ))

For i = 0 to NB_ENTREES do poids_cachee[i] =poids_cachee[i] +EPSILON * delta_cachee * tempo[i]

seuil_cachee = seuil_cachee - EPSILON * delta_cachee

Finrepeter

Figure 5.6.2.2 - algorithm of the second layer of processors

Repeat to make infinity

Floating F, exit, tempo[NB_CACHEES];

output = 0;

For i = 0 to NB_CACHEES-1 do

receive F's Layer 2

Tempo[i] =F

output = output + poids_sortie[i] *F

End

activation_sortie = sigmoid (exit - seuil_sortie ));

receive F's Layer 1 / * activation of the learning values *

delta_sortie = (F - activation_sortie) * activation_sortie *(1-activation_sortie)

For i = 0 to NB_CACHEES-1 do

issue delta_sortie to layer 2

poids_sortie[i] = poids_sortie[i] + (EPSILON*o_delta * tempo[i]);

End

seuil_sortie = seuil_sortie - EPSILON * delta_sortie

Finrepeter

Figure 5.6.2.3 - algorithm of the third layer of processors

Procedures making (non blocking) and receiving (blocking) a message through the lines of communications to 40 MB/s allow a low of timeout.

It is likely that examples learning time is longer than for the dedicated machine of the preceding paragraph.

5.7 Performance

Learning time is quite long (from several days to several years), but interesting results (error rate is close to zero) are available in short presentations when the basis of examples is large enough (which is the case of algorithm strong such as D.E.S. or R.S.A. then be simple as the XOR operations for between 200 and 500 submissions for an error rate of zero.

However, once the learning is done, the deadline for passage of information through the network of neurons is very short (in the order of tens of nanoseconds). What is prodigious when we know that it must repeat an exhaustive search for each text encrypted with a different key.

5.8 Conclusion

We have seen in this chapter, the neuro-Cryptanalysis of strong encryption, the general principle and a study based on neural networks, whether learning the keys on the basis of clear and encrypted texts. We describe applications. We present differential neuro- cryptanalysis and linear neuro-Cryptanalysis of des, which allowed us to measure the statistical performance of neural networks that are excellent. A dedicated hardware application has been described. A set of very satisfactory performance on a basis of learning of low size.

Chapter 6 - Glossary and Mathematics

6.1 Introduction

This chapter is part of this memory mainly to complement the terminology used in the previous chapters. Just bring the reader to the clarification in the fields of theory information, the complexity of algorithms and number theory. All the above-mentioned points are widely used in Cryptography.

6.2 The information theory

Quantification of information

This is the minimum number of bits to encode all possible meanings of information.

The entropy H (M)

It is a measure of the amount of information contained in a message M.

In general, H (M) = Log 2 (n) where n is the number of possible meanings.

The uncertainty

This is the number of bits of the plaintext which must be found to help locate the plaintext in an integer from the ciphertext.

The rate of the r language

r = H (M) / n where N is the length of the message in characters of the language (in

bytes).

The absolute rate R

R = Log 2 (L) where L is the number of characters in the language. R is in

bits/character.

Redundancy D

D = R - r

The entropy of a cryptosystem H (K)

H (K) = Log2 (number of possible keys)

The number of different keys to decrypt a message

The Unicity distance (point of uniqueness)

u = H (K) /D

The confusion

Due to erase the relationship between plaintexts and ciphertexts (example: overriding)

The dissemination

Fact disperse the redundancy of the text (example: transposition or permutation of

blocks)

6.3 The complexity of algorithms

The complexity of algorithms corresponds to 2 parameters: T the time complexity and S complexity in space (typically memory).

Denotation

O(n) : complexity of linear algorithms, n is the number of iterations

O(n 2 ) : complexity of quadratic algorithms

O(n 3 ) : cubic algorithms complexity

Previous algorithms are polynomial algorithms in time O (n t )

O( f (n) t) : complexity of exponential algorithms (t:constante, f (n): polynomial function of n)

O( f (n) t) : complexity of superpolynomiaux algorithms

(t:constante, f (n) > constant C and f (n) < O(n) ))

The classes of problems

The class least complex to most complex:

P : problems that can be solved in polynomial time.

NP : problems that can be solved in polynomial time on a non-deterministic TURING machine (variant of the normal TURING machine who guess solutions).

NP-complete : problems that can be solved in polynomial time on a non-deterministic TURING machine, including the class P (addition of a set of elements checking the P class).

Also PSPACE : problems that can be solved in polynomial space and variable time.

Also PSPACE-complete : problems that can be solved in polynomial space and variable time.

EXP TIME : problems that can be solved in exponential time

6.4 The number theory

Congruences

(a + b) mod n = ((a mod n) + (b mod n)) mod n, same with (a-b) and (a * x)

(a *(b+c)) mod n = (((a*b) mod n) + ((a*c) mod n)) mod n

If (a mod n) then (a x mod n) with natural whole x

The primes

It is a number integer > 1 whose only factors are 1 and itself. For more details on primes and their cryptographic applications, see (KRANAKIS 1986).

The inverses modulo n

The goal is to find x such as 1 =(a*x) mod n or a -1 = x mod n

There is not solution, but in general, there is a single x if a and n are coprime between them.

The resolution of this problem is obtained by using the extended Euclidean algorithm and its complexity is O (log2n). For more details see (SCHNEIER 1995, pages 209-210) and (KNUTH 1981).

FERMAT's theorem

If m is Prime and a is not multiple of m, then m-1 = 1 mod m.

Residues modulo n

These are the remains of the subtraction of one number by n

Residues restricted

These are the remains of the subtraction of a number n that are coprime to n.

N EULER function (indicator of EULER, n)

It is the cardinal of the restricted set of residues modulo n, this function is denoted (n).

(n) is the number of positive integers smaller than n and coprime to n.

If first n, (n) = n-1 and if n = p * q where p and q are first then (n) =(p-1) *(q-1).

Or pgcd(a,n) = 1 and (a * x) mod n = b, calculate x:

-by the generalization of Euler: x = (b * exp(a(n)-1 mod n)) mod n

-by Euclid's algorithm: x = (b * reverse (a, n)) mod n.

see (SCHNEIER 1995, pages 212-213)

The Chinese remainder theorem

A few are has and (b) such as a < p and b < q (p and q first), there are unique x such as x < p *

q and as x = a mod p and x = b mod q.

By Euclid, calculating u as u * q = 1 mod p which gives us x = (((a-b) * u) mod p) * q + b

Details and code in C (SCHNEIER 1995, pages 213-214)

The residuals squared modulo p

If p Prime, has < p then a is residual squared modulo p If x 2 = a mod p for some x.

The LEGENDRE symbol

It is noted L (a, p) or (a/p) with a whole natural and p Prime > 2.

We then obtain: L (a, p) = 0 if a is divisible by p.

L (a, p) = 1 if a is a square modulo p

L(a,p) =-1 if a is not a residue quadratic modulo p

To calculate, it has the formula L(a, p) = a (p-1) / 2 mod p

There are also the following recursive expressions:

If

a = 1, L (a, p) = 1

If

a is even, L (a, p) = L (a/2, p) *(-1) (p * p - 1) / 8 else L (a, p) = L(p mod a,a) *(-1) (a-1) *(p-1)/4

The JACOBI symbol (Jacobian)

Noted J (a, n), it is a generalization of L (a, n). To compute,

If n is Prime, J (a, n) = 1 if a is residual squared modulo n

J(a,n) =-1 if a is not residual squared modulo n

If n = p 1 *

* p m (p m is a factor n Prime),.

J (a, n) = J(a,p1) *

* J (a, p m )

If a = 0, J(0,n) = 0

It follows the following properties:

J(1,k) = 1; J (a * b, k) = (a, k) J * J (b, k);

J(2,k) = 1 if (k2-1) / 8 is peerJ(2,k) =-1 if (k 2 -1) / 8 is odd;

J (a, b) = J ((a mod b,b), useful if a > b; )

If pgcd(a,b) = 1 and a, b odd then

If (a-1) *(b-1)/4 pair is then J(a,b) = J(b,a) if J(a,b) =-J(b,a)

BLUM integers

If p and q are coprime and p = 3 mod 4 and q = 3 mod 4 then n = p * q is a BLUM integer.

Residues squared modulo n 4 square root which is also a square, it is the principal square root.

Generators

If p is Prime, g < p then g is a generator modulo p if whatever n (1, p-1), there is a as g a = n mod p (g is primitive compared to's).

, /q mod p, if the result is 1 for a first q-factor then g is not generator modulo p.

If you know the decomposition into factors first p - 1: q 1 , q 2 ,

q n so for all q n , computes g (p-1)

The body of GALOIS

Arithmetic modulo n, if n is Prime, is a finite field. Similarly if n is an integer power of a prime number. If p is Prime, a body of GALOIS is Z/p. Addition, subtraction, multiplication, division work with 0 neutral element of addition, 1 neutral element of multiplication. Whatever p0, there are p'= 1/p. On a commutativity, association and distributivity.

Z/2 n (body Z/q n )

Let p(x) be a polynomial p (x) irreducible of degree n, the "generators" polynomials in a given body are primitive polynomials. In Z/2 n , in cryptography, we use much p (x) = x n + x + 1 because multiplication and exponentiation are very effective and the physical implementation is easy with shift registers.

The factorization

The best algorithms for factoring numbers are as follows:

Quadratic sieve: the number of operations is e (ln n) ½ * (ln (ln n)) ½ ., the fastest, see (POMERANCE 1985), (POMERANCE 1988) and (WUNDERLICH, 1983).

Screened on digital bodies: the number of operations is e (ln n) 1/3 * (ln(ln n)) 2/3 , see (LENSTRA

1993).

Methods of elliptic curves. See (MONTGOMERY 1987) and (MONTGOMERY 1990).

Algorithm Monte Carlo of POLLARD. See (POLLARD, 1975), (BRENT, 1980), (KNUTH, 1981, page 370).

Algorithm of continued fractions. See (KNUTH, 1981, pages 381-382)

Attempt of divisions: divisions of the number by all lower primes.

Chapter 7 - Conclusion

We presented the neural networks, defined and determined which model of neural networks the most appropriate Cryptography on algorithmic learning plan and material terms as regards architectures already carried out and observed performance.

The most interesting Connectionist model turns out be the network of perceptrons with back- propagation of the gradient through the various properties that were analyzed and demonstrated by different scientists:

their generalization property

their low sensitivity to noise (if an error sneaks into the basis of examples)

their low sensitivity to fault (lost connections, modified weight or bug in the program)

information are outsourced

Research of statistical calculations and heuristics capabilities

We presented the structure of the model chosen in the following figure:

the structure of the model chosen in the following figure: This architecture can also be software

This architecture can also be software than hardware. Neural networks have already been implemented on machines massively parallel.

An analysis of linear multilayer networks showed us the analogies with different statistical methods of analysis of the data, in particular linear regression and discriminant analysis. It has been shown that the backpropagation is a discriminant analysis of a population of N individuals (N being the number of examples included in learning) described by n parameters (wheren is the number of input neurons) and projected in a hyperplane of dimension p (wherep is the number of hidden units). It is therefore possible to use non-linearly separable problem to build a classifier where a probabilistic model. Which proves the interest of such an algorithm in cryptography and especially cryptanalysis.

On the hardware side, the benefits of the VLSI components are:

ease of use

the important signal-to-noise ratio

an easy-to-implement cascade circuit

a high adaptation (these circuits allow to solve various tasks)

a reduced price of manufacturing

We presented then the three types of existing components on the market or research laboratory:

1. components dedicated to digital neural which speeds network go up to 1 GB of connections processed per second.

2. the digital coprocessors particular purpose (also called neuro-accelerators) are special circuitry that can be connected to hosts (PCs or workstations), they work with a neuro- simulator program. The mix of hardware and software aspects gives these benefits:

accelerated speed, flexibility and improved user interface.

3. networks of neurons on massively parallel machines.

An implementation of the algorithm has been developed on the Connection Machine CM-2 (created by THINKING MACHINES Corp.) with a topology hypercube 64 k processors, which gave 180 million interconnections calculated per second (IPS) or 40 million weight updated per second.

Here is the performance measured by machine in interconnections calculated by seconds (figure below).

CM-2

180 million

CRAY X - MP

50

million

 

WARP (10)

17

million

ANZA

10

million

MORE

 

The use of such configurations would allow to obtain excellent results in learning of cryptographic ciphers.

We have seen that Cryptography is a very large and popular area of mathematicians and computer scientists. We had the force of a cryptosystem which depends entirely on the used key whether it be public or private and exchanges cryptographic protocols. We have chosen to focus on the realization of neural and neuro-Cryptanalysis of cryptosystems.

Our work specifically concerned the ECB mode which is more suitable for learning of the networks of neurons with an entry and a number of bits output fixed and not loop re-inbound. It is also possible to connect one or more networks of neurons in this way.

We have chosen to tackle the D.E.S. because it is the older standard of encryption and the most studied algorithms.

The physical aspect is very important for the speed of execution. The VLSI components are widespread and effective but there are even more interesting technology-based components that should not be disregarded: the Gallium Arsenic (GaAs) or arsenide technology. It has already been included in supercomputers.

The major differences between GaAs and VLSI are:

Fast failover of the GaAs doors

The Exchange with components other than GaAs is a major difficulty

Very small density of GaAs integrated circuits

With regard to the D.E.S., there is a circuit running at 50 MHz performing encryption in 20 ns, which allows to make 50 million of ciphers in a second. Since late 1995, AMD sells a circuit encrypting the of 250 MHz.

In August 1993, the Canadian Michael J. WIENER described how to build a machine for $ 1 million that performs a comprehensive search of des keys to find the right key in 3.5 hours. Each of its basic circuits has power equivalent to 14 million stations SUN.

We analyzed both as successful cryptanalysis against des.

Differential cryptanalysis that is to look at the specifics of a pair of ciphertexts for a pair of plaintexts with a particular difference.

Force of residing in his rounds and all operations of a round being completely linear except S- tables, Eli BIHAM and Adi SHAMIR analyzed 8 S-tables for text input differences and differences in output texts, these information are synthesized in 8 tables called Tables of distribution of differences of the (see 8 tables in annex 3). We realized the algorithm to generate these tables.

Linear cryptanalysis is to study the statistical linear relationships between a plaintext bits, the bits of the ciphertext and key which allowed to encrypt. These relationships allow for some bits of the key values when we know the plaintexts and ciphertexts associated. It deduced the linear relationships of each S-table by choosing a subset of bits of input and output bits, calculating parity (Xor) of these bits with parity of the subset is zero. In general, a subset will be entries with parity 0 (linear) and others with parity 1 (affine). MATSUI has calculated the number of parity zero of each subset of bits of input and output for each S-table amongst the 64 x 16 = 1024 possible subsets. It is possible to associate different probabilities to various bits of the sub-keys. Probabilities of obtaining parity zero (linear relationship) are synthesized in 8 tables called Tables of linear approximations of a (see 8 tables in annex 4). We realized the algorithm to generate these tables.

After showing the possible association between neural networks and cryptography, we defined the field of neuro-Cryptography.

We then identified some important points for the correct use of neural networks. How the basis of learning will be generated is very important for the realization of neural applications. Learning depends on random initialization of weights the network as well as the number of examples, the order of presentation of these examples then the consistency in the choice of a set of examples.

We have seen that a sample consists of a value to be presented at the entrance to the network of neurons and a value to present output of this network, output based on the input value. If the number of examples is too low, it is clear that the network will not seek a transfer function of the studied cryptosystem but will instead store the examples given and cannot therefore in

any way find a result for an input value different from those given in the basis of examples. In cryptography to present more than half of all possible to be certain of the results examples even if it is true that in strong cryptography, the number of possible input values is very large.

Then we realized an algorithm to present the examples in a more or less complete mess. It's cutting the base k sub-bases then in turn present the elements of each of the sub-bases (k can be even or odd). The following figure shows the error rate final Tss for k different values (the number of presentation being fixed at 500 and 256 examples).

We note that the order of presentation of the basis of learning is not useful.
We note that the order of presentation of the basis of learning is not useful.
k
1
2
3
4
5
6
7
8
TSS
0.05
0.06
0.06
0.05
0.08
0.07
0.05
0.08

At the level of the automatic generation of contiguous texts, we presented an algorithm that can generate clear examples regardless of the number of nested loops to a single body of loop which will be executed on each iteration of the innermost loop.

We analyzed the coefficient of learning to enable a more or less rapid learning with opportunities for convergence of the network to an inversely proportional solution due to local minima of the curve of error measured by the basis for learning and values output calculated by the neural network.

Should empirically vary Epsilon between 0.1 and 2.0. If the network doesn't want any similarly converge, it is certainly due to the problem of the non-linearly separable, which is the case of learning of the XOR. Should then use a Momentum Term whose real value is between 0.1 and 1.0 and which will aim to avoid local minima by deriving the error function, meaning that it allows to take into account in the current step of learning from previous steps.

We presented the self-study which is interesting for neuronal learning of cryptographic algorithms. The neuronal system has two parts: the emulator and the controller whose learning are carried out separately.

The task of the emulator is to simulate the complex function or the encryption algorithm. There therefore its entry State at any given time and an input at this time and its output is the output of the algorithm to the following time. The input of the controller is the State of the system at time k, its output is the value to be input to the algorithm or the function complex. The proper role of the controller is to learn the law of adaptive control. But for this learning, the error signal is not calculated on the order but on its result, the gap between actual condition and current state. It comes to the idea of a guided rather than supervised learning because no Professor learns the System Control Act. In fact, the system learns itself in dealing with the information he receives in return for shares. To make possible learning through backpropagation and retropropager error on the position, the structure of the emulator must be homogeneous at the controller.

Another quality of this device is its ability to e-learning. Learning of the controller is fast. In addition, the law of synthesized control is sufficiently robust to small random perturbations. It is therefore possible to perform neural networks for self-learning on a line of communication for encryption as for authentication of messages in real time.

We presented several different applications. On learning of the XOR, i.e. to achieve C = A XOR B, need us a network 16-bit input (i.e. 2 bytes A and B) and 8-bit output (a byte C). The network must therefore be 16 neurons input, 16 minimum layer (s) (s) hidden neurons and 8 output neurons. The broadcast consists of 65536 causes - effects. After various tests, the success to the XOR learning rate is very close to 100% depending on the random weight initialisation and the number of submissions. More the number of entries and hidden layer neurons are great, plus the number of presentations of the base can be reduced. If the random initialization of the weight is correct, a single submission can be sufficient and better quality.

For the learning of cryptographic algorithms, we have shown that whether a function or an algorithm for combining data entries (causes) for output data (effects). It is therefore to determine input and output of the network structures and to find a basis of causes and associated effects sufficient to learning of the network converges to a minimal amount of errors, or even almost.

converges to a minimal amount of errors, or even almost. We have shown how to make