Maîtrise Informatique
Option Microinformatique Microélectronique
Année 1995  1996
Université Paris 8
Département MicroInformatique MicroElectronique
2, rue de la Liberté 93526 SAINT DENIS CEDEX 02
Applied Neurocryptography
by
Sébastien Dourlens
To all beings thinking
That a secret can be kept longtime
Summary
This memory is carried out in the context of the master MIME at the Paris 8 University.
Its purpose is to do research on the neuronal applications to Cryptography.
Searching for existing shows that no thesis, no Conference summary report, no work and no internet (Web pages and news) user information and applied neural networks to Cryptography.
We then thought it would be interesting to define a new field called neuroCryptography whose aim is the use of neural networks to encrypt a message, decrypt a message or Exchange messages in the network. The cryptogaphie contains another area to study in a probabilistic manner the strength and weaknesses of an encryption algorithm, it comes to cryptanalysis. Neural networks can play a decisive role in this area, this is why we have also defined the neurocryptanalysis.
The areas of Artificial Intelligence with the networks of neurons, cryptography and cryptanalysis have long been highly studied by universities around the world and, among other things, by enterprises in electronic circuit design.
We begin by choosing the model and the most efficient learning of neural networks from its qualities of synthesis of complex functions and statistical analyses. This model is the network of perceptrons with backpropagation of the gradient. The hardware realization should not be neglected since cryptography requires a great speed of learning, which is a function of the number of keys and possible texts.
We have added elements allowing the creation of hardware architectures.
Then we choose the field of cryptographic applications, it is primarily the study of des (Data Encryption Standard) and its cryptanalysis.
Then we test and measure the performance of neuroCryptography and neurocryptanalysis which prove be quite interesting from all points of view. The calculation time can be improved by design of machine architecture dedicated to the learning of cryptographic algorithms through arsenidebased components or of massively parallel machines as it has already been done for neural networks and the D.E.S. but separately.
In regards to neuroCryptanalysis of des, realize us a neurocryptanalyseurs differential and linear studying the probabilities to get the entries of the Stables based on outputs, allowing us to obtain characteristics for an unknown subkey.
This line of research is open now, should continue to the coherence between the neural network to learning of the globality of the cryptosystem and the neurocryptanalyseurs of the internal structure of this cryptosystem which are very fast in learning. Another reason is the ability of synthesis of the gradient backpropagation network.
Thanks
I want to thank my research director Mr. Christian Riesner researcher in Artificial Intelligence specializing in neural networks.
Thanks to teachers  researchers from the Department of Micro computer  Micro Electronics of the University Paris 8.
Thanks again to students, researchers and professors of universities that have provided me information valuable and useful for this memory.
Table of contents
1 Introduction
1.1 Searching for existing
1.2 Neural networks
1.3 Contemporary Cryptography
1.4 The neuroCryptography applied
1.5 The memory map
2. Neural networks
2.1 Introduction
2.2 Basic concepts and terminology
2.3 The situation presents
2.4 Neural networks are used in Cryptography?
2.5 What types of neural networks use in Cryptography?
2.6 The model structure of perceptrons with backpropagation of the gradient
2.7 The gradient backpropagation algorithm
2.8 Analysis of linear multilayer networks
2.8.1 Problem of the linear perceptron multilayer
2.8.2 Discriminant analysis of rank p
2.8.3 Incremental learning of the hidden layer
2.8.4 Relations with the principal component analysis
2.9 Material
2.10 Conclusion
3. The Cryptography
3.1
Introduction
3.2
Definitions
3.3 Contemporary Cryptography
3.3.1 The cryptosystem and strength
3.3.2 Protocols
3.3.3. The types of attacks in cryptanalysis
3.4 Cryptographic algorithms
3.4.1 The coding of blocks and the stream encoding
3.4.2 The number of Vigenère
3.4.3 The strong figures
3.5 Reference: the Data Encryption Standard (des)
3.5.1 History
3.5.2 Architecture
3.5.3 Cryptanalysis
3.5.4 The physical aspect
3.6 The Cryptanalysis of des
3.6.1 Differential cryptanalysis
3.6.2 Linear cryptanalysis
3.7 Conclusion
4. The NeuroCryptography
4.1 Introduction
Can 4.2 I bind the Cryptography and neural networks?
4.3 The new definitions
4.3.1 The neuroencryption or neuroencryption
4.3.2 The neurodecryption or neurodecryption
4.3.3
The neurogenerator
4.3.4
Neurocryptanalysis
4.4 The generation of bases of learning
4.4.1 Examples
4.4.2 Order of presentation
4.4.3 Automatic generation of texts
4.4.4 The coefficient of learning
4.5 Selflearning
4.6 The realization of applications
4.6.1 The learning of the exclusive or (XOR)
4.6.2 The learning of cryptographic algorithms
4.6.3 Key learning
4.7 The advantages and disadvantages
4.8 Conclusion
5. The Neurocryptanalysis
5.1 Introduction
5.2 Definition
5.3 General principle
5.4 Applied Neurocryptanalysis
5.4.1 The NeuroCryptanalysis of the Vigenère figure
5.4.2 The Neurodifferential cryptanalysis of des
5.4.3 The Neurolinear Cryptanalysis of des
5.4.4 Overall NeuroCryptanalysis of the crypt (3) UNIX
5.5 Analysis of the results of cryptanalysis
5.6 Hardware implementations
5.6.1
Dedicated Machine
5.6.2 Algorithm for the Connection Machine CM5
5.7 Performance
5.8 Conclusion
6 Glossary and math basics
6.1 Introduction
6.2 The information theory
6.3 The complexity of algorithms
6.4 The number theory
7 Conclusion
Bibliography
Neural networks
Cryptography
Mathematics
HTML pages and newsgroup on the internet Annexes
1 C sources
The gradient backpropagation neural network
The figure of Vigenère or XOR single
Cryptanalysis of the Vigenère figure
The code of the D.E.S.
Learning of the XOR in disorder
Automatic generation of basis for learning of the D.E.S.
The generation of tables of differences in des distributions
The generation of tables of linear approximations of des
Neuronal functions library
The differential des neurogenerator
The linear neurogenerator of the D.E.S.
2. The neural circuits
3. The tables of differences in des distributions
4. The tables of linear approximations of des
5. Tables simplified distributions of differences
6. The tables of the neurocryptanalyseur differential
7. The tables in the linear neurocryptanalyseur
8. The measures of learning of the XOR tables
9. The massively parallel machines
Chapter 1  Introduction
1.1 Searching for existing
The purpose of the memorandum is to research on neuronal applications allowing for Cryptography.
Searching for existing shows that no thesis, no Conference summary report, no work and no internet (Web pages and news) user information and applied neural networks to Cryptography.
Indeed, David Pointcheval de l'Ecole Normale Supérieure de Paris is served by the problem of the perceptron to create an authentication protocol or it was an only mathematical and theoretical study.
The areas of Artificial Intelligence with the networks of neurons, cryptography and cryptanalysis have long been very studied by researchers at universities around the world and among other electronic circuits design firms.
We then thought it would be interesting to define a new field called neuroCryptography whose aim is the use of neural networks to encrypt a message, decrypt a message or Exchange messages in the network. Cryptography contains another area to study in a probabilistic manner the strength and weaknesses of an encryption algorithm, it comes to cryptanalysis. Neural networks can play a decisive role in this area, this is why we have also defined the neurocryptanalysis.
1.2 Neural networks
We present the neural networks, define and determine what model of neural networks the most appropriate Cryptography on algorithmic learning plan and material terms in relation to already completed architectures as well as the observed performance.
The most interesting Connectionist model turns out be the network of perceptrons with back propagation of the gradient through its various properties.
These properties were analyzed and demonstrated by different scientists.
their generalization property
their low sensitivity to noise (if an error sneaks into the basis of examples)
their low sensitivity to fault (lost connections, modified weight or bug in the program)
information are outsourced
Research of statistical calculations and heuristics capabilities
We present the structure of the model chosen in the following figure:
This architecture can both be software (sequential singleprocessor computer program) material (massively parallel machines).
These machines and neural networks are two little different Connectionist approaches. The study of neural networks is equivalent to consider parallel machines interconnected except that they contain a matrix of weight compact and some "intelligence". Furthermore, neural networks have already been implemented on machines massively parallel.
An analysis of linear multilayer networks shows analogies with different statistical methods of analysis of the data, in particular linear regression and discriminant analysis. It has been shown that the backpropagation is a discriminant analysis of a population of N individuals (N being the number of examples included in learning) described by n parameters (wheren is the number of input neurons) and projected in a hyperplane of dimension p (wherep is the number of hidden units). It is therefore possible to use nonlinearly separable problem to build a classifier where a probabilistic model. Which proves the interest of such an algorithm in cryptography and especially cryptanalysis.
We need to go back to the hardware aspect if we want a faster learning of a large number of keys and texts.
The most studied are numeric VLSI, the advantages are:
ease of use
the important signaltonoise ratio
an easytoimplement cascade circuit
a high adaptation (these circuits allow to solve various tasks)
a reduced price of manufacturing
The network of neurons in VLSI implementation requires 4 blocks:
the summons (of the inputs of a neuron) with logical Adders
the multiplication (for weight) with parallel multipliers
the function of nonlinear transfer with a full circuit of calculation or a table that contains the values of the function approximations, or a circuit of calculation of approximations (for the sigmoid with 1/5 ^{t}^{h} of pas and a error of less than 13%.) Just 4 comparators and a few logic gates (ALIPPI 1990))
memorization of values (SRAM or D  RAM memories)
We present then the three types of existing components on the market or research laboratory:
1. components dedicated to digital neural which speeds network go up to 1 GB of connections processed per second
2. the digital coprocessors particular purpose (also called neuroaccelerators) are special circuitry that can be connected to hosts (PCs or workstations), they work with a neuro simulator program. The mix of hardware and software aspects gives these benefits:
accelerated speed, flexibility and improved user interface
3. networks of neurons on massively parallel machines
An implementation of the above mentioned algorithm has been developed on the Connection Machine CM2 (created by THINKING MACHINES Corp.) with a topology hypercube 64 k processors, which gave 180 million interconnections calculated per second (IPS) are 40 million weight updated per second.
Here is the performance measured by machine in interconnections calculated by seconds (figure below).
CM2 
180 millions 

CRAY XMP 
50 
millions 
WARP (10) 
17 
millions 
ANZA PLUS 
10 
millions 
The use of such configurations would get good results in learning of cryptographic ciphers.
1.3 Contemporary Cryptography
Cryptography is a very large and popular area of mathematicians and computer scientists. However, nowadays, cryptography is the study of more or less strong encryption of messages or files and study of protocols to Exchange private networks and other means of communication. Found in the study of ciphers, the means to find keys or decrease the exhaustive search of keys: it is cryptanalysis. We present the strength of a cryptosystem which depends entirely on the used key that it is public (known to all for message sending) or private (known to those who can read issued messages) and exchanges cryptographic protocols. We prefer to focus on the realization of neural and neuroCryptanalysis of cryptosystems.
Here are the different types of possible attacks in cryptanalysis:
to ciphertext only : the attacker must find the cleartext with the encrypted text. A ciphertext attack is practically impossible, everything depends on the encryption.
to knownplaintext : the attacker has the plaintext and corresponding ciphertext. The ciphertext was not chosen by the attacker but anyway the message is compromised. In some cryptosystem, a pair of encrypted text  plaintext can compromise the security of the system as well as the transmission medium.
to chosen plaintext : the attacker has the ability to find the ciphertext corresponding to an arbitrary plaintext of his choice.
to chosen ciphertext : the attacker can arbitrarily choose and find the corresponding unencrypted clear text. This attack may show weaknesses in the systems public key, and even to find the private key.
to suitable chosen plaintext : the attacker can determine the ciphertexts of plaintexts chosen in an iterative process or interactive based on the results previously found. An example is the differential.
We quickly describe modes of encryption with C _{i} which is the i ^{}^{t}^{h} message M _{i} encrypted, E the encryption function, D the function reverse for the key (or subkey) K and Vi an intermediate encrypted message:
The ECB (Electronic Code Book) mode where C _{i} = E _{K} (M _{i} ) and M _{i} = D _{K} (C _{i} )
CBC (Cipher Block Chaining) mode where C _{i} = E _{K} (M _{i} XOR C _{i}_{}_{1} ) and M _{i} = D _{K} (C _{i} ) XOR C _{i}_{}_{1}
The OFB (Output FeedBack) mode where V _{i} = E _{K} (V _{i}_{}_{1} ) and (c) _{i} = M _{i} XOR V _{i}
The Cipher FeedBack (CFB) mode where C _{i} = M _{i} XOR E _{K} (C _{i}_{}_{1} ) and M _{i} = C _{i} XOR
E K (C  i1 )
Any encryption algorithm can be implemented in these modes.
In what concerns our work, we will focus specifically on the ECB mode suited more to learning of the networks of neurons with an input and output number of fixed bits and not loop reinbound, although it is possible to connect one or more networks of neurons in this way but learning time would be quite longer.
There are simple as the figure of Vigenère (simple XOR of contiguous blocks with a same key of the same size as a block) and algorithms more complex as the R.S.A. of the name of its designers (RIVEST, SHAMIR, and ALDEMAN) and des.
One uses a public key and one private key, the other only a private key.
These are actually figures of Vigenère with a different key for each block. In the R.S.A. key uses of large prime numbers while in the D.E.S. it depends on Stables more or less linear and more or less affine.
We have chosen to tackle the D.E.S. because it is the older standard of encryption and the most studied algorithms.
The D.E.S. combines conversions and substitutions in a product code which the safety level is much higher than that of the two codes used base (text and key). These substitutions are non linear which produces a cryptosystem resistant to any cryptanalysis. He has also designed to
withstand differential cryptanalysis which was classified by the army and unknown to researchers.
It uses blocks of 64bit input L0 and R0, the length of the key K is 56bit (8byte without the last bit used for parity). This key allows to generate 16 different subkeys of 48bit K1 to K16 . Contrary to appearances, it was highly enough and it is a little less these days because it takes 2 ^{5}^{6} ciphers to find the key with an exhaustive search.
The function f is called a round,i ^{}^{t}^{h} round receives inputs the right part R _{i} (or 32 bits of the text to be encrypted) and the K _{i} subkey (48 bits). The rounds of des are detailed below. He gets out of 32 bits that are added to L _{i} . While R _{i} is passed as what L _{i} _{+} _{1} , the encrypted bits are transmitted to R _{i} _{+} _{1} (except for the final round).
The physical aspect is very important for the speed of execution. The VLSI components are widespread and effective but there are even more interesting technologybased components that should not be disregarded: the Gallium Arsenic (GaAs) or arsenide technology. It has already been included in supercomputers.
The major differences between GaAs and VLSI are:
fast failover of the GaAs doors
the Exchange with components other than GaAs is a major difficulty
very small density of GaAs integrated circuits
With regard to the D.E.S., there is a circuit running at 50 MHz performing encryption in 20 ns, which allows to make 50 million of ciphers in a second.
Since late 1995, AMD sells a circuit encrypting at 250 MHz.
In August 1993, the Canadian Michael J. WIENER described how to build a machine for $ 1 million that performs a comprehensive search of des keys to find the right key in 3.5 hours. Each of its basic circuits has power equivalent to 14 million stations SUN.
It seems so obvious that the exhaustive search is faster to perform types of cryptanalysis because even if the number of attempts is less, the search time is much longer, cryptanalysis is still very interesting to measure the performance of cryptographic algorithms.
We analyze then both as successful cryptanalysis against des.
Differential cryptanalysis is to look at the specifics of a pair of ciphertexts for a pair of plaintexts with a particular difference.
It analyses the evolution of these differences when the plaintexts spread through rounds of DES to be encrypted with the same key.
After randomly choosing a pair of plaintexts with a difference set, calculate the difference in the resulting ciphertexts. Using these differences, it is possible to associate different probabilities to various bits of the subkeys. Plus a large number of ciphertexts is analyzed, most most likely encryption key will emerge.
Force of residing in his rounds and all operations of a round being completely linear except S tables, Eli BIHAM and Adi SHAMIR analyzed 8 Stables for text input differences and differences in output texts, these information are synthesized in 8 tables called Tables of distribution of differences of the (see 8 tables in annex 3). We realized the algorithm to generate these tables.
Linear cryptanalysis is to study the statistical linear relationships between a plaintext bits, the bits of the ciphertext and key which allowed to encrypt. These relationships allow for some bits of the key values when we know the plaintexts and ciphertexts associated.
It deduced the linear relationships of each Stable by choosing a subset of bits of input and output bits, calculating parity (Xor) of these bits with parity of the subset is zero. In general, a subset will be entries with parity 0 (linear) and others with parity 1 (affine).
MATSUI has calculated the number of zeroparities of each subset of input bits and output for each Stable amongst the 64 x 16 = 1024 possible subsets. It is possible to associate different probabilities to various bits of the subkeys. Probabilities for a parityzero (linear relationship) are synthesized in 8 tables called Tables of linear approximations of a (see 8 tables in annex 4). We realized the algorithm to generate these tables.
1.4 Applied neuroCryptography
After showing the possible association between neural networks and cryptography, we define the field of neuroCryptography. All terms used in Cryptography must be preceded by the particle "neuro" where the cryptosystem contains one or more networks of neurons or one or more elements of the network as for example the perceptron.
We then analyze some important points for the correct use of neural networks.
How the basis of learning will be generated is very important for the realization of neural applications. Learning depends on random initialization of weights the network as well as the number of examples, the order of presentation of these examples then the consistency in the choice of a set of examples.
An example is composed of a value to be presented at the entrance to the network of neurons and a value to present output of this network, the value of output based on the input value. If the number of examples is too low, it is clear that the network will not seek a transfer function of the studied cryptosystem but will instead store the examples given and cannot therefore in any way find a result for an input value different from those given in the basis of examples. In cryptography to present more than half of all possible to be certain of the results examples even if it is true that in strong cryptography, the number of possible input values is very large.
Then we realized an algorithm to present the examples in a more or less complete mess. It's cutting the base k subbases then in turn present the elements of each of the subbases (k can be even or odd). The following figure shows the error rate final Tss for k different values (the number of presentation being fixed at 500 and 256 examples).
At the level of the automatic generation of contiguous texts, we present an algorithm that can generate clear examples regardless of the number of nested loops to a single body of loop which will be executed on each iteration of the innermost loop.
The coefficient of learning, usually noted Epsilon and also called learning rate, allows a more or less rapid learning with opportunities for convergence of the network to an inversely proportional solution due to local minima of the curve of error measured by the basis for learning and values output calculated by the neural network. Should empirically vary Epsilon between 0.1 and 2.0. If the network doesn't want any similarly converge, this is certainly due to the problem of the nonlinearly separable, which is the case of learning of the XOR. You should then use a Momentum Term whose real value is between 0.1 and 1.0 and which will aim to avoid local minima by deriving the error function because it allows to take into account in the current step of learning from previous steps.
Selfstudy can be interesting for neuronal learning of cryptographic algorithms. The neuronal system has two parts: the emulator and the controller whose learning are carried out separately.
The task of the emulator is to simulate the complex function or the encryption algorithm. There therefore its entry State at any given time and an input at this time and its output is the output of the algorithm to the following time. The input of the controller is the State of the system at time k, its output is the value to be input to the algorithm or the function complex. The proper role of the controller is to learn the law of adaptive control. But for this learning, the error signal is not calculated on the order but on its result, the gap between actual condition and current state. It comes to the idea of a guided rather than supervised learning because no Professor learns the System Control Act. In fact, the system learns itself in dealing with the information he receives in return for shares. To make possible learning through backpropagation and retropropager error on the position, the structure of the emulator must be homogeneous at the controller.
Another quality of this device is its ability to elearning. Learning of the controller is fast. In addition, the law of synthesized control is sufficiently robust to small random perturbations. It is therefore possible to perform neural networks for selflearning on a line of communication for encryption as for authentication of messages in real time.
We present several different applications.
On learning of the XOR, i.e. to achieve C = A XOR B, need us a network 16bit input (i.e. 2 bytes A and B) and 8bit output (a byte C). The network must therefore be 16 neurons input, 16 minimum layer (s) (s) hidden neurons and 8 output neurons. The broadcast consists of 65536 causes  effects. After various tests, the success to the XOR learning rate is very close to 100% depending on the random weight initialisation and the number of submissions. More the number of entries and hidden layer neurons are great, plus the number of presentations of the base can be reduced. If the random initialization of the weight is correct, a single submission can be sufficient and better quality.
The learning of cryptographic algorithms to determine a function or an algorithm for combining data entries (causes) for output data (effects). It is therefore to determine input and output of the network structures and to find a basis of causes and associated effects sufficient to learning of the network converges to a minimal amount of errors, or even almost.
The question that arises is to know how to make the neural network can memorize the algorithm. The answer is to present virtually all possible encryption keys (e.g. 64 bits) and all possible plaintexts (e.g. 64 bits) input and calculate all resulting ciphertexts with the encryption algorithm. Thus, the neural network will be synthesized algorithm since when it presents him an encryption key and a plain text input, it will give us output the ciphertext whereas.
If the encryption algorithm is bijectif (that is, if are presented as input encrypted text it gets output the plaintext) then the encryption algorithm is the same as the decryption algorithm and the neural network also decrypts.
With regard to key learning, an encryption key must be linked to an encryption or decryption algorithm and a clear or encrypted text.
If key has a fixed size of N bit, should be N bit output of the neural network and M bits input equal to two times the number of bits of text blocks clear and encrypted text.
In fact, the neural network realizes a function that finds the key directly from a plaintext and encrypted text.
We present then the advantages and disadvantages of the neuronal methods used. Learning of neural networks time remains long enough on the basis of the number of bits of the key and encrypted and clear texts, this time can be optimized if the neural network is implemented on a parallel machine.
Regards memorizing keys and encryption algorithms, neural networks are high achievers with over 90% success in learning of weak ciphers. A strong encryption algorithm, to rapid learning. Neural networks are widely used in image recognition, it is so simple to perform authentication. At the level of the hardware architecture, it is easy to parallèliser the algorithms. As well as at the level of networks of neurons and ciphers based on hardware architectures. But this solution is quite expensive. The design of neurociphers can be useful in cases where a secret key and an encryption algorithm are taught how to network to hide information to the user, in particular, at the level of the key generator that could be kept secret by a distributor body. It would be messy to a cryptanalyst to discover the function of the generator algorithm of encryption keys. Neurocryptanalysis seems to be a lot more application to neural networks due to their emergent properties of massively parallel statistical analysis and their ownership of concentration of information or approximations of statistical matrices.
An application of the most important of neuroCryptography is neurocryptanalysis. Neuro cryptanalysis is to perform the Cryptanalysis of cryptographic algorithms with the use of neural networks. I.e. to achieve one or more neural networks to find or help find the key of an encryption algorithm. The important principle is the presentation to the network of neurons a ciphertext and the encryption algorithm.
In neurocryptanalysis, the neural network to help find the encryption key used in the cipher text.
As a neural network can learn a cryptographic algorithm or can 'remember' (by a function approximation) a set of keys. This neural network structure is identical to that of the self
study. It is clear that neural networks can take an important place in cryptography in the design, use, and verification of protocols.
We test and present possible forms of neuroas cryptanalysis.
To neurocryptanalyser a Vigenère figure, it would take that our neural network either a frequency analysis or one analysis of a subset of n characters of a given language, and then measure the correlation between the plaintext and the ciphertext learned for all subsets of n characters. This type of problem is resolvable by a neural network but would be very long in supervised learning. However, it is possible to carry it out in selflearning mode but the ciphertext should be large enough.
We measure the performance of neural networks at the statistical level by neurodifferential cryptanalysis and linear neuroCryptanalysis of des according to the following scheme:
These performances proved to be particularly good.
We then implemented a neurocryptanalyseur of the command of Unix crypt (3) or ufc_crypt (ultra fast crypt), which is an implementation of the des used in the encryption of passwords stored in the/etc/passwd file. It is a little special in the sense where the key is unknown to the user, no one has the ability to perform decryption of password. This key is specific to the Unix system in use. We thought it would be interesting to learn a certain amount of passwords clear and encrypted passwords corresponding to a neural network. The basis of learning should be large enough so the D.E.S. learning does not become a memorization of the examples of this basis, what makes that the network would be unable to find the solutions to other nearby examples of the database.
We have therefore made two applications. A UNIX (or GNU Linux) synthesizing the crypt function of Unix for password clear of 4 characters whose values are a miniscule letter or a point, a division, or about 615000 passwords bar and 2 hours of calculations per presentation. The other is MSDOS, she realizes learning 1024 clear passwords of 7 characters and passwords encrypted in 11 characters (we remove the first 2 characters of salt used to re encrypt the password encrypted for 65536 encrypted different passwords for the same clear text).
We have added a visualization program of the first graphical statistics. The second provides information quickly.
We can deduce the following results.
The neurocrypanalyses differential and linear methods are methods of probabilistic calculations to quickly get information about a part of des. They allow to perform the opposite
function of a Stable for a difference of texts chosen for one and for a linear relationship with
a 
subkey selected for the other. Such neural networks learning is very fast. 
It 
is possible to gather for a method given, differential or linear, 8 x 16 = 128 neural networks 
(one for each Stablesnew each round) and to operate in parallel to the information given by the ciphertext output of des to the plaintext input. Thus these networks may be supervisors of
other neural networks learning unsupervised amending the key bits that different texts pass
through the D.E.S encryption key.
Would be a selflearning of the subkeys. From the subkeys, we find the
Statistical analysis of the program under MSDOS version results are surprising with 90% of the encryption function for the base found by the network of neurons and about 80% of bits to
a close this basic example but not submitted to the network. This proves that for a low basis of learning, it is easy to a neural network to find a clear password from a password encrypted without taking into account the salt included by the Unix system.
We present then two architectures.
The first is a dedicated parallel architecture as a neurocryptanalyseur of strong ciphers needs
a very fast supervised learning. It is necessary to present all plaintexts, ciphertexts and keys to the neural network. The following figure shows the overview of learning dedicated to an encryption algorithm.
A complete machine can be constructed on the same pattern with a large number of units of
binary counters and circuits with the encryption algorithm. This number is limited by the time
of learning of the single neural circuit of approximately 1 s. It is preferable for the D.E.S.,
treat a fixed data subset as we have done in past applications.
In the second, we present our algorithms written to the distributed architecture of the CM5
using 3 layers of processors with a processor for a neuron. The first is used to initialize (clear text) input and output (ciphertext) of neural network which is located on layers 2 and 3. It is likely that examples learning time is longer than for the dedicated machine of the preceding paragraph.
The performances are as follows:
learning time is quite long (from several days to several years), but interesting results (error rate is close to zero) are available in short presentations when the basis of examples is large enough (which is the case of algorithms strong such as D.E.S. or R.S.A. then for simple operations such as the XOR it takes between 200 and 500 presentations to get an error rate zero. Once the learning is done, the deadline for passage of information through the network of neurons is very short (in the order of tens of nanoseconds). What is prodigious when we know that it must repeat an exhaustive search for each text encrypted with a different key.
1.5 The memory map
Chapters 2 and 3 are organized so as to present the neural networks, cryptography in a clear manner and define our choices in the direction of our research.
In Chapter 4, we define the neurocryptography, settings for well use it in the creation of applications.
Chapter 5 presents the neurocryptanalysis from ciphers based on XOR and more complex ciphers. The study of neuroCryptanalysis of des shows the performance of the neuro cryptographic applications. Different applications support our conclusions on the performance of neural networks.
In Chapter 6 supplementary, we give various definitions to clarify certain points on which is based the current cryptography: callbacks in information theory, complexity of algorithms and number theory.
You will then find the bibliography, HTML pages on the Internet and an annex with source codes and various documents.
Chapter 2  Neural networks
2.1 Introduction
In this chapter, after some necessary definitions, we define current means to link neural
networks to Cryptography. We present the neural network model used as well as learning the most suitable to perform Cryptography. We describe then the algorithm and the benefits of such a model, specifically, at the level of the linear multilayer network analysis to evaluate their performance at statistical level. Then, we list various material aspects knowing that learning must be the fastest possible.
2.2 Basic concepts and terminology
Called selforganization network a network of elements of simultaneously active treatment (nodes and connections) with timevarying local interactions are the overall conduct of the system. Among such networks, Connectionist models use digital information and are dynamic systems that perform calculations similar to those of a neuron.
A 
Connectionist model is characterized by: a network (all nodes) connected by directed links 
or 
connections, an activation rule (local procedure at each node updating activation level 
based on their input and their common activation, each node performs this procedure in parallel) and a rule for learning or adaptation (local procedure that describes how connections vary over timemeaning that the weight of the connection is updated to reflect its current value and levels of activation of the nodes it connects each node will perform this procedure in parallel).
The concept of intelligence is an emergent property of its selforganization, it is an underlying principle of this type of network.
Early neural networks have appeared in 1943 with logical MC CULLOCH neurons, they exist various forms of networks.
Multilayer networks to flow of information towards the front are the most interesting. they have an input layer, a layer of output and one or more hidden socalled intermediate layers (figure 2.2.1).
input
hidden
ouput
Figure 2.2.1  multilayer network to flow to the front
There are 3 modes of possible learning: supervised, Nonsupportive and strengthened.
Supervised learning is more suitable to store a cryptographic algorithm or to remember a set of private encryption keys because this learning uses a Professor giving desired system inputs and outputs.
Supervised learning is to present to the inputs and outputs of the network a database causes and effects (unlike nonsupervised learning where effects are not presented). Then asked the network we calculate the outputs corresponding to cases presented in its entries. Then measure the sum of the errors for each of the neurons in the network. We must continue to present the basis of causes and effects until the measured error is almost nil.
Neural networks behave well where the basis is not complete because they 'mainstreamed', that is, the information acquired is delocalized over the entire surface of the network. It is important that the number of neurons and the number of hidden layers are selected based on the number of entries in the network, the number of elements of the base to present and the number of submissions.
Figure 2.2.2 presents the response of a neural network during the learning phase, we can see how the error decreases as the presentations of the basis of causes and effects.
Figure 2.2.2  Learning Phase
The other two modes of learning are best in automatic control and correction of errors.
For more detailed information, see (BOURRET 1991).
2.3 The situation presents
Currently used neurons or perceptrons are elements made up of a number of input and output, each entry is weighted by an amplification function and the output is activated by the comparison of the sum of the weighted inputs and the activation threshold.
You will find all the models of neural networks in figure 2.3.1. The detail of each of these models as well as a complete description are contained in (MAREN).
You can consult the documents of authors of previous models: (GROSSBERG 1986) (HEBB, 1975), (HOPFIELD 1982), (KOHONEN 1984), (ROSENBLATT 1959), (RUMELHART 1986), (LIPPMAN, 1987) (MCCULLOCH 1943) and (WEISHBUCH 1989).
Neural networks 
Authors and dates 
Advantages / disadvantages 
and learning 

Set of Perceptrons with back propagation of the gradient 
WERBOS, PARKER, RUMELHART. 1987 
Learning fast, low memory 
Bidirectional associative memory 
KOSKO. 1987 
Low storage capacity, slow search. 
CAUCHY machine 
CAUCHY. 1986 

Brainstateinabox 
ANDERSON. 1977 
Unknown performance. 
SelfAssociation memory 

HOPFIELD 
HOPFIELD. 1982 
Low memory. 
SelfAssociation memory 

KOHONEN 
KOHONEN. 1981 
slow learning, unknown number of presentations. 
The learning vector quantization 

Selforganization 
KOHONEN. 1981 
slow learning, unknown number of presentations. 
Autoorganisatrices cards 
Figure 2.3.1  models of neural networks
Among these networks should take to cryptography that allows us a quick learning with little memory capacity because the purpose of the use of such a network is a transfer function approximation or synthesis of cryptographic algorithms.
Perceptrons neural network has the advantage of being currently well known and to meet our needs, it is easy to implement, and his performances are very interesting.
2.4 Neural networks are they used in Cryptography?
There are a few applications that have been studied in the context of the compression of images or files and the identification of messages (completed application no) (PATHMA 1995). We believe that apart from secret military projects, no neural network is used for encryption, decryption and cryptanalysis. However some students specialized in cryptography in France and Belgium appear to be interested. But no literature or media contains information on this subject.
2.5 What types of neural networks are used in Cryptography?
As we have seen in paragraph 2.3, the model of perceptrons with backpropagation of the gradient is the most studied and demonstrated reliability with respect to the learning of the XOR, these networks are simple to implement and have a fast learning.
The advantages of the use of neural networks are:
their generalization property
their low sensitivity to noise (if an error sneaks into the basis of examples)
their low sensitivity to fault (lost connections, modified weight or bug in the program)
information are outsourced
Research of statistical calculations and heuristics capabilities
This model is more suited to the synthesis and looking for associations or recognition. In addition, all States and the outputs of the neurons of these networks can be updated simultaneously. (See the code in annex 1 of the learning of the XOR). A critique of learning algorithms lets say our choice for this model: (CAMARGO 1990). Paragraph 2.8 show these benefits specifically.
2.6 The model structure of perceptrons with backpropagation of the gradient
Figure 2.6.1 on the next page shows the structure of the model of perceptrons in back propagation of the gradient. There are input bits, the hidden layer, and the output layer. The deltas of the hidden layer, those of the output layer and activations for learning.
The choice of the number of hidden layer neurons necessary must obey a compromise optimizing learning avoiding the overfitting which would be the consequence of a too large number of hidden units. This choice is often the result of knowhow and practical experience. It can be guided by statistical considerations.
Figure 2.6.1  Structure du modèle de perceptrons à rétropropagation du gradient
This architecture can both be software (sequential singleprocessor computer program) material (massively parallel machines). On CM1, CM  2 and MASPAR implementations were realized, their performances have been measured (at paragraph 2.9)
2.7 The gradient backpropagation algorithm
This model is a multilayer network traffic it forwards (see figure 2.2.1).
Supervised learning in this case is to measure the error between the inputs and outputs and then perform the propagation of the error to neurons in the hidden layers and those entries. F transfer function is a sigmoid function which the differentiability plays an important role. Figure 2.7.1 shows (a) layer architecture and function of transfer, (b) the calculation of the error signal to one output device and (c) the calculation by backpropagation of error of a hidden unit signal.
Figure 2.7.1  Apprentissage par rétropropagation
Backpropagation of error formula is _{i} = f'(e _{i} ) _{k} w _{k}_{i} .
Here's the algorithm for N neurons in input, M output neurons, N _{k} the number of neurons in the hidden layer # k :
1. Initialize the weights of the connections randomly
2. Present a case (X _{1} , X _{2} ,
3. Calculate the outputs of each of the hidden layers and the output layer neurons by the perceptronlike formulas
,
X _{N} ) and the associated effect (S _{1} , S _{2} ,
,
S _{M} )
first layer:
second layer :
and so on
with
.
4. Recursively change the weights of the connections of neurons outputs to the hidden layers. W _{i}_{j} is the weight between the i neuron and the neuron j. x _{i} is the output of the neuron i. W _{i}_{j} (t + 1) = W _{i}_{j} (t) + _{j} x _{i} with learning factor.
_{} _{j} _{=}_{j} x (1x _{j} )(s _{j} x _{j} ) if the (X _{1} ,
,
X _{M} ) are the outputs to output neurons.
_{j}_{=} x _{j} (1x _{j} )
neuron j.
. The sum is done on all the k + 1 layer neurons connected to the
5.
Return to step 2 until the system (depending on the error).
In the next parts of this memory, neural networks which is discussed will be networks of perceptrons with backpropagation of the gradient.
Linear analysis of multilayer networks 2.8s
The success of the gradient backpropagation algorithm led researchers to analyze in detail the process. They showed analogies with different statistical methods of analysis of data, in particular linear regression and discriminant analysis. In this paragraph, we rely on P.GALLINARI publications, F.FOGELMAN  SOULIÉ (GALLINARI 1988) that carry out a comparison of the classical method of discriminant analysis and the linear multilayer perceptron (with a layer of hidden units). In the linear case, it is shown that the backpropagation is a discriminant analysis of a population of N individuals (N being the number of examples included in learning) described by n parameters (wheren is the number of input neurons) and projected onto a hyperplane of dimension p (wherep is the number of hidden units).
These results are then used to validate an incremental construction of the hidden layer. It is thus shown that when we add a set of q hidden units, it is not necessary to repeat all the learning, simply freeze the existing connections and make learning about connections relating to units just to add. We can consider an incremental construction of the layer of hidden neurons that saves a precious learning time but it means a variable structure.
The general interest of this approach is to show how comparison algorithms Connectionist and classical methods suggests a permanent enrichment of the first allowing them to increase their performance.
2.8.1 Problem of the linear perceptron multilayer
The perceptron is a supervised classification problem. The characteristics of the input number is n (number of input devices). The number of classes is m (number of output neurons). The number of examples of the basis of learning is N.
Assuming N > n > m which is the case of a reasonable classification problem.
Either X the matrix n x N of entries X = (x _{1} ,
(y
output space minimizing the quadratic distance there and fX. The problem is therefore to find
the matrix dimension M (m x n) minimizing are  MX ^{2} . The solution to this problem is provided in the book (BOURRET, 1991, pages 189212) by the Penrose pseudoinverse, it is the matrix W = YX ^{+} . Although the quadratic error function is convex, the uniqueness of the minimization problem is not ensured, there may be local minima.
_{N} ) and Y (m x N) matrix output imposed Y =
,x
_{1} ,
,y
_{N} ). The optimal linear classifier is the application of linear f space of entries in the
The interest of the study is to analyse the linear multilayer case to approximate the behavior of the backpropagation algorithm in the nonlinear case (case of the XOR in Chapter 4). The solution of the problem of minimization is the PW matrix where P is the projector onto the subspace of R ^{m} generated by the p vector own C=WXY ^{t} related to the p more large eigenvalues.
2.8.2
Discriminant analysis of rank p
Discriminant analysis of rank p is to find the best dimensional subspace p R ^{n} such that the classes of projections of vectors of input on these subspaces are best separated possible. It is shown in (Bourret 1991) the following theorem: a problem of classification and M = HK optimal
achievement for the quadratic criterium of this classification by a linear perceptron with a layer
of p neurons hidden. Then K performs a discriminant analysis of rank p.
2.8.3 Incremental learning of the hidden layer
A serious gap in the gradient backpropagation algorithm is to apply that to a network already
structured and where the number of hidden neurons is fixed.
We can justify in (BOURRET 1991) the following procedure of incremental learning: the learning algorithm is first applied to a network only with a minimum number of neurons in the hidden layer. When we arrived at an optimal weighting of connections, the performance of the
network are not satisfactory, a hidden unit is added and the learning algorithm is applied only
to this neuronrelated connections. The operation is repeated until a satisfactory performance.
Remember that beyond p = (W), it is pointless to increase the number of hidden neurons. The role of hidden neurons is clear: each neuron detects a feature contributing to the classification. These features are nonredundant (orthogonality of the eigenvectors) and their contribution to the separation of the classes is decreasing (classification and module descending eigenvalues).
2.8.4 Relations with the principal component analysis
The backpropagation with p hidden neurons that projects data space dimension p corresponding to that would be found by the principal component analysis. Moreover, in the practice of the principal components analysis was built one by one these components in the order of decreasing values of the modules of the eigenvalues of the covariance matrix of the input data until the sum of these modules divided by the trace of the matrix reaches a fixed threshold. The incremental construction of a backpropagation network to the same concern, the corresponding threshold in this case meets the error observed on the outputs.
It can therefore be concluded that the results obtained by backpropagation could be through
more traditional methods of data analysis (discriminant analysis, principal components analysis), this nearly the backpropagation occurs massively parallel. However, nonlinearities of the neural units make changes to the studied behavior. These observable changes by numerical experimentation have been reported in (GALLINARI 1988). Notably, excess neurons in the nonlinear case instead of extract negligible surplus features in the classification
(orthogonality of the eigenvectors) behave like neurons from previous layers contribute to robustness and improving the performance of the classifier.
2.9 Material
The physical aspect is very important for the cryptography because the implementation of neural networks in VLSI (very large capacity for integration of transistors components) allows faster and more suitable applications.
A large number of keys and text learning is faster.
The most studied are numeric VLSI, the advantages are:
ease of use
the important signaltonoise ratio
an easytoimplement cascade circuit
a high adaptation (these circuits allow to solve various tasks)
a reduced price of manufacturing
For more details, it should read reports written by Dr. VALERIU BEIU for the implementation and optimization of VLSI neural networks (BEIU 1995a), (BEIU 1995 b).
Figure 2.9.1 below shows comparison of different materials for the implementation of neural networks.
Figure 2.9.1  comparison of different materials
The network of neurons in VLSI implementation requires 4 blocks (see figure 2.9.2):
the summons (of the inputs of a neuron) with logical Adders
the multiplication (for weight) with parallel multipliers
the transfer function nonlinear with a full circuit of calculation or table that contain values of approximation of the function, or a circuit of calculation of approximations (for the sigmoid with 1/5 steps and a error < 13%, just 4 comparators and a few logic gates (ALIPPI 1990)).
memorization of values (SRAM or D  RAM memories)
Figure 2.9.2  Circuit CMOS with 1024 synapses to distributed neurons
In regards to the backpropagation, NIGRI completed a circuit containing a table for all real values of the sigmoid between2 and 2 with 8bit precision what is regarded as sufficiently precise (NIGRI 1991).
Here are the three types of existing components on the market or research laboratory:
1. components dedicated to digital neural which speeds network go up to 1 GB of connections processed per second
Lneuro Philips (Duranton 1988, 1989, 1990) ^{*}
X 1 and N64000 of Adaptive Solutions (Adaptive 1991, 1992; Hammerstrom 1990) ^{*}
Ni1000 Intel (Scofield, 1991; Holler 1992) ^{*}
pRAM of King's College London (Clarkson 19891993) ^{*}
WSI's Hitachi (Yasunaga 1989, 1990, 1991) ^{*}
1.5V chip (Watanabe1993) ^{*}
2. the digital coprocessors particular purpose (also called neuroaccelerators) are special circuitry that can be connected to hosts (PCs or workstations), they work with a neuro simulator program. The mix of hardware and software aspects gives these benefits:
accelerated speed, flexibility and improved user interface.
Delta Floating Point Processor by SAIC (DARPA 1989) ^{*} connected to a PC
ANZA, Balboa Hecht  Nielsen Computers (HechtNielsen 1991) ^{*} with a speed of the order of 10 Megaconnections per second
implementations on RISC, DSP or Transputer processors
3. networks of neurons on massively parallel machines
WARP (Arnould 1985; Kung 1985, Annaratone 1987) ^{*}
CM (MeansE 1991) ^{*}
RAP (Morgan 1990; Beck 1990) ^{*}
SANDY (Kato 1990) ^{*}
MUSIC (Gunzinger1992; Mueller 1995) ^{*}
MIND (Gamrat 1991) ^{*}
SNAP (HechtNielsen 1991; Means R1991) ^{*}
GF11 (Witbrock 1990; Jackson 1991) ^{*}
Toshiba (Hirai 1991) ^{*}
MANTRA(Lehmann 1991, 1993) ^{*}
SYNAPSE (Ramacher 1991a, 1991b, 1992, 1993;) Johnson1993a) ^{*}
HANNIBAL (Myers 1993) ^{*}
BACCHUS and PAN IV (Huch 1990; Pochmuller1991; Palm 1991) ^{*}
PE RISC (Hiraiwa 1990) ^{*}
RMnc256 (Erdogan 1992) ^{*}
Hitachi WSI (Boyd 1990; Yasunaga 19891991) ^{*}
MasPar MP1 (Grajski 1990; MasPar 1990 a  c;Nickolls 1990) ^{*}
CNS1 (Asanovic 1993 b) ^{*}
For more information or the references of the machines above (with an asterisk), you can consult (Beiu 1995 c).
You will find in annex 2 a set of electronics manufacturers who directed networks of neurons in Silicon.
An implementation of the above mentioned algorithm has been developed on the Connection Machine CM2 (created by THINKING MACHINES Corp.) with a topology hypercube 64 k processors, which gave 180 million interconnections calculated per second (IPS) or 40 million weight updated per second.
Here is the performance measured by machine in interconnections calculated by seconds (figure 2.9.3).
CM2 
180 millions 

CRAY XMP 
50 
millions 
WARP (10) 
17 
millions 
ANZA PLUS 
10 
millions 
Figure 2.9.3  performance of parallel machines
The use of such configurations would allow to obtain excellent results in learning of cryptographic ciphers.
You will find in chapters 4 and 5 How to use the implementation of neural networks on the Connection machine CM2 or CM5 in Cryptography.
We detail the functioning of the MASPAR machines CM5 in annex 9.
2.10 Conclusion
In this chapter, we see that the neural network model most interesting model is the perceptron in backpropagation of the gradient and supervised learning is the most suitable. In addition, the use of the networks of neurons in cryptography is very low and even very little known while the study which has been made so far of neural networks allows to say that perceptrons networks are able to learn to synthesize a transfer function fairly easily. They allow to give statistics, as well as more traditional statistical methods, based on the values of entries making it very useful in Cryptography. It also emerges that neural networks are currently at the level of hardware implementing comprehensive enough and made at the industrial level. These networks can be perfectly parallel and excessively fast.
Everything shows that should bind neural networks to cryptography, but what is Cryptography appropriate? And what cryptographic tools use? The answers are the following chapters.
Chapter 3  Cryptography
3.1 Introduction
We give in this chapter of the important definitions to understand the continuation of our work as well as clarification regarding the current situation of the world "known" Cryptography then we describe the composition of cryptographic algorithms, weak and strong. We specifically detail the D.E.S. because, after more than 20 years of existence, it remains the most used and the most studied, especially at the level of its cryptanalysis which is very difficult.
3.2 Definitions
Cryptography is the art of hiding (encrypt) messages.
A cryptosystem is a hardware or software system performing the cryptographic, it can contain one or more encryption algorithms.
Cryptanalysis is the art of breaking codes or the cryptosystems, i.e. to find the key to read all or part of the message.
Cryptology is the mathematical study of cryptography and cryptanalysis.
An original message is called plaintext or plaintext.
A resulting message is called cipher text.
An encryption key is a code to encrypt a plaintext.
A decryption key is a secret code to decrypt a ciphertext.
A private key allows the encryption and decryption, it must be secret.
A public key allows only encryption, it may be broadcast; only the person with the associated private key can decrypt the message.
Is called exhaustive search the test of the set of all possible keys to find the decryption key. Feel free to consult (FAQ 1996).
3.3 Contemporary Cryptography
Cryptography is a very large and popular area of mathematicians and computer scientists. However, nowadays, cryptography is the study of more or less strong encryption of messages or files and study of protocols to Exchange private networks and other means of communication. Found in the study of ciphers, the means to find keys or decrease the exhaustive search of keys: it is cryptanalysis.
3.3.1 The cryptosystem and strength
The strength of a cryptosystem lies in the used key and the algorithm of encryption (or digit) if it is kept secret (which is reserved for the military).
The key size must be large (512, 1024 or 2048 bit is reasonable) so the distance of uniqueness is great (see Chapter 6 supplementary) and the powerful key generator or secret.
The ciphertext should appear random to all standard statistical tests
The cryptosystem must withstand all known attacks.
However, even if the cryptosystem meet the previous criteria, it cannot conclude that this system is infallible!
The cryptosystems are of two types: public key or private key.
A cryptosystem private key K is defined by D _{K} (= me C _{K} (M)) = M where C is the function of encryption and decryption function, M D a clear message and me the encrypted message.
3.3.2 Protocols
The protocols are a series of steps to human beings (at least two) to accomplish a task. Cryptographic protocols allow participants to exchange secret information between them.
Applications using them are data communications, authentication, management of private and public keys, cutting messages, mix of messages, access to databases, dating services, subliminal messages, digital signatures, collective signatures, pledging, playing heads or tails, playing poker blind evidence disclosure void, silver electronics and anonymous messages. The best would be a protocol to intrinsic discipline because he himself would ensure the integrity of the transaction (without intervenor or "arbitrator"), its construction would make impossible challenges; There are no!
The study of the protocols is very documented in (SCHNEIER 95). We will attach in the pages that follow to the neuronal development and neuroCryptanalysis of cryptosystems that looking for protocols making it more secure exchange of information between participants.
3.3.3. The types of attacks in cryptanalysis
Cryptanalysis distinguishes between the following different types of possible attacks:
to ciphertext only : the attacker must find the plaintext having only the ciphertext. A ciphertext attack is practically impossible, everything depends on the encryption.
to knownplaintext : the attacker has the plaintext and corresponding ciphertext. The ciphertext was not chosen by the attacker but anyway the message is compromised. In some cryptosystem, a pair of encrypted text  plaintext can compromise the security of the system as well as the transmission medium.
to chosen plaintext : the attacker has the ability to find the ciphertext corresponding to an arbitrary plaintext of his choice.
to chosen ciphertext : the attacker can arbitrarily choose and find the corresponding unencrypted clear text. This attack may show weaknesses in the systems public key, and even to find the private key.
to suitable chosen plaintext : the attacker can determine the ciphertexts of chosen plaintexts in an iterative and interactive process based on the results previously found. An example is the differential.
Some of these attacks can be interesting when they are used against ciphers strong. See (FAQ 96) and (SCHNEIER 95) for details of these attacks.
3.4 Cryptographic algorithms
3.4.1
The coding of blocks and the stream encoding
In general, the plaintext M is divided into blocks of bits of fixed length: M = _{1} M _{2} M
M
N.
Each M _{i} block is encrypted: C _{i} = E _{k} (M _{i} ) and the result is added to the ciphertext C =12C C
C N .
There are 2 main types of coding: coding blocks and the stream encoding.
In the coding of blocks, the size of a block must be high to prevent an attack: it is usual to use 64bit to be 2 ^{6}^{4} research opportunities. The transformation function T (M) = C is the same for each block which can memory and goes relatively quickly to encode.
In the stream encoding, blocks are encoded sequentially and each block is encoded by a separate transformation which depends on:
1. previous coded blocks, and/or
2. previous processing, and/or
3. the number of blocks
This information must be in memory between each coding of blocks. If the transformation varies in each block, the block size can be short (usually between 1 and 8 bits).
The same clear text or message M won't give so necessarily the same ciphertext C.
Block coding is a coding of substitution in which the plaintext and ciphertext blocks are binary vectors of length N. For each key, the encryption function E _{K} (M) is a permutation of the set {0,1} ^{N} to itself. D _{K} (C) is the decryption function (inverse permutation) such as D _{K} (E _{K} (M)) = E _{K} (D _{K} (C)) = identity.
There are 4 modes of encryption which are ECB, CBC, OFB, CFB
ECB mode (Electronic Code Book)
C _{i} = E _{K} (M _{i} ) and M _{i} = D _{K} (C _{i} )
CBC mode (Cipher Block Chaining)
C _{i} = E _{K} (M _{i} XOR C _{i}_{}_{1} ) and M _{i} = D _{K} (C _{i} ) XOR C _{i}_{}_{1}
OFB mode (Output FeedBack)
V _{i} = E _{K} (V _{i}_{}_{1} ) and C _{i} =M _{i} XOR V _{i}
CFB mode (Cipher FeedBack)
C _{i} = M _{i} XOR E _{K} (C _{i}_{}_{1} ) and M _{i} = C _{i} XOR E _{K} (C _{i}_{}_{1} )
Any encryption algorithm can be implemented in these modes.
In regard to our work, we will focus specifically on the ECB mode that most fits the learning of neural networks with an input and an output of fixedbit numbers and not loop reinbound, although it is possible to connect one or more networks of neurons in this way but learning time would be quite longer.
3.4.2 The number of Vigenère
The only XOR based encryption algorithm is called Figure of Vigenère (the code is located in the annex).
Encryption is performed between a clear M and a key of N characters:
1. M is divided into blocks of N characters
2. For each block, the XOR operation is performed between the block and the key.
This algorithm is trivial has broken, if we accept that the characters are ASCII and the length of the key is unknown:
1. You must first discover the key by a process called counting of coincidences (FRIEDMAN 1920): compare the text encrypted to itself but shifted a given number of bytes: count the number of identical bytes. If the two blocks of text put face to face have been encoded with the same key, more than 6% of the bytes will be equal. If they have been encoded with a different key so less than 0.4% of the bytes will be equal. The smallest movement indicating a high coincidence is the length of the desired key.
2. Then he must offset the ciphertext of this length and apply the XOR between ciphertext and thereby offset text. This operation removes the key and leave you with the result of the XOR of the plaintext with itself shifted. The English language rate is between 1 and 1.5 bit/letter, 1.2 for Shannon; the French is between 1 and 1.8 bit/letter (see Chapter 6). There is enough redundancy to choose the correct decryption.
The code in C of this program is in the annex.
This figure is too low to be sure!
3.4.3 The strong figures
There are two kinds of strong encryption algorithms: only XOR operation between text and codebased ciphers at base of very large prime numbers and others.
An example of the first case is the R.S.A. (RIVEST, SHAMIR and ALDEMAN) which is PKI.
Here is the algorithm:
1. Decompose data into blocks of length equal to the length of the code word
2. Make a XOR between the block (modified by a given encryption) and code (key or subkey encrypted)
3. Write the encrypted block
4. Repeat step 2 for each block
This algorithm is the same as almost all encryption algorithms, the differences come from the generation of the keys to encrypt or decrypt.
In the R.S.A., it is necessary to generate codes (2 public codes and 3 secret codes) to encrypt and decrypt, so the authors had to:
1. choose two large numbers p and q (512 bits),
2. make the product n = pq,
3. Choose randomly d first with (p1)(q1) between max(p,q) + 1 and n1,
4. calculate e = d ^{}^{1} modulo (p1)(q1).
This gives n and e public and p, q, d secret.
The R.S.A. is based on the theory of numbers (see chapter V), in particular, the difficulty of factorization of a number into its prime factors. Its effectiveness lies in the proliferation of these factors. For more details on the R.S.A. should absolutely read (ALDEMAN 78).
PGP (Pretty Good Privacy) Zimmerman combines the RSA and the use of very long primes.
In the second case, it has the D.E.S. we describe in the next paragraph, it works with a private key. (LUCIFER is the ancestor of REDOC II, SNEFRU, KHAFRE, IDEA, LOKI and FEAL are of the same type and weaker algorithms that the of).
3.5 A reference: the Data Encryption Standard (des)
3.5.1 History
The algorithm of 1977, it was developed by the I.B.M. Corporation for the federal bureau of standards of the United States, which has made the encryption standard for all exchanges of
confidential information (banking networks, smart cards, communications,
).
The D.E.S. combines conversions and substitutions in a product code which the safety level is much higher than that of the two codes used base (text and key). These substitutions are non linear which produces a cryptosystem resistant to any cryptanalysis. He has also designed to withstand differential cryptanalysis which was classified by the army and unknown to researchers.
3.5.2 Architecture
3.5.2.1 Below figure shows a graphical representation of the internal architecture of des. It uses blocks of 64bit input L0 and R0, the length of the key K is 56bit (8byte without the last bit used for parity). This key will generate 16 48bit different K1 to K16 subkeys. Contrary to appearances was very adequate and it is a little less these days because it takes 2 ^{5}^{6} ciphers to find the key with an exhaustive search.
The function f is called a round,i ^{}^{t}^{h} round receives inputs the right part R _{i} (or 32 bits of the text to be encrypted) and the K _{i} subkey (48 bits). The rounds of des are detailed below. He gets out of 32 bits that are added to L _{i} . While Ri is passed as what L _{i} _{+} _{1} , the encrypted bits are transmitted to R _{i} _{+} _{1} (except for the final round).
Figure 3.5.2.1  des
3.5.2.2 (A) and (b) figures are the algorithms used by des. Function IP (bit permutation) and IP ^{}^{1} (inverse permutation of bits) can be ignored because they are well known and do not add so not by force to the D.E.S.
We realize that all of the encryption is based on expansions, reductions and permutations of bits. Apart from the round, these operations are linear.
L, R: part low and high current text block 

Separation of the 16 subkeys (48 bits per round) (56bit) key 

C 
(0), D (0) = PC1 (key) 
C, D: low and high compressed key 
for(i = 0; i < 16;i++)) { 
PC1, PC2: permutation and key compression 

C 
(i) = LS (i) (C (i1)) 
LS: offset 
D 
(i) = LS (i) (D (i1)) 
IP: initial permutation (fig.II  2 b) 
K 
(i) = PC2 (i) (C (i), D (i)) 
IP ^{}^{1} : inverse permutation (fig.II  2 b) 
} 
FP: Exchange (fig.II  2 b) 
3.5.2.2 Figure (a)  des algorithms
Coding of a block (64bit) 
Decoding of a block (64bit) 

L 
(0), R (0) = IP (block plaintext) 
L 
(16), R (16) = IP ^{}^{1} (block ciphertext) 
for(i = 0; i < = 16;i++)) { 
for(i = 0; i < = 16;i++)) { 

L 
(i) = R (i1) 
R 
(i1) = L (i) 
R 
(i) = R (i1) ^ f (R (i1), K (i)) 
L 
(i1) = R (i) ^ f (L (i), K (i)) 
block cipher text = FP (R (16), L (16)) 
block plaintext = FP (L (0), R (0)) 

} 
} 
3.5.2.2 Figure (b)  des algorithms
The D.E.S. combines 2 mathematical techniques: confusion and dissemination (see Chapter 6). The round f apply the text substitution (8 S  boxes or Stables) followed by a permutation (Pboxing or Ptable) based on the text and the key.
3.5.2.3 Figure which follows presents the synopsis of a round (the function f).
Figure 3.5.2.3  a f round of the DES
The content of this round is otherwise presented in figure the following paragraph 3.5.3.1. Various standards have emerged to standardize the exchange of encrypted information D.E.S.; ANSI standards references are X3.92.digital: D.E.S., X3.106: modes of operation, X3.105:
network, X9.19: authentication, X9.24: distribution of keys; the references of standards of the Federal standard are 1027 and 1028.
3.5.3 Cryptanalysis
3.5.3.1 Figure following shows the architecture of a round with its Stables which, unlike other operations, are more or less milineaires/miaffines. If they were completely ripened, des would be very easy to break, but they have been selected to withstand attacks. The subkey bits, and those of the once expanded text block are added, and substituted through Stables then swapped.
Figure 3.5.3.1  a round of des with its Stables
Current research to break des, without exhaustive research, have managed to weaken the D.E.S. but little. The results are in figure 3.5.3.2 and (SCHNEIER, 1996).
Exhaustive search 
Differential cryptanalysis 
Linear 

cryptanalysis 

A clear texts chosen 
_{2} 
56 
_{2} 
47 

A known plaintexts 
_{2} 
56 
_{2} 
55 
_{2} 
43 
Des operations 
_{2} 
56 
_{2} 
37 
Figure 3.5.3.2  results of different as cryptanalysis (for des16 round)
There are two types of cryptanalysis: differential cryptanalysis and linear cryptanalysis are described in paragraph 3.6.
The complete and commented code in C to the D.E.S. is located in Appendix 1.
3.5.4 The physical aspect
The physical aspect is very important for the speed of execution. The VLSI components are very widespread and effective but there are even more interesting technologybased components that should not be disregarded: the Gallium Arsenic (GaAs) or arsenide technology. It has already been included in supercomputers.
The major differences between GaAs and VLSI are:
Fast failover of the GaAs doors
The Exchange with components other than GaAs is a major difficulty
Very small density of GaAs integrated circuits
The GaAs (DCFL E/DMESFET) Gates times are less than or equal to 50 picoseconds, while
it takes at least a nanosecond in Silicon (NMOS).
The access time to memory RAM GaAs takes approximately 500 picosecond and 10 nanoseconds in Silicon. This indicates that the performance of computers based on the GaAs technology should be 20 times higher than the fastest siliconbased supercomputers. On the other hand, the level of integration GaAs is of about 50,000 transistors per integrated circuit while it is 1 million in Silicon due to the problem of heat dissipation. This problem is greater the number of GaAs circuits required to design a computer and a highperformance computer is to optimize the number of circuits integrated on the motherboard.
GaAs circuits with outside communication is another factor. The problem is the downturn forced by other components. However, the signal propagation is not very different between silicon and GaAs. The only solution to solve this exchange rate is to introduce a memory with
a multilevel hierarchy. However it does not exist for the moment which works with the GaAs technology.
Although the GaAs technology cannot be fully exploited for the moment, it is certainly a very interesting technology of the future for the Cryptography due to its excellent performance. If the CM  2 has its equivalent in arsenide, is the property of the military.
With regard to the D.E.S., there is a circuit running at 50 MHz performing encryption in 20 ns, which allows to make 50 million of ciphers in a second.
Since late 1995, AMD sells a circuit encrypting at 250 MHz.
In August 1993, the Canadian Michael J. WIENER described how to build a machine for $ 1 million that performs a comprehensive search of des keys to find the right key in 3.5 hours. Each of its basic circuits has power equivalent to 14 million stations SUN.
See (WIENER 1993) for more details on this machine.
It seems so obvious that the exhaustive search is faster to perform types of cryptanalysis because even if the number of attempts is less, the search time is much longer, cryptanalysis is still very interesting to measure the performance of cryptographic algorithms.
You will find in annex 9, the characteristics of the MASPAR machines CM5.
3.6 The Cryptanalysis of des
3.6.1 Differential cryptanalysis
It is an attack to clear texts chosen on the rounds of des to find the key. (the presentation of the various attacks was made in paragraph 3.3.3 In 1990 and 1991, Eli BIHAM and Adi SHAMIR create differential cryptanalysis, this method is to look at the specifics of a pair of ciphertexts for a pair of plaintexts with a particular difference.
Differential cryptanalysis analyzes the evolution of these differences when the plaintexts spread through rounds of DES to be encrypted with the same key.
After randomly choosing a pair of plaintexts with a difference set, calculate the difference in the resulting ciphertexts. Using these differences, it is possible to associate different probabilities to various bits of the subkeys. Plus a large number of ciphertexts is analyzed, most most likely encryption key will emerge.
Force of the resident in his rounds and all operations of a round being completely linear except Stables (or Sboxes), Eli BIHAM and Adi SHAMIR analyzed 8 Stables for text input differences and differences in output texts, these information are synthesized in 8 tables called Tables of distribution of differences of of (see the 8 tables in annex 3). We realized the algorithm to generate these tables in figure 3.6.1.1. P is a plaintext, P ^{*} is another clear text, X is the encrypted text of P, X ^{*} is the encrypted text of P ^{*} , P' is the difference of P and P ^{*} , X' X and X ^{*}
Initialize the Table boxes to 0
For t = 1 to 8 do / / number of Stable
For P = 0 to 63 To
For P ^{*} = 0 to 63 To
P'= P xor P ^{*}
X = Stable _{t} (P)
X ^{*} = Stable _{t} (P) ^{*} )
X'= X xor X ^{*}
Table _{t} [P'] [X'] = Table _{t} [P'] [X'] + 1
End for
End for
End for
Figure 3.6.1.1  distribution tables generation algorithm
Once these tables is generated, pictured in figure 3.6.1.2 on next page it is possible to have information about B ( B = B xor B *) according to C ( C = C xor C *). So for a has known text ( A = A xor A *), the combination of A and C suggests bit a xor K _{i} values and has ' xor K _{i} which allows to have information on a few bits of the K _{t} subkey.
With this information, it is possible to overlook a large number of chosen plaintexts.
Figure 3.6.1.2  a round of the analyzed
The likelihood of having a pair of inputs P' Stable on the basis of a pair of outputs X' is p= Table [P'] [X'] / 64. I recall that E is the permutation of the round and P the toggle function function.
You will find the program of generation of tables of distributions of the differences in annex
1.
This attack also works fine on FEAL, IDEA, LOKI, REDOC II, SNEFRU, KHAFRE and LUCIFER. For more information, you can consult (BIHAM 1991), (BIHAM 1993a) and (BIHAM 1993b).
3.6.2 Linear cryptanalysis
It is an attack to clear texts known on the rounds of des to find the key.
It was in 1993 that Mitsuru MATSUI created linear cryptanalysis, this method is to study the statistical linear relationships between a plaintext bits, the bits of the ciphertext and key which allowed to encrypt. These relationships allow for some bits of the key values when we know the plaintexts and ciphertexts associated.
It deduced the linear relationships of each Stable by choosing a subset of bits of input and output bits, calculating parity (Xor) of these bits with parity of the subset is zero. In general, a subset will be entries with parity 0 (linear) and others with parity 1 (affine).
MATSUI has calculated the number of parity zero of each subset of bits of input and output for each Stable amongst the 64 x 16 = 1024 possible subsets. is possible to associate different probabilities to various bits of the subkeys. Probabilities of obtaining parity zero (linear relationship) are synthesized in 8 tables called Tables of linear approximations of a (see 8 tables in annex 4). We realized the algorithm to generate these tables in figure 3.6.2.1. P is a plaintext, C is the text encrypted p, K is a subkey.
For t = 1 to 8 do / / number of Stable
ForP = 0 to 63, do
For C = 0 to 15 do
Table [i] [j] =  32 / / remove half
For K = 0 to 63, do
PA = (parity(Stable _{t} (K) & C) + parity(K & P)) & 1
If (AP == 0) ThenTable [i] [j] ++;
End for
End for
End for
End for
Figure 3.6.2.1  linear approximations tables generation algorithm
Once these tables is generated, if a box of the table is set to 0 then the probability is (32/64  Table _{i}_{j} 64) and this information cannot be exploited to attack des. on the other hand, if the value of this checkbox is nonzero, we have a linear relationship of probability p = 1/2  Table _{i}_{j} 64 on having the K _{t} subkey bits based on the bits of output of the Stable t.
You will find the program of generation of tables of linear approximations in annex 1.
You can consult (MATSUI 1994) and (harp, 1995). In (SCHNEIER, 1996), we learn that searches are performed by combining differential cryptanalysis and linear cryptanalysis.
3.7 Conclusion
In this chapter, you have seen a terminology and a set of points on which it is interesting to consider neurocryptography, especially in the study of their as cryptanalysis and encryption algorithms, in means of hardware and software of the Cryptography implementation. The des and its cryptanalysis, study with neural network architecture should prove their effectiveness of memorization and probabilistic research for complex encryption algorithms. Found in the following chapters of the theories and applications implemented to prove these theories.
Chapter 4  NeuroCryptography
4.1 Introduction
In this chapter, we define the possible association between neural networks and Cryptography. We then present the neuroCryptography as well as the range of possible applications to perform encryption, decryption and Cryptanalysis of a chosen algorithm. Also found in this chapter the formation of a learning base and different parameters related to the learning of ciphers and discuss selfstudy as part of a line of communications information control applications.
4.2 Can we bind Cryptography and Neural networks?
The two preceding chapters show that although that this has not been done (or made by the military confidentially), neural networks can be useful in Cryptography. Learning of neural networks must still be optimized and fast. On the other hand, the use of the oncetrained network is excessively fast and efficient.
To achieve satisfactory applications for learning a strong figure, must be a great execution speed. This implies that used neural networks must be implemented in parallel hardware architecture as cryptographic algorithms. Nevertheless, it is possible to create software applications on data smaller to get results more quickly.
Low numbers can be simulated on a PC. The problem arises when you want to associate a cryptographic algorithm with a neural network in a unique parallel architecture without wasting time in Exchange for information. You can make applications on strong ciphers but not a general point of view, i.e. the entire algorithm, it is better in this case to simplify the task by working on small parts of the algorithm whose complexity is reduced. In addition, can ignore completely linear or completely affine functions and endeavour to weaken the other functions through the neural network synthesis facility.
4.3 The new definitions
It comes to define the field of neuroCryptography. All terms used in Cryptography must be preceded by the particle "neuro" where the cryptosystem contains one or more networks of neurons or one or more elements of the network as for example the perceptron.
4.3.1 Neuroencryptionor neuroencryption
It is the action of encrypting it with a cryptosystem with a hardware or software architecture based on the functioning of neural networks.
4.3.2 Neurodecryptionor neurodecryption
It is the action of decipher with a cryptosystem with a hardware or software architecture based on the functioning of neural networks.
4.3.3
The neurogenerator
A neurogenerator is a generator of all or part of a public or private encryption key with a
hardware or software architecture based on the functioning of neural networks.
4.3.4 Neurocryptanalysis
Neurocryptanalysis is the Cryptanalysis of a cryptosystem using a hardware or software architecture based on the functioning of neural networks and neurocryptanalyseur the way to the neurocryptanalysis. Chapter 5 is completely devoted to neurocryptanalysis and its applications, in particular, at the level of a strong cipher like des.
4.4 The generation of bases of learning
How the basis of learning will be generated is very important for the realization of neural applications. Learning depends on random initialization of weights the network as well as the number of examples, the order of presentation of these examples then the consistency in the choice of a set of examples.
4.4.1 Examples
An example is composed of a value to be presented at the entrance to the network of neurons and a value to present output of this network, the value of output based on the input value.
If the number of examples is too low, it is clear that the network will not seek a transfer
function of the studied cryptosystem but will instead store the examples given and cannot therefore in any way find a result for an input value different from those given in the basis of examples.
In cryptography to present more than half of all possible to be certain of the results examples
even if it is true that in strong cryptography, the number of possible input values is very large.
4.4.2 Order of presentation
If all possible examples are in the basis of learning, i.e. If for N input neurons there are 2 ^{N}  1
examples presented, it is not necessary to present the examples in the order of generation (in general, ascending).
We conducted an algorithm to present the examples in a more or less complete mess. It's cutting the base k subbases then in turn present the elements of each of the subbases (k can be even or odd).
The following algorithm uses n for the total number of examples of the basis for learning and p for the current addressing element, it returns the index of the sample to the neural network:
Begin
d =Integer(p * k/n);
return ((p Integer(d*n/k)) * k) + d;
End
Figure 4.4.2.1  choice of an example in one of the subbases k
This mathematical formula is trivially demonstrated by recurrence because it is a suite of discrete values.
The C source code is located in Appendix 1 (learning of the XOR in the mess). 4.4.2.2 Figure shows error rates end Tss for k different values (the number of presentation being fixed at 500 and 256 examples).
Figure 4.4.2.2  error for a disordered presentation rate
4.4.3 Automatic generation of texts
To generate a regular automatic learning basis, i.e. following an alphabet given by generating all possible examples in the order must be a N characters in input to the encryption algorithm N nested loops to a single body of loop which will be executed on each iteration of the innermost loop as shown in figure 4.4.3.1 for an alphabet of P characters.
The body of loop retrieves the values of the counters and generates a plain text (one character of the text by meter), this text is encrypted by an encryption algorithm which gives an example (plaintext  ciphertext) to present to the neural network.
For compteur1 = 0 to P1 do
For compteur2 = 0 to P1 do
For counterN= 0 to P1 do
End
End
Corps(compteur1,compteur2,
End
,compteurN ))
Figure 4.4.3.1  loops nested for the generation of ordained texts
The algorithm we present in 4.4.3.2 figure to generate clear examples regardless of the number of nested loops N :
/ * Initialize loop counters and values of end conditions */
For b = 0 to N do i_bcl [b] = 0;End
For b = 0 to N do f_bcl [b] =P1. End
/ * Execute the nested loops *
Repeat to infinity
b=N 1;
If (Body (i_bcl) ==true) Then exit;
If (i_bcl [b] < f_bcl [b]) then i_bcl [b] ++;
Else
Label _precedent:
i_bcl [b] = 0; / * Reset the counter to 0 * /
If(b== 0) Then exit; Otherwise b;
If (i_bcl [b] < f_bcl [b]) Then i_bcl [b] ++; else go to _precedent;
End else
End repeat
Figure 4.4.3.2  variable nested loops for the generation of ordained texts
In this case, the body function has arguments the values of the counters of loops and return a Boolean value to indicate whether or not exit loops. b is the value of the current loop. An example of C source code is located in Appendix 1 (Automatic Generation of basis for learning of des).
4.4.4 The coefficient of learning
This coefficient, generally noted Epsilon and also called learning rate, allows a more or less rapid learning with opportunities for convergence of the network to an inversely proportional solution due to local minima of the curve of error measured by the basis for learning and values output calculated by the neural network.
Should empirically vary Epsilon between 0.1 and 2.0. If the network doesn't want any similarly converge, it is certainly due to the problem of the nonlinearly separable, which is the case of learning of the XOR. Should then use a Momentum Term whose real value is between 0.1 and 1.0 and which will aim to avoid local minima by deriving the error function, meaning that it allows to take into account in the current step of learning from previous steps.
4.5 Selflearning
Selfstudy can be interesting for neuronal learning of cryptographic algorithms. The neuronal system consists of two parts the emulator and the controller whose learning are carried out separately.
The task of the emulator is to simulate the complex function or the encryption algorithm. There therefore its entry State at any given time and an input at this time and its output is the output of the algorithm to the following time. Learning is done by presenting every moment a different input (figure 4.5.1).
Figure 4.5.1  learning a complex function or an algorithm
Once completed the learning of the emulator, it is connected to the controller (figure 4.5.2).
Figure 4.5.2  learning of the controller through the emulator
The input of the controller is the State of the system at time k, its output is the value to be input to the algorithm or the function complex. The proper role of the controller is to learn the law of adaptive control. But for this learning, the error signal is not calculated on the order but on its result, the gap between actual condition and current state. It comes to the idea of a guided rather than supervised learning because no Professor learns the System Control Act. In fact, the system learns itself in dealing with the information he receives in return for shares. To make possible learning through backpropagation and retropropager error on the position, the structure of the emulator must be homogeneous at the controller.
Another quality of this device is its ability to elearning. Learning of the controller is fast. In addition, the law of synthesized control is sufficiently robust to small random perturbations.
It is therefore possible to perform neural networks for selflearning on a line of communication for encryption as for authentication of messages in real time.
4.6 The realization of applications
4.6.1 The learning of the exclusive or (XOR))
The XOR is a simple operation that is particularly used in Cryptography. 4.6.1.1 Figure below represents its truth table with a, b and c binary, c being the sum without restraint of a and b.
The purpose of this paragraph is to show that the XOR is easily achievable and that all of the XORbased cryptographic applications are feasible with one or more networks of neurons. You will find how cryptanalyser a single digit XORbased on 64bit in Chapter 3.
a 
b 
c 
0 
0 
0 
0 
1 
1 
1 
0 
1 
1 
1 
0 
Figure 4.6.1.1  Table of truth of the XOR
To achieve C = A XOR B, need us a network 16bit input (i.e. 2 bytes A and B) and 8bit output (a byte C). The network must therefore be 16 neurons input, 16 minimum layer (s) (s) hidden neurons and 8 output neurons. The broadcast consists of 65536 causes  effects.
You can find the code in C of this network in annex 1. (The coefficient of learning is referred by EPSILON). The rate of success at learning of the XOR is very close to 100% depending on the random weight initialisation and the number of submissions.
More the number of entries and hidden layer neurons are great, plus the number of presentations of the base can be reduced. If the random initialization of the weight is correct, a single submission can be sufficient and better quality.
The table in annex 8 the measurement error rate for each presentation.
4.6.2 The learning of cryptographic algorithms
Just as the previous paragraph, to determine a function or an algorithm for combining data entries (causes) for output data (effects).
It is therefore to determine input and output of the network structures and to find a basis of causes and associated effects sufficient to learning of the network converges to a minimal amount of errors, or even almost.
Any encryption algorithm consists as in figure 4.6.2.1.
Figure 4.6.2.1  synoptic of an encryption algorithm
The question that arises is to know how to make the neural network can memorize the algorithm. The only answer is to present virtually all possible encryption keys (e.g. 64 bits) and all possible plaintexts (e.g. 64 bits) input and calculate all resulting ciphertexts with the encryption algorithm.
Thus, the neural network will be synthesized algorithm since when it presents him an encryption key and a plain text input, it will give us output the ciphertext whereas.
If the encryption algorithm is bijectif (that is, if are presented as input encrypted text it gets output the plaintext) then the encryption algorithm is the same as the decryption algorithm and the neural network also decrypts.
Initialize the network weights randomly
Repeat
for each key make
for each text make clear
Encrypt the plaintext with key
Initialize the network with the clear text entries
Calculate the outputs of the network
Initialize the outputs of the network with the cipher text
Calculate the deltas of the network
Change the weight of the network
Measure the error of the network
end for
end for
until the error is almost nil
Figure 4.6.2.2  learning algorithm
4.6.2.2 Figure presents the learning algorithm regardless of the "Encrypt" function which computes the ciphertext from the provided clear text.
If 
the number of bits of the plaintext is 64 and the key is 56, gives us 2 ^{1}^{2}^{0} examples to present 
to 
the network of neurons, which may be huge in calculation time if the encryption function is 
long.
Hence the importance of the physical aspect and dedicated architectures.
Various applications can be carried out, including the Cryptanalysis of des, that you can see in Chapter 5 of this memorandum.
4.6.3 Key learning
A single encryption or decryption key has no meaning, it must be linked to an encryption or
decryption algorithm and a clear or encrypted text.
If key has a fixed size of N bits, then N bits in the neural network outputs and M bits input
equal to two times the number of bits of the plaintext and ciphertext blocks.
4.6.3.1 Figure shows text input and the output key:
Figure 4.6.3.1  memorization of key
In fact, the neural network realizes a function that finds the key directly from a plaintext and encrypted text.
4.7 The advantages and disadvantages
Learning of neural networks time remains long enough on the basis of the number of bits of the key and encrypted and clear texts, this time can be optimized if the neural network is implemented on a parallel machine.
In regards to memorizing keys and ciphers, neural networks are high achievers with over 90% success in learning of weak ciphers. A strong encryption algorithm, to rapid learning.
Neural networks are used extensively in recognition of images, it is so simple to perform authentication.
At the level of the hardware architecture, it is easy to parallèliser the algorithms. As well as at the level of networks of neurons and ciphers based on hardware architectures. But this solution is quite expensive financially.
The design of neuroencryption can be useful in cases where a secret key and an encryption algorithm are taught how to network to hide information to the user, in particular, at the level of the key generator that could be kept secret by a distributor body. It would be messy to a cryptanalyst to discover the function of the generator algorithm of encryption keys.
Neurocryptanalysis is an application much more adapted to neural networks due to their emergent properties of massively parallel statistical analysis and their ownership of concentration of information or approximations of statistical matrices. Chapter 5 on neuro cryptanalysis should enlighten you about the possibilities of neural networks.
In addition, over a problem cryptographic complexity class of PSPACE requiring a very large capacity memory, Neuron network is compact and its size is fixed.
4.8 Conclusion
We have defined in this chapter, the association of two broad areas of Artificial Intelligence neural networks and contemporary Cryptography. We presented the neuroCryptography as well as the range of applications possible to perform encryption, decryption and Cryptanalysis of a chosen algorithm. Also found in this chapter the formation of a learning base and different parameters related to the learning of ciphers and discuss selfstudy as part of a line of communications information control applications. Learning of a strong encryption
algorithm being quite long and requiring the use of parallel machines, to use neural networks to synthesize an encryption algorithm with a given key, this algorithm and this key being kept secret for example by a distributor body.
Chapter 5  Neurocryptanalysis
5.1 Introduction
In this chapter, we present the neuroCryptanalysis of strong encryption, the general principle being the search for key by a neural networksbased study, whether learning the functions of the texts clear and encrypted keys. Then we describe applications. We present differential neurocryptanalysis and linear neuroCryptanalysis of des, allowing us to measure the statistical performance of neural networks. A dedicated hardware application is described as a last resort.
5.2 Definition
Neurocryptanalysis is to perform the Cryptanalysis of cryptographic algorithms with the use of neural networks. I.e. to achieve one or more neural networks to find or help find the key of an encryption algorithm.
The reader will find in Chapter 3 an introduction to applications in neurocryptanalysis. Neurocryptanalyseur means then a system performing the Cryptanalysis of a cryptographic algorithm, this system is a hardware or software program containing at least a neural network useful in cryptanalysis in question.
5.3 General principle
The important principle is the presentation to the network of neurons a ciphertext and the encryption algorithm.
In neurocryptanalysis, the neural network must help find the encryption key used in the cipher text, figure 5.3.1 shows a possible architecture of neurocryptanalyseur.
Figure 5.3.1  Overview of neurocryptanalyseur
According to Chapter 2, a neural network can learn a cryptographic algorithm or can 'remember' (by a function approximation) a set of keys, I therefore infer that the neuro cryptanalyseur can be broken down into 2 subnets of neurons as follows:
Figure 5.3.2  a neurocryptanalyseur learning
This neural network structure is identical to that of selflearning in paragraph 4.5.
Applications carried out in the following paragraph will allow to check learning described in Chapter 2.
It is clear that neural networks can take an important place in cryptography in the design, use, and verification of protocols presented in Chapter 3.
5.4 Applied Neurocryptanalysis
5.4.1 NeuroCryptanalysis of the Vigenère
This figure, as well as its cryptanalysis are explained in paragraph 3.4.2.
To neurocryptanalyser such a algorithm, should our neural network performs either a frequency analysis or one analysis of a subset of n characters of a given language, and then measure the correlation between the plaintext and the ciphertext learned for all subsets of n characters.
This type of problem is resolvable by a neural network but would be very long in supervised learning. However, it is possible to carry it out in selflearning mode but the ciphertext should be large enough.
5.4.2 The Neurodifferential cryptanalysis of DES
Differential cryptanalysis is described in Chapter 3.6.1.
To better understand the information given by the tables of differences by BIHAM and SHAMIR distributions, we have generated for each Stable, the tables with xaxis values of outputs of the Stable and ordered the bits of Stable entries. These tables are in Appendix 5, it can therefore be directly see the probabilities p= Table [P'] [bits of X'] / 64 to have any particular bit depending on the value of output.
What is presenting pairs of plaintexts input and pairs of ciphertexts in output of an Sto a neural network, would be close probability or no previous tables for each of the input bits?
We have to create a network of neurons with 16 input bits (each of these bits is a value of output among the 16 who are the category of preceding tables) and 6 output neurons giving the probability of having a 1 on one of the 6 bits of Stable entries.
Figure 5.4.2.1  use of the differential neurocryptanalyseur
For examples using learning algorithm and realization of this neural network, you can read the code C in annex 1. 5.4.2.1 Figure presents the neurocryptanalyseur after learning, it returns information about the probability of having one bit to 1 of P' . One gets not directly of probabilities on the bits of the subkey, just make a XOR between the bits of the input text pair and those calculated for information on the bits of the subkey.
The neural network, at the end of 10 presentations of 4096 examples (pairs of texts among 64 Stable entry texts), gives the results contained in the table in annex 6. Just increase the number of presentations to get more accurate probability values. Note that the obtained probability exactly match the values given by the classical method of differential cryptanalysis.
The advantage of the neural network is its concentration of the set of Stablesspecific statistical matrices and massively parallel operation which allows to calculate the 8 S 8 cryptanalyseurs neuro  tables simultaneously.
5.4.3 Neurolinear Cryptanalysis of DES
Linear cryptanalysis is described in section 3.6.2.
The neural network will generate all quadratic forms for obtaining information outputs on the basis of its inputs, which amounts to generalized linear Cryptanalysis of of.S., generalized linear cryptanalysis looks up information about the key from the study of the rounds of des and more precisely of its Stables which is different from the global study of the cryptosystem by our neurocryptanalyzer.
Figure 5.4.3.1  use of the linear neurocryptanalyseur
Unlike differential neurocryptanalysis, it shouldn't try to simplify the tables of linear approximations because make the sum of the probabilities for each bit would be a loss of information. Indeed, these sums are all almost equal. On the other hand, should create a network of neurons with 16 input bits (each of these bits is a value of output among the 16 who are the category of preceding tables) and 6 output neurons giving the probability of having a 1 on one of the 6 bits of Stable entries. The advantage of the neural network is that it refers to excellent values of probabilities. You can check the correlation between the bits of the neurosingle tables and tables of linear approximations input values for each output value.
The basis examples learning algorithm and realization of this neural network are in annex 1.
5.4.3.1 Figure presents the neurocryptanalyseur after learning, it returns information about the probability of having one of the 6 bits of the S1 table entries. One gets not directly of
probabilities on the bits of the subkey, just make a XOR between plaintext bits, and those calculated for information on the bits of the subkey.
The results are given in annex 7. Just increase the number of presentations to get more accurate probability values.
5.4.4 Overall NeuroCryptanalysis of the crypt (3) UNIX
The command of Unix crypt (3) or ufc_crypt (ultra fast crypt) is an implementation of the des used in the encryption of passwords stored in the/etc/passwd file, a little special in the direction where the key is unknown to the user, no one has the ability to perform decryption of password. This key is specific to the Unix system in use. The goal is not to find the clear password. It is encrypted with the same key given clear password and compare it with the password from the/etc/passwd file. If they are identical, the user is authenticated and access to its own account.
Crack is an application seeking the passwords of users on a Unix server. Its role is to generate
a clear passwords set on the basis of a multitude of syntactic rules and/or from a dictionary. It takes several hours to several days to penetrate a system and then retrieve the password file and search out others.
We thought it would be interesting to learn a certain amount of passwords clear and encrypted passwords corresponding to a neural network. The basis of learning should be large enough so the D.E.S. learning does not become a memorization of the examples of this basis, what makes that the network would be unable to find the solutions to other nearby examples of the database.
We have therefore made two applications. A UNIX (or GNU Linux) synthesizing the crypt function of unix for password clear of 4 characters whose values are a lowercase letter or a point or a division, or about 615000 passwords bar and 2 hours of calculations per presentation. The other is MSDOS, she realizes learning 1024 clear passwords of 7 characters and passwords encrypted in 11 characters (we remove the first 2 characters of salt used to reencrypt the password encrypted for 65536 encrypted different passwords for the same clear text).
We have added a visualization program of the first graphical statistics. The second provides quick information.
The source and the results are available in the annex.
5.5 Analysis of the results of cryptanalysis
The neurocrypanalyses differential and linear methods are methods of probabilistic calculations to quickly get information about a part of des. They allow to perform the opposite
function of a Stable for a difference of texts chosen for one and for a linear relationship with
a 
subkey selected for the other. Such neural networks learning is very fast. 
It 
is possible to gather for a method given, differential or linear, 8 x 16 = 128 neural networks 
(one for each Stablesnew each round) and to operate in parallel to the information given by the ciphertext output of des to the plaintext input. Thus these networks may be supervisors of
other nonsupervised learning neural networks amending the bits of the key as and as different
texts passes through the D.E.S we find the encryption key.
Would be a selflearning of the subkeys. From the subkeys,
Statistical analysis of the program under MSDOS version results are surprising with 90% of the encryption function for the base found by the network of neurons and about 80% of bits to a close this basic example but not submitted to the network. This proves that for a low basis of learning, it is easy for a neural network to find a clear password from a password encrypted without taking into account salt included by the Unix system.
5.6 Hardware implementations
There are 2 possible hardware implementations. One is based on existing architectures and more precisely consists of an implementation on machine massively parallel type MASPAR
or Connection Machine (characteristics of these machines are given in annex 9).
The other is based on the design of architecture dedicated cryptanalyser encryption algorithm.
5.6.1 Dedicated Machine
The idea is to present a strong ciphers with a very fast supervised learning neuro cryptanalyseur. As we show in paragraph 4.6.2 it is necessary to present all plaintexts, ciphertexts and keys to the neural network. 5.6.1.1 Figure shows the overview of learning dedicated to an encryption algorithm.
A complete machine can be constructed on the same pattern with a large number N of units of
binary counters (120 bits: 64 bits of text and 56 bits of key) and circuits with the encryption algorithm (for the D.E.S., AMD has built a circuit at a clock frequency of 250 MHz arsenide
approximately 5.10 ^{9} encryptions per second). The number N is limited by the time of learning
of the single neural circuit of approximately 1 s. Each unit has less than 14 ns.
For des, the time interval between each unit is necessarily 1 s. What gives 10 ^{6} learning per
second to learn the 2 ^{5}^{6} possible keys. Either 10 ^{3}^{0} s for all possible values of text and key, or
4 ^{2}^{2} years for a presentation. If the neural circuit took 14 ns, should be 3 ^{1}^{8} years.
In the case of a single key, it would take 41 years and for a single text should be 2 months.
While the exhaustive search for a key takes 3.5 hours for a dedicated nonneuronal machine which would cost 5 million francs.
Nevertheless, it is possible that the neural circuits of the future go much faster. It is preferable for the D.E.S., treat a fixed data subset as we have done in paragraph 5.4.4.
5.6.2 Algorithm for the Connection Machine CM5
The following algorithms was written for the distributed architecture of the CM5 using 3 layers of processors with a processor for a neuron. The first is used to initialize input (plaintext) and output (ciphertext) network of neurons located on layers 2 and 3. We use duplicated and used in layers 2 and 3 variables. NB_ENTREES, NB_CACHEES, NB_SORTIES, EPSILON are constants that define the number of entries, hidden layer neurons and output of the neural network and the coefficient of learning. So in a single processor is:
NB_ENTREES poids_cachee for the weight of the hidden layer in a processor;
1 seuil_cachee; 1 activation_cachee; 1 delta_cachee;
NB_CACHEES poids_sortie; 1 seuil_sortie; 1 activation_sortie; 1 delta_sortie.
Before you start, we initialize the weights of the connections with random values.
Repeat to infinity do
generate key & texte_clair in M
For i = 0 to NB_CACHES1 Do issue M to all layer 2 end processors
encrypt M with the encryption algorithm in C
For i = 0 to NB_SORTIES1 Do issue C to all layer 3 end processors
Finrepeter
Figure 5.6.2.1  algorithm of the first layer of processors
It defines a small macro: bit (i, m) {return (! ())}m & (1 < < i))); } for the following algorithm.
Repeat to make infinity
Integer tempo [i];
Floating output, error;
receive the layer 1 M
output = 0.0
For i = 0 to NB_ENTREES1 If (bit (i,M)) then exit += poids_cachee [i]; tempo [i] =M end
Activation_cachee= sigmoid (outputseuil_cachee)
For i = 0 to NB_SORTIES1 do activation_cachee transmitting to layer 3 end
error= 0.0
For i = 0 to NB_SORTIES1 do
receive M layer 3 / * poids_sortie for this hidden below neuron *
error=error+ receive M layer 3 / * delta_sortie * /.
End
delta_cachee = error*activation_cachee*(1activation_cachée ))
For i = 0 to NB_ENTREES do poids_cachee[i] =poids_cachee[i] +EPSILON * delta_cachee * tempo[i]
seuil_cachee = seuil_cachee  EPSILON * delta_cachee
Finrepeter
Figure 5.6.2.2  algorithm of the second layer of processors
Repeat to make infinity
Floating F, exit, tempo[NB_CACHEES];
output = 0;
For i = 0 to NB_CACHEES1 do
receive F's Layer 2
Tempo[i] =F
output = output + poids_sortie[i] *F
End
issue poids_sortie to layer 2
activation_sortie = sigmoid (exit  seuil_sortie ));
receive F's Layer 1 / * activation of the learning values *
delta_sortie = (F  activation_sortie) * activation_sortie *(1activation_sortie)
For i = 0 to NB_CACHEES1 do
issue delta_sortie to layer 2
poids_sortie[i] = poids_sortie[i] + (EPSILON*o_delta * tempo[i]);
End
seuil_sortie = seuil_sortie  EPSILON * delta_sortie
Finrepeter
Figure 5.6.2.3  algorithm of the third layer of processors
Procedures making (non blocking) and receiving (blocking) a message through the lines of communications to 40 MB/s allow a low of timeout.
It is likely that examples learning time is longer than for the dedicated machine of the preceding paragraph.
5.7 Performance
Learning time is quite long (from several days to several years), but interesting results (error rate is close to zero) are available in short presentations when the basis of examples is large enough (which is the case of algorithm strong such as D.E.S. or R.S.A. then be simple as the XOR operations for between 200 and 500 submissions for an error rate of zero.
However, once the learning is done, the deadline for passage of information through the network of neurons is very short (in the order of tens of nanoseconds). What is prodigious when we know that it must repeat an exhaustive search for each text encrypted with a different key.
5.8 Conclusion
We have seen in this chapter, the neuroCryptanalysis of strong encryption, the general principle and a study based on neural networks, whether learning the keys on the basis of clear and encrypted texts. We describe applications. We present differential neuro cryptanalysis and linear neuroCryptanalysis of des, which allowed us to measure the statistical performance of neural networks that are excellent. A dedicated hardware application has been described. A set of very satisfactory performance on a basis of learning of low size.
Chapter 6  Glossary and Mathematics
6.1 Introduction
This chapter is part of this memory mainly to complement the terminology used in the previous chapters. Just bring the reader to the clarification in the fields of theory information, the complexity of algorithms and number theory. All the abovementioned points are widely used in Cryptography.
6.2 The information theory
Quantification of information
This is the minimum number of bits to encode all possible meanings of information.
The entropy H (M)
It is a measure of the amount of information contained in a message M.
In general, H (M) = Log _{2} (n) where n is the number of possible meanings.
The uncertainty
This is the number of bits of the plaintext which must be found to help locate the plaintext in an integer from the ciphertext.
The rate of the r language
r = H (M) / n where N is the length of the message in characters of the language (in
bytes).
The absolute rate R
R = Log _{2} (L) where L is the number of characters in the language. R is in
bits/character.
Redundancy D
D = R  r
The entropy of a cryptosystem H (K)
H (K) = Log2 (number of possible keys)
The number of different keys to decrypt a message
2
^{H} ^{(}^{K}^{)} ^{} ^{n}^{D} 1 where n is the length of the message, H (K) entropy and D redundancy.
The Unicity distance (point of uniqueness)
u = H (K) /D
The confusion
Due to erase the relationship between plaintexts and ciphertexts (example: overriding)
The dissemination
Fact disperse the redundancy of the text (example: transposition or permutation of
blocks)
6.3 The complexity of algorithms
The complexity of algorithms corresponds to 2 parameters: T the time complexity and S complexity in space (typically memory).
Denotation
O(n) : complexity of linear algorithms, n is the number of iterations
O(n ^{2} ) : complexity of quadratic algorithms
O(n ^{3} ) : cubic algorithms complexity
Previous algorithms are polynomial algorithms in time O (n ^{t} )
O( ^{f} ^{(}^{n}^{)} t) : complexity of exponential algorithms (t:constante, f (n): polynomial function of n)
O( ^{f} ^{(}^{n}^{)} t) : complexity of superpolynomiaux algorithms
(t:constante, f (n) > constant C and f (n) < O(n) ))
The classes of problems
The class least complex to most complex:
P : problems that can be solved in polynomial time.
NP : problems that can be solved in polynomial time on a nondeterministic TURING machine (variant of the normal TURING machine who guess solutions).
NPcomplete : problems that can be solved in polynomial time on a nondeterministic TURING machine, including the class P (addition of a set of elements checking the P class).
Also PSPACE : problems that can be solved in polynomial space and variable time.
Also PSPACEcomplete : problems that can be solved in polynomial space and variable time.
EXP TIME : problems that can be solved in exponential time
6.4 The number theory
Congruences
(a + b) mod n = ((a mod n) + (b mod n)) mod n, same with (ab) and (a * x)
(a *(b+c)) mod n = (((a*b) mod n) + ((a*c) mod n)) mod n
If (a mod n) then (a ^{x} mod n) with natural whole x
The primes
It is a number integer > 1 whose only factors are 1 and itself. For more details on primes and their cryptographic applications, see (KRANAKIS 1986).
The inverses modulo n
The goal is to find x such as 1 =(a*x) mod n or a ^{}^{1} = x mod n
There is not solution, but in general, there is a single x if a and n are coprime between them.
The resolution of this problem is obtained by using the extended Euclidean algorithm and its complexity is O (log2n). For more details see (SCHNEIER 1995, pages 209210) and (KNUTH 1981).
FERMAT's theorem
If m is Prime and a is not multiple of m, then ^{m}^{}^{1} = 1 mod m.
Residues modulo n
These are the remains of the subtraction of one number by n
Residues restricted
These are the remains of the subtraction of a number n that are coprime to n.
N EULER function (indicator of EULER, n)
It is the cardinal of the restricted set of residues modulo n, this function is denoted (n).
(n) is the number of positive integers smaller than n and coprime to n.
If first n, (n) = n1 and if n = p * q where p and q are first then (n) =(p1) *(q1).
Or pgcd(a,n) = 1 and (a * x) mod n = b, calculate x:
by the generalization of Euler: x = (b * exp(a ^{(}^{n}^{)}^{}^{1} mod n)) mod n
by Euclid's algorithm: x = (b * reverse (a, n)) mod n.
see (SCHNEIER 1995, pages 212213)
The Chinese remainder theorem
A few are has and (b) such as a < p and b < q (p and q first), there are unique x such as x < p *
q and as x = a mod p and x = b mod q.
By Euclid, calculating u as u * q = 1 mod p which gives us x = (((ab) * u) mod p) * q + b
Details and code in C (SCHNEIER 1995, pages 213214)
The residuals squared modulo p
If p Prime, has < p then a is residual squared modulo p If x ^{2} = a mod p for some x.
The LEGENDRE symbol
It is noted L (a, p) or (a/p) with a whole natural and p Prime > 2.
We then obtain: L (a, p) = 0 if a is divisible by p.
L (a, p) = 1 if a is a square modulo p
L(a,p) =1 if a is not a residue quadratic modulo p
To calculate, it has the formula L(a, p) = a ^{(}^{p}^{}^{1}^{)} ^{/} ^{2} mod p
There are also the following recursive expressions:
If 
a = 1, L (a, p) = 1 
If 
a is even, L (a, p) = L (a/2, p) *(1) ^{(}^{p} ^{*} ^{p} ^{} ^{1}^{)} ^{/} ^{8} else L (a, p) = L(p mod a,a) *(1) ^{(}^{a}^{}^{1}^{)} ^{*}^{(}^{p}^{}^{1}^{)}^{/}^{4} 
The JACOBI symbol (Jacobian)
Noted J (a, n), it is a generalization of L (a, n). To compute,
If n is Prime, J (a, n) = 1 if a is residual squared modulo n
J(a,n) =1 if a is not residual squared modulo n
If n = p _{1} *
* p _{m} (p _{m} is a factor n Prime),.
J (a, n) = J(a,p1) *
* J (a, p _{m} )
If a = 0, J(0,n) = 0
It follows the following properties:
J(1,k) = 1; J (a * b, k) = (a, k) J * J (b, k);
J(2,k) = 1 if (k21) / 8 is peerJ(2,k) =1 if (k ^{2} 1) / 8 is odd;
J (a, b) = J ((a mod b,b), useful if a > b; )
If pgcd(a,b) = 1 and a, b odd then
If (a1) *(b1)/4 pair is then J(a,b) = J(b,a) if J(a,b) =J(b,a)
BLUM integers
If p and q are coprime and p = 3 mod 4 and q = 3 mod 4 then n = p * q is a BLUM integer.
Residues squared modulo n 4 square root which is also a square, it is the principal square root.
Generators
If p is Prime, g < p then g is a generator modulo p if whatever n (1, p1), there is a as g ^{a} = n mod p (g is primitive compared to's).
, ^{/}^{q} mod p, if the result is 1 for a first qfactor then g is not generator modulo p.
If you know the decomposition into factors first p  1: q _{1} , q _{2} ,
q _{n} so for all q _{n} , computes g ^{(}^{p}^{}^{1}^{)}
The body of GALOIS
Arithmetic modulo n, if n is Prime, is a finite field. Similarly if n is an integer power of a prime number. If p is Prime, a body of GALOIS is Z/p. Addition, subtraction, multiplication, division work with 0 neutral element of addition, 1 neutral element of multiplication. Whatever p 0, there are p'= 1/p. On a commutativity, association and distributivity.
Z/2 ^{n} (body Z/q ^{n} )
Let p(x) be a polynomial p (x) irreducible of degree n, the "generators" polynomials in a given body are primitive polynomials. In Z/2 ^{n} , in cryptography, we use much p (x) = x ^{n} + x + 1 because multiplication and exponentiation are very effective and the physical implementation is easy with shift registers.
The factorization
The best algorithms for factoring numbers are as follows:
Quadratic sieve: the number of operations is e ^{(}^{l}^{n} ^{n}^{)} ^{½} ^{*} ^{(}^{l}^{n} ^{(}^{l}^{n} ^{n}^{)}^{)} ^{½} ., the fastest, see (POMERANCE 1985), (POMERANCE 1988) and (WUNDERLICH, 1983).
Screened on digital bodies: the number of operations is e ^{(}^{l}^{n} ^{n}^{)} ^{1}^{/}^{3} ^{*} ^{(}^{l}^{n}^{(}^{l}^{n} ^{n}^{)}^{)} ^{2}^{/}^{3} , see (LENSTRA
1993).
Methods of elliptic curves. See (MONTGOMERY 1987) and (MONTGOMERY 1990).
Algorithm Monte Carlo of POLLARD. See (POLLARD, 1975), (BRENT, 1980), (KNUTH, 1981, page 370).
Algorithm of continued fractions. See (KNUTH, 1981, pages 381382)
Attempt of divisions: divisions of the number by all lower primes.
Chapter 7  Conclusion
We presented the neural networks, defined and determined which model of neural networks the most appropriate Cryptography on algorithmic learning plan and material terms as regards architectures already carried out and observed performance.
The most interesting Connectionist model turns out be the network of perceptrons with back propagation of the gradient through the various properties that were analyzed and demonstrated by different scientists:
their generalization property
their low sensitivity to noise (if an error sneaks into the basis of examples)
their low sensitivity to fault (lost connections, modified weight or bug in the program)
information are outsourced
Research of statistical calculations and heuristics capabilities
We presented the structure of the model chosen in the following figure:
This architecture can also be software than hardware. Neural networks have already been implemented on machines massively parallel.
An analysis of linear multilayer networks showed us the analogies with different statistical methods of analysis of the data, in particular linear regression and discriminant analysis. It has been shown that the backpropagation is a discriminant analysis of a population of N individuals (N being the number of examples included in learning) described by n parameters (wheren is the number of input neurons) and projected in a hyperplane of dimension p (wherep is the number of hidden units). It is therefore possible to use nonlinearly separable problem to build a classifier where a probabilistic model. Which proves the interest of such an algorithm in cryptography and especially cryptanalysis.
On the hardware side, the benefits of the VLSI components are:
ease of use
the important signaltonoise ratio
an easytoimplement cascade circuit
a high adaptation (these circuits allow to solve various tasks)
a reduced price of manufacturing
We presented then the three types of existing components on the market or research laboratory:
1. components dedicated to digital neural which speeds network go up to 1 GB of connections processed per second.
2. the digital coprocessors particular purpose (also called neuroaccelerators) are special circuitry that can be connected to hosts (PCs or workstations), they work with a neuro simulator program. The mix of hardware and software aspects gives these benefits:
accelerated speed, flexibility and improved user interface.
3. networks of neurons on massively parallel machines.
An implementation of the algorithm has been developed on the Connection Machine CM2 (created by THINKING MACHINES Corp.) with a topology hypercube 64 k processors, which gave 180 million interconnections calculated per second (IPS) or 40 million weight updated per second.
Here is the performance measured by machine in interconnections calculated by seconds (figure below).
CM2 
180 million 

CRAY X  MP 
50 
million 
WARP (10) 
17 
million 
ANZA 
10 
million 
MORE 
The use of such configurations would allow to obtain excellent results in learning of cryptographic ciphers.
We have seen that Cryptography is a very large and popular area of mathematicians and computer scientists. We had the force of a cryptosystem which depends entirely on the used key whether it be public or private and exchanges cryptographic protocols. We have chosen to focus on the realization of neural and neuroCryptanalysis of cryptosystems.
Our work specifically concerned the ECB mode which is more suitable for learning of the networks of neurons with an entry and a number of bits output fixed and not loop reinbound. It is also possible to connect one or more networks of neurons in this way.
We have chosen to tackle the D.E.S. because it is the older standard of encryption and the most studied algorithms.
The physical aspect is very important for the speed of execution. The VLSI components are widespread and effective but there are even more interesting technologybased components that should not be disregarded: the Gallium Arsenic (GaAs) or arsenide technology. It has already been included in supercomputers.
The major differences between GaAs and VLSI are:
Fast failover of the GaAs doors
The Exchange with components other than GaAs is a major difficulty
Very small density of GaAs integrated circuits
With regard to the D.E.S., there is a circuit running at 50 MHz performing encryption in 20 ns, which allows to make 50 million of ciphers in a second. Since late 1995, AMD sells a circuit encrypting the of 250 MHz.
In August 1993, the Canadian Michael J. WIENER described how to build a machine for $ 1 million that performs a comprehensive search of des keys to find the right key in 3.5 hours. Each of its basic circuits has power equivalent to 14 million stations SUN.
We analyzed both as successful cryptanalysis against des.
Differential cryptanalysis that is to look at the specifics of a pair of ciphertexts for a pair of plaintexts with a particular difference.
Force of residing in his rounds and all operations of a round being completely linear except S tables, Eli BIHAM and Adi SHAMIR analyzed 8 Stables for text input differences and differences in output texts, these information are synthesized in 8 tables called Tables of distribution of differences of the (see 8 tables in annex 3). We realized the algorithm to generate these tables.
Linear cryptanalysis is to study the statistical linear relationships between a plaintext bits, the bits of the ciphertext and key which allowed to encrypt. These relationships allow for some bits of the key values when we know the plaintexts and ciphertexts associated. It deduced the linear relationships of each Stable by choosing a subset of bits of input and output bits, calculating parity (Xor) of these bits with parity of the subset is zero. In general, a subset will be entries with parity 0 (linear) and others with parity 1 (affine). MATSUI has calculated the number of parity zero of each subset of bits of input and output for each Stable amongst the 64 x 16 = 1024 possible subsets. It is possible to associate different probabilities to various bits of the subkeys. Probabilities of obtaining parity zero (linear relationship) are synthesized in 8 tables called Tables of linear approximations of a (see 8 tables in annex 4). We realized the algorithm to generate these tables.
After showing the possible association between neural networks and cryptography, we defined the field of neuroCryptography.
We then identified some important points for the correct use of neural networks. How the basis of learning will be generated is very important for the realization of neural applications. Learning depends on random initialization of weights the network as well as the number of examples, the order of presentation of these examples then the consistency in the choice of a set of examples.
We have seen that a sample consists of a value to be presented at the entrance to the network of neurons and a value to present output of this network, output based on the input value. If the number of examples is too low, it is clear that the network will not seek a transfer function of the studied cryptosystem but will instead store the examples given and cannot therefore in
any way find a result for an input value different from those given in the basis of examples. In cryptography to present more than half of all possible to be certain of the results examples even if it is true that in strong cryptography, the number of possible input values is very large.
Then we realized an algorithm to present the examples in a more or less complete mess. It's cutting the base k subbases then in turn present the elements of each of the subbases (k can be even or odd). The following figure shows the error rate final Tss for k different values (the number of presentation being fixed at 500 and 256 examples).
At the level of the automatic generation of contiguous texts, we presented an algorithm that can generate clear examples regardless of the number of nested loops to a single body of loop which will be executed on each iteration of the innermost loop.
We analyzed the coefficient of learning to enable a more or less rapid learning with opportunities for convergence of the network to an inversely proportional solution due to local minima of the curve of error measured by the basis for learning and values output calculated by the neural network.
Should empirically vary Epsilon between 0.1 and 2.0. If the network doesn't want any similarly converge, it is certainly due to the problem of the nonlinearly separable, which is the case of learning of the XOR. Should then use a Momentum Term whose real value is between 0.1 and 1.0 and which will aim to avoid local minima by deriving the error function, meaning that it allows to take into account in the current step of learning from previous steps.
We presented the selfstudy which is interesting for neuronal learning of cryptographic algorithms. The neuronal system has two parts: the emulator and the controller whose learning are carried out separately.
The task of the emulator is to simulate the complex function or the encryption algorithm. There therefore its entry State at any given time and an input at this time and its output is the output of the algorithm to the following time. The input of the controller is the State of the system at time k, its output is the value to be input to the algorithm or the function complex. The proper role of the controller is to learn the law of adaptive control. But for this learning, the error signal is not calculated on the order but on its result, the gap between actual condition and current state. It comes to the idea of a guided rather than supervised learning because no Professor learns the System Control Act. In fact, the system learns itself in dealing with the information he receives in return for shares. To make possible learning through backpropagation and retropropager error on the position, the structure of the emulator must be homogeneous at the controller.
Another quality of this device is its ability to elearning. Learning of the controller is fast. In addition, the law of synthesized control is sufficiently robust to small random perturbations. It is therefore possible to perform neural networks for selflearning on a line of communication for encryption as for authentication of messages in real time.
We presented several different applications. On learning of the XOR, i.e. to achieve C = A XOR B, need us a network 16bit input (i.e. 2 bytes A and B) and 8bit output (a byte C). The network must therefore be 16 neurons input, 16 minimum layer (s) (s) hidden neurons and 8 output neurons. The broadcast consists of 65536 causes  effects. After various tests, the success to the XOR learning rate is very close to 100% depending on the random weight initialisation and the number of submissions. More the number of entries and hidden layer neurons are great, plus the number of presentations of the base can be reduced. If the random initialization of the weight is correct, a single submission can be sufficient and better quality.
For the learning of cryptographic algorithms, we have shown that whether a function or an algorithm for combining data entries (causes) for output data (effects). It is therefore to determine input and output of the network structures and to find a basis of causes and associated effects sufficient to learning of the network converges to a minimal amount of errors, or even almost.
Гораздо больше, чем просто документы.
Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.
Отменить можно в любой момент.