Академический Документы
Профессиональный Документы
Культура Документы
Vassilis Kostakos
vkostakos@yahoo.com
http://www.geocities.com/vkostakos
May 7, 2001
MATH0082 Double Unit Project
Algorithms descriptions 15 α
Implementation 15 α
Comparison tests 30 2α
Report and analysis 40 2α
Total 100 6α
Note: All the software files which are refereed to by this report may be
found on the BUCS filesystem at : ~ma9vk\public_html\project\
Abstract
The problem of integer factorisation has been around for a very long time. This
report describes a number of algorithms and methods for performing factori-
sation. Particularly, the Trial Divisions and Fermat algorithms are dicussed.
Furthermore, Pollard’s ρ and p − 1 methods are described, and finally Lenstra’s
Elliptic Curves method. The theory behind each algorithm is explained, so that
the reader can become familiar with the process. Then, a sample pseudocode
is presented, along with the expected running time for each algorithm. Finally,
this report includes test data for each algorithm.
CONTENTS
1 Introduction 1
I Documentation 3
2 Project Plan 4
2.1 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Coding standards . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Requirements 7
3.1 User Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Non-functional Requirements . . . . . . . . . . . . . . . . . . . . 8
3.4 Software and Hardware Requirements . . . . . . . . . . . . . . . 9
4 Testing 10
4.1 Correctness tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Performance tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
II Implementation 12
5 Tools for factorisation 13
5.1 Greatest common divisor . . . . . . . . . . . . . . . . . . . . . . 13
5.2 Fast exponentiation modulo . . . . . . . . . . . . . . . . . . . . . 13
5.3 Primality testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
i
CONTENTS ii
7 Fermat’s algorithm 18
7.1 Quick description of Fermat’s algorithm . . . . . . . . . . . . . . 18
7.2 Detailed description of Fermat’s algorithm . . . . . . . . . . . . . 18
7.3 Implementation of Fermat’s algorithm . . . . . . . . . . . . . . . 19
7.4 Running time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7.5 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
11 Overall Comparison 37
12 Epilogue 39
III Appendices 40
A Benchmarks 41
A.1 Tests with products of two nearby primes . . . . . . . . . . . . . 41
A.2 Tests with products of three nearby primes . . . . . . . . . . . . 42
A.3 Tests with products of three arbitrary primes . . . . . . . . . . . 42
B Program output 43
B.1 Tests output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
B.2 Combined factorisation output . . . . . . . . . . . . . . . . . . . 45
B.3 Biggest factorisation . . . . . . . . . . . . . . . . . . . . . . . . . 47
Bibliography 48
LIST OF TABLES
2.1 My schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
iii
LIST OF FIGURES
iv
CHAPTER 1
Introduction
This report, along with the software which I wrote, consist of my final year
project. The main objective of this report is to balance somewhere between a
theoretical explanation of certain factorisation algorithms and a description of
my source code.
Background information
The problem of factorisation has been known for thousands of years. However,
only recently did it become “popular”. This sudden interest in factorisation was
due to the advances in cryptography, and mainly the RSA public key cryptosys-
tem.
The problem of factorisation may be stated as follows: “Given a composite
integer N , find a nontrivial factor f of N .”
There are a lot of factorisation algorithms out there. Some of them are heav-
ily used, others just serve educational purposes. The factorisation algorithms
may be distinguished in two different ways:
• Deterministic or nondeterministic
• Run time depends on size of N or f .
Deterministic algorithms are algorithms which are guaranteed to find a solution
if we let them run long enough. On the contrary, nondeterministic algorithms
may never terminate. The most usual distinction, however, deals with the run-
time of the algorithm. The running time of recent algorithms dependS on the
size of the input number N , whereas older algorithms depended on the size of
the factor f which they find.
About my project
In doing my project, I tried to cover a broad range of algorithms and methods.
The running time of all the algorithms I have implemented depends on the size
1
CHAPTER 1. INTRODUCTION 2
of the factor f which they find. Furthermore, only the first two algorithms
which I describe are deterministic.
Documentation
3
CHAPTER 2
Project Plan
2.1 Resources
I started planning for this project by writing down what resources I though I
was going to need in order to successfully complete the project.
In terms of Hardware, all I needed was a computer, which I already owned.
Furthermore, I could use the computing facilities of the University as well. In
terms of software, I decided that I wanted to write the program using C. There
are lots of different environments for creating C programs. However, I used the
LCC-WIN32 version 3.3 for Windows, which includes an ansi-C compiler.
My main concern was finding a suitable arbitrary-precision library, which
I could use with my program. In the end, I decided to use Mike’s Arbitrary
Precision Math Library (MAPM) version 3.70, written by Michael C. Ring
(ringx004@tc.umn.edu).
Furthermore, I thought that I would also need some kind of books or papers,
which would help me. In addition to the resources listed in the bibliography
section, I also made use of the following programming books:
2.2 Scheduling
The next part in planning my project was to devise of a schedule, which would
roughly be my guide in what I do. In table 2.1 you can see my schedule, or to
be precise, the final version of my schedule.
4
CHAPTER 2. PROJECT PLAN 5
Schedule
Weeks Tasks
1 (Semester 1)
2 (Semester 1)
3 (Semester 1) Signed up for LEGO maze-solving robot
4 (Semester 1) Preliminary research on Robot movement, etc.
5 (Semester 1)
6 (Semester 1) Wrote first version of software for robot.
7 (Semester 1) NEW PROJECT: Integer factorisation
8 (Semester 1) Looking for a maths library
9 (Semester 1) Found the MAPM library, performance tests
10 (Semester 1) Implement trial divisions algorithm
11 (Semester 1) Wrote low-level functions for MAPM
12 (Semester 1) Implemented Fermat’s algorithm
(Christmas) Revise for exams
(Christmas) Revise for exams
(Christmas) Revise for exams
13 (Exams) Exams
14 (Exams) Exams
15 (Exams) Exams
1 (Semester 2) Research into Pollard’s algorithms
2 (Semester 2) Implement MODEXPO, GCD, PRIME functions
3 (Semester 2) Pollard’s ρ algorithm
4 (Semester 2) Tests on all algorithms so far implemented
5 (Semester 2) Pollard p − 1. Read about Elliptic curves
6 (Semester 2) Elliptic curves algorithm and testing
7 (Semester 2) Function interface modifications, more tests
8 (Semester 2) Developed COMBINED function. Started report
(Easter) Test result analysis, graph generation
(Easter) Report writting
(Easter) Report writting
9 (Semester 2) Report writting
10 (Semester 2) Report revision, final version preparation
11 (Semester 2) DEADLINE
• The prototypes for functions in file xxx.c are placed in the file xxx.h.
It was obvious that the software program I was creating was quite modular,
and could be built in “big chunks” at a time. Therefore, I decided to use
a common algorithm testing interface. This meant that I would place each
algorithm in a separate file, and use a common file to call the factorisation
routines. This would also make it possible to call all of my algorithms from
another function in an effort to factorise a really hard number.
By doing the above, I was planning to minimise the effort of adding a new
algorithm to my program, and make the testing of different algorithms quicker
and easier.
CHAPTER 3
Requirements
This chapter describes all the requirements and specifications that I used for
implementing this project. Of course, these requirements were in no case static.
In fact, they would change quite often, as I moved further into the project. A
change in the requirements would often reflect upon a new idea that I came up
with, or an idea that I wanted to drop. Therefore, these are the requirements
at the end of the project.
7
CHAPTER 3. REQUIREMENTS 8
Also, the software should output the computational time that was required
to complete the factorisation, and also verify that the results it gives are correct.
This should also be done while running long tests, and in which case the results
should be somehow stored on a disk file.
MAIN.C This is the main file of the program. Nothing special here.
MAIN.H Main header file. Contains definition of output destination for pa-
rameterised compilation.
AL TRIAL.C This file contains the source code for the trial divisions algo-
rithm.
AL FERMT.C This file contains the source code for fermat’s method.
AL PRHO1.C This file contains the source code for Pollard’s p − 1 method.
AL PRHO.C This file contains the source code for Pollard’s ρ method.
AL ELLCRVS.C This file contains the source code for the Elliptic curve
method.
TESTS.C Here are defined some tests for measuring the speed of each algo-
rithm.
TESTS.H This file contains parameters for the testing routine.
MYLIB.C In this file I have included some of my “tool” functions, as well
as some low-level functions for the arbitrary-precision arithmetic library I
used.
CHAPTER 3. REQUIREMENTS 9
COMBINED.C This file contains a function which utilises all the factoring
algorithms. It tries to factor a given number by applying the different
algorithms until the number has been completely factorised, or until it
gives up.
The tests I performed for my project come in two flavours. First, I had to
test my algorithms to see if they ran as expected, i.e. try to find “bugs” in
the program. However, I also ran lots of performance tests, ie perform lots of
factorisations in order to get a feeling of performance of each algorithm.
10
CHAPTER 4. TESTING 11
Implementation
12
CHAPTER 5
Tools for factorisation
Before proceeding with the actual algorithms and their description, it would
be useful to describe some “tool” algorithms which are used throughout the
factorisation algorithms.
WHILE b 6= 0 DO
temp := b
b := a MOD b
a := temp
RETURN a
Figure 5.1: Pseudocode for computing gcd(a, b) using the Euclidean algorithm
x8 = ((x2 )2 )2
13
CHAPTER 5. TOOLS FOR FACTORISATION 14
n = 1
WHILE b 6= 0
IF b is odd THEN
n := n × a MOD m
b := bb/2c
a := a × a MOD m
x256 = (((((((x2 )2 )2 )2 )2 )2 )2 )2 .
If the exponent is not a power of 2, then we use its binary representation, which
is just a sum of powers of 2:
The pseudocode shown in figure 5.2 will quickly compute ab mod m. The
way it works is that it finds the binary representation of b, while at the same
time compute successive squares of a. The variable n records the product of the
powers of a, and also contains the final result at the end of the computation.
2n−1 ≡ 1 (mod n)
then we say that n is pseudoprime. Therefore, for any number n, we can just
compute the value 2n−1 (mod n) using the algorithm 5.2, and then simply check
to see if the return value is 1 or not.
Despite the fact that this test is not a 100% guarantee of primality, in practice
it is very useful. This test can be made stronger by computing the same values
for the bases 2,3,5,7, and then checking to see if all of them yield the result 1.
CHAPTER 6
Trial divisions algorithm
The most straight-forward algorithm for factorising an integer is using trial divi-
sions. This algorithm is a good place to start, and it is quite easy to understand.
15
CHAPTER 6. TRIAL DIVISIONS ALGORITHM 16
INPUT N
test factor := 2
WHILE (N > 1 AND test factor < max)
IF (N MOD test factor) == 0 THEN
N := N / test factor
PRINT test factor
ELSE
test factor := test factor + 1
O(f · (log N )2 ),
where f is the size of the factor found. The efficiency of this algorithm depends
on your strategy of choosing the trial divisors p, as explained earlier. In figure
6.2 you can see the results of the tests of my implementation of this algorithm.
The graph shows the factor size versus the amount of time it took, from a
sample of 1427 factorisations. As expected, the amount of time the algorithm
takes increases exponentially with the size of the factor found. Practically, after
6 or 7 digits, this algorithm becomes too expensive.
CHAPTER 6. TRIAL DIVISIONS ALGORITHM 17
6.4 Remarks
One of the features of this algorithm is that if we let it run long enough on a
prime Np , it will prove the primality of Np . In most cases this is not wanted,
and it is regarded as a waste of effort. However, this algorithm is very fast
in finding prime factors of size less than 5-6 decimal digits. Furthermore, this
algorithm may be used in breaking up composite factors which are found using
the algorithms described in the following chapters.
CHAPTER 7
Fermat’s algorithm
The first of the modern algorithms that I will describe is due to Fermat. It
is not usually implemented these days unless it is known that the number to
be factored has two factors which are relatively close to the square root of the
number. However, this algorithm contains the key idea behind two of the most
powerful algorithms for factorisation, the Quadratic Sieve and the Continued
Fractions algorithm.
18
CHAPTER 7. FERMAT’S ALGORITHM 19
√ √
way up to N . In Fermat’s algorithm, we start by looking for factors near N ,
and work our way down.
x + y = (u + v − 2)/2, x − y = (u − v)/2
At this point, I believe that some sort of pseudocode would be most appro-
priate in order to fully understand my implementation. Figure 7.1 contains the
pseudocode which describes my implementation.
(1 − k)2 √
N.
2k
CHAPTER 7. FERMAT’S ALGORITHM 20
INPUT N √
sqrt := d N e
u := 2 * sqrt + 1
v := 1
r := sqrt * sqrt - N
WHILE r <> 0
IF r > 0 THEN /* Keep increasing y */
WHILE r > 0
r := r - v
v := v + 2
IF r < 0 THEN /* Increase x */
r := r + u
u := u + 2
PRINT (u + v - 2) / 2
PRINT (u - v) / 2
1
This complexity is of the order O(cN 2 ). However, the value of k can be very
small, and thus making this algorithm impractical. For instance, let us consider
1 2
an “ordinary” case where a ≈ N 3 and b ≈ N 3 . In such a case, the number of
cycles necessary will be
√ √ √ √
( N − 3 N )2 ( 3 N )2 ( 6 N − 1)2 1 2
√ = √ ≈ N 3,
23N 23N 2
1
which is considerably higher than O(N 2 ). Therefore this algorithm is only
practical when the factors a and b are almost equal to each other.
In figure 7.2 you can see the test results of my implementation of Fermat’s
algorithm, from a sample of 2075 factorisations. Again, the graph shows the
relation of the size of the factor found versus the amount of time it took. As
we expected, this algorithm become too slow for factors with 7 or more digits.
The graph follows the same trend as the trial divisions algorithm. In practice
however, we will prefer the trial divisions algorithm.
7.5 Remarks
This algorithm has a very nice feature: it does not involve multiplication. We
have defined the variables r, u, v in such a way that we only need to perform
addition and subtraction. This is why sometimes this algorithm is called fac-
torising by addition and subtraction. However, the number of additions and
subtractions that we have to perform is quite large. For example, in order to
factorise
1783647329 = 84449 × 21121
we need to increase x 10551 times, and y 31664 times.
CHAPTER 7. FERMAT’S ALGORITHM 21
Additionally, this algorithm suffers from the same problem as trial divisions,
it will prove primality in the worst case. If this algorithm is given a prime
number p, then the results will eventually be 1 and p. By the way, this is
even worst than proving primality with trial √ divisions. The total number of
cycles required for proving primality is n − n, which is much worst than trial
divisions.
CHAPTER 8
The Pollard ρ method
This method is also called Pollard’s second factoring method or the Monte Carlo
Method because of it pseudo random nature. It is based on a “statistical” idea
[7] and has been refined by Richard Brent [1]. The ideas involved for finding
the factors of a number N are described below.
2. Search for the period of repetition, i.e. find i and j such that xi ≡ xj (mod p).
3. Calculate the factor p of N .
22
CHAPTER 8. THE POLLARD ρ METHOD 23
This means that the sequence {xi } is periodically repeated, except maybe
from a part at the beginning which is called the aperiodic part. This part can
be thought of the “tail” of the Greek letter ρ. Once we get off the tail, we keep
cycling around the same sequence of values. That’s why this algorithm is known
as the Pollard ρ algorithm.
Back to our problem, instead of random integers {xi }, it would be sufficient
to recursively compute a sequence of pseudo-random integers. The simplest way
to do this would be to define a linear formula such as
for a fixed a and x0 . Unfortunately, it turns out that this does not produce
sufficiently random values to give a short period of recurring values. This means
that we would have to compute a lot of values before we can identify the period
of recurrence.
The next simplest choice is to use a quadriatic formula such as
for a fixed a and x0 . It has been empirically observed that the above expression
does produce sufficiently random values1 . Pollard found that in such a sequence
√
{xi } of integers mod N an element is usually recurring after only about C N
steps.
In this way the period is discovered after fewer arithmetic operations than de-
manded by the original algorithm of Pollard. The saving in Brent’s modification
is due to the fact that the lower xi ’s are not computed twice as in Floyd’s algo-
rithm.
1 Note that this is not true if a is either 0 or -2.
2 The proof of Floyd’s cycle-finding algorithm is omitted.
CHAPTER 8. THE POLLARD ρ METHOD 24
• max: This variable sets the limit of the maximum test factor to be used.
• factors: An array of MAPM variables, in which the factors of n will be
written.
In figure 8.1 you can see the pseudocode of my algorithm. Note that the
constant a, which is used to generate the pseudorandom sequence, is hardcoded
in the function. It is quite an easy task to change its value, in order to get a
different sequence of numbers.
CHAPTER 8. THE POLLARD ρ METHOD 25
INPUT N , c, max
x1 := 2
x2 := x12 + c /* Our chosen function */
range := 1
product := 1
terms := 0
WHILE terms < max DO
FOR j := 1 to range DO
x2 := (x22 + a) MOD N /* Our chosen function */
product := product × (x1 - x2) MOD N
terms := terms + 1
IF (terms MOD 20 == 0) THEN
g := gcd(product, N )
IF g > 1 THEN
PRINT g
N := N / g
product := 1
next values(x1, x2, range) /* Brent’s improvement */
8.4 Remarks
The method that I have just described for finding prime factors of composite
integers is probabilistic. This means that we have to be prepared to be unlucky
on occasion, and not get any results. If we run the Pollard ρ algorithm and do
not find any prime divisors that might be because there are no prime divisors
in the appropriate interval or it might be because of bad luck.
What we need to do in such situations is to change our luck. For this
algorithm, this would mean to change certain constants, such as the recursive
function described in section 8.1.1. Then, of course, we have to know when it is
time to give up, and perhaps try another algorithm. In practice, after running
trial divisions up to 106 or 107 , one would run the Pollard ρ algorithm for a
while. Keep in mind, however, that if all the prime factors are roughly larger
than 1012 then this algorithm will not usually work.
CHAPTER 9
The Pollard p − 1 method
The next algorithm that I will consider is known as the Pollard p − 1 algorithm
[6]. It formalises several rules, which have been known for some time. The
principle here is to use information concerning the order of some element a of
the group MN of primitive residue classes mod N to deduce properties of the
factors of N.
aQ ≡ 1 (mod p),
27
CHAPTER 9. THE POLLARD P − 1 METHOD 28
INPUT N , c, max
m := c
FOR i := 1 to max DO
m := modexpo(m, i, N )
IF (i MOD 10 == 0) THEN
g := gcd(m-1, N )
IF g > 1 THEN
PRINT g
9.5 Remarks
This algorithm has the same problems as the previous one. As described earlier,
at some point we might find the GCD to be equal to N . In such cases we will
want to try to change the base a to a different integer. Also, the algorithm
might not terminate if p − 1 has only large prime factors.
CHAPTER 9. THE POLLARD P − 1 METHOD 30
It has been statistically found that the largest prime factor of an arbitrary
integer N usually falls around N 0.63 . Therefore, with a limit of 10000, Pollard
p − 1 will find prime factors that are less than two million. We should keep in
mind however that there is a fairly wide distribution of the largest prime factor
of N , and therefore factors much larger than two million may be found.
According to [2], the largest factor found by this algorithm during the Cun-
ningham project is a 32-digit factor
49858990580788843054012690078841
of 2977 − 1.
I should also note that because of Pollard p − 1, the RSA public key crypto-
system has restrictions on the primes a and b that are chosen. Essentially, if
a − 1 or b − 1 have only small prime factors, then Pollard p − 1 will break the
encryption very quickly.
CHAPTER 10
Elliptic Curves Method
31
CHAPTER 10. ELLIPTIC CURVES METHOD 32
Therefore, given the first coordinate of (x, y)#i, we can compute the first coor-
dinate of (x, y)#2i using the above formula. We can extend this to 2i + 1 with
the following formula:
As you can see, such computations involve lots of fractions. We can avoid
using rational numbers if we introduce the notion of a triplet (X, Y, Z), where
x = X/Z, y = Y /Z,
and where X,Y , and Z are integers. Another nice feature of this notation is
that the identity element ∞ now has the explicit representation (0, Y, 0), where
Y can be any integer.
If we define (Xi , Yi , Zi ) = (X, Y, Z)#i, we can adjust our previous formulas
to our new notation:
I should note that for our purposes, we do not need to calculate the second
coordinate Y of the triplets. Still, Yi can always be recovered from Xi and Zi .
Also, we can use our triplets modulo n, as long as we do all our computations
modulo n.
b ≡ y 2 − x3 − ax (mod N ).
We convert to triplets (X, Y, Z), with our initial triplet being (x, y, 1).
If p is a prime number which divides N , and | E(a, b)/p | divides k!, then
will be the identity element in E(a, b)/p (but not in E(a, b)). This simply means
that there is at least one coordinate of (X, Y, Z)#k! which is not divisible by
N , but all the coordinates are divisible by p.
Since Zk! is divisible by p, there is a good chance that the greatest common
divisor of Zk! and N is a non-trivial divisor of N .
CHAPTER 10. ELLIPTIC CURVES METHOD 34
INPUT N , X, Y, a, max
b := Y2 - X3 - aX MOD N
g := gcd(4a3 + 27b2 , N )
IF g > 1 THEN
PRINT g
Z := 1
k := 2
WHILE k <= max DO
FOR i := 1 to 10 DO
NEXTVALUES(X,Z,k,N ,a,b)
k := k + 1
g := gcd(Z, N )
IF g > 1 THEN
PRINT g
In figure 10.3 you can see the results of 6398 factorisations which I performed
using this algorithm.
As in the Pollard p − 1 algorithm, we can speed up this algorithm by re-
stricting k to a set of powers of primes less than max rather than running over
all integers less than max. Also, we can expect better results if we regularly in-
terrupt the run and restart with a new set of parameters rather than persisting
on our initial choice of parameters.
10.4 Remarks
The Elliptic Curve Method has the characteristic of being practical from the
point where trial division becomes impossible until well into the range where
MPQS and NFS can be implemented.
The largest factor that has been found by ECM is the 53-digit factor of
2677 − 1, according to [2]. Note that if the RSA system was implemented with
512-bit keys and the three-factor variation, the smallest prime would be less
than 53 digits, so Elliptic Curves could be used to break the system.
CHAPTER 11
Overall Comparison
The various factorisation methods I have described are all useful in different
situations. When factoring a large number, the method to be chosen must
depend on knowledge about the factors of the number. To begin with, you
must make sure that the number is composite, so that you do not make a long
computer run which will result in nothing. It would be really frustrating to
discover after a very long run that N has the prime factorisation N = 97 · p,
which could have been obtained almost immediately by using trial division.
Further, you could use Fermat’s method in case N is the product of two al-
most equal factors, or Pollard’s p−1 method in the event of N having one factor
p with p − 1 being a product of only small primes. There are lots of methods
for finding middle-sizes factors, all of which are good for specific situations.
But the question remains: how long do you keep looking for these middling
sized factors before pulling out something like the Quadriatic Sieve or NFS? A
well-balanced strategy, developed by Naur [5] may be summarised as follows:
1. Make sure N is composite. Since very small divisors are quite common
and are found very quickly by trial division, it is worthwhile attempting
trial division up to 100 or 1000 even before applying a strong pseudoprime
test.
3. At this point you need to take a long shot, and with a little luck, shorten
the running time enormously – it could even be decisive for quick success
or complete failure in the case when N is very large. The strategy to be
employed is: Take the methods you have implemented on your computer
37
CHAPTER 11. OVERALL COMPARISON 38
covering various situations, which will mean one or more of the following:
Pollard’s p−1 and p+1 methods, Fermat’s , Shanks’, or even the Williams’
methods. The methods should be capable of being suspended and resumed
from where they stopped.
Since you cannot possibly know in advance which of these methods will
achieve a factorisation (If a factorisation will be found at all), it is a good
technique at this stage to run the program of each method in sequence
for a predetermined number of steps, say 1000 or 10000, and breaking
the runs off at re-start points in order to be able to proceed, if necessary.
If N does not factorise during such a run you have to repeat the whole
process from the re-start point of the previous run. Also, you might want
to consider the possibility of changing your choice of constants.
4. If the number N has still not been factored, you will need to rely upon
the “big algorithms”. Depending on the size of the number and on the
capacity of your computer, this can be the Multiple Polynomial Quadriatic
Sieve (MPQS), the Number Field Sieve (NFS), or even the Elliptic Curves
Method. Now you have to sit down and wait; fairly good estimates of the
maximal running times are available for all these methods, so that you
will know approximately how long the computer run could take.
Choosing which methods to use, and when, is still more an art than a science.
You should keep in mind that the “big” algorithms are much more cumbersome
and it is worth spending at least a few minutes trying to vary your luck first.
Theoretically and experimentally, it has been shown that you have a better
chance of finding your mid-sized factors if you run several algorithms with several
choices of parameters rather than spending the same amount of time on a single
algorithms with a single set of parameters.
CHAPTER 12
Epilogue
When I started my work on this project, I had very little knowledge of this field
of study. Factorisation was something that I had never encountered before, at
least not in great detail. I believe that this was to my advantage, since I was
able to write my report from an introductory point of view, paying attention to
the points which were hard for me to understand.
During the course of my project, I had to make lots of choices regarding
the material which I would study. For instance, I chose not to implement one
of the “big guns” of factorisation, such as MPQS or NFS. I believe that my
choices allowed me to focus on the quality of what I did, instead of doing many
things, but without enough care. This way, I was able to firmly understand the
concepts of these elementary algorithms, and thus obtain a good background in
the subject.
Of course, my project included lots of programming. Although I was already
familiar with the programming language I used (ANSI C), I was able to further
develop my programming skills. My final program consisted of roughly 1500
lines of code, which means that my program was not small. Furthermore, I de-
veloped a sense of responsibility as far as organisation procedures are concerned,
such as keeping a logbook and doing tests.
I am really happy to have managed to keep the balance between the theoret-
ical and practical issues in doing my project. Although my report tends more
towards theory, nevertheless I did quite a lot of work on the actual software.
Thus I have been able to produce a complete tutorial of factorisation, including
the theoretical description and background, the pseudocode description, and the
actual implementation. I hope that this project will be helpful to those who get
their hands on it.
39
Part III
Appendices
40
APPENDIX A
Benchmarks
In this appendix I have included some sample test results of all the algorithms
I implemented. Theses tests used certain numbers which I specifically chose for
some reason, and not any arbitrary numbers.
41
APPENDIX A. BENCHMARKS 42
The results showed that Pollard’s ρ algorithm was the fastest. The Elliptic
Curves Method was fairly quick for large factors. For smaller factors, Trial
Divisions was quicker. Note that it made no sense to run this test on Fermat’s
method, since this method is used with numbers which have two factors.
In this set, Pollard’s ρ algorithm was again the fastest overall. We see,
however, that for very large factors, ECM showed its capabilities by being the
fastest. Also, I should say that Pollard’s p − 1 algorithm simply gave up for
number 4, so I do not have a timing for that. We also see the Trial Divisions
timings growing quite rapidly.
APPENDIX B
Program output
Fermat’s algorithm
12 103447054117 192.950 29666491,3487, OK!
12 107658803491 25.540 4206901,25591, OK!
43
APPENDIX B. PROGRAM OUTPUT 44
C:\>factor 298347004781928719247912
Trying to factorise 298347004781928719247912
Trying trial divisions...
Remainding portion is 53923156875619
APPENDIX B. PROGRAM OUTPUT 46
C:\>factor 45346346353453643534522543411
Trying to factorise 45346346353453643534522543411
Trying trial divisions...
Remainding portion is 121571974137945425025529607
Trying Pollard rho-1...
373,127819,52721,18040742769701893,1, OK!
C:\>factor 765674960895860548647659458604856094859061115
Trying to factorise 765674960895860548647659458604856094859061115
Trying trial divisions...
Remainding portion is 1168969405947878700225434287946345182990933
Trying Pollard rho-1...
Remainding portion is 1168969405947878700225434287946345182990933
Trying Pollard rho...
5,131,35574947,32859343569728401850477367905744039,1, OK!
C:\>factor 849357309574398572983749827349822289473
Trying to factorise 849357309574398572983749827349822289473
Trying trial divisions...
Remainding portion is 28502879612550708848744918532495127
Trying Pollard rho-1...
Remainding portion is 28502879612550708848744918532495127
Trying Pollard rho...
Remainding portion is 1540239701527958435278069669
Trying Elliptic curves...
3,3,7,11,43,18505483, INCOMPLETE!!!
C:\>factor 32499823472313423412312414243511
Trying to factorise 32499823472313423412312414243511
Trying trial divisions...
Remainding portion is 119925547868315215543588244441
Trying Pollard rho-1...
Remainding portion is 119925547868315215543588244441
Trying Pollard rho...
Remainding portion is 119925547868315215543588244441
Trying Elliptic curves...
271, INCOMPLETE!!!
C:\>factor 23423423523253423423423423524199392991
Trying to factorise 23423423523253423423423423524199392991
Trying trial divisions...
Remainding portion is 23423423523253423423423423524199392991
Trying Pollard rho-1...
APPENDIX B. PROGRAM OUTPUT 47
where 1041 · · · 0001 has 130 digits, and 2304 · · · 1067 is a 113-digit probable
prime. This factorisation took 35 seconds on a Pentium Celeron at 433 MHz.
BIBLIOGRAPHY
[2] Richard P. Brent. Some parallel algorithms for integer factorisation. Tech-
nical report, 1999.
[5] Thorkil Naur. Integer factorisation, daimi report. Technical report, 1982.
[7] J. M. Pollard. A monte carlo method for factorisation. Nordisk Tidskrift for
Informationsbehandling (BIT), 15:331–334, 1975.
[8] Hans Riesel. Prime numbers and computer methods for factorization.
Birkhauster, 1985.
48