Вы находитесь на странице: 1из 55

Integer Factorisation

Vassilis Kostakos

Department of Mathematical Sciences


University of Bath

vkostakos@yahoo.com
http://www.geocities.com/vkostakos

May 7, 2001
MATH0082 Double Unit Project

Comparison of Integer Factorisation Algorithms


Candidate: Kostakos, V
Supervisor: Russell Bradford
SURNAME Checker:
Review date: December 2000
Final submission date: 10 May 2001
Equipment required:

Implement and compare several integer factorisation algorithms.

Algorithms descriptions 15 α
Implementation 15 α
Comparison tests 30 2α
Report and analysis 40 2α

Total 100 6α

Note: All the software files which are refereed to by this report may be
found on the BUCS filesystem at : ~ma9vk\public_html\project\
Abstract

The problem of integer factorisation has been around for a very long time. This
report describes a number of algorithms and methods for performing factori-
sation. Particularly, the Trial Divisions and Fermat algorithms are dicussed.
Furthermore, Pollard’s ρ and p − 1 methods are described, and finally Lenstra’s
Elliptic Curves method. The theory behind each algorithm is explained, so that
the reader can become familiar with the process. Then, a sample pseudocode
is presented, along with the expected running time for each algorithm. Finally,
this report includes test data for each algorithm.
CONTENTS

1 Introduction 1

I Documentation 3
2 Project Plan 4
2.1 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Coding standards . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Requirements 7
3.1 User Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Non-functional Requirements . . . . . . . . . . . . . . . . . . . . 8
3.4 Software and Hardware Requirements . . . . . . . . . . . . . . . 9

4 Testing 10
4.1 Correctness tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Performance tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

II Implementation 12
5 Tools for factorisation 13
5.1 Greatest common divisor . . . . . . . . . . . . . . . . . . . . . . 13
5.2 Fast exponentiation modulo . . . . . . . . . . . . . . . . . . . . . 13
5.3 Primality testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6 Trial divisions algorithm 15


6.1 Description of trial divisions algorithm . . . . . . . . . . . . . . . 15
6.2 Implementation of trial divisions algorithm . . . . . . . . . . . . 15
6.3 Running time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

i
CONTENTS ii

7 Fermat’s algorithm 18
7.1 Quick description of Fermat’s algorithm . . . . . . . . . . . . . . 18
7.2 Detailed description of Fermat’s algorithm . . . . . . . . . . . . . 18
7.3 Implementation of Fermat’s algorithm . . . . . . . . . . . . . . . 19
7.4 Running time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7.5 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

8 The Pollard ρ method 22


8.1 Description of the algorithm . . . . . . . . . . . . . . . . . . . . . 22
8.1.1 Constructing the sequence . . . . . . . . . . . . . . . . . . 22
8.1.2 Finding the period . . . . . . . . . . . . . . . . . . . . . . 23
8.1.3 Calculating the factor . . . . . . . . . . . . . . . . . . . . 24
8.2 Implementation of Pollard ρ . . . . . . . . . . . . . . . . . . . . . 24
8.3 Running time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
8.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

9 The Pollard p − 1 method 27


9.1 Description of the algorithm . . . . . . . . . . . . . . . . . . . . . 27
9.2 A slight improvement . . . . . . . . . . . . . . . . . . . . . . . . 28
9.3 Implementation of Pollard p − 1 . . . . . . . . . . . . . . . . . . . 28
9.4 Running time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
9.5 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

10 Elliptic Curves Method 31


10.1 Introduction to elliptic curves . . . . . . . . . . . . . . . . . . . . 31
10.1.1 Elliptic curves as a group . . . . . . . . . . . . . . . . . . 31
10.1.2 Elliptic curves modulo n . . . . . . . . . . . . . . . . . . . 32
10.1.3 Computation on elliptic curves . . . . . . . . . . . . . . . 32
10.1.4 Factorisation using elliptic curves . . . . . . . . . . . . . . 33
10.2 Implementation of elliptic curves method . . . . . . . . . . . . . 34
10.3 Running time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
10.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

11 Overall Comparison 37

12 Epilogue 39

III Appendices 40
A Benchmarks 41
A.1 Tests with products of two nearby primes . . . . . . . . . . . . . 41
A.2 Tests with products of three nearby primes . . . . . . . . . . . . 42
A.3 Tests with products of three arbitrary primes . . . . . . . . . . . 42

B Program output 43
B.1 Tests output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
B.2 Combined factorisation output . . . . . . . . . . . . . . . . . . . 45
B.3 Biggest factorisation . . . . . . . . . . . . . . . . . . . . . . . . . 47

Bibliography 48
LIST OF TABLES

2.1 My schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

A.1 Products of two nearby primes . . . . . . . . . . . . . . . . . . . 41


A.2 Products of three nearby primes . . . . . . . . . . . . . . . . . . 42
A.3 Products of three arbitrary primes . . . . . . . . . . . . . . . . . 42

iii
LIST OF FIGURES

5.1 Pseudocode for computing gcd(a, b) using the Euclidean algo-


rithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2 Pseudocode for fast computation of ab mod m . . . . . . . . . . 14

6.1 Pseudocode for trial divisions algorithm . . . . . . . . . . . . . . 16


6.2 Results of tests on Trial divisions algorithm . . . . . . . . . . . . 16

7.1 Pseudocode for Fermat’s algorithm . . . . . . . . . . . . . . . . . 20


7.2 Results of tests on Fermat’s algorithm . . . . . . . . . . . . . . . 21

8.1 Pseudocode for the Pollard ρ algorithm . . . . . . . . . . . . . . 25


8.2 Results of tests on the Pollard ρ algorithm . . . . . . . . . . . . 26

9.1 Pseudocode for Fermat’s algorithm . . . . . . . . . . . . . . . . . 28


9.2 Results of tests on Pollard p − 1 algorithm . . . . . . . . . . . . 29

10.1 Pseudocode for main loop of Elliptic curves method . . . . . . . 34


10.2 Pseudocode for NEXTVALUES function of Elliptic curves method . 35
10.3 Results of tests on Elliptic curves algorithm . . . . . . . . . . . 36

iv
CHAPTER 1
Introduction

This report, along with the software which I wrote, consist of my final year
project. The main objective of this report is to balance somewhere between a
theoretical explanation of certain factorisation algorithms and a description of
my source code.

Background information
The problem of factorisation has been known for thousands of years. However,
only recently did it become “popular”. This sudden interest in factorisation was
due to the advances in cryptography, and mainly the RSA public key cryptosys-
tem.
The problem of factorisation may be stated as follows: “Given a composite
integer N , find a nontrivial factor f of N .”
There are a lot of factorisation algorithms out there. Some of them are heav-
ily used, others just serve educational purposes. The factorisation algorithms
may be distinguished in two different ways:
• Deterministic or nondeterministic
• Run time depends on size of N or f .
Deterministic algorithms are algorithms which are guaranteed to find a solution
if we let them run long enough. On the contrary, nondeterministic algorithms
may never terminate. The most usual distinction, however, deals with the run-
time of the algorithm. The running time of recent algorithms dependS on the
size of the input number N , whereas older algorithms depended on the size of
the factor f which they find.

About my project
In doing my project, I tried to cover a broad range of algorithms and methods.
The running time of all the algorithms I have implemented depends on the size

1
CHAPTER 1. INTRODUCTION 2

of the factor f which they find. Furthermore, only the first two algorithms
which I describe are deterministic.

About this report


This report is divided into 3 parts. The first part deals with my preparation
and scheduling for doing the project. Matters like requirements, resources, etc.
are all covered in the first part.
The second part of this report presents an account of all the algorithms
I implemented. For each algorithm, I have tried to describe the theoretical
background in order to make the reader understand what’s going on. Then, I
describe my implementation of the algorithm , along with some sort of pseu-
docode for illustration purposes. Finally, I present my test results, in the form
of a graph. (In Appendices A and B I have included a set of tests on all of the
algorithms).
The third part consists of the Appendices, in which I have includes sample
timings of the algorithms, as well as output of my program.
Part I

Documentation

3
CHAPTER 2
Project Plan

2.1 Resources
I started planning for this project by writing down what resources I though I
was going to need in order to successfully complete the project.
In terms of Hardware, all I needed was a computer, which I already owned.
Furthermore, I could use the computing facilities of the University as well. In
terms of software, I decided that I wanted to write the program using C. There
are lots of different environments for creating C programs. However, I used the
LCC-WIN32 version 3.3 for Windows, which includes an ansi-C compiler.
My main concern was finding a suitable arbitrary-precision library, which
I could use with my program. In the end, I decided to use Mike’s Arbitrary
Precision Math Library (MAPM) version 3.70, written by Michael C. Ring
(ringx004@tc.umn.edu).
Furthermore, I thought that I would also need some kind of books or papers,
which would help me. In addition to the resources listed in the bibliography
section, I also made use of the following programming books:

• Walter A. Burkhard, “C for programmers”, 1988 Wadsworth, Inc.

• Morton H. Lewin, “Elements of C”, Piscataway, New Jersey.


• M.I. Bolsky, “The C Programmer’s Handbook”, AT&T Bell Laboratories,
Prentice Hall, Inc.
• Leslie Lamport, “LaTeX user’s guide and reference manual”, 1994 Addison-
Wesley Publishing Company.

2.2 Scheduling
The next part in planning my project was to devise of a schedule, which would
roughly be my guide in what I do. In table 2.1 you can see my schedule, or to
be precise, the final version of my schedule.

4
CHAPTER 2. PROJECT PLAN 5

Schedule
Weeks Tasks
1 (Semester 1)
2 (Semester 1)
3 (Semester 1) Signed up for LEGO maze-solving robot
4 (Semester 1) Preliminary research on Robot movement, etc.
5 (Semester 1)
6 (Semester 1) Wrote first version of software for robot.
7 (Semester 1) NEW PROJECT: Integer factorisation
8 (Semester 1) Looking for a maths library
9 (Semester 1) Found the MAPM library, performance tests
10 (Semester 1) Implement trial divisions algorithm
11 (Semester 1) Wrote low-level functions for MAPM
12 (Semester 1) Implemented Fermat’s algorithm
(Christmas) Revise for exams
(Christmas) Revise for exams
(Christmas) Revise for exams
13 (Exams) Exams
14 (Exams) Exams
15 (Exams) Exams
1 (Semester 2) Research into Pollard’s algorithms
2 (Semester 2) Implement MODEXPO, GCD, PRIME functions
3 (Semester 2) Pollard’s ρ algorithm
4 (Semester 2) Tests on all algorithms so far implemented
5 (Semester 2) Pollard p − 1. Read about Elliptic curves
6 (Semester 2) Elliptic curves algorithm and testing
7 (Semester 2) Function interface modifications, more tests
8 (Semester 2) Developed COMBINED function. Started report
(Easter) Test result analysis, graph generation
(Easter) Report writting
(Easter) Report writting
9 (Semester 2) Report writting
10 (Semester 2) Report revision, final version preparation
11 (Semester 2) DEADLINE

Table 2.1: My schedule

I tried to follow my schedule as close as possible. Sometimes, I made changes


to it, in order to accommodate any new tasks I thought were required. The final
version of my schedule resembles quite a lot my initial schedule, however I have
made a number of changes.

2.3 Coding standards


It is always a good idea to specify some coding standards before starting a
project, even if only one person is going to do any coding.
First of all, I should say that all the source files were compiled using the
-ansi flag. I received no warning messages when compiling the final version of
my program.
Here are some guidelines which I followed:
CHAPTER 2. PROJECT PLAN 6

• Function names beginning with m belong to the MAPM library. Specifi-


cally, the functions that begin with m apm are functions which are defined
in the library itself. Any other functions beginning with m are macros of
functions in the MAPM library, which I defined in order to shorten the
code.

• Function names beginning with M are low-level functions which interface


the MAPM library. I wrote these functions in order to improve the per-
formance of the program, and shorten the code as well.

• The prototypes for functions in file xxx.c are placed in the file xxx.h.

It was obvious that the software program I was creating was quite modular,
and could be built in “big chunks” at a time. Therefore, I decided to use
a common algorithm testing interface. This meant that I would place each
algorithm in a separate file, and use a common file to call the factorisation
routines. This would also make it possible to call all of my algorithms from
another function in an effort to factorise a really hard number.
By doing the above, I was planning to minimise the effort of adding a new
algorithm to my program, and make the testing of different algorithms quicker
and easier.
CHAPTER 3
Requirements

This chapter describes all the requirements and specifications that I used for
implementing this project. Of course, these requirements were in no case static.
In fact, they would change quite often, as I moved further into the project. A
change in the requirements would often reflect upon a new idea that I came up
with, or an idea that I wanted to drop. Therefore, these are the requirements
at the end of the project.

3.1 User Definition


The first thing that I had to specify was my target audience. It helps a lot
to know who you want to look at your work. I guess it would be too naive to
assume that my audience consisted of the two examiners that would assess my
project. On the other hand, I wouldn’t like to embark on a commercial software
project, which would target a large piece of the market.
With the above in mind, I chose my audience to be the academic community.
Such an audience is not really keen about software that blows and whistles,
but is more interested in the theoretical background. In fact, I believe that
my project could be used for educational purposes, because it demonstrates a
simple implementation of some fundamental mathematical concepts.
Of course, when I refer to my project, I refer to both the software as well as
the final report. Therefore, my choice of the academic community as an audience
should have an effect on both the software and the final report.

3.2 Functional Requirements


I believe that it was clear that my software should accept as input an integer
N , and produce as output a factorisation p1 p2 · · · pn of N . But there is more to
it than just that.
A very important requirement was that the software should be able to per-
form arbitrary-precision arithmetic. In other words, it should be able to deal
with really long numbers, and perform calculations on them.

7
CHAPTER 3. REQUIREMENTS 8

Also, the software should output the computational time that was required
to complete the factorisation, and also verify that the results it gives are correct.
This should also be done while running long tests, and in which case the results
should be somehow stored on a disk file.

3.3 Non-functional Requirements


The most important element of the non-functional requirements deals with the
algorithms that the program will implement. Therefore, I decided to implement
the following algorithms.
• Trial division algorithm
• Fermat’s algorithm
• Pollard’s ρ method
• Pollard’s p − 1 method
• Elliptic curves method
The fact that I chose not to implement one of the “big” algorithms, namely
MPQS and NFS, is that I did not have enough time. By applying a variety of
smaller algorithms, I got the flavour of different methods and theories, on which
the very advanced algorithms are based.
In terms of the user interface, I believe that a GUI was not something really
required. Therefore, I chose to implement a command line interface, with simple
input and output.
The source code of the program was divided into the following files:

MAIN.C This is the main file of the program. Nothing special here.
MAIN.H Main header file. Contains definition of output destination for pa-
rameterised compilation.
AL TRIAL.C This file contains the source code for the trial divisions algo-
rithm.
AL FERMT.C This file contains the source code for fermat’s method.
AL PRHO1.C This file contains the source code for Pollard’s p − 1 method.
AL PRHO.C This file contains the source code for Pollard’s ρ method.
AL ELLCRVS.C This file contains the source code for the Elliptic curve
method.
TESTS.C Here are defined some tests for measuring the speed of each algo-
rithm.
TESTS.H This file contains parameters for the testing routine.
MYLIB.C In this file I have included some of my “tool” functions, as well
as some low-level functions for the arbitrary-precision arithmetic library I
used.
CHAPTER 3. REQUIREMENTS 9

MYLIB.H This file includes function prototypes as well as macro definitions.

COMBINED.C This file contains a function which utilises all the factoring
algorithms. It tries to factor a given number by applying the different
algorithms until the number has been completely factorised, or until it
gives up.

3.4 Software and Hardware Requirements


I developed the software on an MS-Windows 98 machine, with an Intel Celeron
433MHz processor. However, the software is capable of running on any machine
which fulfills the minimum MS-Windows 95 requirements. Also, the source code
may be compiled under a different operating system (Unix, Linux, etc.) in order
to produce compatible versions of the program.
CHAPTER 4
Testing

The tests I performed for my project come in two flavours. First, I had to
test my algorithms to see if they ran as expected, i.e. try to find “bugs” in
the program. However, I also ran lots of performance tests, ie perform lots of
factorisations in order to get a feeling of performance of each algorithm.

4.1 Correctness tests


Most of my testing for correctness was performed “in place” with the program.
Essentially, I had to make sure that my algorithms did indeed perform a fac-
torisation. This is quite easy to check within the main flow of the program, so
I felt that there was no need for separate testing modules.
By just adding a couple of lines of code, I was able to test the correctness of
my results every time I performed a factorisation. This way, I was constantly
checking for errors, even when I was running the performance tests.
I should note at this point that all my checking was performed (inevitably)
using the facilities of the MAPM library. I guess that if the MAPM library
contained any sort of errors, my checks, and in fact my whole program, would
be erroneous.

4.2 Performance tests


I had to perform two separate kinds of performance tests. First of all, I ran
tests on the library MAPM, to get a feel for its capabilities. These tests were
supposed to give me an approximation of how fast this library was, and how to
judge my algorithms according to the library’s capabilities.
The second, and most important kind of performance test was to benchmark
the algorithms I implemented. These kinds of tests I usually performed after
I felt that an algorithm was fully implemented. The results of these tests are
included in the last section of each algorithm’s chapter. I have tried to evaluate
these tests, to the best of my abilities, and perhaps draw on some conclusions.

10
CHAPTER 4. TESTING 11

In the Appendix A I have tried to perform a mini “benchmarking” scheme,


where all the algorithms were given the same numbers, and their performance
was timed and entered into a table. Although I did not run too many of these
tests, I felt that the results were quite within what I expected.
Finally, in Appendix B I have included some sample printouts of the perfor-
mance tests for each algorithm, as well as sample output of my final program,
which utilises all the algorithms in order to factorise an input number.
Part II

Implementation

12
CHAPTER 5
Tools for factorisation

Before proceeding with the actual algorithms and their description, it would
be useful to describe some “tool” algorithms which are used throughout the
factorisation algorithms.

5.1 Greatest common divisor


This algorithm is by far the most used algorithm in my program. It is used
by all the factorisation methods I have implemented. A very efficient routine
for finding the greatest common divisor of two numbers a and b would greatly
enhance the performance of the factorisation algorithms.
In figure 5.1 I have included pseudocode for finding the gcd(a, b) using the
Euclidean method.

WHILE b 6= 0 DO
temp := b
b := a MOD b
a := temp
RETURN a

Figure 5.1: Pseudocode for computing gcd(a, b) using the Euclidean algorithm

5.2 Fast exponentiation modulo


The idea behind fast exponentiation is that if the exponent is a power of 2 then
we can exponentiate by successively squaring:

x8 = ((x2 )2 )2

13
CHAPTER 5. TOOLS FOR FACTORISATION 14

n = 1
WHILE b 6= 0
IF b is odd THEN
n := n × a MOD m
b := bb/2c
a := a × a MOD m

Figure 5.2: Pseudocode for fast computation of ab mod m

x256 = (((((((x2 )2 )2 )2 )2 )2 )2 )2 .
If the exponent is not a power of 2, then we use its binary representation, which
is just a sum of powers of 2:

x291 = x256 × x32 × x2 × x1 .

The pseudocode shown in figure 5.2 will quickly compute ab mod m. The
way it works is that it finds the binary representation of b, while at the same
time compute successive squares of a. The variable n records the product of the
powers of a, and also contains the final result at the end of the computation.

5.3 Primality testing


According to Fermat’s little theorem, if n is odd and composite and n satisfies

2n−1 ≡ 1 (mod n)

then we say that n is pseudoprime. Therefore, for any number n, we can just
compute the value 2n−1 (mod n) using the algorithm 5.2, and then simply check
to see if the return value is 1 or not.
Despite the fact that this test is not a 100% guarantee of primality, in practice
it is very useful. This test can be made stronger by computing the same values
for the bases 2,3,5,7, and then checking to see if all of them yield the result 1.
CHAPTER 6
Trial divisions algorithm

The most straight-forward algorithm for factorising an integer is using trial divi-
sions. This algorithm is a good place to start, and it is quite easy to understand.

6.1 Description of trial divisions algorithm


This algorithm essentially tries to factorise an integer N using “brute force”.
Starting at p = 2, this algorithm tries to divide N with every number until it
succeeds. When this happens, it sets N ← N/p, and resumes its operation.
The way in which we choose our p can speed up, or slow down, our algorithm.
For instance, we could pick our p’s sequentially, by adding 1 at every iteration.
Even better, we could divide N by 2 and 3, and then keep adding 2 to p in
order to generate a sequence of odd numbers. The fastest way, but with more
memory requirements, is to generate a list of all prime number below a specified
limit, and then assign those values to p.

6.2 Implementation of trial divisions algorithm


In figure 6.1 you can see the pseudocode of my implementation. I have not made
any attempts to optimise this algorithm, and so I have used the “naive” way of
choosing my p’s, i.e. by adding 1 to the trial divisor at every iteration.
As far as the source code is concerned, this function accepts the following
parameters:

• n: The number to be factorised. Note that no changes are made to the


original value of this variable.
• max: This variable sets the limit of the maximum test factor to be used.

• factors: An array of MAPM variables, in which the factors of n will be


written.

15
CHAPTER 6. TRIAL DIVISIONS ALGORITHM 16

INPUT N
test factor := 2
WHILE (N > 1 AND test factor < max)
IF (N MOD test factor) == 0 THEN
N := N / test factor
PRINT test factor
ELSE
test factor := test factor + 1

Figure 6.1: Pseudocode for trial divisions algorithm

Figure 6.2: Results of tests on Trial divisions algorithm

6.3 Running time


According to [2], the expected running time of this algorithm is

O(f · (log N )2 ),

where f is the size of the factor found. The efficiency of this algorithm depends
on your strategy of choosing the trial divisors p, as explained earlier. In figure
6.2 you can see the results of the tests of my implementation of this algorithm.
The graph shows the factor size versus the amount of time it took, from a
sample of 1427 factorisations. As expected, the amount of time the algorithm
takes increases exponentially with the size of the factor found. Practically, after
6 or 7 digits, this algorithm becomes too expensive.
CHAPTER 6. TRIAL DIVISIONS ALGORITHM 17

6.4 Remarks
One of the features of this algorithm is that if we let it run long enough on a
prime Np , it will prove the primality of Np . In most cases this is not wanted,
and it is regarded as a waste of effort. However, this algorithm is very fast
in finding prime factors of size less than 5-6 decimal digits. Furthermore, this
algorithm may be used in breaking up composite factors which are found using
the algorithms described in the following chapters.
CHAPTER 7
Fermat’s algorithm

The first of the modern algorithms that I will describe is due to Fermat. It
is not usually implemented these days unless it is known that the number to
be factored has two factors which are relatively close to the square root of the
number. However, this algorithm contains the key idea behind two of the most
powerful algorithms for factorisation, the Quadratic Sieve and the Continued
Fractions algorithm.

7.1 Quick description of Fermat’s algorithm


Fermat’s idea is the following. Let the number to be factored be N . Suppose
that N can be written as the difference of two squares, such as
N = x2 − y 2
Instantly, we could write N as (x − y)(x + y), and thus we have successfully
broken N into two factors. The two factors may not be prime. In that case, we
could recursively apply this process until we deduce a prime factorisation for N .

7.2 Detailed description of Fermat’s algorithm


The first step in describing this algorithm is to prove that every odd number N
can be written as a difference of squares.
Let us suppose that N = a × b. Since we assumed N to be odd, then both
a and b must be odd. Now, let us define x and y as follows:
x = (a + b)/2, y = (a − b)/2
Then, if we try to work out x2 − y 2 for the above values, we get
x2 − y 2 = (a2 + 2ab + b2 ) − (a2 − 2ab + b2 ) = ab = n.
Fermat’s algorithm works in the opposite direction from trial division. When
we apply trial division, we start by looking at small factors, and we work our

18
CHAPTER 7. FERMAT’S ALGORITHM 19

√ √
way up to N . In Fermat’s algorithm, we start by looking for factors near N ,
and work our way down.

7.3 Implementation of Fermat’s algorithm


Now I will describe an implementation of Fermat’s algorithm. As I mentioned
earlier,
√we search for integers x and y such that x2 − y 2 = N . We can start with
x = d N e, and try increasing y until x − y 2 is equal or less than N . If it is
2

equal to N then we are done! If not, we increase x by one, and we iterate.


In order to further optimise the algorithm, let us set r = x2 − y 2 − N .
Therefore, we have success when r = 0. All that we really want to do is keep
track of r. The value of r can change only when we increase x by one or y by
one. When we replace x2 with (x + 1)2 , variable r increases by 2x + 1. We
could express this increase in r by setting u = 2x + 1. Similarly, when y 2 is
replaced by (y + 1)2 the variable r decreases by 2y + 1. This decrease in r can
be expressed as v = 2y + 1. (Note that when x and y increase by one, u and v
increase by two.)
Having defined r, u, and v, we can proceed with our implementation. It
√ we do not actually need the values x√and y. Since we start by
turns out that
setting√x =e N d and y = 0, it follows that u = 2d N e + 1 and v = 1. Also,
r = (d N e)2 − N .
All we now have to do is define an increase in x and an increase in y.
According to the definition of u and v, an increase to x by 1 would increase r
by u, and u by 2. Similarly, and increase to y by 1 would decrease r by v, and
increase v by 2.
The algorithm is completely defined. All we now have to do is keep increasing
x and y (in practice u, v, and r), until r = 0. When r is zero, we can compute
(x+y) and (x-y) as follows:

x + y = (u + v − 2)/2, x − y = (u − v)/2

At this point, I believe that some sort of pseudocode would be most appro-
priate in order to fully understand my implementation. Figure 7.1 contains the
pseudocode which describes my implementation.

7.4 Running time


How much work is actually needed to find the factors of N ? Let us suppose that
N = a × b, with a < b. The factorisation
√ will be achieved when x = (a + b)/2.
Since the starting value of x is N , and b = N/a, the factorisation will take
approximately √
1 N √ ( N − a)2
(a + ) − N =
2 a 2a
cycles. √
If the two factors of N are really close, i.e. if a = k N , with 0 < k < 1,
then the number of cycles required in order to obtain the factorisation is

(1 − k)2 √
N.
2k
CHAPTER 7. FERMAT’S ALGORITHM 20

INPUT N √
sqrt := d N e
u := 2 * sqrt + 1
v := 1
r := sqrt * sqrt - N
WHILE r <> 0
IF r > 0 THEN /* Keep increasing y */
WHILE r > 0
r := r - v
v := v + 2
IF r < 0 THEN /* Increase x */
r := r + u
u := u + 2
PRINT (u + v - 2) / 2
PRINT (u - v) / 2

Figure 7.1: Pseudocode for Fermat’s algorithm

1
This complexity is of the order O(cN 2 ). However, the value of k can be very
small, and thus making this algorithm impractical. For instance, let us consider
1 2
an “ordinary” case where a ≈ N 3 and b ≈ N 3 . In such a case, the number of
cycles necessary will be
√ √ √ √
( N − 3 N )2 ( 3 N )2 ( 6 N − 1)2 1 2
√ = √ ≈ N 3,
23N 23N 2
1
which is considerably higher than O(N 2 ). Therefore this algorithm is only
practical when the factors a and b are almost equal to each other.
In figure 7.2 you can see the test results of my implementation of Fermat’s
algorithm, from a sample of 2075 factorisations. Again, the graph shows the
relation of the size of the factor found versus the amount of time it took. As
we expected, this algorithm become too slow for factors with 7 or more digits.
The graph follows the same trend as the trial divisions algorithm. In practice
however, we will prefer the trial divisions algorithm.

7.5 Remarks
This algorithm has a very nice feature: it does not involve multiplication. We
have defined the variables r, u, v in such a way that we only need to perform
addition and subtraction. This is why sometimes this algorithm is called fac-
torising by addition and subtraction. However, the number of additions and
subtractions that we have to perform is quite large. For example, in order to
factorise
1783647329 = 84449 × 21121
we need to increase x 10551 times, and y 31664 times.
CHAPTER 7. FERMAT’S ALGORITHM 21

Figure 7.2: Results of tests on Fermat’s algorithm

Additionally, this algorithm suffers from the same problem as trial divisions,
it will prove primality in the worst case. If this algorithm is given a prime
number p, then the results will eventually be 1 and p. By the way, this is
even worst than proving primality with trial √ divisions. The total number of
cycles required for proving primality is n − n, which is much worst than trial
divisions.
CHAPTER 8
The Pollard ρ method

This method is also called Pollard’s second factoring method or the Monte Carlo
Method because of it pseudo random nature. It is based on a “statistical” idea
[7] and has been refined by Richard Brent [1]. The ideas involved for finding
the factors of a number N are described below.

8.1 Description of the algorithm


In short, the algorithm comprises of the following steps:

1. Construct a sequence of integers {xi } which is periodically recurrent


(mod p), where p is a prime factor of N .

2. Search for the period of repetition, i.e. find i and j such that xi ≡ xj (mod p).
3. Calculate the factor p of N .

8.1.1 Constructing the sequence


The first step in finding a factor is to construct a sequence of periodically recur-
rent values. Let us consider a recursively defined sequence of numbers, according
to the formula
xi ≡ f (xi−1 , xi−2 , . . . , xi−k ) (mod m)
where m is any arbitrary integer, and given the initial values x1 , . . . , xk . This
means that the values xk+1 , xk+2 , . . . can be computed by using the k previously
computed values. However, all the values are computed mod m, and therefore
there are only m possible values that each xi can take. This means that there
are at most ms distinct sequences of s values. Therefore, after at most ms + 1
values, we will have two identical sequences of s consecutive numbers. Let these
sequences of s values be xi , xi+1 , . . . , xi+s−1 and xj , xj+1 , . . . , xj+s−1 . Since
these sequences are identical, it follows that their next elements, namely xi+s
and xj+s respectively, will be the same. In fact, every element xi+s+n and
xj+s+n will be identical thereafter.

22
CHAPTER 8. THE POLLARD ρ METHOD 23

This means that the sequence {xi } is periodically repeated, except maybe
from a part at the beginning which is called the aperiodic part. This part can
be thought of the “tail” of the Greek letter ρ. Once we get off the tail, we keep
cycling around the same sequence of values. That’s why this algorithm is known
as the Pollard ρ algorithm.
Back to our problem, instead of random integers {xi }, it would be sufficient
to recursively compute a sequence of pseudo-random integers. The simplest way
to do this would be to define a linear formula such as

xi+1 ≡ axi (mod N )

for a fixed a and x0 . Unfortunately, it turns out that this does not produce
sufficiently random values to give a short period of recurring values. This means
that we would have to compute a lot of values before we can identify the period
of recurrence.
The next simplest choice is to use a quadriatic formula such as

xi+1 ≡ x2i + a (mod N )

for a fixed a and x0 . It has been empirically observed that the above expression
does produce sufficiently random values1 . Pollard found that in such a sequence

{xi } of integers mod N an element is usually recurring after only about C N
steps.

8.1.2 Finding the period


The second step of the algorithm is to search for the period within the sequence
{xi }. To determine it in the most general case would require finding where
a sequence of consecutive elements is repeated if the period is long. This is
quite a tedious task, and is ruled out by the amount of labour involved. In the
simplest case however, where xi is defined in terms of xi−1 only, the sequence
will start to repeat as soon as any single xk is the same as any of the previous
ones. Therefore, in order to find the period, we only need to compare each new
xj with the previous values.
The original version of Pollard’s method used Floyd’s cycle-finding algorithm
for finding the period.2 Suppose the sequence {xi } (mod m) has an aperiodic
part of length a and a period of length l. The period will then ultimately be
revealed by the test: Is x2i ≡ xi (mod m)?
The ρ method of Pollard has been made about 25% faster by a modification
to the cycle-finding algorithm due to Brent [1]. As we saw above, Pollard
searched for the period of the sequence xi (mod m) by considering x2i − xi
(mod m). Instead, Brent halts xi when i = 2k − 1 and subsequently considers

x2n −1 − xj , 2n+1 − 2n−1 ≤ j ≤ 2n+1 + 1.

In this way the period is discovered after fewer arithmetic operations than de-
manded by the original algorithm of Pollard. The saving in Brent’s modification
is due to the fact that the lower xi ’s are not computed twice as in Floyd’s algo-
rithm.
1 Note that this is not true if a is either 0 or -2.
2 The proof of Floyd’s cycle-finding algorithm is omitted.
CHAPTER 8. THE POLLARD ρ METHOD 24

8.1.3 Calculating the factor


Finally, consider the third and last step of Pollard’s ρ method. If we have
a sequence {xi } that is periodic (mod N ), how can we find p, the unknown
factor of N ?
In section 8.1.1 we saw that the formula xi ≡ xi2 + a (mod N ) is sufficient
to give us a desired sequence of pseudo-random numbers. Now, let us introduce
the formula
yi = xi (mod p)
where p is the unknown factor of N . This formula gives rise to a few nice proper-
ties. The sequence {yi } is periodic, and eventually we will have yi = yj (mod p).
But when this happens, then xi = xj (mod p), which means that p will divide
xi − xj . Therefore, by taking the GCD of xi − xj and N , we have a very good
chance of finding a non-trivial divisor of N .
All this is nice, except from the fact that we do not know p, which means
that we cannot compute {yi }, and therefore we do not know when yi will equal
yj . This is where the algorithms for finding the period in a periodic sequence
are used. What we do is that we use Floyd’s or Brent’s algorithm to choose
lots of xi ’s and xj ’s, and we each time we compute the GCD of xi − xj and N .
Usually, the GCD will be one. But as soon as xi ≡ xj (mod p), then xi − xj
will be divisible by p, which means that the GCD will be a non-trivial divisor
of N .
A further improvement that can be made to both versions of Pollard’s ρ
algorithm has as follows. Instead of computing the GCD at every cycle of the
algorithm, we can accumulate the product of differences of all the pairs we have
considered. After say 20 cycles, we can compute the GCD of this product and
N , without risking to miss any factors of N . This way, the burden of computing
a GCD at each cycle is reduced to one subtraction and one multiplication.

8.2 Implementation of Pollard ρ


As with the previous algorithms, I implemented Pollard’s ρ algorithm in a single
function. The parameters that the function expects are:
• n: The number to be factorised. Note that no changes are made to the
original value of this variable.

• max: This variable sets the limit of the maximum test factor to be used.
• factors: An array of MAPM variables, in which the factors of n will be
written.
In figure 8.1 you can see the pseudocode of my algorithm. Note that the
constant a, which is used to generate the pseudorandom sequence, is hardcoded
in the function. It is quite an easy task to change its value, in order to get a
different sequence of numbers.
CHAPTER 8. THE POLLARD ρ METHOD 25

INPUT N , c, max
x1 := 2
x2 := x12 + c /* Our chosen function */
range := 1
product := 1
terms := 0
WHILE terms < max DO
FOR j := 1 to range DO
x2 := (x22 + a) MOD N /* Our chosen function */
product := product × (x1 - x2) MOD N
terms := terms + 1
IF (terms MOD 20 == 0) THEN
g := gcd(product, N )
IF g > 1 THEN
PRINT g
N := N / g
product := 1
next values(x1, x2, range) /* Brent’s improvement */

Figure 8.1: Pseudocode for the Pollard ρ algorithm

8.3 Running time


Under plausible assumptions, the expected running time of Pollard’s ρ algorithm
is
O(f 1/2 · (log N )2 ),
where f is the size of the factor found. Figure 8.2 shows the test results of my
implementation, from a sample of 4997 factorisations. There are a number of
conclusions and comments to be made about this funny-looking graph.
First of all, we have to remember that this algorithm is not deterministic,
but probabilistic. Therefore, the results might contradict themselves at some
points. For instance, at first glance one might think that this algorithm takes
more time to finds small factors than larger ones. However, this is not entirely
true.
You should keep in mind that this graph only contains timings of successful
factorisations. So, although the times for 20-digit factors are quite small, the
“success rate” of the algorithm is quite low for such factors.
The best way to explain this graph if we observe its patterns. There is an
obvious pattern for each group of factors. The timings seem to build up slowly,
and then explode very high. If this pattern is true for 20-digit numbers as well,
we can see that the graph only contains the first part of the pattern, where the
timings are quite small. If we had enough space to fit the entire graph, then
when the pattern for 20-digit factors completed itself, its height could as much
as the Eiffel tower’s!
CHAPTER 8. THE POLLARD ρ METHOD 26

Figure 8.2: Results of tests on the Pollard ρ algorithm

8.4 Remarks
The method that I have just described for finding prime factors of composite
integers is probabilistic. This means that we have to be prepared to be unlucky
on occasion, and not get any results. If we run the Pollard ρ algorithm and do
not find any prime divisors that might be because there are no prime divisors
in the appropriate interval or it might be because of bad luck.
What we need to do in such situations is to change our luck. For this
algorithm, this would mean to change certain constants, such as the recursive
function described in section 8.1.1. Then, of course, we have to know when it is
time to give up, and perhaps try another algorithm. In practice, after running
trial divisions up to 106 or 107 , one would run the Pollard ρ algorithm for a
while. Keep in mind, however, that if all the prime factors are roughly larger
than 1012 then this algorithm will not usually work.
CHAPTER 9
The Pollard p − 1 method

The next algorithm that I will consider is known as the Pollard p − 1 algorithm
[6]. It formalises several rules, which have been known for some time. The
principle here is to use information concerning the order of some element a of
the group MN of primitive residue classes mod N to deduce properties of the
factors of N.

9.1 Description of the algorithm


This algorithm is pretty much based on Fermat’s little theorem: If p is prime,
and a 6= 0 mod p then
ap−1 ≡ 1 (mod p).
Now, let us suppose that the number to be factored is N , and that one of its
prime factors is p. Also, assume that p − 1 divides Q. Using Fermat’s theorem,
and under the assumption that (p − 1)|Q, we arrive at

aQ ≡ 1 (mod p),

and therefore p divides aQ − 1. Now, we can apply GCD to N and aQ − 1 to


get p or some other non-trivial divisor of N .
Our problem now is to find a Q such that (p − 1)|Q, and keeping in mind
that we do not know p. This can be done in two ways. The easiest way is to
set Q = max! (mod n). This value can be computed quickly, since

amax! = (· · · (((a1 )2 )3 )4 · · · )max ,

and because as we saw in Section 5.2, exponentiation modulo N is very fast.


Note that a can be any number, as long as it is relatively prime to N .
Another, more efficient way to choose Q is to set Q = p1 p2 · · · pk , where pi
is a prime number less than a specified limit. In such a case we should also
append to Q some additional multiples of the small primes, so as not to miss
out any factors of N . This will cut the number of exponentiations required by
about a factor of eight.

27
CHAPTER 9. THE POLLARD P − 1 METHOD 28

INPUT N , c, max
m := c
FOR i := 1 to max DO
m := modexpo(m, i, N )
IF (i MOD 10 == 0) THEN
g := gcd(m-1, N )
IF g > 1 THEN
PRINT g

Figure 9.1: Pseudocode for Fermat’s algorithm

No matter how we choose Q, we have to keep in mind that essentially the


size of Q is what limits our search space. For instance, by choosing Q = 10000!
we are assuming that p − 1 has prime factors less than 10000.

9.2 A slight improvement


In practice, we do not know how close we have to get to max before we have
picked up the first prime divisor of N . And we do not want to go so far that we
pick them all up. For that reason, we periodically check the value of GCD(aQ −
1, N ). If it is still 1, we continue. If it is N , then we have picked up all the
divisors of N . In such a case we need to either backtrack a bit, or try using a
different a.

9.3 Implementation of Pollard p − 1


In this section I will describe how I implemented the algorithm, as well as discuss
certain issues that came up while implementing this algorithm.
The function accepts the following parameters:
num: the number to be factorised
c, max: so that Q = cmax!

factors: an array where the factors of num are stored


The algorithm is essentially a loop which runs until we have reached the
specified limit of iterations, which is max. In most literature, this limit is set to
10000, so I decided to follow this guideline. My implementation uses the simple
way of choosing Q, i.e. setting Q = 10000!, and subsequently calculating 210000!
(mod N ). This is done using the procedure modexpo, which was described in
section 5.2.
Every 10 cycles, the program calculates the gcd of the current 2k! (mod N )
and N , using the algorithm described in section 5.1. If the gcd is greater than
one, then the gcd is written in the factors array. Subsequently, the program
sets N ← N/gcd. If the remaining N is composite, then the procedure is applied
recursively to the new N , otherwise the function terminates.
CHAPTER 9. THE POLLARD P − 1 METHOD 29

Figure 9.2: Results of tests on Pollard p − 1 algorithm

The pseudocode of my implementation is shown in figure 9.1. I should note


that my implementation makes no effort in backtracking or changing a in case
gcd is equal to 1. It is up to the caller of the function to choose an appropriate
a (c) and limit of iterations (max).

9.4 Running time


In the worst case, Pollard’s p − 1 algorithm takes as long as the trial divisions
algorithm. However, it usually does better, provided that we are lucky enough
to find a factor.
In figure 9.2 I have plotted the results of 14217 factorisations using this
algorithm. As previously, the graph contains timings derived only from the
successful tests, not the ones that failed. The patterns in the graph resemble
greatly the graph of Pollard’s ρ algorithm. However, there is another point to
be made about this algorithm.
Apparently, Pollard’s p − 1 algorithm is much faster that Pollard’s ρ algo-
rithm, but with less success. It turns out that the algorithm sheldomly gives
back results, but when it does, it is very fast. This is why I had to perform so
many tests on this algorithm, because more than 70% of the tests failed.

9.5 Remarks
This algorithm has the same problems as the previous one. As described earlier,
at some point we might find the GCD to be equal to N . In such cases we will
want to try to change the base a to a different integer. Also, the algorithm
might not terminate if p − 1 has only large prime factors.
CHAPTER 9. THE POLLARD P − 1 METHOD 30

It has been statistically found that the largest prime factor of an arbitrary
integer N usually falls around N 0.63 . Therefore, with a limit of 10000, Pollard
p − 1 will find prime factors that are less than two million. We should keep in
mind however that there is a fairly wide distribution of the largest prime factor
of N , and therefore factors much larger than two million may be found.
According to [2], the largest factor found by this algorithm during the Cun-
ningham project is a 32-digit factor

49858990580788843054012690078841

of 2977 − 1.
I should also note that because of Pollard p − 1, the RSA public key crypto-
system has restrictions on the primes a and b that are chosen. Essentially, if
a − 1 or b − 1 have only small prime factors, then Pollard p − 1 will break the
encryption very quickly.
CHAPTER 10
Elliptic Curves Method

Factorisation based on elliptic curves is a relatively new method. As its name


implies, this method is based on the theory of elliptic curves. First, I will
briefly describe what elliptic curves are, and demonstrate the theory behind
them. Then, I will go on with the description of the factorisation method using
elliptic curves.

10.1 Introduction to elliptic curves


Elliptic curves are equations of the form
y 2 = x3 + ax + b,
where a and b are constants, such that
4a3 + 27 =
6 0.
These curves have the curious property that if a line intersects it at two points,
then it will also have a third point of intersection. A tangent to the curve is
considered to have two points of intersection at the point of tangency.
If we know the two points (x1 , y1 ), (x2 , y2 ) of intersection, we can compute
the slope λ of the line, as well as the third point of intersection in the following
way:
( 2
3x1 +a
2y if x1 = x2 ,
λ = y1 −y2
x1 −x2 otherwise
x3 = λ2 − x1 − x2 (mod n)
y3 = λ(x3 − x1 ) + y1 (mod n)

10.1.1 Elliptic curves as a group


In order to perform factorisation with elliptic curves, we need to make the set
of points on an elliptic curve into a group. To do this, we must define a binary
operation ∂, the identity element, as well as the inverse.

31
CHAPTER 10. ELLIPTIC CURVES METHOD 32

We start by defining the binary operation as follows:


(x1 , y1 )∂(x2 , y2 ) = (x3 , −y3 )
where x3 and y3 are computed as shown earlier. Note that the new point is
not the third point of intersection, but its reflection across the x-axis. It is still,
however, on the same elliptic curve.
Now we proceed with defining the identity element of our group as follows:
(x, y)∂(x, −y) = (x, −y)∂(x, y) = ∞
With the above definition, we have managed to define both the identity element
and the inverses. The identity element ∞ can be thought of as a point far north,
such that every vertical line passes through it.
In terms of notation, E(a, b) denotes the group of rational points on the
curve y 2 = x3 + ax + b, where 4a2 + 27b2 = 6 0, together with the point ∞. Also,
with (xi , yi ) we denote (x1 , y1 )#i, where
(x1 , y1 )#i = (x1 , y1 )∂(x1 , y1 )∂ · · · ∂(x1 , y1 ) .
| {z }
i times

10.1.2 Elliptic curves modulo n


All our reasoning from the previous sections still applies to elliptic curves modulo
n.
If x1 ≡ x2 (mod n) and y1 ≡ −y2 (mod n) then
(x1 , y1 )∂(x2 , y2 ) = ∞.
Let s be the inverse of x1 − x2 . As before, we define:
š
(3x21 + a) × s if x1 ≡ x2 (mod n),
λ =
(y1 − y2 ) × s otherwise
x3 = λ2 − x1 − x2 (mod n)
y3 = λ(x3 − x1 ) + y1 (mod n)
Furthermore, we will define the binary operation as
(x1 , y1 )∂(x2 , y2 ) ≡ (x3 , −y3 ) (mod n),
and we will define (xi , yi ) mod n as
(xi , yi ) ≡ (x1 , y1 )#i (mod n).
Finally, | E(a, b)/n | will denote the elliptic group modulo n whose elements are
pairs (x, y) of non-negative integers less than n and satisfying y 2 ≡ x3 = ax + b,
together with the point ∞.

10.1.3 Computation on elliptic curves


In order to implement factorisation, we need a fast way of computing (x, y)#i.
Given the first coordinate of (x1 , y1 ), we can compute the first coordinate of
(x2 , y2 ) as follows:
(x2 − a)2 − 8bx
x2 = .
4(x3 + ax + b)
CHAPTER 10. ELLIPTIC CURVES METHOD 33

Therefore, given the first coordinate of (x, y)#i, we can compute the first coor-
dinate of (x, y)#2i using the above formula. We can extend this to 2i + 1 with
the following formula:

(a − xi xi+1 )2 − 4b(xi + xi+1 )


x2i+1 = .
x1 (xi − xi+1 )2

As you can see, such computations involve lots of fractions. We can avoid
using rational numbers if we introduce the notion of a triplet (X, Y, Z), where

x = X/Z, y = Y /Z,

and where X,Y , and Z are integers. Another nice feature of this notation is
that the identity element ∞ now has the explicit representation (0, Y, 0), where
Y can be any integer.
If we define (Xi , Yi , Zi ) = (X, Y, Z)#i, we can adjust our previous formulas
to our new notation:

X2i = (Xi2 − aZi2 )2 − 8bXi Zi3 ,


Z2i = 4Zi (Xi3 + aXi Zi2 + bZi3 ),
X2i+1 = Z((Xi Xi+1 − aZi Zi+1 )2 − 4bZi Zi+1 (Xi Zi+1 + Xi+1 Zi )),
Z2i+1 = X1 (Xi+1 Zi − Xi Zi+1 )2 .

I should note that for our purposes, we do not need to calculate the second
coordinate Y of the triplets. Still, Yi can always be recovered from Xi and Zi .
Also, we can use our triplets modulo n, as long as we do all our computations
modulo n.

10.1.4 Factorisation using elliptic curves


The method I will be describing is essentially due to A. K. Lenstra, and H. W.
Lenstra, Jr.
Let N be a composite number relatively prime to 6. (In practice, this means
that N has no small factors). We randomly choose a for our elliptic curve, and
a random point (x, y) on the curve. We can now compute b as follows:

b ≡ y 2 − x3 − ax (mod N ).

We convert to triplets (X, Y, Z), with our initial triplet being (x, y, 1).
If p is a prime number which divides N , and | E(a, b)/p | divides k!, then

(X, Y, Z)#k! = (· · · (((X, Y, Z)#1)#2) · · · )#k

will be the identity element in E(a, b)/p (but not in E(a, b)). This simply means
that there is at least one coordinate of (X, Y, Z)#k! which is not divisible by
N , but all the coordinates are divisible by p.
Since Zk! is divisible by p, there is a good chance that the greatest common
divisor of Zk! and N is a non-trivial divisor of N .
CHAPTER 10. ELLIPTIC CURVES METHOD 34

INPUT N , X, Y, a, max
b := Y2 - X3 - aX MOD N
g := gcd(4a3 + 27b2 , N )
IF g > 1 THEN
PRINT g
Z := 1
k := 2
WHILE k <= max DO
FOR i := 1 to 10 DO
NEXTVALUES(X,Z,k,N ,a,b)
k := k + 1
g := gcd(Z, N )
IF g > 1 THEN
PRINT g

Figure 10.1: Pseudocode for main loop of Elliptic curves method

10.2 Implementation of elliptic curves method


My implementation of the Elliptic Curves Method consists of two “big” functions
and four “smaller” functions. The first two are shown in figures 10.1 and 10.2.
The main loop of the algorithm uses the same structure as some of our previous
algorithms. Essentially, we loop many times, and at each iteration we take the
gcd of N and Z. This function accepts the following parameters:
• n: The number to be factorised. Must be relatively prime to 6.
• X, Y: These are arbitrary integers, between 1 and n.
• a: An arbitrary integer, the first parameter of our curve.
• max: This variable sets the limit of the maximum iterations.
• factors: An array of MAPM variables, in which the factors of n will be
written.
Most of the work, however, is done in the NEXTVALUES function. This func-
tion is responsible for calculating the first and third coordinates of our triplets.
This algorithm uses the binary expansion of k in order to find the results. By
doing this, it manages to compute Xk and Zk by successively computing X2i or
X2i+1 in a minimum number of steps.
The four “small” functions that I mentioned use the formulas from section
10.1.3 to compute the values X2i , X2i+1 , Z2i , Z2i+1 . Their implementation is
quite straightforward.

10.3 Running time


According to [2], under √
plausible assumptions, the expected running time of
this algorithm is O(exp( c ln f ln ln f ) · (log N )2 ), where c ≈ 2 is a constant.
CHAPTER 10. ELLIPTIC CURVES METHOD 35

/*Calculates the first and third


coordinates of (X, Y, Z)#k (mod N ).*/
INPUT X, Z, k, n, a, b
i := 0
C[] := BINARY(k)
X1 := X
Z1 := Z
X2 := X2i (X,Z)
Z2 := Z2i (X,Z)
FOR i := length(C[])-1 TO 1 DO
U1 := X2i+1 (X1,Z1,X2,Z2)
U2 := Z2i+1 (X1,Z1,X2,Z2)
IF C[i] == 0 THEN
temp := X2i (X1,Z1)
Z1 := Z2i (X1,Z1)
X1 := temp
X2 := U1
Z2 := U2
ELSE
temp := X2i (X2,Z2)
Z2 := Z2i (X2,Z2)
X2 := temp
X1 := U1
Z1 := U2
PRINT X1, Z1

Figure 10.2: Pseudocode for NEXTVALUES function of Elliptic curves method


CHAPTER 10. ELLIPTIC CURVES METHOD 36

Figure 10.3: Results of tests on Elliptic curves algorithm

In figure 10.3 you can see the results of 6398 factorisations which I performed
using this algorithm.
As in the Pollard p − 1 algorithm, we can speed up this algorithm by re-
stricting k to a set of powers of primes less than max rather than running over
all integers less than max. Also, we can expect better results if we regularly in-
terrupt the run and restart with a new set of parameters rather than persisting
on our initial choice of parameters.

10.4 Remarks
The Elliptic Curve Method has the characteristic of being practical from the
point where trial division becomes impossible until well into the range where
MPQS and NFS can be implemented.
The largest factor that has been found by ECM is the 53-digit factor of
2677 − 1, according to [2]. Note that if the RSA system was implemented with
512-bit keys and the three-factor variation, the smallest prime would be less
than 53 digits, so Elliptic Curves could be used to break the system.
CHAPTER 11
Overall Comparison

The various factorisation methods I have described are all useful in different
situations. When factoring a large number, the method to be chosen must
depend on knowledge about the factors of the number. To begin with, you
must make sure that the number is composite, so that you do not make a long
computer run which will result in nothing. It would be really frustrating to
discover after a very long run that N has the prime factorisation N = 97 · p,
which could have been obtained almost immediately by using trial division.
Further, you could use Fermat’s method in case N is the product of two al-
most equal factors, or Pollard’s p−1 method in the event of N having one factor
p with p − 1 being a product of only small primes. There are lots of methods
for finding middle-sizes factors, all of which are good for specific situations.
But the question remains: how long do you keep looking for these middling
sized factors before pulling out something like the Quadriatic Sieve or NFS? A
well-balanced strategy, developed by Naur [5] may be summarised as follows:

1. Make sure N is composite. Since very small divisors are quite common
and are found very quickly by trial division, it is worthwhile attempting
trial division up to 100 or 1000 even before applying a strong pseudoprime
test.

2. Perform trial division up to 105 or 106 . If Pollard’s ρ method is available,


then trial division need only be performed to a much lower search limit,
e.g.. 104 , since the small divisors will fall out rapidly also with Pollard’s
methods. One reason why trial division with the small primes is useful,
despite the fact that Pollard’s ρ method is quicker, is that the small factors
tend to appear multiplied together when found with Pollard’s method, and
thus have to be separated by trial division anyhow. Apply a compositeness
test on what is left of N every time a factor has been found and removed.

3. At this point you need to take a long shot, and with a little luck, shorten
the running time enormously – it could even be decisive for quick success
or complete failure in the case when N is very large. The strategy to be
employed is: Take the methods you have implemented on your computer

37
CHAPTER 11. OVERALL COMPARISON 38

covering various situations, which will mean one or more of the following:
Pollard’s p−1 and p+1 methods, Fermat’s , Shanks’, or even the Williams’
methods. The methods should be capable of being suspended and resumed
from where they stopped.
Since you cannot possibly know in advance which of these methods will
achieve a factorisation (If a factorisation will be found at all), it is a good
technique at this stage to run the program of each method in sequence
for a predetermined number of steps, say 1000 or 10000, and breaking
the runs off at re-start points in order to be able to proceed, if necessary.
If N does not factorise during such a run you have to repeat the whole
process from the re-start point of the previous run. Also, you might want
to consider the possibility of changing your choice of constants.

4. If the number N has still not been factored, you will need to rely upon
the “big algorithms”. Depending on the size of the number and on the
capacity of your computer, this can be the Multiple Polynomial Quadriatic
Sieve (MPQS), the Number Field Sieve (NFS), or even the Elliptic Curves
Method. Now you have to sit down and wait; fairly good estimates of the
maximal running times are available for all these methods, so that you
will know approximately how long the computer run could take.

Choosing which methods to use, and when, is still more an art than a science.
You should keep in mind that the “big” algorithms are much more cumbersome
and it is worth spending at least a few minutes trying to vary your luck first.
Theoretically and experimentally, it has been shown that you have a better
chance of finding your mid-sized factors if you run several algorithms with several
choices of parameters rather than spending the same amount of time on a single
algorithms with a single set of parameters.
CHAPTER 12
Epilogue

When I started my work on this project, I had very little knowledge of this field
of study. Factorisation was something that I had never encountered before, at
least not in great detail. I believe that this was to my advantage, since I was
able to write my report from an introductory point of view, paying attention to
the points which were hard for me to understand.
During the course of my project, I had to make lots of choices regarding
the material which I would study. For instance, I chose not to implement one
of the “big guns” of factorisation, such as MPQS or NFS. I believe that my
choices allowed me to focus on the quality of what I did, instead of doing many
things, but without enough care. This way, I was able to firmly understand the
concepts of these elementary algorithms, and thus obtain a good background in
the subject.
Of course, my project included lots of programming. Although I was already
familiar with the programming language I used (ANSI C), I was able to further
develop my programming skills. My final program consisted of roughly 1500
lines of code, which means that my program was not small. Furthermore, I de-
veloped a sense of responsibility as far as organisation procedures are concerned,
such as keeping a logbook and doing tests.
I am really happy to have managed to keep the balance between the theoret-
ical and practical issues in doing my project. Although my report tends more
towards theory, nevertheless I did quite a lot of work on the actual software.
Thus I have been able to produce a complete tutorial of factorisation, including
the theoretical description and background, the pseudocode description, and the
actual implementation. I hope that this project will be helpful to those who get
their hands on it.

39
Part III

Appendices

40
APPENDIX A
Benchmarks

In this appendix I have included some sample test results of all the algorithms
I implemented. Theses tests used certain numbers which I specifically chose for
some reason, and not any arbitrary numbers.

A.1 Tests with products of two nearby primes


For the first set of tests, I used numbers which were products of two nearby
primes. In table A.1 you can see the results of this set of tests.

Algorithm number 1 number 2 number 3 number 4


Trial Divisions 0.05 1.71 8.58 15.81
Fermat Method 0.01 0.02 0.02 0.02
Pollard’s p − 1 0.12 0.72 1.9 0.39
Pollard’s ρ 0.02 0.33 0.59 0.61
Lenstra’s ECM 0.08 0.11 0.56 0.93
number 1 = 3980021 = 1993 x 1997
number 2 = 16831170221 = 129733 x 129737
number 3 = 431589872009 = 656951 x 65695
number 4 = 1469322167111 = 1212121 x 1212191

Table A.1: Products of two nearby primes

As it was expected, Fermat’s algorithm was by far the fastest. Another


point to make is the quick time of Pollard p − 1 for number 4. If we look at
the decomposition of the factors of number 4 (minus one), we find out that
1212121 − 1 = 1212120 = 2 · 2 · 2 · 3 · 3 · 5 · 7 · 13 · 37. This means that p − 1
had small factors, and that’s why Pollard’s p − 1 algorithm found the factors so
quickly. Finally, I should note that Pollard’s ρ algorithm was faster that Trial
Divisions, even though the numbers were relatively small.

41
APPENDIX A. BENCHMARKS 42

A.2 Tests with products of three nearby primes


For this set of tests, I used numbers which were products of three nearby primes.
In table A.2 you can see the results of this set of tests.

Algorithm number 1 number 2 number 3 number 4


Trial Divisions 0.07 0.14 0.26 2.72
Fermat Method - - - -
Pollard’s p − 1 0.26 0.68 0.28 1.74
Pollard’s ρ 0.05 0.05 0.07 0.29
Lenstra’s ECM 0.12 0.67 0.82 0.70
number 1 = 7956061979 = 1993 x 1997 x 1999
number 2 = 110154695923 = 4789 x 4793 x 4799
number 3 = 1019829472003 = 10061 x 10067 x 10069
number 4 = 1005660644975291 = 100183 x 100189 x 100193

Table A.2: Products of three nearby primes

The results showed that Pollard’s ρ algorithm was the fastest. The Elliptic
Curves Method was fairly quick for large factors. For smaller factors, Trial
Divisions was quicker. Note that it made no sense to run this test on Fermat’s
method, since this method is used with numbers which have two factors.

A.3 Tests with products of three arbitrary primes


For the last set of tests, I used numbers which were products of three arbitrary
primes. The size of the factors gradually grows as we move on to the next
number. In table A.3 you can see the results of this set of tests.

Algorithm number 1 number 2 number 3 number 4


Trial Divisions 0.11 0.221 3.122 8.26
Fermat Method - - - -
Pollard’s p − 1 0.23 0.90 5.46 -
Pollard’s ρ 0.07 0.06 0.14 1.31
Lenstra’s ECM 0.53 0.46 6.781 0.94
number 1 = 14960418503 = 179 x 8467 x 9871
number 2 = 8355211084777 = 1163 x 12347 x 581857
number 3 = 416531649825896503 = 12983 x 987533 x 32487877
number 4 = 153674304751986405509 = 762479 x 1276237 x 157921783

Table A.3: Products of three arbitrary primes

In this set, Pollard’s ρ algorithm was again the fastest overall. We see,
however, that for very large factors, ECM showed its capabilities by being the
fastest. Also, I should say that Pollard’s p − 1 algorithm simply gave up for
number 4, so I do not have a timing for that. We also see the Trial Divisions
timings growing quite rapidly.
APPENDIX B
Program output

In this Appendix, I have included some sample output of my program. This


output was either created by running the performance tests, or running the
“combined” version of my program.

B.1 Tests output


Trial divisions algorithm
8 14700064 17.140 2,2,2,2,2,459377, OK!
8 14920780 0.000 2,2,5,7,197,541, OK!
8 16279397 1.420 401,40597, OK!
8 17470133 26.090 23,759571, OK!
8 18292242 1.820 2,3,59,51673, OK!
8 19120508 0.110 2,2,11,103,4219, OK!
8 20635253 0.550 1231,16763, OK!
8 21426599 0.110 43,181,2753, OK!
8 22389487 4.340 173,129419, OK!
8 24145844 1.100 2,2,193,31277, OK!
8 24772976 52.950 2,2,2,2,1548311, OK!
8 27083076 78.700 2,2,3,2256923, OK!
8 28960820 0.220 2,2,5,7,31,6673, OK!
8 29620253 79.590 13,2278481, OK!
8 30567968 0.050 2,2,2,2,2,421,2269, OK!
8 33072882 10.000 2,3,19,290113, OK!
8 34633266 0.050 2,3,47,191,643, OK!
8 37023334 651.580 2,18511667, OK!
8 38003210 0.000 2,5,7,31,83,211, OK!
8 38324435 261.010 5,7664887, OK!

Fermat’s algorithm
12 103447054117 192.950 29666491,3487, OK!
12 107658803491 25.540 4206901,25591, OK!

43
APPENDIX B. PROGRAM OUTPUT 44

12 118366881563 0.990 499373,237031, OK!


12 121823707817 1.040 518251,235067, OK!
12 122404743091 766.150 118494427,1033, OK!
12 131816524371 0.050 377727,348973, OK!
12 132132493075 283.470 43680163,3025, OK!
12 137348837443 404.580 63616877,2159, OK!
12 149761336043 0.820 515471,290533, OK!
12 158116709061 3.740 973703,162387, OK!
12 161430148107 0.930 548707,294201, OK!
12 167785725663 22.080 3733799,44937, OK!
12 178428901625 706.010 109802401,1625, OK!
12 187189598215 0.880 573865,326191, OK!
12 204614067397 3.730 1019497,200701, OK!
12 241531775015 12.740 2338951,103265, OK!
12 256718621685 0.820 631545,406493, OK!
12 264985242949 66.460 10837399,24451, OK!
12 279740763407 6021.260 929371307,301, OK!
12 286185151757 19107.040 2778496619,103, OK!
12 290275123679 36.200 6102191,47569, OK!
12 299489220963 0.440 613593,488091, OK!
12 324305418991 289.230 45123893,7187, OK!
12 349234244871 4.510 1255069,278259, OK!

Pollard rho algorithm


15 119353790409531 78.050 69, ERROR!!!
15 125351514673454 4.060 2,1381,67049,676883, OK!
15 149331533171784 5.990 24,28961,214845731, OK!
15 152348612751036 4.560 19188,349,22750103, OK!
15 172004878340591 89.800 NO FACTORS FOUND
15 194957421274879 73.440 1231, ERROR!!!
15 205920181957521 75.910 34383, ERROR!!!
15 222008721116816 6.260 59888,47777,77591, OK!
15 249557297942163 108.310 123, ERROR!!!
15 297738900701700 113.040 300, ERROR!!!
15 321283014530048 5.600 1024,431,1223,595229, OK!
15 351838344957934 149.070 2, ERROR!!!
15 377118177660380 5.440 118330146740,3187, OK!
15 407893389920692 28.230 52,6367,1231993663, OK!
15 453023303965458 7.960 6,1321483,57135721, OK!
15 501651885394403 91.950 3767, ERROR!!!
15 581362260122354 104.240 2,1597, ERROR!!!
15 631626096053032 7.800 8,805813,97979633, OK!

Pollard p-1 algorithm


20 17983096255782676173 71.900 87, ERROR!!!
20 18732302463106516915 42.290 5,4716563,794320036141, OK!
20 19633168496947424017 70.140 53,83, ERROR!!!
20 20054808357175364639 0.600 89,461,227,2153286050833, OK!
20 21240599822239584893 73.880 101, ERROR!!!
APPENDIX B. PROGRAM OUTPUT 45

20 24333145381449307839 0.440 3459,853,8247050571457, OK!


20 25281400673763563085 17.140 765,6281321,36013,146093, OK!
20 26159111191007383487 0.110 1429,397,46110544251599, OK!
20 28423862054615105617 0.110 77,3141601,117500938421, OK!
20 29474651485752579413 79.310 NO FACTORS FOUND
20 30334734795998729991 79.040 3, ERROR!!!
20 33204165949884833525 0.990 58387475,2797,203320147, OK!
20 35955144223483810055 0.050 5,7191028844696762011, OK!
20 38847376691209313199 0.050 8127,4780038967787537, OK!
20 40772255186800963589 2.910 761,334423,160207903963, OK!
20 44695231498724965775 72.220 2725,23, ERROR!!!
20 45751739079576358967 0.060 37,1236533488637198891, OK!

Elliptic curves algorithm


16 2033370515132833 1.370 67,30348813658699,43, ERROR!!!
16 2144333894601389 0.500 19, ERROR!!!
16 2351222877625469 0.330 59,39851235213991, OK!
16 2480636328028919 3.080 197, ERROR!!!
16 2588922079845487 0.280 37259, ERROR!!!
16 3532991485651031 23.510 5233,675136916807, OK!
16 3766348308874693 0.940 349,10791828965257, OK!
16 3861434629423001 0.330 19,203233401548579, OK!
16 4209004773580541 0.110 2057, ERROR!!!
16 8394595909750987 0.330 NO FACTORS FOUND
16 9220175949479131 0.880 637,1109, ERROR!!!
16 9701531376381211 0.000 107, ERROR!!!
17 10644389056710187 1.210 NO FACTORS FOUND
17 11166864797389741 0.330 11,1015169527035431, OK!
17 11489043630188369 0.160 2191,5243744240159, OK!
17 12078054826846641 0.000 NO FACTORS FOUND
17 12380759253388243 0.160 37, ERROR!!!
17 14432713913707519 6.260 1277, ERROR!!!
17 25621001377456313 0.330 NO FACTORS FOUND
17 28111503784053067 0.170 161,1577, ERROR!!!
17 30979002302099723 476.530 45247,684664227509, OK!
17 32178036153737471 0.330 1501,21437732280971, OK!

B.2 Combined factorisation output


This section includes output produced when my program used the “combined”
function for factorising. All it means is that all my algorithms were called one
after the other, and tried to factorise part of the input number. The output of
the program is quite self-explanatory.

C:\>factor 298347004781928719247912
Trying to factorise 298347004781928719247912
Trying trial divisions...
Remainding portion is 53923156875619
APPENDIX B. PROGRAM OUTPUT 46

Trying Pollard rho-1...


Remainding portion is 53923156875619
Trying Pollard rho...
2,2,2,3,83,409,6791,411469,131050351,1, OK!

C:\>factor 45346346353453643534522543411
Trying to factorise 45346346353453643534522543411
Trying trial divisions...
Remainding portion is 121571974137945425025529607
Trying Pollard rho-1...
373,127819,52721,18040742769701893,1, OK!

C:\>factor 765674960895860548647659458604856094859061115
Trying to factorise 765674960895860548647659458604856094859061115
Trying trial divisions...
Remainding portion is 1168969405947878700225434287946345182990933
Trying Pollard rho-1...
Remainding portion is 1168969405947878700225434287946345182990933
Trying Pollard rho...
5,131,35574947,32859343569728401850477367905744039,1, OK!

C:\>factor 849357309574398572983749827349822289473
Trying to factorise 849357309574398572983749827349822289473
Trying trial divisions...
Remainding portion is 28502879612550708848744918532495127
Trying Pollard rho-1...
Remainding portion is 28502879612550708848744918532495127
Trying Pollard rho...
Remainding portion is 1540239701527958435278069669
Trying Elliptic curves...
3,3,7,11,43,18505483, INCOMPLETE!!!

C:\>factor 32499823472313423412312414243511
Trying to factorise 32499823472313423412312414243511
Trying trial divisions...
Remainding portion is 119925547868315215543588244441
Trying Pollard rho-1...
Remainding portion is 119925547868315215543588244441
Trying Pollard rho...
Remainding portion is 119925547868315215543588244441
Trying Elliptic curves...
271, INCOMPLETE!!!

C:\>factor 23423423523253423423423423524199392991
Trying to factorise 23423423523253423423423423524199392991
Trying trial divisions...
Remainding portion is 23423423523253423423423423524199392991
Trying Pollard rho-1...
APPENDIX B. PROGRAM OUTPUT 47

Remainding portion is 1480901784361979099919290859467623


Trying Pollard rho...
Remainding portion is 1480901784361979099919290859467623
Trying Elliptic curves...
15817, INCOMPLETE!!!

B.3 Biggest factorisation


During the course of this project, I performed lots of factorisations. The largest
one I achieved was done using Pollard’s ρ algorithm, and has as follows:
1041979940506209714136430511217320000000000000000000000000
0000000000000000000000000000000000000000000000000000000000
00000000000001 = 247· 1667· 49891· 2200717· 23048379600231661
8883049050652428355273922788632352489739844628952085821986
62338027412731196022283540032318521067

where 1041 · · · 0001 has 130 digits, and 2304 · · · 1067 is a 113-digit probable
prime. This factorisation took 35 seconds on a Pentium Celeron at 433 MHz.
BIBLIOGRAPHY

[1] Richard P. Brent. An improved monte carlo factorization algorithm. Nordisk


Tidskrift for Informationsbehandling (BIT), 20:176–184, 1980.

[2] Richard P. Brent. Some parallel algorithms for integer factorisation. Tech-
nical report, 1999.

[3] Donald E. Knuth. The art of computer programming, Volume 2. Addison-


Wesley, second edition, 1981.

[4] Evangelos Kranakis. Primality and cryptography. B.G. Teubner, 1986.

[5] Thorkil Naur. Integer factorisation, daimi report. Technical report, 1982.

[6] J. M. Pollard. Theorems on factorisation and primality testing. Proc. Cambr.


Philos. Soc., 76:521–528, 1974.

[7] J. M. Pollard. A monte carlo method for factorisation. Nordisk Tidskrift for
Informationsbehandling (BIT), 15:331–334, 1975.
[8] Hans Riesel. Prime numbers and computer methods for factorization.
Birkhauster, 1985.

48

Вам также может понравиться