Ocaml For Scientists

Flying Frog Consultancy Ltd.
, 2005
Dedicated to Emma
For more products and services by Flying Frog Consultancy Ltd., visit our website:
http://www.ffconsuItancy.com
Contents
1 Introduction 1
1.1 Good programming style . 1
1.2 A Brief History of OCaml .2
1.3 Benefits of OCaml ... 2
1.4 Running OCaml programs . 3
1.4.1 Top-level ...... 3
1.4.2 Byte-code compilation 4
1.4.3 Native-code compilation 5
1.5 OCaml syntax. . . . . . . 5
1.5.1 Language overview 5
1.5.1.1 Types .. 5
1.5.1.2 Variables and functions 7
1.5.1.3 Tuples, records and variants 9
1.5.1.4 Lists and arrays 12
1.5.1.5 If ....... 13
1.5.1.6 Program composition 14
1.5.1.7 More about functions 14
1.5.1.8 Modules. 16
1.5.2 Pattern matching . 16
1.5.2.1 Guarded patterns 18
1.5.2.2 Erroneous patterns. 19
1.5.2.3 Good style ... 20
1.5.2.4 Nested patterns 23
1.5.2.5 Parallel pattern matching 23
1.5.3 Exceptions ........ 24
1
ii CONTENTS
1.5.4 Polymorphism 26
1.5.5 Currying .... 28
1.6 Functional vs Imperative programming . 28
1.7 Recursion .. 30
1.8 Applicability 32
2 Program Structure 33
2.1 Nesting 33
2.2 Factoring 34
2.3 Modules. 37
2.3.1 Signatures . 39
2.3.2 Structures . 39
2.3.3 Anonymous signatures 41
2.3.4 Use of the IntRange module. 42
2.3.5 Another example 43
2.4 Objects .... 46
2.4.1 Classes. 46
2.4.2 Objects 47
2.4.2.1 Immediate objects 47
2.4.2.2 Classed objects . 48
2.4.2.3 Inheritance 49
2.5 OCaml browser 50
2.6 Compilation .. 51
2.6.1 Linking with libraries 55
2.7 Custom top-levels . ...... 57
3 Data Structures 59
3.1 Algorithmic Complexity 59
3.1.1 Primitive operations 60
3.1.2 Complexity ..... 61
3.1.2.1 Asymptotic complexity 61
3.2 Arrays 64
3.3 Lists 68
3.4 Sets 73
CONTENTS
3.5 Hash tables
3.6 Maps ...
3.7 Summary
3.8 Heterogeneous containers
3.9 Trees .
3.9.1 Balanced trees
3.9.2 Unbalanced trees
4 Numerical Analysis
4.1 Number representation .
4.1.1 Integers .....
4.1.2 Floating-point numbers
4.2 Quirks.
4.3 Algebra
4.4 Interpolation
4.5 Quadratic solutions .
4.6 Mean and variance
4.7 Other forms of arithmetic
4.7.1 Rational arithmetic.
4.7.2 High-precision floating point.
4.7.3 Adaptive precision .
5 Input and Output
5.1 Printing to screen .
5.2 Reading from and writing to disc
5.3 Marshalling . . . . .
5.4 Lexing and Parsing .
5.4.1 Lexing.
5.4.2 Parsing
6 Visualization
6.1 Overview of OpenGL .
6.1.1 GLUT.
6.2 Basic rendering
iii
78
80
83
84
85
90
91
101
101
101
102
105
.105
107
108
109
111
111
.112
.113
115
.115
116
.117
.118
119
123
127
127
128
130
iv
6.2.1 Geometric primitives
6.2.2 Filling...
6.2.3 Projection.
6.2.4 Animation.
6.3 'Iransformations..
6.4 Efficient rendering
6.5 Rendering scientific data .
7 Optimization
7.1 Profiling.
7.2 Algorithmic optimization.
7.3 Lower-level optimizations
7.3.1 Benchmarking data structures.
7.3.2 Automated transformations ..
7.3.2.1 Compiler optimizations
7.3.2.2 Defunctorizing
7.3.3 Manual transformations
7.3.3.1 Tail-recursion.
7.3.3.2 Deforesting ..
7.3.3.3 Terminating early
7.3.3.4 Specializing data structures
7.3.3.5 Avoiding polymorphic numerical functions.
7.3.3.6 Unboxing data structures . . . . . . . . . .
8 Libraries
8.1 Command-line arguments
8.2 Timing ..
8.3 Big arrays
8.4 Vector-Matrix.
8.5 Fourier transform .
CONTENTS
131
.133
133
135
136
137
139
145
.145
.148
149
149
153
.153
154
.154
.154
157
158
159
160
161
165
165
167
169
171
.172
CONTENTS v
9 Simple Examples 111
9.1 Arithmetic . .177
9.2 List related 181
9.2.1 count
181
9.2.2 position
182
9.2.3 mapi
182
9.2.4 chop 184
9.2.5 dice
185
9.2.6 replace 186
9.2.7 sub ...
186
9.2.8 extract
187
9.2.9 randomize. 187
9.2.10 permute ..
188
9.2.11 Run-length encoding 189
9.3 String related . . . . . . 189
9.3.1 string_of_list
190
9.3.2 DNA sequence 10 190
9.3.3 Matrix 10 .
192
9.4 Array related
194
9.4.1 map2 .
194
9.4.2 Double folds.
194
9.4.3 rotate ....
195
9.4.4 Matrix trace 196
9.5 Higher-order functions
196
9.5.1 Data structures of functions 196
9.5.2 Tuple related ..... .197
9.5.3 Generalised products . 198
9.5.4 Converting between container types 199
10 Complete Examples 201
10.1 Maximum entropy method . .201
10.1.1 Formulation . . . .202
10.1.2 Implementation. .203
vi
10.1.2.1 Lexer .
10.1.2.2 Parser.
10.1.2.3 Main program
10.1.2.4 Compilation
10.1.2.5 Results ...
10.1.2.6 Optimisation
10.2 Global minimization ....
10.2.1 The mutate function
10.2.2 Efficiency ....
10.2.3 Implementation.
10.2.3.1 Lexer .
10.2.3.2 Parser.
10.2.3.5 Results ...
10.3 Finding nth-nearest neighbours
10.3.1 Formulation. .. ..
10.3.2 Implementation .
10.3.2.1 Lexer .
10.3.2.2 Parser.
10.3.3 Results
lOA Eigen problems
10.4.1 Implementation.
10.4.2 Results .....
10.5 Discrete wavelet transform.
Bibliography
CONTENTS
. .203
.204
.205
.208
.209
.209
.210
.211
.213
.214
.214
.215
.216
.221
.222
.222
.222
.224
.226
.226
.228
.233
.233
.234
.235
.236
.237
239
CONTENTS vii
A Advanced Topics 241
A.l Data sharing ......... .241
A.2 Labelled and optional arguments .242
A.3 Defining binary infix operators .244
AA Installing top-level pretty printers. .245
A.5 Monomorphism .245
A.6 Functors ... .246
A.7 Memoization .247
A.8 Polymorphic variants . .249
A.9 Phantom types ... .250
A.I0 Exponential type growth. .251
B Troubleshooting 253
B.l Dangerous if . .253
B.2 Scoping subtleties. .255
B.3 Evaluation order .256
BA Constructor arguments . .257
B.5 Recycled types .... .257
B.6 Mutable array contents . .258
B.7 Polymorphic problems .259
B.8 Local and non-local variable definitions. .259
viii CONTENTS
Preface
This book aims to encourage the scientific community to adopt stricter approaches to computer
programming, emphasising correctness over performance, beginning with the selection of the
Objective CAML language due to its inherent safety. Although scientists are the principal
target audience, anyone interested in learning more about modern programming techniques is
likely to benefit from reading this book.
Due to the widespread adoption of computers for everything from the logging and analysis of
experimentally observed data to the computationally-intensive simulation of physical systems,
the computer is now a vitally important tool for scientists. However, poor approaches to
programming are endemic in current scientific culture. Specifically, more worth is placed
on scientific results than on the creation of generic programs which could have been used
to generate many more such results, leading to the constant redevelopment of disposable
programs. If this can be cured, science will benefit from professional quality (correct, reusable
and future-proof) programs and data formats which will greatly accelerate the rate of scientific
discovery.
This book may be divided into three main parts. Chapters 1-5 introduce the reader to
the syntax of Objective CAML and the creation and execution of working programs based
upon the most important features found in the language. Chapters 6-8 deal with extended
functionality available via libraries and optimisation. Finally, chapters 9 and 10 present a
variety of examples. In particular, chapter 10 describes the creation of complete programs
capable of solving some of the most important types of problem found in computational
science.
IX
x CONTENTS
Notation
The sets of integers, real and complex numbers are denoted as Z, IR and C respectively.
The unit imaginary number is denoted as i = V-I according to the convention of the physical
sciences (engineers use i). Real and imaginary parts of a complex number z = x+iy, x,y E IR
are denoted as Re[z] = x and Im[z] = y respectively. The complex conjugate is denoted
z* = x - iy. In complex polar notation z is written z = re
i8
, r, () E IR where r = Izl is termed
the modulus of z and () = arg[z] is termed the argument of z.
Vectors are written in bold typeface (e.g. r) and default to r E IR3.
Directed integer rounding functions are referred to as floor and ceiling and are denoted by
and formulated as:
lxJ = max{n E Z;n::; x} and fxl = min{n E Z;n 2:: x}
respectively.
Ranges are written using round or square braces to indicate exclusive or inclusive range ends
respectively, e.g. for an integer range [1 ... n) == {I ... n - I}.
Derivatives of functions with respect to the first argument can be written in shorthand nota-
tion, e.g. q,1I(X, y) == ~ q x y) and q,'(l, y) == ~ : IX=l'
Inner products of complex-valued functions are written using Dirp,c notation, Le.
(JIg) = f: f*(x)g(x) dx
Readers should be aware that the standard mathematical notation (g, f) == (JIg) is often used
in other literature.
Fourier transforms are denoted and formulated according to convention for the physical sci-
ences, with the forward transform:
and the reverse transform:
Readers should be aware that alternate formulations exist in related fields.
xi
xii CONTENTS
The set operations union, intersection and difference are denoted by the symbols U, n and
\ respectively. The cartesian product of two sets A and B is written A x B. For example,
{1,2} x {a, b, c} is the set of pairs {{I, a}, {I, b}, {I, c}, {2, a}, {2, b}, {2, en.
Function LP norms, denoted by IlfIILP' are defined as:
1
IlfllLP = (i: If(x)I
P
dX) P
and we default to the L
2
norm, e.g. the Plancherel equality may be written 11111
2
= IIfl1
2
'lifE L
2
(lR). In particular, L
2
(lR) is the Hilbert space of functions f : lR ---t C with IIfl1
2
ER
A function f which maps values from a set A onto a set B is written L: A ---t B. Typically,
thisjs used to indicate the argument and return types of a function, e.g. f : lRx lR ---t C (parsed
as f : (lR x lR) ---t C) is a function which maps two real numbers (expressed as an element in
the cartesian product of the set of real numbers with itself) onto a complex number.
The variance aJ of a function f satisfying IIfl1
2
= I is defined as:
aJ = i:t
2
f(t) dt - (i: tf(t) dt) 2
where a f is known as the standard deviation.
The r-function r[z] : C ---t C is defined to be:
r[z] = 1
00
tz-1e-
t
dt
Glossary
Glossary of terms
A-function an anonymous function.
Abstract type a type with a visible name but hidden implementation. Abstract types are
created by declaring only the name of the type in a module signature, and not the
complete implementation of the type as given in the module structure.
Accumulator a variable used to build the result of a computation. The concept of an
accumulator underpins the fold algorithm (introduced on page 36). For example, in the
case of a function which sums the elements of a list, the accumulator is the variable
which holds the cumulative sum while the algorithm is running.
Algorithm a mathematical recipe for solving a problem. For example, Euler's method is a
well-known algorithm for finding the largest common divisor of two numbers.
Array a flat container which provides random access to its elements in 0(1) time-complexity.
See section 3.2.
Associative container A container which represents a mapping from keys to values.
Asymptotic complexity an approximation to the complexity of an algorithm, derived in the
limit of infinite input complexity and, typically, as a lower or upper bound. For example,
an algorithm with a complexity f(n) = 3+3n+n
2
has an asymptotic complexity 0(n
2
).
See section 3.1.
Balanced tree a tree data structure in which the maximum variation in depth can be shown
to tend to a finite value in the limit of an infinite number of leaves in the tree. Often this
restriction is tightened to require that the variation in depth is no more than a single
level. See section 3.9.1 for a brief discussion.
Binary tree a tree data structure in which all non-leaf nodes contain exactly two binary
trees.
Byte code a representation of a program which is intermediate between the source code
and machine code. For example, the ocamlc compiler transforms OCaml source code
into a platform-independent byte-code. Section 1.4.2 describes how to compile OCaml
programs into byte code.
xiii
xiv CONTENTS
Cache an intermediate store used to accelerate the fetching of a subset of data.
Cache hit the quick process of retrieving data which is already in the cache.
Cache miss the slow process of fetching data to fill the cache when a request is made for
data not already in the cache.
Cache coherent accessing data (typically in memory) sequentially, or more sequentially than
random access, in order to minimize cache misses.
Cartesian cross product a set-theoretic form of outer product. For example, the cartesian
cross product of the set A = {a, b} with the set B = {c, d, e} is the set of pairs A X B =
{(a,c),(a,d),(a,e),(b,c),(b,d),(b,e)}.
Class expression definition of values and methods implemented in any object created from
this class.
Class type declaration of values and methods which any object adhering to this class type
must provide.
Compile-time while a program is being compiled.
Compiler a program capable of transforming other programs. For example, the compiler
ocamlopt transforms OCaml source code into executable machine code.
Complexity a quantitative indication of the growth of the computational requirements (such
as time or memory) of an algorithm with respect to its input. Algorithmic complexity
is described in section 3.1.
Cons the:: operator. When used in a pattern, h: :t is said to decapitate a list, binding h to
the first element of a list (the head) and t to a list containing the remaining elements
(the tail). When used in an expression, h: : t prepends the element h onto the list t.
See sections 3.3 and 9.2 for example uses of the cons operator.
Container a data structure used to store values. The values held in a data structure are
known as the elements of the data structure. Arrays, lists and sets are examples of data
structures.
Curried function any function which returns a function as its result. See section 1.5.5.
Data structure a scheme for organizing related pieces of information.
Decapitate splitting a list into its first element (the head) and a list containing the remaining
elements (the tail).
Exception a programming construct which allows the flow of execution to be altered by the
raising of an exception. Execution then continues at the most recently defined exception
handler capable of dealing with the exception. See section 1.5.3.
Flat container a non-hierarchical data structure representing a collection of values (ele-
ments). For example, arrays and lists.
CONTENTS xv
Fixed point an int and a (possibly implicit) scaling. Used to represent real-valued numbers
x E JR. approximately, with a constant absolute error.
Float a type which, in OCaml, represents a double-precision IEEE floating-point number.
Floating point a number representation commonly used to approximate real-valued numbers
x E R See section 4.1.2.
Folds a higher-order function which applies its function argument to an accumulator and
each element of a container. Introduced on page 36.
Function a mapping from input values to output values which may be described implicitly
as an algorithm or explicitly, e.g. by a pattern match.
Functional programming a style of programming in which a computation is performed by
composing the results of expressions without side-effects.
Functional language any programming language which allows functions to be passed as
arguments to other functions, returned as the result of functions and stored as values in
data structures.
Garbage collection the process of identifying data which are no longer accessible to a run-
ning program, destroying them and reclaiming the resources they required.
Generic programming the use of polymorphic functions and types.
Graph a data structure composed of vertices, and edges which link pairs of vertices.
Hash a constant-sized datum computed from an arbitrarily complicated value.
Hash table a data structure providing fast random access to its elements via their associated
hash values.
Head the element at the front of a list.
Higher-order function any function which accepts another function as an argument. For
example, f is a higher-order function in the definition f(g,x) = g(g(x)) because 9 must
be a function (as 9 is applied to x and then to g(x)).
Heterogeneous container a data structure capable of storing several elements of different
types. See section 3.8.
Homogeneous container a data structure (container) capable of storing several elements
of the same type. For example, an array of integers is a homogeneous container because
an array is a data structure containing elements of a single type, in this case integers.
Imperative programming a style of programming in which the result of a computation
is generated by statements which act by way of side-effects, as opposed to functional
programming.
xvi CONTENTS
Iteration a homonym with different meanings in different contexts and disciplines. In the
context of numerical algorithms, an "iterative algorithm" means an algorithm designed
to produce an approximate result by progressively converging on the solution. More
generally, the word iterative is often used to describe repetitive algorithms, where a
single repeat is known as an iteration.
Impure functional language a language, such as OCaml, which provides both functional
and imperative programming constructs.
Int a type which exactly represents a contiguous subset of the integers Z. See section 4.1.1.
10 input and output operations, such as printing to the screen or reading from disc.
Leaf in the context of tree data structures, a leaf node is a node containing no remaining
trees.
Lex converting a character stream into a token stream. For example, recognising the keywords
in a language before parsing them.
Linked List see list.
List a flat container providing prepend and decapitation operators in 0(1) time-complexity.
In OCaml, these are performed by the:: operator, known as the cons operator. A list
is traversed by repeated decapitation. See section 3.3.
Maps either a container or a higher-order function:
A data structure implementing a container which allows key-values pairs to be
inserted and keys to be subsequently mapped onto their corresponding values. See
sections 3.5 and 3.6.
A higher-order function map f {lo, .. , In-I} -+ {f(lo), ... , f(ln-I)} which acts
upon a container of elements to create a new container, the elements of which
are the result of applying the given function f to each element li in the given
container. Sometimes known as inner map.
Module a construct which encapsulates definitions in a structure and, optionally, allow the
externally-visible portion of the definitions to be restricted to a subset of the definitions
by way of a signature. See section 2.3.
Module signature a module interface, declaring the types, exceptions, variables and func-
tions which are to be accessible to code outside a module using the signature. See
section 2.3.1.
Module structure the body of a module, containing definitions of the constituent types,
exceptions, variables and functions which make up the module. See section 2.3.2.
Monomorphic a single, possibly not-yet-known, type. See section A.5.
Mutable can be altered.
CONTENTS XVll
Native code the result of compiling a program into the machine language (machine code) un-
derstood natively by the CPU. Section 1.4.3 describes how to compile OCaml programs
to native code.
Object-oriented programming the creation of objects, which encapsulate functions and
data, at run-time. In particular, the use of inheritance to specify relationships between
types of object.
Parse the act of understanding something formally. Parsing often refers to the recognition
of grammatical constructs. See section 5.4.2.
Partial specialisation the specialisation of a program or function to part of its input data.
For example, given a function to compute x
n
for any given floating-point number x and
integer n, generating a function to compute x
3
for any floating-point number x is partial
specialising the original to n = 3.
Pattern matching a construct in a programming language which allows patterns to be found
and extracted from data structures.
Persistence the ability to reuse old data structures without having to worry about undoing
state changes and unwanted interactions. An advantage of functional programming.
Platform a CPU architecture (e.g. ARM, MIPS, AMD) and operating system (e.g. IRIX,
Linux, Mac OS X).
Polymorphic one of any type. In particular, polymorphic functions are generic over the
types of at least one of their arguments. Variant types can be generic over polymorphic
type-arguments. See section 1.5.4.
Primitive operation a low-level function or operation, used to formulate the time-complexity
of an algorithm. See section 3.1.1.
Record a tuple with named fields. For example, a record of type:
{ x: float; y: float} can have a value { x=1. ; y=2. }.
Regular Expression a form of pattern matching.
Regexp common abbreviation of regular expression.
Root in the context data structures, the root is the origin of the data structure, from which
all other portions may be accessed.
Run-time while a program is being executed.
Side-effect any result of an expression apart from the value which the expression returns,
e.g. altering a mutable variable or performing 10.
Signature see Module signature.
Source code the initial, manually-entered form of a program. For example, the source code
to the FFTW library (for computing Fast Fourier Transforms) is written in OCaml.
This OCaml code can be compiled and run to generate C code which can, in turn, be
compiled into a library and linked into a final program.
xviii CONTENTS
Static typing completely type checking at compile-time such that no type checking is re-
quired at run-time.
Structure see Module structure.
Tail the remainder of a list without its front element.
Time-complexity complexity of the time taken to execute an algorithm, specified as the
number of times a set of primitive operations are performed.
Top-level an interactive OCaml interpreter started by running the ocaml program. See
section 1.4.1.
Tree a recursive data structure represented by nodes which may contain further trees. The
root node is the node from which all others may be reached. Leaf nodes are those which
contain no further trees. Trees are traversed by examining the child nodes of the current
node recursively.
Tuple a type representing elements in the set of the cartesian cross product of the sets of
types in the tuple. For example, the 2-tuple of floating-point numbers (x, y) ofthe type
float * float is typically used to represent the set lR x R
Type the set of possible values of a variable, function argument or result, or the mapping
between argument and result types of a function.
Variant type explicitly listed sets of possible values. See section 1.5.1.3.
Glossary of acronyms
AST Abstract-syntax tree
BNF Backus-Naur form
CAML Categorical abstract machine language
FFT Fast Fourier transform
FFTW Fastest Fourier transform in the west
GOE Gaussian orthogonal ensemble
INRIA Institut National de Recherche en Informatique et en Automatique
10 Input and output
LCF Logic of computable functions
MEM Maximum entropy method
ML Meta-language
OCaml Objective CAML
CONTENTS
00 Object-oriented
OOP Object-oriented programming
OpenGL Open graphics library
SGI Silicon Graphics Incorporated
VM Virtual machine
XML Extensible markup language
xix
xx CONTENTS
Chapter 1
Introduction
For the first time in history, and thanks to the exponential growth rate of computing power,
an increasing number of scientists are finding that more time is spent creating, rather than
executing, working programs. Indeed, much effort is spent writing small programs to automate
otherwise tedious forms of analysis. In the future, this imbalance will doubtless be addressed
by the adoption and teaching of more efficient programming techniques at the cost of less
efficient programs. An important step in this direction is the use of higher-level programming
languages, such as OCaml, in place of more conventional languages for scientific programming
such as Fortran, C, C++ and Java.
In this chapter, we shall begin by laying down some guidelines for good programming which
are applicable in any language before briefly reviewing the history of the OCamllanguage and
outlining some of the features of the language which enforce some of these guidelines and other
features which allow the remaining guidelines to be met. As we shall see, these aspects of the
design of OCaml greatly improve reliability and development speed. Coupled with the fact
that a freely available, efficient compiler already exists for this language, no wonder OCaml is
already being adopted by scientists of all disciplines.
1.1 Good programming style
Regardless of the choice of language, some simple, generic guidelines can be productively
adhered to. We shall now examine the most relevant such guidelines in the context of scientific
computing:
Avoid premature optimisation Programs should be written correctly first and optimised
last.
Structure programs Complicated programs should be hierarchically decomposed into pro-
gressively smaller, constituent components.
Factor programs Complicated or common operations should be factored out into separate
functions.
Explicit interfaces Interfaces should always be made as explicit as possible.
1
2 CHAPTER 1. INTRODUCTION
Avoid magic numbers Numeric constants should be defined once and referred back to,
rather than explicitly "hard-coding" their value multiple times at different places in a
program.
We shall now examine some of the ways OCaml can help in enforcing these guidelines and
how the OCaml compiler can exploit well-designed code.
1.2 A Brief History of OCaml
The Meta-Language (ML) was originally developed at Edinburgh University in the 1970's as
a language designed to efficiently represent other languages. The language was pioneered by
Robin Milner for the Logic of Computable Functions (LCF) theorem prover. The original ML,
and its derivatives, were designed to stretch theoretical computer science to the limit, yielding
remarkably robust and concise programming languages which can also be very efficient.
The Categorical Abstract Machine Language (CAML) was the acronym originally used to de-
scribe what is now known as the Caml family of languages. Gerard Huet designed and imple-
mented Caml at Institut National de Recherche en Informatique et en Automatique (INRIA)
in France, until 1994. Since then, development has continued as part of projet Cristal, now
led by Xavier Leroy.
Objective Caml (OCamI
1
) is the current flagship language of projet Crista!. The Cristal group
have produced freely available tools for this language. Most notably, an interpreter which runs
OCaml code in a virtual machine (VM) and two compilers, one which compiles OCaml to a
machine independent byte-code which can then be executed by a byte-code interpreter and
another which compiles OCaml directly to native code. At the time of writing, the native-code
compiler is capable of producing code for Alpha, Sparc, x86, MIPS, HPPA, PowerPC, ARM,
ia64 and x86-64 CPUs and the associated run-time environment has been ported to the Linux,
Windows, MacOS X, BSD, Solaris, HPUX, IRIX and Tru64 operating systems.
1.3 Benefits of OCaml
Before delving into the syntax of the language itself, we shall list the main, advantageous
features offered by the OCamllanguage:
Safety OCaml programs are thoroughly checked at compile-time such that they are proven
to be entirely safe to run, e.g. a compiled OCaml program cannot segfault.
Functional Functions may be nested, passed as arguments to other functions and stored in
data structures as values.
Strongly typed The types of all values are checked during compilation to ensure that they
are well defined and validly used.
1Pronounced oh-camel.
1.4. RUNNING OCAML PROGRAMS 3
Statically typed Any typing errors in a program are picked up at compile-time by the
compiler, instead of at run-time as in many other languages.
Type inference The types of values are automatically inferred during compilation by the
context in which they occur. Therefore, the types of variables and functions in OCaml
code does not need to be specified explicitly, dramatically reducing source code size.
Polymorphism In cases where any of several different types may be valid, any such type
can be used. This greatly simplifies the writing of generic, reusable code.
Pattern matching Values, particularly the contents of data structures, can be matched
against arbitrarily-complicated patterns in order to determine the appropriate action.
Modules Programs can be structured by grouping their data structures and related functions
into modules.
Objects Data structures and related functions can also be grouped into objects (object-
oriented programming).
Separate compilation Source files can be compiled separately into object files which are
then linked together to form an executable. When linking, object files are automatically
type checked and optimized before the final executable is created.
1.4 Running OCaml programs
OCaml provides three different ways to execute code. We shall now examine each of these
three approaches, explaining how code can be executed using them and noting their relative
advantages and disadvantages.
1.4.1 Top-level
The OCaml top-level interactively interprets OCaml code and is started by running the pro-
gram ocaml:
$ ocaml
Objective Caml version 3.08.0
#
OCaml code may then be entered at this # prompt, the end of which is delimited by ". For
example, the following calculates 1 +3 = 4:
# 1 + 3;;
- : int = 4
#
The top-level will also print the type of the result as well as its value (when the result has a
value). For example, the following defines a variable called sqr which is a function:
4
# let sqr x = x *. x;;
val sqr : float -> float = <fun>
#
CHAPTER 1. INTRODUCTION
This response indicates that a function called sqr has been defined which accepts a float
and returns a float. In general, the response of the top-level is either of the form:
- : type = value
or consisting of one or more descriptions of the form:
val name: type = value
where - indicates that a value has been returned but was not bound to a variable name, name
is the name of a variable which has been bound, type is the type of the value and value is the
value itself. Values are described explicitly for many data structures, such as 4 in the former
case, but several other kinds of value are simply classified, such as <fun> to indicate that the
value is a function in the latter case
2
.
Programs entered into the top-level execute almost as quickly as byte-code compiled pro-
grams (which is often quite a bit slower than native-code compiled programs). However, the
interactivity of the top-level makes testing the validity of code segments much easier.
In the remainder of this book, we shall write numerous code snippets in this style, as if they
had been entered into the top-level.
1.4.2 Byte-code compilation
When stored in a plain text file with the suffix ".ml", an OCaml program can be compiled to
a machine independent byte-code using the ocamlc compiler. For example, for a file "test.mI"
containing the code:
let _ =print_endline "Hello world!"
This file may be compiled at the Unix shell $ prompt into a byte-code executable called "test":
$ ocamlc test.ml -0 test
and then executed:
$ ./test
Hello world!
In this case, the result was to print the string "Hello world!" onto the screen. Byte-code
compilation is an adequate way to execute OCaml programs which do not perform intensive
computations. If the time taken to execute a program needs to be reduced then native-code
compilation can be used instead.
2 Abstract types are denoted <abstr>, as we shall see in chapter 2.
1.5. OCAML SYNTAX
1.4.3 Native-code compilation
5
The "test.ml" program could equivalently have been compiled to native code, creating a stand-
alone, native-code executable called "test", using:
$ ocamlopt test.ml -0 test
The resulting executable runs in exactly the same way:
$ ./test
Hello world!
Programs compiled to native code, particularly in the context of numerically intensive pro-
grams, can be considerably faster to execute.
1.5 OCaml syntax
Before we consider the features offered by OCamI, a brief overview of the syntax of the
language is instructive, so that we can provide actual code examples later. Other books give
more systematic, thorough and formal introductions to the whole of the OCamllanguage [1].
1.5.1 Language overview
In this section we shall evolve the notions of values, types, variables, functions, simple contain-
ers (lists and arrays) and program flow control. These notions will then be used to introduce
more advanced features in the later sections of this chapter.
When presented with a block of code, even the most seasoned and fluent programmer will not
be able to infer the purpose of the code. Consequently, programs should contain additional
descriptions written in plain English, known as comments. In OCaml, comments are enclosed
between (* and *). They may be nested, i.e. (* (* ... *) *) is a valid comment. Com-
ments are treated as whitespace, i.e. a (* ... *) b is understood to mean a b rather than
abo
Just as numbers can be defined to be members of sets such as integer (E Z), real (E JR.),
complex (E C) and so on, so values in programs are also defined to be members of sets. These
sets are known as types.
1.5.1.1 Types
Fundamentally, languages provide basic types and; often, allow more sophisticated types to
be defined in terms of the basic types. OCamI provides a number of built-in types: unit, int,
float, char, string and bool. We shall examine these built-in types before discussing the
compound tuple, record and variant types.
Only one value is of type unit and this value is written 0 and, therefore, conveys no informa-
tion. This is used to implement functions which require no input or expressions which return
no value. For example, a new line can be printed by calling the print_newline function as
print_newline O. This function requires no input, so it accepts a single argument 0 oftype
unit, and returns the value 0 of type unit.
Integers are written -2, -1, 0,1 and 2. Floating-point numbers are written -2., -1., -0.5,
0.,0.5,1. and 2 .. For example:
# 3;;
- : int = 3
# 5.;;
-: float=5.
Arithmetic can be performed using the conventional +, -, *, /, mod binary in:fi.x3 operators over
the integers
4
. For example, the following expression is evaluated according to usual mathe-
matical convention regarding operator precedence, with multiplication taking precedence over
addition:
# 1 * 2 + 2 * 3;;
- : int = 8
The floating-point infix functions have slightly different names, suffixed by a full-stop: +., - .,
*., /. as well as ** (raise to the power). For example, the following calculates (lx 2)+(2x 3) =
8:
# 1. *. 2. +. 2. *. 3.;;
- : float = 8.
The distinct names of the operators for different types arises as the most elegant solution to
allowing the unambiguous inference of types in the presence of different forms of arithmetic.
The definition of new operators is discussed later, in section A.3. In order to perform arithmetic
using mixed types, functions such as float_oCint can be used to convert between types.
Unlike other languages, OCaml is phenomenally pedantic about types. For example, the
following fails because * denotes the multiplication of a pair of integers and cannot, therefore,
be applied to a value of type float:
# 2 *
This expression has type float but is here used with type int
Note that the OCaml top-level underlines the erroneous portion of the code.
Explicitly converting the value of type float to a value of type int using the built-in function
int_oCfloat results in a valid expression which the top-level will execute:
# 2 * (int_of_float 2.);;
- : int = 4
3 An infix function is a function which is called with its name and arguments in a non-standard order. For
example, the arguments i and j of the conventional addition operator + appear on either side i +j.
4 As well as bit-wise binary infix operators lsI, lsr, asl, asr, land, lor and lxor described in the manual [2].
1.5. OCAML SYNTAX 7
In general, arithmetic is typically performed using a single number representation (e.g. either
int or float) and conversions between representations are, therefore, comparatively rare.
Single characters (of type char) are written in single quotes, e.g. 'a', which may also be
written using a 3-digit decimal code, e.g. '\097'.
Strings are written in double quotes, e.g. "Hello World!". Characters in a string of length
n may be extracted using the notation s. [i] for i E {O ... n - 1}. For example, the fifth
character in this string is "0":
# "Hello world!". [4J ; ;
- : char = '0'
The character at index i in a string s may be set to c using the notation s. [i] <- c.
A pair of strings may be concatenated using the - operator
5
;
# "Hello " "world!";;
- : string = "Hello world!"
Booleans are either true or false. Booleans are created by the usual comparison functions
=, <> (not equal to), <, >, <=, >=. These functions are polymorphic, meaning they may be
applied to pairs of values of the same type for any type
6
. The usual, short-circuit-evaluated
7
logical comparisons && and II are also present. For example, the following expression tests
that one is less than three and 2.5 is less than 2.7:
# 1 < 3 && 2. 5 < 2. 7 ; ;
- : bool = true
Values may be assigned, or bound, to names. As OCaml is a functional language, these values
may be expressions mapping values to values - functions. We shall now examine the binding
of values and expressions to variable and function names.
1.5.1.2 Variables and functions
Variables and functions are both defined using the let construct and must be given names
beginning with lower-case letters
8
. For example, the following defines a variable called a to
have the value 2;
# let a =2;;
val a : int = 2
5A list of strings may be concatenated more efficiently than repeated application of the operator by using
the String. concat function.
6Any attempt to evaluate a comparison function over a value which has the type of a function raises an
Invalid_argument exception at run-time.
7Short-circuit evaluation refers to the premature escaping of a sequence of operations (in this case, boolean
comparisons). For example, the expression false && expr need not evaluate expr as the result of the whole
expression is necessarily false due to the preceding false.
BIn particular, names may include the' character which provides an easy way to denote derivative functions,
as we shall see at the end of this chapter.
Note that the language automatically infers types. In this case, a has been inferred to be of
type into
Definitions using let can be defined locally using the syntax:
let name = exprl in expr2
This evaluates expr1 and binds the result to the variable name before evaluating expr2. For
example, the following evaluates a
2
in the context a = 3, giving 9:
# let a =3 in a * a; ;
- : int = 9
Note that the value 3 bound to the variable a in this example was local to the expression a *
a and, therefore, the global definition of a is still 2:
# a;;
- : int = 2
More recent definitions shadow previous definitions. For example, the following supersedes
the definition a = 2 with a = a x a in order to calculate 2 x 2 x 2 x 2 = 16:
# let a = 2 in
let a = a * a in
a * a;;
- : int = 16
As OCaml is a functional language, values can be functions and variables can be bound to
them in exactly the same way as we have just seen. Specifically, function definitions append
a list of arguments between the name of the function and the = in the let construct. For
example, a function called sqr which accepts an argument called x and returns x * x may be
defined as:
# let sqr x = x * x; ;
val sqr : int -> int = <fun>
In this case, the use of the integer multiply * results in OCaml correctly inferring the type of
sqr to be int -> int, Le. the sqr function accepts a value of type int and returns a value of
type into
The function sqr may then be applied to an integer as:
# sqr 5;;
- : int = 25
Typically, more sophisticated computations require the use of more complicated types. We
shall now examine the three simplest ways by which more complicated types may be con-
structed.
1.5. OCAML SYNTAX
1.5.1.3 Tuples, records and variants
9
Tuples are the simplest form of compound types, containing a fixed number of values which
may be of different types. The type of a tuple is written analogously to conventional set-
theoretic style, using * to denote the cartesian product between the sets of possible values
for each type. For example, a tuple of three integers, conventionally denoted by the triple
(i, j, k) E Z x Z x Z, can be represented by values (i, j, k) of the type int * int * int.
When written, tuple values are comma-separated and enclosed in parentheses. For example,
the following tuple contains three different values of type int:
# 0, 2,3);;
- : int * int * int = (1, 2, 3)
A tuple containing n values is described as an n-tuple, e.g. the tuple (1, 2, 3) is a 3-tuple.
Records are essentially tuples with named components, known as fields. Records and, in
particular, the names of their fields must be defined using a type construct before they can
be used. When written, record fields are written name : type where name is the name of the
field (which must start with a lower-case letter) and type is the type of values in that field,
and are semicolon-separated and enclosed in curly braces. For example, a record containing
the x and y components of a 2D vector could be defined as:
# type vec2 = { x:float; y:float };;
type vec2 = { x: float; y: float}
A value of this type representing the zero vector can then be defined using:
# let zero ={ x=o. ; y=O. };;
val zero : vec2 = {x = O. ; y = O.}
Note that the use of a record with fields x and y allowed OCaml to infer the type of zero as
vec2.
Whereas the tuples are order-dependent, i.e. (1,2) i= (2,1), the named fields of a record may
appear in any order, i.e. {x = 1,y = 2} == {y = 2,x = 1}. Thus, we could, equivalently, have
provided the x and y fields in reverse order:
# let zero = { y=O.; x=o. };;
val zero : vec2 = {x = O. ; y = O.}
The fields in this record can be extracted individually using the notation record .field where
record is the name of the record and field is the name of the field within that record. For
example, the x field in the variable zero is 0:
# zero .x;;
-:float=O.
Also, a shorthand with notation exists for the creation of a new record from an existing record
with a single field replaced. This is particularly useful when records contain many fields. For
example, the record {x=l.; y=O.} may be obtained by replacing the field x in the variable
zero with 1:
10
# let x_axis = { zero with x=l. };;
val x_axis : vec2 = {x = 1. ; Y = 0 .}
Although OCaml is a functional language, OCaml does support imperative programming.
Fundamentally, record fields can be marked as mutable, in which case their value may be
changed. For example, the type of a mutable, two-dimensional vector called vee2 may be
defined as:
# type vec2 = { mutable x: float; mutable y: float}; ;
type vec2 = {mutable x : float; mutable y : float; }
A value r of this type may be defined:
# let r = { x=l .; y=2. };;
val r : vec2 = {x = 1. ; Y = 2.}
The x-coordinate of the vector r may be altered in-place using an imperative style:
# r.x <- 3.;;
- : unit = 0
The side-effect of this expression has mutated the value of the variable r, the x-coordinate of
which is now 3 instead of 1:
# r;;
- : vec = {x = 3.; Y = 2.}
However, a record with a single, mutable field can often be useful. This data structure, called
a reference, is already provided by the type ref. For example, the following defines a variable
named a which is a reference to the integer 2:
# let a = ref 2;;
val a : int ref = {contents = 2}
The type of a is then int ref. The value referred to by a may be obtained using ! a:
# !a;;
- : int = 2
The value of a may be set using: =:
# a := 3;
- : unit = 0
# a;;
- : int ref = {contents = 3}
In the case of references to integers, two additional functions are provided, iner and deer,
which increment and decrement references to integers, respectively:
1.5. OCAML SYNTAX
#incra;;
- : unit = 0
# a;;
val a : int ref = {contents = 4}
11
The types of values stored in tuples and records are defined at compile-time. OCaml com-
pletely verifies the correct use of these types at compile-time. However, this is too restrictive
in many circumstances. These requirements can be slightly relaxed by allowing a type to
be defined which can acquire one of several possible types at run-time. These are known as
variant types. OCaml still verifies the correct use of variant types as far as is theoretically
possible.
Variant types are defined using the type construct with the possible constituent types referred
to by constructors (the names of which must begin with upper-case letters) separated by the
I character. For example, a variant type named button which may adopt the values On or
Off may be written:
# type button = On I Off;;
type button = On I Off
The constructors On and Off may then be used as values of type button:
# On;;
- : button = On
# Off;;
- : button = Off
In this case, the constructors On and Off convey no information in themselves (i.e. like the
type uni t, On and Off do not carry data) but the choice of On or Off does convey information.
Note that both expressions were correctly inferred to have results of type button.
More usefully, constructors may take arguments, allowing them to convey information by
carrying data. The arguments are defined using of and are written in the same form as
that of a tuple. For example, a replacement button type which provides an On constructor
accepting two arguments may be written:
# type button = On of int * string I Off;;
type button = On of int * string I Off
The On constructor may then be used to create values of type button by appending the
argument in the style of a tuple:
# On (1, "mine");;
- : button = On (1, "mine")
# On (2, "hers");;
- : button = On (2, "hers")
# Off;;
- : button = Off
Types can also be defined recursively, which is very useful when defining more sophisticated
data structures, such as trees. For example, a binary tree contains either zero or two binary
trees and can be defined as:
# type binary_tree = Leaf I Node of binary_tree * binary_tree;;
type binary_tree = Leaf I Node of binary_tree * binary_tree
A value of type binary_tree may be written in terms of these
# Node (Node (Leaf, Leaf), Leaf);;
- : binary_tree = Node (Node (Leaf, Leaf), Leaf)
Of course, we could also place data in the nodes to make a more useful data structure. This
line of thinking will be pursued in chapter 3. In the mean time, let us consider two special
data structures which have notations built into the language.
1.5.1.4 Lists and arrays
Lists are written [a; b; c] and arrays are written [I a; b; c I] . As we shall see in chapter 3,
lists and arrays have different merits.
The types of lists and arrays of integers, for example, are written int list and int array,
respectively:
# [1; 2; 3J ; ;
- : int list = [1; 2; 3J
# [11; 2; 3IJ;;
- : int array = [11; 2; 3 IJ
In the case of lists, the infix cons operator :: provides a simple way to prepend an element
to the front of a list. For example, prepending 1 onto the list [2; 3] gives the list [1; 2; 3]:
# 1 :: [2; 3J;;
- : int list = [1; 2; 3J
In the case of arrays, the notation array. (i) may be used to extract the i +1
th
element. For
example, [11; 2; 3 I] . (1) gives the second element 2:
# [11; 2; 31J .(1);;
- : int = 2
Also, a short-hand notation can be used to represent lists or arrays of tuples by omitting the
parentheses. For example, [(a, b); (c, d)] may be written [a, b; c, d]:
# [1, 2; 3, 4J ; ;
- : (int * int) list = [(1, 2); (3, 4)J
# [11,2; 3, 4IJ;;
- : (int * int) array = [I (1,2); (3,4) IJ
The use and properties of lists, arrays and several other data structures will be discussed in
chapter 3. In the mean time, we shall examine programming constructs which allow more
interesting computations to be performed.
1.5. OCAML SYNTAX
1.5.1.5 If
13
Like many other programming languages, OCaml provides an if construct which allows a
boolean "predicate" expression to determine which of two expressions is evaluated and re-
turned, as well as a special if construct which optionally evaluates an expression of type
unit:
if exprl then expr2
if exprl then expr2 else expr3
In both cases, expr1 must evaluate to a value of type bool. In the former case, expr2 is
expected to evaluate to the value of type unit. In the latter case, both expr2 and expr3 must
evaluate to values of the same type.
The former evaluates the boolean expression exprl and, only if the result is true, evaluates
the expression expr2. Thus, the former is equivalent to:
if exprl then expr2 else ()
The latter similarly evaluates expr1 but returning the result of either expr2, if expr1 evaluated
to true, or of expr3 otherwise.
For example, the following function prints "Less than three" if the given argument is less than
three:
# let f x = if x < 3 then print_endline "Less than three";;
val f : int -> unit:::: <fun>
# f 1;;
Less than three
- : unit:::: ()
# f 5;;
- : unit = 0
The following function returns the string "Less" if the argument is less than 3 and "Greater"
otherwise:
# let f x = if x < 3 then "Less" else "Greater"; ;
val f : int -> string:::: <fun>
# f 1;;
- : string = "Less"
# f 5;;
- : string:::: "Greater"
The parts of the language we have covered can already be used to write some interesting
programs. However, attention should be paid to the way in which programs are constructed
from these parts.
1.5.1.6 Program composition
As we have seen, program segments may be written in the top-level which replies by reciting
the automatically inferred types and executing expressions. However, the ;; used to force
the top-level into producing output is not necessary in programs compiled with ocamlc and
ocamlopt. For example, the two previous functions can be defined simultaneously, with only
a single ;; at the end:
# let f1 x = if x < 3 then print_endline "Less than three"
let f2 x = if x < 3 then "Less" else "Greater" ; ;
val f1 : int -> unit = <fun>
val f2 : int -> string = <fun>
Note that OCaml has determined that this input corresponds to two separate function defini-
tions. In fact, when written for the ocamlc or ocamlopt compilers, programs can be written
entirely without ;;, such as:
let f1 x = if x < 3 then print_endline "Less than three"
let f2 x = if x < 3 then "Less" else "Greater"
As we have seen, expressions which act by way of a side-effect (such as printing) produce the
value 0 of type unit. Many situations require a sequence of such expressions to be evaluated.
Expressions of type unit may be concatenated into a compound expression by using the;
separator. For example, a function to print "A", "B" and then "C" on three separate lines could
be written:
# let f () =
print_endline "A";
print_endline "B";
print_endline "G";;
val f : unit -> unit = <fun>
Note that there is no final;, only the delimiting; ;, so the value 0 of type unit produced
by the final call to print_endline is returned by our f function.
1.5.1.7 More about functions
Functions can also be defined anonymously, known as A-abstraction in computer science.
For example, the following defines a function f(x) = x x x which has a type representing
9
f ~
# fun X -> X * x; ;
- : int -> int = <fun>
This is an anonymous equivalent to the sqr function defined earlier. The type of this expression
is also inferred to be int -> into This anonymous function can be called as if it were the
name of a conventional function. For example, applying the function f to the value 2 gives
2 x 2 = 4:
9We say "representing" because the OCaml type int is, in fact, a finite subset of Z, as we shall see in
chapter 4.
1.5. OCAML SYNTAX
# (fun x -> x * x) 2;;
val: int::: 4
Consequently, we could have defined sqr equivalently as:
# let sqr = fun x -> x * x;;
val sqr : int -> int = <fun>
15
Once defined, this version of the sqr function is indistinguishable from the original.
The let ... in construct allows definitions to be nested, including function definitions. For
example, the following function ipow3 raises a given int to the power three using a sqr
function nested within the body of the ipow3 function:
# let ipow3 x =
let sqr x = x * x in
x * sqr x;;
val ipow3 : int -> int = <fun>
Note that the function application sqr x takes precedence over the multiplication.
The let construct may also be used to define the elements of a tuple simultaneously. For
example, the following defines two variables, a and b, simultaneously:
# let (a, b) = (3,4);;
val a : int::: 3
val b : int::: 4
This is particularly useful when factoring code. For example, the following definition of the
ipow4 function contains an implementation of the sqr function which is identical to that in
our previous definition of the ipow3 function:
# let ipow4 x:::
let sqr x ::: X * x in
(sqr x) * (sqr x);;
Just as common subexpressions in a mathematical expression can be factored, so the ipow3
and ipow4 functions can be factored by sharing a common sqr function and returning the
ipow3 and ipow4 functions simultaneously in a 2-tuple:
# let (ipow3, ipow4) =
let sqr x ::: X * x in
((fun x -> x * (sqr x)), fun X -> (sqr x) * (sqr x));;
val ipow3 : int -> int ::: <fun>
Factoring code is an important way to keep programs manageable. In particular, programs
can be factored much more aggressively in the presence of higher-order functions - something
which can be done in OCaml but not Java, C++ or Fortran. We shall discuss such factoring
of OCaml programs as a means of code structuring in chapter 2. In the mean time, we shall
examine functions which perform computations by applying themselves.
As we have already seen, variable names in variable and function definitions refer to their
previously defined values. This default behaviour can be overridden using the ree keyword,
which allows a variable definition to refer to itself. This is necessary to define a recursive
function
10
. For example, the following implementation of the ipow function, which computes
n
m
for n, m 2:: 0 E Z, calls itself recursively with smaller m to build up the result until the
base-case n
m
= 1 for m = 0 is reached:
# let ree ipow n m:= if m:= 0 then 1 else n * ipow n (m - 1);;
val ipow : int -> int -> int := <fun>
For example, 2
16
= 65,536:
# ipow 2 16;;
- : int:= 65536
All of the programming constructs we have just introduced may be structured into modules.
1.5.1.8 Modules
In OCaml, modules are the most commonly used means to encapsulate related definitions. For
example, many function definitions pertaining to lists are encapsulated in the List module.
Visible definitions in modules may be referred to by the notation module. name where module
is the name of the module and name is the name of the type or variable definition. For
example, the List module contains a function length which returns the number of elements
in the given list:
# List . length ["one", "two", "three"];;
- : int:= 3
The Pervasives module contains many common definitions, such as sqrt, and is automati-
cally opened before a program starts so these definitions are available immediately.
The OCaml module system and program structuring in general are examined in chapter 2.
We shall now examine some of the more advanced features of OCaml in more detail.
1.5.2 Pattern matching
As a program is executed, it is quite often necessary to choose the future course of action based
upon the value of a previously computed result. As we have already seen, a two-way choice can
be implemented using the if construct. However, the ability to choose from several different
possible actions is often desirable. Although such cases can be reduced to a series of if tests,
languages typically provide a more general construct to compare a result with several different
possibilities more succinctly, more clearly and sometimes more efficiently than manually nested
10A recursive function is a function which calls itself, possibly via other functions.
1.5. OCAML SYNTAX 17
ifs. In Fortran, this is the SELECT CASE construct. In C and C++, it is the switch case
construct.
Unlike conventional languages, OCaml allows the value of a previous result to be compared
against various patterns - pattern matching. As we shall see, this approach is considerably
more powerful than the conventional approaches.
The most common pattern matching construct in OCaml is in the mat ch ... with ... expres-
sion:
match expr with
patternl -> exprl
pattern2 -> expr2
pattern3 -> expr3
This evaluates expr and compares the resulting value firstly with pattern1 then with pat-
tern2 and so on, until a pattern is found which matches the value of expr, in which case
the corresponding expression (e.g. expr2) is evaluated and returned. A pattern is an expres-
sion composed of constants and variable names. When a pattern matches an argument, the
variables are bound to values of the corresponding expressions.
Patterns may contain arbitrary data structures (tuples, records, variant types, lists and arrays)
and, in particular, the cons operator:: may be used in a pattern to decapitate a list. Also,
the pattern _ matches any value without assigning a name to it. This is useful for clarifying
that part of a pattern is not referred to in the corresponding expression.
For example, the following function compares its argument with several possible patterns of
type int, returning the expression of type string corresponding to the pattern which matches:
# let f i = match i with
o -> "Zero"
3 -> "Three"
-> "Neither zero nor three";;
Applying this function to some expressions of type int demonstrates the functionality of the
match construct:
# f 0;;
- : string = "Zero"
# f 1;;
- : string = "Neither zero nor three"
# f (1 + 2);;
- : string = "Three"
As pattern matching is such a fundamental concept in OCaml programming, we shall provide
several more examples using pattern matching in this section.
A function is_empty_list which examines a given list and returns true if the list is empty
and false if the list contains any elements, may be written without pattern matching by
simply testing equality with the empty list:
18
# let is_empty_list 1 =
1 = [J;;
val is_empty_list: 'a list -> bool = <fun>
Using pattern matching, this example may be written using the match ... with ... construct
as;
# let is_empty_list 1 = match 1 with
[J -> true
I _ -> false;;
Note the use of the anonymous _ pattern to match any value, in this case accounting for all
other possibilities.
The is_empty_list function can also be written using the function ... construct, used to
create one-argument A-functions which are pattern matched over their argument:
# let is_empty_list = function
[J -> true
I _ -> false;;
In general, functions which pattern match over their last argument may be rewritten more
succinctly using function. Let us now consider some additional sophistication supported by
OOaml's pattern matching.
1.5.2.1 Guarded patterns
Patterns can also have arbitrary tests associated with them, written using the when construct.
Such patterns are referred to as guarded patterns and are only allowed to match when the
associated boolean expression evaluates to true. For example, the following function evaluates
to true only for lists which contain three integers, i, j and k, satisfying the equality i-j-k = 0:
# let f =function
[i; j; kJ when i - j - k =0 -> true
I _ -> false;;
val f : int list -> bool = <fun>
# f [2; 3J;;
- : bool = false
# f [5; 2; 3J;;
- : bool = true
# f [1; 2; 3J;;
- : bool = false
Subsequent patterns sharing the same variable bindings and corresponding expression may be
written in the short-hand notation:
match ... with
pattern1 I pattern2 I ... -> ...
I ...
For example, the following function returns true if the given integer is in the set {-I, 0, I}
and false otherwise:
# let is_sign = function
-1 I I 1 - > true
I _ -> false;;
val is_sign: int -> bool = <fun>
The sophistication provided by pattern matching may be misused. Fortunately, the OCaml
compilers go to great lengths to enforce correct use, even brashly criticising the programmers
style when appropriate.
1.5.2.2 Erroneous patterns
Sequences of patterns which match to the same corresponding expression are required to share
the same set of variable bindings. For example, although the following function makes sense to
a human, the OCaml compilers object to the patterns (a, 0.) and (0. , b) binding different
sets of variables ({a} and {b}, respectively):
# let product a b =match (a, b) with
lli 0.) 1 b) -> 0.
I (a, b) -> a*. b;;
Variable a must occur on both sides of this I pattern
In this case, this function could be corrected by using the anonymous _ pattern as neither a
nor b is used in the first case:
# let product a b =match (a, b) with
C,O.) 1(0.,_)->0.
I (a,b)->a*.b;;
val product : float -> float -> float = <fun>
This actually conveys useful information about the code. Specifically, that the values matched
by _ are not used in the corresponding expression.
OCaml uses type information to determine the possible values of expression being matched
over. If the set of pattern matches fails to cover all of the possible values of the input then,
at compile-time, the compiler emits:
Warning: this pattern-matching is not exhaustive
followed by examples of values which could not be matched. If a program containing such pat-
tern matches is executed and no matching pattern is found at run-time then the Mat ch_f ail ure
exception is raised. Exceptions will be discussed in section 1.5.3.
For example, in the context of the variant type:
# type int_option =None I Some of int; ;
type int_option =None I Some of int
The OCaml compiler will warn of a function matching only Some ... values and neglecting
the None value:
# let extract = function Some i -> .!; ;
Warning: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
None
val extract: int_option -> int = <fun>
This extract function then works as expected on Some '" values:
# extract (Some 3);;
- : int = 3
but causes a Match_failure exception to be raised at run-time if a None value is given, as
none of the patterns in the pattern match of the extract function match this value:
# extract None;;
Exception: Match_failure ("", 5, -40).
As some approaches to pattern matching lead to more robust programs, some notions of good
and bad programming styles arise in the context of pattern matching.
1.5.2.3 Good style
The compiler cannot prove
l1
that any given pattern match covers all eventualities in the
general case. Thus, some style guidelines may be productively adhered to when writing pattern
matches, to aid the compiler in its proofs:
Guarded patterns should be used only when necessary. In particular, in any given
pattern matching, at least one pattern should be unguarded.
Unless all eventualities are clearly covered (such as [] and h: : t which, between them,
match any list) the last pattern should be general.
As proof generation cannot be automated in general, the OCaml compilers do not try to prove
that a sequence of guarded patterns will match all possible inputs. Instead, the programmer
is expected to adhere to a good programming style, making the breadth of the final match
explicit by removing the guard. For example, the OCaml compilers do not prove that the
following pattern match covers all possibilities:
# let sign =function
i when i O. ->.::1.
I O. -> Q
I i when i O. -> 1; ;
1.
(However, some guarded clause may match this value.)
val sign: float -> int = <fun>
llIndeed, it can be proven that the act of proving cannot be automated in the general case.
1.5. OCAML SYNTAX
In this case, the function should have been written without the guard on the last pattern:
# let sign'" function
i when i < O. ->-1
I O. -> 0
I _ -> 1;;
val sign : float -> int = <fun>
21
Also, the OCaml compilers will try to determine any patterns which can never be matched.
If such a pattern is found, the compiler will emit a warning. For example, in this case the
first match accounts for all possible input values and, therefore, the second match will never
be used:
# let product a b = match (a, b) with
(a, b) -> a *. b
I
Warning: this match case is unused.
val product : float -> float -> float = <fun>
When matching over the constructors of a type, all eventualities should be caught explicitly,
Le. the final pattern should not be made completely general. For example, in the context of
a type which can represent different number representations:
# type number'" Integer of int I Real of float; ;
type number = Integer of int I Real of float
A function to test for equality with zero could be written in the following, poor style:
# let bad_is_zero = function
Integer 0 -> true
I Real O. -> true
I _ -> false;;
val bad_is_zero : number -> bool = <fun>
When applied to various values of type number, this function correctly acts a predicate to test
for equality with zero:
# bad_is_zero (Integer (-1;;
- : bool = false
# bad_is_zero (Integer 0);;
- : bool =true
# bad_is_zero (Real 0.);;
- : bool =true
# bad_is_zero (Real 2.6);;
- : bool = false
Although the bad_is_zero function works in this case, a better style would be to extract the
numerical values from the constructors and test their equality with zero, avoiding the final
catch-all case in the pattern match:
22
# let good_is_zero = function
Integer i -> i = 0
I Real x -> x = O. ; ;
val good_is_zero : number -> bool = <fun>
Not only is the latter style more concise but, more importantly, this style is more robust. For
example, if whilst developing our program, we were to supplement the definition of our number
type with a new representation, say of the complex numbers z = x +iy E C:
# type number = Integer of int I Real of float I Complex of float * float;;
type number = Integer of int I Real of float I Complex of float * float
the bad_ is_zero function, which is written in the poor style, would compile without warning
to give incorrect functionality:
# let bad_is_zero =function
Integer 0 -> true
I Real O. -> true
I _ -> false;;
val bad_is_zero : number -> bool = <fun>
Specifically, this function treats all values which are not zero-integers and zero-reals as being
non-zero. Thus, zero-complex z = 0 + Oi is incorrectly deemed to be non-zero:
# bad_is_zero (Complex (0., 0.));;
- : bool = false
In contrast, the good_is_zero function, which was written using the good style, would allow
the compiler to spot that part of the number type was no longer being accounted for in the
pattern match:
Integer i -> i =- Q
I Real x -> !. =- ;
Complex <-, _)
The programmer could then supplement this function with a case for complex numbers:
Integer i -> i = 0
I Real x -> x = o.
I Complex (x, y) -> X = O. && Y= O. ;;
The resulting function would then provide the correct functionality:
# good_is_zero (Complex (0., 0.));;
- : bool = true
Clearly, the ability have such safety checks performed at compile-time can be very valuable.
This is another important aspect of safety provided by the OCamllanguage, which results in
considerably more robust programs.
Due to the ubiquity of pattern matching in OCaml programs, the number and structure
of pattern matches can be non-trivial. In particular, patterns may be nested and may be
performed in parallel.
1.5.2.4 Nested patterns
In some cases, nested pattern matches may be desirable. Inner pattern matches may be
bundled either into parentheses (. . .) or, equivalently, into a begin ... end construct. When
split across multiple lines, the begin ... end construct is the conventional choice. For example,
the following function tests equality between two values of type number
12
:
# let number_equal a b =match a with
Integer i ->
begin
match b with
Integer j when i = j -> true
I Complex (x, 0.) I Real x when x = float_of_int i -> true
I Integer _ I Real _ I Complex _ -> false
end
Real x I Complex (x, 0.) ->
begin
match b with
Integer i when x = float_of_int i -> true
I Complex (y, 0.) I Real y when x =y -> true
end
Complex (xi, yi) ->
begin
match b with
Complex (x2, y2) when xi = x2 && yi =y2 -> true
end; ;
val number_equal: number -> number -> bool = <fun>
In many cases, nested patterns may be written more succinctly and, in fact, more efficient,
when presented as a single pattern match which matches different values simultaneously.
1.5.2.5 Parallel pattern matching
In many cases, nested pattern matches may be combined into a single pattern match. This
functionality is often obtained by combining variables into a tuple which is then matched over.
This is known as parallel pattern matching. For example, the previous function could have
been written:
12Note that the built-in, polymorphic equality = could be used to compare values of type number but this
would perform a structural comparison rather than a numerical comparison, e.g. the expression Real 1. =
Complex (1., 0.) evaluates to false.
# let number_equal a b = match (a, b) with
(Integer i, Integer j) -> i = j
(Integer i, (Real x I Complex (x, 0.)))
Real x I Complex (x, 0.)), Integer i) -> X = float_of_int i
Real xi I Complex (xi, 0.)), (Complex (x2, 0.) I Real x2)) -> xi = x2
Integer _ I Real _), Complex _)
(Complex _, (Integer _ I Real _)) -> false
I (Complex (xi, yi), Complex (x2, y2)) -> xi = x2 && yi = y2;;
val number_equal: number -> number -> bool = <fun>
As a core feature of the OCamllanguage, pattern matching will be used extensively in the re-
mainder ofthis book, particularly when dissecting data structures in chapter 3. One remaining
form of pattern matching in OCaml programs appears in the handling of exceptions.
1.5.3 Exceptions
In many programming languages, program execution can be interrupted by the raising
13
of an
exception. This is a useful facility, typically used to handle problems such as failing to open a
file or an unexpected flow of execution (e.g. due to a program being given invalid input) but
exceptions are also useful as an efficient means to escape a computation, as we shall see in
section 7.3.3.3.
Like a variant constructor in oCaml, the name of an exception must begin with a capital
letter and an exception mayor may not carry an associated value. Before an exception can be
used, it must declared. An exception which does not carry associated data may be declared
as:
exception Name
An exception which carries associated data of type type may be declared:
exception Name of type
Exceptions are raised using the raise construct. For example, the following raises a built-in
exception called Failure which carries a string:
# raise (Failure "My problem");;
Exception: Failure "My problem".
Exceptions may also be caught using the syntax:
try
expr
with
patternl - > expr1
pattern2 -> expr2
pattern3 -> expr3
13Sometimes known as throwing an exception, e.g. in the context of the C++ language.
where expr is evaluated and its result returned if no exception was raised. If an exception was
raised then the exception is matched against the patterns and the value of the corresponding
expression (if any) is returned instead.
Note that, unlike other pattern matching constructs, patterns matching over exceptions need
not account for all eventualities - any uncaught exceptions simply continue to propagate.
For example, an exception called ZeroLength, which does not carry associated data, may be
declared as:
# exception ZeroLength;;
exception ZeroLength
A function to normalise a 2D vector r = (x, y) to create a unit 2D vector r = r/ IrI, catching
the erroneous case of a zero-length vector, may then be written:
# let norm (x, y) =
let 1 = sqrt (x*.x +. y*.y) in
if 1 = O. then raise ZeroLength else
let il = 1. I. 1 in
(il* .x, iH.y);;
val norm: float * float -> float * float = <fun>
Applying the norm function to a non-zero-length vector produces the correct result to within
numerical error (a subject discussed in chapter 4):
# norm (3., 4.);;
- : float * float = (0.600000000000000089, 0.8)
Applying the norm function to the zero vector raises the ZeroLength exception:
# norm (0., 0.);;
Exception: ZeroLength.
A "safe" version of the norm function might catch this exception and return some reasonable
result in the case of a zero-length vector:
# let safe_norm r =try norm r with ZeroLength -> (0., 0.);;
val safe_norm: float * float -> float * float = <fun>
Applying the safe_normfunction to a non-zero-length vector causes the result of the expression
norm r to be returned:
# safe_norm (3. , 4.); ;
- : float * float = (0.600000000000000089, 0.8)
However, applying the safe_norm function to the zero vector causes the norm function to raise
the ZeroLength exception which is then caught within the safe_norm function which then
returns the zero vector:
26
# safe_norm (0.,0.);;
-: float * float = (0.,0.)
The use of exceptions to handle unusual occurrences, such as in the safe_norm function,
is one important application of exceptions. This functionality is exploited by many of the
functions provided by the core OCaml library, such as those for handling files (discussed in
chapter 5). The safe_norm function is a simple example using exceptions which could have
been written using an if expression. However, exceptions are much more useful in more
complicated circumstances, where many if expressions would be required in order to achieve
the same effect.
Another important application is the use of exceptions to escape computations. The usefulness
of this way of exploiting exceptions cannot be fully understood without first understanding
data structures and algorithms and, therefore, this topic will be discussed in much more detail
in chapter 3 and again, in the context of performance, in chapter 7.
The Pervasives module defines two exceptions, Invalid_argument and Failure, as well as
two functions which simplify the raising of these exceptions. Specifically, the invalid_arg
and failwi th functions raise the Invalid_argument and Failure exceptions, respectively,
using the given string.
Support for exceptions is not uncommon in modern languages. However, the automatic gen-
eralisation of functions over all types of data for which they are valid is rather unusual and is
discussed next.
1.5.4 Polymorphism
As we have seen, OCaml will infer types in a program. But what if a specific type cannot be
inferred? In this case, OCaml will create a polymorphic function which can act on any suitable
type. For example, the following defines a higher-order function f which accepts function g
and a value x, and applies g to the result of applying g to x:
# let f g x = g (g x);;
val f : (' a -> 'a) -> 'a -> 'a = <fun>
Note that OCaml uses the notation 'a (conventionally written a) when writing the type of
the function f. This OCaml function may then be used for any type 'a. For example, the
following uses the polymorphic function to calculate 24, first using the type int and then
using the type float:
# f (fun X -> x * x) 2;;
- : int = 16
# f (fun X -> x *. x) 2. ; ;
- : float = 16.
Types may be constrained by specifying types in a definition using the syntax (expr : type).
For example, specifying the type of the argument x to be a floating-point value in the definition
of the function f results in OCaml inferring all of the previously polymorphic types to be fl oat:
1.5. OCAML SYNTAX
# let f g ex : float) = g (g x);;
val f : (float -> float) -> float -> float = <fun>
Although omitting the brackets results in the same types being inferred in this case:
# let f g x : float =g (g x);;
val f : (float -> float) -> float -> float = <fun>
27
The syntax of this latter form actually constrains the return type of the function f to be
float, rather than constraining the type of the last argument, as in the former example.
Variant types may contain polymorphic types, in which case the name of the variant type
must be preceded by its polymorphic type arguments. For example, the polymorphic option:
# type 'a option = None I Some of 'a
type 'a option = None I Some of 'a
can have values such as Some 1 or Some 2 (for which the type is written int option) and the
value None (for which the type defaults to 'a option). For example, this 3-tuple is allowed to
have elements of different types:
# (Some 1, Some 2, None);;
- : int option * int option * 'a option = (Some 1, Some 2, None)
In contrast, the elements of a list must all be of the same type and, therefore, a None presented
as an alternative to a Some 1 will be inferred to be of type int option:
# [Some 1; Some 2; None] ; ;
- : int option list = [Some 1; Some 2; None]
The polymorphic option type 'a opt ion is actually already in the Pervasives module.
Many polymorphic functions are provided by the language. Most notably the comparison
operators =, <>, <, >, <=, >=. Also the total ordering function compare which provides con-
ventional ordering over the int and float types as well as lexicographic ordering over lists,
arrays and strings:
{
-1
compare ab =
# compare 1 2;;
- : int = -1
# compare "slug" "plug";;
- : int = 1
ab
The min and max functions use polymorphic comparison to find the smaller and larger of two
given arguments, respectively:
# max 1 2;;
- : int = 2
# min "slug" "plug";;
- : string = "plug"
Before completing this introduction to OCaml, we have one remaining exotic topic to cover.
1.5.5 Currying
A curried function is a function which returns a function as its result. Curried functions axe
best introduced as a more powerful alternative to the conventional (non-curried) functions
provided by imperative programming languages.
Effectively, imperative languages only allow functions which accept a single value (often a
tuple) as an axgument. For example, a raise-to-the-power function for integers would have to
accept a single tuple as an axgument which contained the two values used by the function:
# let rec ipow (x, n) = if n = 0 then 1. else x *. ipow (x, n - 1);;
val ipow : float * int -> float = <fun>
But, as we have seen, OCaml also allows:
# let rec ipow x n = if n = 0 then 1. else x *. ipow x (n - 1);;
val ipow : float -> int -> float = <fun>
This latter approach is actually a powerful generalization of the former, only available in
functional programming languages.
The difference between these two styles is subtle but important. In the latter case, the type
can be understood to be:
val ipow : float -> (int -> float)
Le. this ipow function accepts an floating-point number and returns a function which raises
that number to the given power. A function which returns a function is referred to as a
curried function. As the curried style is more general than the non-functional style, functions
are written in curried form by default in functional languages.
Now that we have examined the syntax of the OCaml language, we shall explain why the
exotic programming styles offered by OCaml are highly relevant in the context of scientific
computing.
1.6 Functional vs Imperative programming
The vast majority of current programmers write in imperative languages and use an imperative
style. This refers to the use of statements or expressions which axe designed to act by way of
a side- effect.
For example, the following declaxes a mutable variable called x, executes a statement which
has the effect of modifying the value of x (squaring it) and then examines the resulting value
of x:
# let x = ref 2;;
val x : int ref = {contents = 2}
# x := !x * !x;;
- : unit = 0
# !x;;
- : int = 4
1.6. FUNCTIONAL VB IMPERATIVE PROGRAMMING 29
The only action of the statement "x : = ! x * ! x" is to modify the value of an existing variable.
This is its side-effect. In this case, the statement has no other effect and, consequently, returns
the value 0 of type uni t.
The functional equivalent to this imperative style is to define a new value (in this case, of the
same name so that the old value is superseded):
# let x = 2;;
val x : int = 2
# let x = x * x; ;
val x : int = 4
# x;;
- : int = 4
Purely functional programming has several advantages over imperative programming:
easier to determine variable values, as they cannot be altered
easier proofs of correctness
typically more concise in terms of the quantity of source code required to perform a
given task
the ability to reuse old data structures (known as persistence) without having to worry
about undoing state changes and unwanted interactions
trivial multi-threading of programs due to the lack of data structure interdependencies
OCaml supports both functional and imperative programming and, hence, is known as an
impure functional programming language. In particular, the OCaml core library provides
implementations of several imperative data structures (strings, arrays and hash tables) as well
as functional data structures (lists, sets and maps). We shall examine these data structures
in detail in chapter 3.
In addition to mutable data structures, the OCaml language provides looping constructs for
imperative programming. The while loop executes its body repeatedly while the condition
is true, returning the value of type unit upon completion. For example, this while loop
repeatedly decrements the mutable variable x, until it reaches zero:
# let x = ref 5;;
val x : int ref = {contents = 5}
# while ! x > 0 do
decr x;
done; ;
- unit = 0
# !x;;
- : int =0
The for loop introduces a new loop variable explicitly, giving the initial and final values of
the loop variable. For example, this for loop runs a loop variable called i from 1 to five,
incrementing the mutable value x five times in total:
30
# for i = 1 to 5 do
incr X;
done; ;
- unit = ()
# !x;;
- : int =5
Thus, while and for loops in OCaml are analogous to those found in most imperative lan-
guages.
In practice, the ability to choose between imperative and functional styles when programming
in OCaml is very productive. Many programming tasks are naturally suited to either an
imperative or a functional style. For example, portions of a program which deal with user
input, such as mouse movements and key-presses, are likely to benefit from an imperative
style where the program maintains a state and user input may result in a change of state. In
contrast, functions dealing with the manipulation of complex data structures, such as trees
and graphs, are likely to benefit from being written in a functional style, using recursive
functions and immutable data, as this greatly simplifies the task of writing such functions
correctly. In both cases, functions can refer to themselves - recursive functions. However,
recursive functions are pivotal in functional programming, where they are used to implement
functionality equivalent to the while and for looping constructs we have just examined.
1.7 Recursion
When a programmer is introduced to the concept of functional programming for the first time,
the way to implement simple programming constructs such as loops does not appear obvious.
If the loop variable cannot be changed then how can the loop proceed?
In essence, the answer to this question lies in the ability to convert looping constructs, such as
mathematical sums and products, into recursive constructs, such as recurrence relations. For
example, the factorial function is typically considered to be a product with the special case
O! = 1:
{
1 n=O
n! = n.
Il
i
=l = 1 x 2 x ... x (n - 1) x n n> 0
However, this may also be expressed as a recurrence relation:
{
1 n= 0
n l -
. - n x (n - i)! n> 0
Both the product and recurrence-relation forms of the factorial function may be expressed
in OCam!. The product form is most obviously implemented in an imperative style, using
mutable variables which are iteratively updated to accumulate the value of the product:
# let factorial n =
let ans = ref 1 and n = ref n in
while (!n > 1) do ( ans := !ans * !n; decr n) done;
!ans;;
val factorial: int -> int = <fun>
# factorial 5;;
- val int = 120
1.7. RECURSION 31
In contrast, the recurrence relation can be implemented more simply, as a recursive function:
# let rec factorial n = if n < 1 then 1 else n * factorial (n - 1);;
# factorial 5;;
- val int = 120
In the case of the factorial function, the functional style is considerably more concise and,
more importantly, is much easier to reason over, Le. the functional version is more easily seen
to be correct. For sophisticated and intrinsically complicated computations, these advantages
result in functional programs often being both simpler and more reliable than their imperative
equivalents.
However, functional programming is not always preferable to imperative programming. Many
problems naturally lend themselves to either imperative or functional styles. Clearly the
factorial function is most easily implemented when considered as a recurrence relation. Other
computations are most naturally represented as sums and products. For example, the dot
product a b of a pair of d-dimensional vectors a and b is most naturally represented as a
sum:
d
a.b=I::aiXbi
i=l
This sum can be computed by a rather obfuscated recursive function:
# let dot a b =
let len = Array. length a in
if len <> Array . length b then invalid_arg "dot" else
let rec aux i accu =
if i < len then aux (i+1) (accu +. a. (i) *. b. (i)) else accu in
aux 0 0.;;
val dot : float array -> float array -> float =<fun>
or by a clearer iterative function:
# let dot a b =
let len = Array. length a in
if len <> Array . length b then invalid_arg "dot" else
let r = ref O. in
for i = 0 to len - 1 do
r := !r +. a. (i) *. b. (i)
done;
!r;;
val dot: float array -> float array -> float = <fun>
For example, (1,2,3) . (2,3,4) = 20:
# dot [11.; 2. ; 3. I] [12.; 3.; 4. I] ; ;
- : float = 20.
In this case, the imperative form of the vector dot product is easier to understand than the
recursive form. Regardless of the choice of functional or imperative style, structured design
and implementation is an important way to manage complicated problems.
Finally, this introductory chapter would not be complete without providing a taste of the
value of OCaml in the context of scientific computing.
1.8 Applicability
Conventional languages vehemently separate functions from data. In contrast, OCaml allows
the seamless treatment of functions as data. Specifically, OCaml allows functions to be stored
as values in data structures, passed as arguments to other functions and returned as the results
of expressions, including the return-values of functions. As we shall now demonstrate, this
ability can be of direct relevance to scientific applications.
Many numerical algorithms are most obviously expressed as a function which accepts and acts
upon another function. For example, consider a function called d which calculates a numerical
approximation to the derivative of a given, one-argument function. The function d accepts a
function f : R R and a value x and returns a function to compute an approximation to the
derivative given by d: (R R) (R R):
d[f](x) = f(x + &) f(x - &)
This is easily written in OCaml as the curried function d
14
:
# let d f x:
let eps = sqrt epsilon_float in
((f (x +. eps)) -. (f (x -. eps)));. (2. *. eps);;
val d: (float -> float) -> float -> float = <fun>
For example, consider the function f(x) = x
3
- x-I:
# let f x =x *. x *. x -. x -. 1.;;
val f : float -> float = <fun>
The higher-order function d can be used to approximate (lil = 11:
ax x=2
# d f 2.;;
- : float = 10.9999999701976776
More importantly, as d is a curried function, we can use d to create derivative functions. For
example, the derivative f'(x) =
# let f' = d f;;
val f' : float -> float = <fun>
The function f' can now be used to calculate a numerical approximation to the derivative of
f for any x. For example, f'(2) = 11:
# f' 2.;;
- : float = 10.9999999701976776
As this demonstrates, fUllctional programming languages such as OCaml offer many consider-
able improvements over conventional languages used for scientific computing. Before continu-
ing, readers should be warned that, once learned, the techniques presented in this book soon
become indispensable and, therefore, there is no going back after this chapter.
14The value epsilon_float, defined in the Pervasives module, is the smallest floating-point number which,
when added to 1, does not give 1. The square root of this value can be shown to give optimal properties when
used in this way.
Chapter 2
Program Structure
In this chapter, we introduce some programming paradigms designed to improve program
structure. As the topics addressed by this chapter are vast, we shall provide only overviews
and references to literature containing more thorough descriptions and evaluations of the
relative merits of the various approaches.
Structured programming is all about managing complexity. Modern computational approaches
in all areas of science involve great intrinsic complexity. Consequently, the efficient structuring
of programs is vitally important if this complexity is to be managed in order to produce robust,
working programs.
Historically, many different approaches have been used to facilitate the structuring of pro-
grams. The simplest approach involves splitting the source code of the program between
several different files, known as compilation units. A marginally more sophisticated approach
involves the creation of namespaces, allowing variable and function names to be hierarchical,
and structures, allowing types to be combined and variables to be grouped. More recently, an
approach known as object-oriented (00) programming has become widespread. As we shall
see, the OCaml language supports all of these approaches as well as others. Consequently,
OCaml programmers are encouraged to learn the relative advantages and disadvantages of
each approach in order that they may make educated decisions regarding the design of new
programs.
Structured programming is not only important in the context of large, complicated programs.
In the case of simple programs, understanding the concepts behind structured programming
can be instrumental in making efficient use of existing libraries.
2.1 Nesting
The concept of nested OCaml definitions is best introduced to scientists by drawing an analogy
with the nesting of definitions in science:
1. When asked to define an animal, scientists from all disciplines would reply with very
similar definitions - the definition of "animal" is global.
33
34 CHAPTER 2. PROGRAMSTRUCTURE
2. When asked to define a circuit, a physicist is likely to define an electronic circuit but an
anaesthesiologist is likely to define an anaesthetic circuit - there are multiple, different
definitions of "circuit" which are local to different scientific disciplines.
3. When asked to define the space-time metric, cosmologists and astrophysicists are likely to
give similar definitions whereas scientists from other disciplines are likely to be unable to
answer - the definition of "space-time metric" is local to the study of general relativity.
4. When asked to define a species, a school-level scientist is likely to reply with a simple,
broad and probably unworkable definition whereas scientists in different specialised fields
are likely to reply with different, specialised definitions - the definition of "species" is
refined separately in separate scientific disciplines.
Analogously, function and variable definitions can be structured hierarchically within an
OCaml program, allowing some definitions to be globally visible, others to be defined sep-
arately in distinct portions of the hierarchy, others to be visible only within a single branch
of the hierarchy and others to be refined, specialising them within the hierarchy.
Compared to writing a program as a flat list of function and variable definitions, structuring a
program into a hierarchy of definitions allows the number of dependencies within the program
to be managed as the size and complexity of the programs grows. This is achieved by nesting
definitions. Thus, nesting is the simplest approach to the structuring of programs
l
.
For example, the ipow3 function defined in the previous chapter contains a nested definition
of a function sqr:
# let ipow3 x =
let sqr x =x * x in
x * sqr x;;
Nesting can be more productively exploited when combined with the factoring of subexpres-
sions, functions and higher-order functions.
2.2 Factoring
The concept of program factoring is best introduced to a scientist in relation to the con-
ventional factoring of mathematical expressions. When creating a complicated mathematical
derivation, the ability to factor subexpressions, typically by introducing a substitution, is a
productive way to manage the incidental complexity of the problem.
For example, the following function definition contains several duplicates of the subexpression
x-I:
f(x) = (x -1- (x -1) (x - 1))X-l
1Remarkably, many other languages, including C and C++, do not allow nesting of function and variable
definitions.
2.2. FACTORING
By factoring out a subexpression a(x), this expression can be simplified:
a(x) = x-I
f(a) = (a - a2t
35
The factoring of subexpressions, such as x-I, is the simplest form of factoring available to a
programmer. The OCaml function equivalent to the original, unfactored expression is:
# let f x = (x -. 1. -. (x -. 1.) *. (x -. 1.)) ** (x -. 1.);;
# f 5.;;
- : float = 20736.
The OCaml function equivalent to the factored form is:
# let f x = let a = x -. 1. in (a -. a *. a) ** a; ;
# f 5.;;
- : float = 20736.
By simplifying expressions, factoring is a means to manage the complexity of a program.
However, the previous example only factors a subexpression. In functional languages, such as
OCaml, the ability to factor out higher-order functions is much more powerful than subex-
pression factoring.
Whenever several definitions share functionality but implement this functionality indepen-
dently, it is likely that a higher-order function may be factored out. In the context of func-
tional programming, this form of factoring is most often seen with algorithms which act over
data structures. As we shall see in chapter 3, the higher-order map and fold functions are used
so commonly with data structures that implementations typically provide these functions. In-
deed, these functions are provided by the OCaml core library for the implementations of lists,
arrays, maps, sets and hash tables. In the mean time, let us consider a kind of fold function
which does not act over an explicit data structure but, instead, acts over an implicit list of
consecutive integers.
The following functions compute the sum and the product of a semi-inclusive range of integers
[l, u):
# let rec sum_range 1 u =
if 1 int -> int = <fun>
# let rec product_range 1 u =
if 1 int -> int = <fun>
For example, the product_range function may be used to compute 5! as the product of the
integers [1,6):
# product_range 1 6;;
- : int = 120
36 CHAPTER 2. PROGRAM STRUCTURE
fold_range f accu 1 9
Figure 2.1: The fold_range function can be used to accumulate the result of applying a
function f to a contiguous sequence of integers, in this case the sequence [1,9).
The sum_range and product_range functions clearly share some functionality. Specifically,
they both apply a function (integer add + and multiply *, respectively) to u - 1 before
recursively applying themselves to the smaller range [l, u - 1) until the range contains no
integers l = u. This shared functionality can be factored out as a higher-order function
fOld_range:
# let rec fold_range f accu 1 u =
if 1 int -> 'a) -> 'a -> int -> int -> 'a = <fun>
The fOld_range function accepts a function f, an accumulator accu and a range specified
by two integers land u. Application of the fOld_range function to mimic the sum_range or
product_range functions begins with a base case in accu (0 or 1, respectively). If l 2 u then
accu is returned as the result. Otherwise, the fOld_range function recurses with the smaller
range [l, u - 1) and a new accumulator given by applying the function f to the old accu and
to u, which will give accu + u in the case of the sum_range function or accu * u in the case
of product_range. This process, known as a right fold because f is applied to the rightmost
integer u-1 first, is illustrated in figure 2.1.
The sum_range and product_range functions may then be expressed more simply in terms of
the fOld_range function by supplying the integer addition or multiplication operators
2
and
base case:
# let rec sum_range 1 u = fold_range ( + ) 0 1 u;;
val sum_range: int -> int -> int = <fun>
# let rec product_range 1 u =fold_range ( * ) 1 1 u;;
val product_range : int -> int -> int = <fun>
Furthermore, these forms of the sum_range and product_range functions can be further
simplified by performing what is known in computer science as 1]-reduction, cancelling the
final arguments 1 and u from both sides of a curried function definition:
2The non-infix form of the + operator is written ( + ), Le. ( + ) a b is equivalent to a + b. Note the
spaces to avoid ( * ) from being interpreted as the start of a comment.
2.3. MODULES
# let rec sum_range = fold_range ( + ) 0;;
val sum_range : int -> int -> int = <fun>
# let rec product_range =fold_range ( * ) 1;;
val product_range: int -> int -> int = <fun>
These functions work in exactly the same way as the originals:
# product_range 1 6;;
- : int = 120
37
but their common functionality has been factored out into the fOld_range function.
In addition to simplifying the definitions of the sum_range and product_range functions, the
fOld_range function may also be used in new function definitions. For example, the following
higher-order function, given a length n and a function f, creates the list containing the n
elements (f(0), f(l), ... , f(n -1)):
# let list_init n f = fold_range (fun 1 i -> f i :: 1) [J 0 n;;
val list_init : int -> (int -> 'a) -> 'a list = <fun>
This list_init function uses the fOld_range function with a A-function, an accumulator
containing the empty list [] and a range [0, n). The A-function prepends each f i onto the
accumulator 1 to construct the result.
This is actually the list equivalent of the Array. init function for arrays. For example, these
two applications of these functions both create the sequence Xi = i for i = O... 9:
# Array.init 10 (fun i -> i);;
- : int array = [10; 1; 2; 3; 4; 5; 6; 7; 8; 91J
# list_initiO (fun i -> i) ; ;
- : int list = [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J
As we have seen, the nesting and factoring of functions and variables can be used to simplify
programs and, therefore, to help manage intrinsic complexity. In addition to these approaches
to program structuring, the OCaml language also provides two constructs designed to encap-
sulate program segments. We shall now examine the methodology and relative benefits of the
approaches offered by modules and objects.
2.3 Modules
We have already encountered several modules. In particular, the Pervasives module which
encapsulates many function and type definitions initialised before a program is executed, such
as operators on the built-in int, float, bool and string types. Also, the Array module
encapsulates function and type definitions related to arrays, such as functions to create arrays
(e.g. Array.init) and to count the number elements in an array (Array.length).
In general, modules are used to encapsulate related definitions, Le. function, type, data, mod-
ule and object definitions. In addition to simply allowing related definitions to be grouped
together, the OCaml module system allows these groups to form hierarchies and allows in-
terfaces between these groups to be defined, the correct use of which is then enforced by the
compiler at compile-time. This can be used to add "safety nets" when programming, improv-
ing program reliability. We shall begin by examining some of the modules available in the core
OCamllibrary before describing how a program can be structured using modules, stating the
syntactic constructs required to create modules and, finally, developing a new module.
The OCaml system comes with many modules, implementing a broad spectrum of function-
alities, from file handling to implementations of various data structures. As we shall see later
in this chapter, these modules are provided in both source and compiled form. The source
code to these modules is a faithful source of wisdom, having been written by expert OCaml
programmers. Moreover, the bundled modules are easily supplemented by user written mod-
ules which, particularly if well written, can be shared between several programs and even
distributed or sold.
Some conventions adhered to by the built-in modules, can be productively adhered to when
creating new modules. If a module's design is based around a single type, this type is con-
ventionally named t, e.g. the Complex module in the core library provides a data type called
Complex. t for storing complex-valued numbers and several functions for acting upon them
(neg, conj, sub, add, mul, inv, div, sqrt, norm, norm2, arg, polar, exp, log and pow).
Also, the most fundamental function to construct a value of this type is conventionally named
make, e.g. the Array .make and String.make functions create arrays and strings, respectively,
containing repetitions of a given element.
In the interests of clarity and correctness, modules provide a well-defined interface known as a
signature. The signature of a module declares the contents of the module which are accessible
to code which uses the module. For example, the type t in the Complex module is declared
in the signature
3
. The code implementing a module is defined in the module structure.
The concepts of module signatures and structures are best introduced by example. Consider
a module called IntRange which encapsulates code for handling contiguous ranges of integers
and includes the functionality of the fold_range function derived in the previous section.
This module will include:
an abstract type t to represent a contiguous range of integers.
a function make to construct a range from a given pair of integers.
a function mem to test a given integer for membership in a given range.
functions fold_left and fOld_right to implement folds in different directions over
ranges.
a function map_into_list to map a range into a list, applying a given function to each
element.
a function to_list to convert a range into a list of consecutive integers.
We shall begin by defining the interface to the module rigorously as a module signature.
3Specifically, in the core library file "complex.mli".
2.3. MODULES
2.3.1 Signatures
39
Module signatures declare the interfaces to modules. Modules which adhere to a given signa-
ture must define all of the constructs declared in the signature but are free to define additional
constructs. However, only those constructs declared in the signature will be accessible, or vis-
ible, from code outside the module (any additional constructs are hidden).
Signatures may contain several different kinds of declaration:
type declarations in the form type ....
exception declarations in the form exception. '"
variable and function type declarations in the form val. '"
open the namespaces other signatures using open statements.
replicate the contents of other signatures using include statements.
other signature declarations, to nest signatures.
Signatures are declared using the syntax:
module type NAME = sig ... end
where the name of the signature (NAME) is conventionally written entirely in capital letters.
For example, the interface to our IntRange module may be defined rigorously as the module
signature denoted (according to convention) INTRANGE:
module type INTRANGE =
sig
type t
val make : int -> int -> t
val mem : int -> t -> bool
val fold_left: ('a -> int -> 'a) -> 'a -> t -> 'a
val fold_right : (int -> 'a -> 'a) -> t -> 'a -> 'a
val map_into_list : (int -> 'a) -> t -> ' a list
val to_list : t -> int list
end
The declarations made in this signature may then be implemented in a module structure which
adheres to this signature.
2.3.2 Structures
Module structures may contain several different kinds of definition which, combined, imple-
ment the internals of the module:
type definitions in the form type ....
40 CHAPTER 2. PROGRAMSTRUCTURE
exception definitions in the form exception ....
variable and function definitions in the form let ....
open the namespaces other module structures using open statements.
replicate the contents of other module structures using include statements.
other module signature and structure definitions, to nest modules.
Module structures are defined using the syntax:
module Name = struet ... end
where the name of the structure (Name) is required to begin with a capital letter.
Also, a module structure Name may be defined as adhering to an existing signature NAME
using the syntax:
module Name : NAME = struet ... end
For example, the IntRange module may be defined as a structure adhering to the INTRANGE
signature:
module IntRange : INTRANGE =
struet
type t ={ 1 : int; u : int}
let make 1 u =
if 1 <= u then {l = 1; u = u} else invalid_arg "IntRange.make"
let mem i r = r. 1 <= i && i < r. u
let fold_left f aeeu r =
let ree aux aeeu 1 u =
if 1 < u then aux (f aeeu 1) (1 + 1) u else aeeu in
aux aeeu r.l r.u
let fold_right f r aeeu =
let ree aux 1 u aeeu =
if 1 i)
end
The foldJight function is the same as the fOld_range function, applying f to the sequence
of integers in decreasing order. The fold_left function applies f to 1 first (shrinking the range
2.3. MODULES 41
to [l +1, u)) and, therefore, applying f to the sequence of integers in increasing order. As we
shall see in chapter 3, many data structures provide fold_left and fold_right functions.
We shall examine the use of the IntRange module in section 2.3.4.
In many cases, a signature is created for a specific module, in which case an anonymous
signature and corresponding structure may be productively defined simultaneously, using an
anonymous signature.
2.3.3 Anonymous signatures
An anonymous signature and corresponding structure may be defined simultaneously as:
module Name : sig ... end:::: struct ... end
The IntRange module may be implemented as an anonymous signature and compliant struc-
ture as follows:
module IntRange
sig
type t
val make : int -> int -> t
val mem : int -> t -> bool
val fold_left: ('a -> int -> 'a) -> 'a -> t -> 'a
val fold_right: (int -> 'a -> 'a) -> t -> 'a -> 'a
val map_into_list : (int -> 'a) -> t -> 'a list
val to_list : t -> int list
end ::::
struct
type t :::: { 1 : int; u int}
let make 1 u ::::
if 1 <:::: U then { 1 :::: 1; u = u } else invalid_arg "IntRange .make "
let mem i r = r.l <= i && i < r. u
let fold_left f accu r =
let rec aux accu 1 u =
if 1 < u then aux (f accu 1) (l + 1) u else accu in
aux accu r.l r.u
let fold_right f r accu =
let rec aux 1 u accu =
if 1 i)
end
The ability to declare an anonymous signature as the interface to a module structure is useful
when the signature has been designed specifically for the given module, rather than to enforce
a consistent interface for several modules.
2.3.4 Use of the IntRange module
The IntRange module may be used to create and perform operations upon values of an abstract
type representing a range of consecutive integers. Abiding by convention, the type IntRange. t
is used to represent a range of consecutive integers. Internally, this type is represented as a
record containing the lower and upper bounds of the range. Externally, this type is visible but
abstract because the type name is declared in the signature as type t, i.e. without giving the
internals of the type. The module defines a make function to construct a value of type t from
a given pair of integers, testing that they form a valid range. For example, having defined the
module in a top-level, a value of type IntRange. t may be constructed:
# let r = IntRange .make 0 10;;
val r : IntRange. t = <abstr>
Note that the value is described as <abstr> to denote the contents of an abstract type.
The input to the make function is validated to ensure that a meaningful range is specified,
Le. that l u. Attempting to create an invalid range will result in an exception being raised
at run-time:
# IntRange.make 100;;
Exception: Invalid_argument "IntRange .make " .
The mem function tests an integer for membership in an integer range. For example, 5 E [0,10)
and 15 tJ. [0,10):
# IntRange.mem 5 r;;
- : bool = true
# IntRange.mem 15 r;;
- : bool = false
The module then defines useful higher-order functions fold_left and fold_right, which
apply a given function over a range of integers in increasing and decreasing order, respectively.
For example, these functions can be used to create a list of integers by using the cons operator
to prepend each integer onto a list, starting with the empty list []:
# IntRange. fold_left (fun 1 i -> i :: 1) [J r;;
- : int list = [9; 8; 7; 6; 5; 4; 3; 2; 1; OJ
# IntRange. fold_right (fun i 1 -> i :: 1) r [J;;
- : int list = [0; 1; 2; 3; 4; 5; 6; 7; 9J
Note that the fold_left function prepended 0 first whereas the fOld_right function prepended
9 first.
This functionality is exploited by the map_into_list function which applies a given function
to each integer before prepending it onto a list, starting with the empty list. For example,
this can be used to create a list of squares Xi = i
2
by supplying a suitable A-function:
# IntRange .map_into_list (fun i -> i * i) r;;
- : int list = [0; 1; 4; 9; 16; 25; 36; 49; 64; 81J
2.3. MODULES 43
Finally, the map_into_list function is used to create a simple to_list function which supplies
an identity function in order to create a list of consecutive integers:
# IntRange.to_list r;;
- : int list = [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J
The functionality provided by this IntRange module can be used in several ways. For example,
the list_init function, which computes the list (f(O), f(1), .. . ,f(n-1)) given the length n
and function f, may be written in terms of IntRange .map_into_list:
# let list_init n f = IntRange.map_into_list f (IntRange.make 0 n);;
val list_init : int -> (int -> 'a) -> 'a list
A list of lists may be constructed by applying the list_ini t function to create a row for each
column:
# let matrix_init n m f =
let iniLrow f = lisLinit n f in
let init_col m= init_row (fun n -> f n m) in
list_init m init_col;;
val matrix_init : int -> int -> (int -> int -> 'a) -> 'a list list = <fun>
Matrices may then be created by passing the appropriate initialising function. For example,
a function to create n x n identity matrices:
# let matrix_identity n =
matrix_init n n (fun i j -> if i = j then 1 else 0);;
val matrix_identity: int -> int list list = <fun>
# matrix_identity 3;;
- : int list list = [[1; 0; OJ; [0; 1; OJ; [0; 0; 1] J
The IntRange module which we have just developed may be thought of as an implicit data
structure, as the integers in a range are not stored explicitly (e.g. in a list) but, rather, are
implied by a pair of integers specifying the bounds of the range. As we shall see in chapter
3, much of the functionality provided by our IntRange module is available in explicit data
structures.
2.3.5 Another example
Before concluding our introduction to modules, consider a module encapsulating type, variable
and function definitions relating to ranges [l ... u) c JR. into a module. This module will
include:
A type t representing a range [l ... u) of real-valued numbers which is represented inter-
nally by two floating-point values representing l and u but abstracted by the signature
so that the contents of the internal representation can only be altered using functions
defined in this module.
A function make which creates a range [l ... u) from two floating-point values representing
l and u.
A function to_pair which converts a range [l ... u) into a 2-tuple of floating-point values.
A function subrange which tests a pair of ranges to determine if the latter range is a
subset of the former range.
A function union which calculates the set union of a pair of ranges, returning a list of
ranges.
A function inter which calculates the set intersection of a pair of ranges, returning a
list of ranges.
This module may be defined as:
module FloatRange :
sig
type t
val make: float -> float -> t
val to_pair: t -> float * float
val subrange : t -> t -> bool
val union: t -> t -> t list
val inter: t -> t -> t list
end =
struct
type t = float * float
let make 1 u = if u < 1 then invalid_arg "Range.make" else (1, u)
let to_pair r = r
let subrange (11, ul) (12, u2) = 11 <= 12 && ul >= u2
let order (11, ul) (12, u2) =
if 11 < u2 then 12, u2), (11, ul)) else 11, ul), (12, u2))
let union rl r2 = let 11, ul), (12, u2)) = order rl r2 in
if ul < 12 then [11, ul; 12, u2] else [min 11 12, max ul u2]
let inter rl r2 = let 11, ul), (12, u2)) = order rl r2 in
if ul < 12 then [] else [max 11 12, min ul u2]
end
This FloatRange module shares several design similarities with the IntRange module. We shall
consider the similarities and differences between these modules before providing examples of
the use of the FloatRange module.
Like the IntRange module, the FloatRange module also defines a type t which, in this case,
represents a range of numbers on the real-line. Again, the type t is abstract. The FloatRange
module also defines a function make to construct a value of the type t.
2.3. MODULES 45
Unlike the IntRange module, the internal representation of a FloatRange. t is chosen to be
a 2-tuple of floating-point numbers rather than a record. From the point of view of code
size and clarity, this is a comparatively inconsequential design decision
4
. More importantly,
the FloatRange module makes use of a function, order, which appears in the structure of
the module but not in the signature. Consequently, this function is only visible to definitions
which appear in the module, after the definition of the order function. This has been done
because the functionality of the order function has been factored out from both the union
and the inter functions.
In this case, the effect of factoring out the order function whilst hiding it from code outside
the FloatRange module could have been achieved using nesting by defining the union and
inter functions simultaneously:
let (union, inter) =
let order (11, ut) (12, u2) =
if 11 < u2 then ((12, u2), (11, ut)) else ((11, ut), (12, u2)) in
let union rl r2 = let ((11, ul), (12, u2)) = order rl r2 in
if ul < 12 then [11, ul; 12, u2J else [min 11 12, max ul u2J in
let inter rl r2 = let ((11, ul), (12, u2)) = order rl r2 in
if ul < 12 then [J else [max 11 12, min ul u2J in
(union, inter)
The FloatRange module may be used to create and perform operations upon values of an
abstract type representing a range of real-valued numbers. For example, a pair of ranges may
be created:
# let a = FloatRange. make 1. 3. and b = FloatRange. make 2. 5.;;
val a : Range. t = <abstr>
val b : Range. t = <abstr>
The contents of these values can be extracted using the to_pair function:
# Range.to_pair a;;
- : float * float = [(1., 3.) J
# Range.to_pair b;;
- : float * float = [(2., 5.) J
The union and intersection ofthese ranges may then be calculated using the union and inter
functions provided by the FloatRange module. In order to see the result we must extract the
ranges as pairs of floating-point numbers by mapping the to_pair function over the resulting
lists. For example, [1,3) U [2,5) = [1,5):
# List.map Range.to_pair (Range.union a b);;
- : (float * float) list = [(1., 5.)J
4However, as we shall see in chapter 7, records containing fields which are all of the type float are repre-
sented more efficiently by the ocamlopt compiler.
46
and [1,3) n [2,5) = [2,3)
CHAPTER 2. PROGRAM STRUCTURE
# List.map Range.to_pair (Range. inter a b);;
- : (float * float) list = [(2., 3.)J
As we shall see in the chapter 3, the OCaml core library provides many useful data structures
which implement the functionalities presented here. Moreover, these implementations are also
encapsulated into separate modules. However, the OCaml language provides an additional
means of encapsulating definitions.
2.4 Objects
Object-oriented (00) programming is a much touted approach for the structuring of pro-
grams. However, the hype surrounding this notion is primarily driven by the fact that recent,
high-profile languages (particularly Java and C++) provide some support for objects but do
not provide support for modules or other constructs to aid with the structuring of programs.
Despite this social aspect to 00 approaches, the subject has a rigorous mathematical back-
ground [3, 4]. The OCamllanguage draws upon this foundation to provide a very carefully
constructed and expressive object system which is particularly well suited to the writing of
extensible programs. However, because the module system provides a safer alternative to
encapsulation, 00 programming is not as prolific in OCaml as it is in other languages.
Fundamental concepts in 00 programming are the ability to define types of object (class
types), define implementations adhering to these types (class expressions), define relation-
ships between class types, instantiate class expressions to produce objects at run-time and
interrogate objects via their methods (which are often functions).
As objects encapsulate program and data, they are somewhat similar to data structures com-
posed of tuples or records containing these values. However, static typing requires a tuple or
record value to be from a well-defined set of possible values (its type). In contrast, the type
of an object is the interface to which it adheres. A single object which satisfies many different
interfaces may then be used in many different contexts, whereas a tuple or record cannot.
2.4.1 Classes
Like module signatures, class types may contain several different kinds of declaration:
values declaratons in the form val ....
methods declaratons in the form method ....
type constraints using the constraint keyword.
inheritance using the inherit keyword.
Class types are declared using the syntax:
2.4. OBJECTS
class type name = object ... end
where the name of the class type (name) must begin with a lower-case letter.
Like module structures, class expressions may contain several different kinds of definition:
values definitions in the form val ....
method definitions in the form method ....
type constraints, using the constraint keyword.
initializers, using the initializer keyword.
Class expressions are declared. using the syntax:
class name = object ... end
where the name of the class expression (name) must begin with a lower-case letter.
2.4.2 Objects
47
Objects are instantiated (created) at run-time, either as immediate objects or as classed ob-
jects.
2.4.2.1 Immediate objects
Objects can be instantiated. independently of classes, known as immediate objects
5
. The
following function accepts two floating-point values x and y, representing real and imaginary
parts respectively, and creates an immediate object representing a complex-valued number
z = x +iy:
# let z x y =
object
val x : float = x
val y : float = y
method re '" x
method im = y
end; ;
val z : float -> float -> < im : float; re : float> = <fun>
Note that the type of the object returned by the function z is defined only by the method
variables im and re which it implements.
The function z can be used. to instantiate an object called a which represents the complex
number a = 2 +3i:
5This feature is new in OCaml version 3.08.
48
# let a = z 2. 3.;;
val a : < im : float; re : float> = <obj>
The real and imaginary parts ofthe complex-valued number represented by a may be extracted
using the re and im members of a, referred to as a#re and a#im, respectively:
# a#re;;
- : float = 2.
# a#im;;
-: float =3.
In addition to immediate objects, objects of the same type may be expressed by instantiating
from a single class expression.
2.4.2.2 Classed objects
The type of the immediate object a could have been expressed by the class type:
# class type number =
object
method re : float
method im : float
end; ;
class type number = object method im : float method re : float end
Although not yet implemented, the class type number may be used as if it were a normal type.
For example, the following declares a function which maps a number onto a float:
# let abs_number (z : number) =
let sqr x = x *. x in
sqrt (sqr z#re +. sqr z#im);;
val abs_number : number -> float = <fun>
Before being able to use this function we must define a class expression which adheres to the
class type number. For example, the following class expression complex implements complex
numbers in way which adheres to the interface required by number:
# class complex x y =
object
val x = x
val y =y
method re float = x
method im : float = y
end; ;
class complex :
float ->
float ->
object val x : float val y float method im : float method re float end
2.4. OBJECTS 49
Objects may be instantiated from class expressions using the new keyword. For example, an
object of type complex may be instantiated using:
# let b = new complex 2. 3.;;
val b : complex = <obj>
Note that the type of the object b is denoted by the name of the class, complex. The resulting
object has the same properties as the original a object. Specifically, the members can be
accessed equivalently:
# b#re;;
- : float = 2.
# b#im;;
- : float = 3.
Also, the abs_number function can be applied to b as complex adheres to the interface pre-
scribed by normal:
# abs_number b;;
- : float = 3.60555127546398912
Classed objects have an important advantage over immediate objects - relationships may be
defined between classes.
2.4.2.3 Inheritance
A class representing real-valued numbers may be derived from our complex class using the
inherit keyword:
# class real x =
object
inherit complex x O.
end; ;
class real:
float ->
object val x : float val y : float method im : float method re : float end
Objects of this class can then be instantiated from a single floating-point value:
# let c = new real 5.;;
val c : real =<obj>
Resulting objects always return an imaginary part of zero:
# c#re;;
- : float = 5.
# c#im;;
- : float = O.
Figure 2.2: The ocamlbrowser program allows the contents of modules to be examined
graphically. The ability to examine the types of library functions is particularly useful.
The type of the Array. fold_left function is illustrated here.
The ability to derive new types of object from existing types makes 00 programming ideally
suited to the creation of extensible programs.
Many more sophisticated uses of the OCaml object system, including multiple inheritance,
parameterized classes, polymorphic methods, coercions, cloning, mutually recursive classes,
binary methods and friends, are described in the OCaml reference manual [2]. As the OCaml
object system is much more powerful than the 00 approaches used in other languages, such
as C++ and Java, the ways in which OCaml objects may be exploited is a current research
topic in scientific programming.
2.5 OCaml browser
The standard OCaml distribution contains an ocamlbrowser program which can be used to
examine available modules using a graphical interface, in particular the core library. Whilst
developing an OCaml program, the ability to review the contents of libraries can be very
useful. In particular, the ability to find the type of a function at the click of a button, or to
examine the documentation in the corresponding interface file, can greatly speed development.
By default, ocamlbrowser shows only the contents of the core library modules and any OCaml
sources found in the directory in which ocamlbrowser was started. Selecting a module name
allows the contents of a particular module to be examined. Selecting a type (t) or value (v) in a
module presents the type, or the type of the value at the bottom of the ocamlbrowser window
(illustrated in figure 2.2). The Impl and Intf buttons in the main window open an additional
window for browsing source code, displaying the contents of the implementation (".ml") or
interface (".mli") files respectively, targeting the selected content (illustrated in figure 2.3).
In many cases, the ability to use the ocamlbrowser program to peruse the contents of other,
2.6. COMPILATION 51
file .Edit
and the element itself as second argument. *)
pal fold_left, ('a -) 'b -) 'a) -) 'a -) 'b array -) 'a
(** [Array. fold left f x a] computes
[f ( ... (f (f' x a. (0)) a. (1)) ... ) a. (n-1)],
where [n] is the length of the array [a]. *)
lIal fold right: ('b -) 'a -) 'a) -) 'b array -) 'a -) 'a
(** [Array. fold_right fax] computes
[f a. (0) (f a. (1) ( ... (f a. (n-1) x) ... ))],
where [n] is the length of the array [a]. *)
(** {6 Sorting} *)
lIal sort: ('a -) 'a -) int) -) 'a array -) unit
(** Sort an array in increasing order according to a comparison
function. The comparison function must return 0 if its arguments
compare as equal, a positive integer if the first is greater,
and a negative if the first is smaller (see below for a
complete specificat10n). For example, {!Pervasives.compare} is
a suitable comparison function, provided there are no floating-point
NaN values in the data. After calling [Array. sort], the

Figure 2.3: The ocamlbrowser program contains a simple editor which can be used to
examine the contents of OCaml source code. In particular, this simplifies the task of
examining documentation in the interface files of libraries, e.g. the description of the
Array. fold_left function shown here.
non-core libraries can be desirable. Other modules can be added to the default selection by
selecting Modules I> Path editor... choice from the menu and adding new search paths.
The final step in creating workable programs based around modules is the compilation of the
parts of a programs into complete executables.
2.6 Compilation
In almost all cases, the code for a program will be split between separate files. This is typically
because a program will make use of libraries to provide part of its functionality. For example,
a program dealing with arbitrary precision arithmetic is likely to make use of the Nat, Num or
Big_int modules or an interface to the GNU MP library.
The separate files used to store parts of an OCaml program are treated as modules. Pairs of
files with the same name except for the suffixes ".mli" and ".ml" are treated as the signatures
and structures of a single module, respectively, the name of which is given by the filename
with its first letter capitalized. Single files with the suffix ".ml" are treated similarly but as
module structures without signatures (i.e. all definitions in the module will be visible).
The process of creating an executable from source code consists of two separate stages. The
first stage, known as compilation, converts the human-readable source code into an interme-
diate form known as object code
6
. The second stage, known as linking, combines object codes
to create an executable.
As we have already seen, OCaml programs can be compiled either into byte-code or into native-
code executables. The OCaml compilers generate names for the object files with the same
6This is nothing to do with object orientation.
Figure 2.4: Dependencies between the First, Second and Main modules in the example
program.
name as the source files but with suffixes based upon the mode of compilation. Specifically,
source code in files with the suffixes ".mli" and ".ml" are compiled either into byte-code object
files with suffixes ".cmi" and ".cmo", respectively, or into native-code object files with suffixes
".cmi" and ".cmx".
When linking, the order in which object files are specified can be important. The OCaml
compilers consider the list of object files in the order in which they are specified. Consequently,
object files must be specified after any other object files which they refer to.
For example, consider a program composed of three separate compilation units named "first",
"second" and ''main'' split between five files: ''first.mli'', ''first.ml'', "second.mli", "second.ml"
and ''main.ml''. The ''first.ml'' file contains:
let sentence = "This string is in the first compilation unit."
let sentence2 = "This string is also in the first compilation unit."
The "first.mli" file contains:
val sentence : string
The "second.ml" file contains:
let sentence = "This string is in the second compilation unit."
The "second.mli" file again contains:
val sentence : string
Finally, the ''main.ml'' file contains:
let
print_endline First. sentence;
print_endline Second. sentence
The source code contained in these five files may be compiled into a byte-code executable by
first compiling the interface (".mli") files, then compiling the implementation (".ml") files and,
finally, linking the resulting object files (".cmo") to form an executable. The dependencies
between the three modules are illustrated in figure 2.4. The dependencies between the five
initial and five generated files are illustrated in figure 2.5.
The interface files ''first.mli'' and "second.mli", which represent module signatures, may be
compiled to object form for a byte-code executable using:
2.6. COMPILATION 53
acamic -c main.ml
ocaml c -c first. ml
ocamic -c second.rnl
acamic first. mii
acamic first. mli
Figure 2.5: Dependencies between the example files used to make the ''first.cmo", "sec-
ond.cmo" and "main.cmo" files before they are linked to create the final executable pro-
gram.
$ ocamlc first.mli
$ ocamlc second.mli
This generates the files ''first.cmi'' and "second.cmi". The implementation files ''first.ml'', "sec-
ond.mI" and ''main.ml'', representing module structures, may be compiled to object form for
a byte-code executable by supplying the -c flag to supress the generation of an executable:
$ ocamlc -c first.ml
$ ocamlc -c second.ml
$ ocamlc -c main.ml
The ''first.cmi'' and "second.cmi" interface files are used to enforce the proper use of the corre-
sponding modules. This generates the files ''first.cmo'', "second.cmo" and ''main.cmo''. Finally,
a byte-code executable may then be created by linking the resulting".cmo" files, in this case
to form an executable named "test":
$ ocamlc first.cmo second.cmo main.cmo -0 test
Executing test prints the two strings defined in the First and Second modules, as expected:
$ ./test
This string is in the first compilation unit.
This string is in the second compilation unit.
The source code can be compiled into native-code, rather than byte-code, by using the
ocamlopt compiler to compile the source code and then link the ".cmx" files instead:
$ ocamlopt first.mli
$ ocamlopt second.mli
$ ocamlopt -c first.ml
$ ocamlopt -c second.ml
$ ocamlopt -c main.ml
$ ocamlopt first.cmx second.cmx main.cmx -0 test
54
Suffix I File type
name.ml Structure of the Name module
name.mli Signature of the Name module
name.cmi Compiled signature
name.cmo Byte-code compiled structure
name.cmx Native-code compiled structure
name.cma Byte-code archive
name.cmxa Native-code archive
name Final executable
Table 2.1: Types offile handled by the OCaml compilers.
The resulting executable runs in exactly the same way.
When linking, the relative order of the implementations of the First and Second modules is
not important, provided they are specified before the Main module which depends upon them.
For example, when linking, we could have specified the First and Second modules in reverse
order with no ill-effect:
$ ocamlopt second.cmx first.cmx main.cmx -0 test
However, as the Main module depends upon both the First and Second modules, trying to
link without these two modules preceding the Main module will fail:
$ ocamlopt first.cmx main.cmx second.cmx -0 test
No implementations provided for the following modules:
Second referenced from main.cmx
$
Note also that the contents of the interface file ''first.mIi'' hides the sentence2 variable defined
in the ''first.mI'' file from external code (Le. from the code in ''main.ml''). Consequently,
attempting to access this variable from the implementation of the Main module:
let =
print_endline First.sentence2
would have caused a compile-time error from the compiler:
$ ocamlopt first.mli
$ ocamlopt second.mli
$ ocamlopt -c first.ml
$ ocamlopt -c second.ml
$ ocamlopt -c main.ml
File "main.ml", line 3, characters 16-31:
Unbound value First. sentence2
$
2.6. COMPILATION 55
Figure 2.6: Dependencies between the compilation units of the core library.
Several different types of file are used in the process of creating an executable from OCaml
source code. Table 2.1 lists the types of file handled by the OCaml compilers.
In the case of complicated programs or libraries, containing many modules and dependencies,
tools to visualise the dependencies can be useful. The ocamldep program can be used to create
a graph-theoretic representation of dependencies between compilation units. For example, the
following generates a graph representing the dependencies between all of the ".mI" files in the
current directory:
$ ocamldep *.ml >dep.dep
The resulting data can be converted into the format used by the freely-available, general-
purpose graph plotting program dot using the freely-available ocamldot program:
$ ocamldot <dep.dep >dep.dot
The dot program may then be used to generate a diagram in PostScript format:
$ dot -Tps dep.dot >dep.ps
For example, the dependency graph between the some of the compilation units which make up
the core OCamllibrary (in the "stdlib" directory of the distribution) is illustrated in figure 2.6.
The OCamI compilers check that compilation commands, as well as the programs themselves,
are correct.
2.6.1 Linking with libraries
The ability to perform the compilation and linking stages separately is more important when
compiling programs which make use of libraries supplied in the form of object code. For
example, consider the program, assumed to be in the file "fact.ml":
open Num
let ree factorial n =
if n=O then (num_oLint 1) else (num_oLint n) *1 factorial (n-1);;
print_endline ("100! ~ (string_of_num (factorial 100)));;
This program uses the Num module (which implements arbitrary precision integer arithmetic).
The namespace of the Num module is opened in order to use the infix operators +/, -I, */,
/ / (equivalent to +, -, * and / for machine-precision integers, respectively). A function to
compute the factorial of a machine-precision integer to give an arbitrary precision integer is
then written in terms of */. Finally, the value of 100! is printed.
As this program depends upon the Num module, attempting to compile this program without
first specifying the object files which it depends upon will fail:
$ ocamlc fact.ml -0 fact
Error while linking fact.erno: Reference to undefined global 'Num'
The Num module is not part of the OCaml core library but is provided as a separate library
"nums.cma" (byte-code) and ''nums.cmxa'' (native-code) which we must link in. Thus, we can
compile our program by specifying "nums.cma" before ''fact.ml'':
$ ocamlc nums.cma fact.ml -0 fact
$ . Ifact
100! = 933262154439441526816992388562667004907159682643816214685929638
9521759999322991560894146397615651828625369792082722375825118521091686
4000000000000000000000000
In fact, the OCaml byte-code can be loaded dynamically, as programs are running or while
the top-level is in use. In the case of the top-level, the functionality required to use the Num
module may be obtained using the #load directive:
# #load "nums. cma" ; ;
The source code in the ''fact.ml'' file may then be executed using the #use directive:
# #use "fact.ml";;
val factorial: int -> Num.num = <fun>
100! = 933262154439441526816992388562667004907159682643816214685929638
9521759999322991560894146397615651828625369792082722375825118521091686
4000000000000000000000000
- : unit = 0
Although we can now compile arbitrary programs into executables and dynamically load byte-
code into a running top-level, we do not yet know how to build a custom-made top-level which
includes extended functionality, such as the Num module.
2.7. CUSTOM TOP-LEVELS
2.7 Custom top-levels
57
The interactivity of the OCaml top-level can be very useful when developing programs. How-
ever, the default top-level is bare in terms of the modules it provides. For example, attempting
to execute the previous program in the default top-level fails because the Num module, and all
that it depends upon, is unavailable:
$ ocaml main.ml
Reference to undefined global 'Num'
In order to execute this program in a top-level, we can first create a customised top-level
which includes the necessary functionality. Top-levels are compiled in a way similar to byte-
code-compiled programs but using the ocamlmktop compiler instead of ocamlc. Thus, we can
build an appropriate top-level, which we choose to call "num.top", using:
$ ocamlmktop nums.cma -0 num.top
Executing the resulting "num.top" file gives us a top-level which includes the functionality of
the Num module:
$ ./num.top
Objective Caml version 3.08.0
#
The program contained in the ''main.ml'' file may then be executed in the running num. top
top-level using the #use directive:
# #use "main.ml";;
100! = 933262154439441526816992388562667004907159682643816214685929638
9521759999322991560894146397615651828625369792082722375825118521091686
4000000000000000000000000
- : unit = 0
We shall discuss libraries relevant to scientific computing in more detail later, in chapter 8.
In particular, we shall use the -cclib option for the compilers to include the functionality of
libraries written in other languages. Meanwhile, let us examine some of the sophisticated data
structures and algorithms provided with OCaml, all of which are encapsulated in modules.
58
Chapter 3
Data Structures
Scientific applications are the most computationally intensive programs in existence. This
places enormous emphasis on the efficiency of such programs. However, much time can be
wasted by optimising fundamentally inefficient algorithms and concentrating on low-level op-
timisations when much more productive higher-level optimisations remain to be exploited.
Too commonly, a given problem is shoe-horned into using arrays because more sophisticated
data structures are prohibitively complicated to implement in many common languages!. Ex-
amples of this problem, endemic in scientific computing, are rife. For example, Finite element
materials simulations, numerical differential equation solvers, numerical integration, implicit
surface tesselation and simulations of particle or fluid dynamics based around uniformly sub-
divided arrays when they should be based around adaptively subdivided trees.
Occasionally, the poor performance of these inappropriately-optimised programs even drives
the use of alternative (often approximate) techniques. Examples of this include the use of
padding to round vector sizes up to integer-powers of two when computing numerical Fourier
transforms (Fourier series). In order to combat this folklore-based approach to optimisation,
we shall introduce a more formal approach to quantifying the efficiency of computations. This
approach is well known in computer science as complexity theory.
The single most important choices determining the efficiency of a program are the selection of
algorithms and of data structures. Before delving into the broad spectrum of data structures
accessible from OOamI, we shall study the notion of algorithmic complexity. This concept
quantifies algorithm efficiency and, therefore, is essential for the objective selection of algo-
rithms and data structures based upon their performance. Studying algorithmic complexity
is the first step towards drastically improving program performance.
3.1 Algorithmic Complexity
In order to compare the efficiencies of algorithms meaningfully, the time requirements of an
algorithm must first be quantified. Although it is theoretically possible to predict the exact
time taken to perform an arbitrarily complicated problem given details of the computer and
of the input data, such an approach quickly becomes intractable.
1Primarily Fortran.
59
60 CHAPTER 3. DATA STRUCTURES
Consequently, exactness is often relinquished in favour of an approximate but still quantitative
measure of the time taken for an algorithm to execute. This approximation, the conventional
notion of algorithmic complexity, is derived as an upper- or lower-bound or average-case
2
of
the amount of computation required, measured in units of some suitably chosen primitive
operations. Furthermore, asymptotic algorithmic efficiency is derived by considering these
forms in the limit of infinite algorithmic complexity.
We shall begin by describing the notion of the primitive operations of an algorithm before
deriving a mathematical description for the asymptotic complexity of an algorithm. Finally,
we shall demonstrate the usefulness of algorithmic complexity in the optimization of a simple
function.
3.1.1 Primitive operations
In order to derive an algorithmic complexity, it is necessary to begin by identifying some
suitable primitive operations. The complexity of an algorithm is then measured as the total
number of these primitive operations it performs. In order to obtain a complexity which
reflects the time required to execute an algorithm, the primitive operations should ideally
terminate after a constant amount of time. However, this restriction cannot be satisfied in
practice (due to effectively-random interference from cache effects etc.), so primitive operations
are typically chosen which terminate in a finite amount of time for any input, as close to a
constant amount of time as possible.
For example, a first version of a function to raise a floating-point value x to a positive, integer
power n may be implemented naIvely as:
# let rec ipow_1 x n = if n = 0 then 1. else x *. ipow_1 x (n - 1); ;
val ipow_1 : float -> int -> float = <fun>
The ipow_1 function executes an algorithm described by this recurrence relation:
{
1 n=O
x
n
=
X X x
n
-
l
otherwise
Consequently, this algorithm performs the floating-point multiply operation exactly n times
in order to obtain its result, i.e. xO = 1, xl = X x 1, x
2
= X X X x 1 and so on. Thus, the
built-in floating-point multiplication function:
val ( *. ) : float -> float -> float
is a logical choice of primitive operation. Moreover, this function multiplies finite-precision
numbers and the algorithms used to perform this operation in practice (which are almost
always implemented as dedicated hardware) always perform a finite number of more primitive
operations at the bit level, regardless of their input. Thus, this choice of primitive operation
will execute in a finite time regardless of its input.
We shall now examine an approximate but practically very useful measure of algorithmic
complexity before exploiting this notion in the optimisation of the i pow_1 function.
2Average-case complexity is particularly useful when statistics are available on the likelihood of different
inputs.
3.1. ALGORITHMIC COMPLEXITY
3.1.2 Complexity
61
The complexity of an algorithm is the number of primitive operations it performs. For example,
the complexity of the ipmcl function is T(n) = n.
As the complexity can be a strongly dependent function of the input, the mathematical deriva-
tion of the complexity quickly becomes intractable for reasonably complicated algorithms.
In practice, this is addressed in two different ways. The first approach is to derive the tight-
est possible bounds of the complexity. If such bounds cannot be obtained then the second
approach is to derive bounds in the asymptotic limit of the complexity.
3.1.2.1 Asymptotic complexity
An easier-to-derive and still useful indicator of the performance of a function is its asymptotic
algorithmic complexity. This gives the asymptotic performance of the function in the limit of
infinite execution time.
Three notations exist for the asymptotic algorithmic complexity of a function f(x):
O(g(x)) =?
C1 < lim f(x)
- x->oo g(x)
O(g(x)) =?
lim f(x) < C
2
x->oo g(x) -
8(g(x))
. f(x)
=?
C1:S hm TI :S C2
x->oo g X
for some constants C
1
, C
2
E R
The 8 form of asymptotic complexity is more restrictive and, therefore, conveys more infor-
mation. In particular, "f(x) is 8(g(x))" implies both "f(x) is O(g(x))" and "f(x) is O(g(x))".
The 0 notation is more commonly encountered as it represents the practically more important
notion of the upper-bound of the complexity.
The formulation of the asymptotic complexity of a function leads to some simple but powerful
manipulations:
f(x) is O(ag(x)), a> 0 =? f(n) is O(g(x)), Le. constant prefactors can be removed.
f(x) is O(xa+x
b
), a> b > 0 =? f(x) is O(x
a
), Le. the polynomial term with the largest
exponent dominates all other polynomial terms.
f(x) is O(x
a
+ bX), a > 0, b > 0 =? f(n) is O(b
X
), i.e. exponential terms dominate any
polynomial terms.
f(x) is O(a
X
+b
X
), a > b > 0 =? f(n) is O(a
X
), Le. the exponential term aX with the
largest mantissa a dominates all other exponential terms.
These rules can be used to simplify an asymptotic complexity.
GJ Computation I T(n) I
0 1 0
1 x 0
2 xxx 1
3
x X x'L
2
4
x 2 ~
2
5 x X (x
2
, ~
3
6 (x X x
2
, ~
3
7 x x (x X x
2
)'L 4
8
((x
2
)2 'L
3
n=O
n=l
n> 1andnodd
n> 1and neven
Table 3.1: Complexity of the ipow_2 function measured as the number of multiply oper-
ations performed.
As the complexity of the ipow_l function is T(n) = n, the asymptotic complexities are clearly
O(n), S1(n) and, therefore, e(n).
The algorithm behind the ipow_l function can be greatly improved upon by reducing the
computation by a constant proportion at a time. In this case, this can be achieved by trying
to halve n repeatedly, rather than decrementing it. The following recurrence relation describes
such an approach:
# let rec ipow_2 x n =
if n = 0 then 1. else if n = 1 then x else
let x2 = ipow_2 x (nj2) in let x2 = x2 *. x2 in
if n mod 2 = 1 then x *. x2 else x2; ;
val ipow_2 : float -> int -> float = <fun>
This variant is clearly more efficient as it avoids the re-computation of previous results, e.g. x
4
is factored into (x
2
)2 to use two floating-point multiplications instead of four. Quantifying
exactly how much more efficient is suddenly a bit of a challenge!
We can begin by expanding the computation manually for some small n (see Table 3.1) as
well as computing and plotting the exact number of integer multiplications performed for both
algorithms as a function of n (shown in figure 3.1).
Lower and upper bounds of the complexity can be derived by considering the minimum and
maximum number of multiplies performed in the body of the ipow_2 function, and the mini-
mum and maximum depths of recursion.
The minimum number of multiplies performed in the body of a single call to the ipow_2
function is 0 for n ::; 1, and 1 for n > 1. The function recursively halves n, giving a depth of
recursion of 1 for n::; 1, and at least Llog2 nJ for n > 1. Thus, a lower bound of the complexity
is 0 for n ::; 1, and log2(n) - 1 for n> 1.
3.1. ALGORITHMIC COMPLEXITY 63
T(n)
100
80
60
40
20
20 40 60 80
n
100
Figure 3.1: Complexities of the ipow_1 and ipow_2 functions in terms of the number T(n)
of multiplies performed.
T(n)
16
14
12
10
....
8
......
6
.......
... ... .
4
2
n
20 40 60 80 100
Figure 3.2: Complexities of the ipow_2 function in terms of the number of multiplies
performed, showing: exact complexity T(n) (green dots) and lower- and upper-bounds
algorithmic complexities log2(n) - 1 T(n) 2(1 +log2 n) for n > 1 (black lines).
The maximum number of multiplies performed in the body of a single call to the ipow_2
function is 2. The depth of recursion is 1 for n 1 and does not exceed flOg2 n1for n > l.
Thus, an upper bound of the complexity is 0 for n 1, and 2(1 + log2 n) for n> 1.
From these lower and upper bounds, the asymptotic complexities of the ipow_2 function are
clearly n(1nn), O(1nn) and, therefore, 9(1nn). The logarithmic complexity of ipow_2 (illus-
trated in figure 3.2) originates from the divide-and-conquer strategy, reducing the computation
required by a constant factor (halving n) at each stage rather than by a constant absolute
amount (decrementing n).
The actual performance of these two versions of the ipow function can be measured (see
Figure 3.3). As expected from the algorithmic complexity, we find that the ipow_2 function
is considerably faster for large n.
Asymptotic algorithmic complexity, as we have just described, should be considered first when
80
'.
60 40 20
.
II ._ ~ Il:la ID
II II II II
.' ..
Il rI' r/'tt
II "rl
.....
" ".
" .
~ II
- ~ ....:"".... - . . . ~ - - - -"-----
n
100
t (ps)
6
5
4
3
2
1
Figure 3.3: Measured performance of the ipow_l and ipow_2 functions which have
asymptotic algorithmic complexities of 8(n) and 8 (In n), respectively.
LITIIITTI ... ~
Figure 3.4: Arrays are the simplest data structure, allowing fast, random access (reading
or writing) to the i
th
element 'IIi E {O ... n - I} where n is the number of elements in the
array. Elements cannot be added or removed without copying the whole array.
trying to choose an efficient algorithm or data structure. On the basis of this, we shall now
examine some of the wide variety of data structures accessible from OCaml in the context of
the algorithmic complexities of operations over them.
3.2 Arrays
Of all the data structures, arrays will be the most familiar to the scientific programmer. Arrays
are containers of fixed size which allow the i
th
element to be extracted in 0(1) time (illustrated
in figure 3.4). This makes them ideally suited in situations which require a container with
fast random access. As the elements of arrays are typically stored contiguously in memory,
they are often the most efficient container for iterating over the elements in order. This is the
principal alluring feature which leads to their (over!) use in numerically intensive programs.
As we have already seen, the OCamllanguage provides a notation for describing arrays:
# let a = [11; 2 I J
let b = [13; 4; 51J
let c = [16; 7; 91 J ; ;
val a : int array = [11; 21 J
val b : int array = [13; 4; 51J
val c : int array = [16; 7; 91J
In OCaml, arrays are mutable, meaning that the elements in an array can be altered in-place.
The element at index i of an array b may be read using the short-hand syntax b. (i):
# b. (1); ;
- : int =4
3.2. ARRAYS
Note that array indices run from {O ... n - 1} for an array containing n elements.
Array elements may be set using the syntax used for mutable record fields, namely:
# c.(2) <- 8;;
- : unit = 0
The contents of the array c have now been altered:
# c;;
- : int array = [16; 7; 81J
65
Any attempt to access an array element which is outside the bounds of the array results in an
exception being raised at run-time:
# c. (3) <- 8;;
Exception: Invalid_argument "index out of bounds".
The mutability of arrays typically leads to the use of an imperative style when arrays are
being used.
The core OCamllibrary provides several functions which act upon arrays in the Array module.
We shall examine some of these functions before looking at the more exotic array functions
offered by OCam!'
The append function concatenates two arrays:
# Array. append a b;;
- : int array = [11; 2; 3; 4; 51J
The append function has complexity 8(n) where n is the length of the resulting array.
The concat function concatenates a list of arrays:
# let e = Array.concat [a; b; cJ;;
val e : int array = [11; 2; 3; 4; 5; 6; 7; 81J
The concat function has complexity 8(n + m) where n is the length of the resulting array
and m is the length ofthe supplied list.
A new variable created from an existing array refers to the existing array. Thus the complexity
of creating a new variable which refers to an existing array is 8(1), Le. independent of the
number of elements in the array. However, all alterations to the array are visible from any
variables which refer to the array. For example, the following creates a variable called d which
refers to the same array as the variable called c:
# let d = c;;
val d : int array = [16; 7; 81 J
The effect of altering the array via either c or d can be seen from both c and d:

Figure 3.5: The higher-order Array. init function creates an array ai
{O ... n - I} using the given function f.
# d.(O) <-17;;
- : unit = 0
# (c, d);;
- : int array * int array = ([ 117; 7; 81 J, [117; 7; 81 J )
# c. (0) <- 6;
(c, d);;
- : int array * int array = ([ 16; 7; 81 J, [16; 7; 81 J)
f(i) for i E
The copy function returns a new array which contains the same elements as the given array.
For example, the following creates a variable d (superseding the previous d) which is a copy
of c:
# let d = Array. copy c;;
val d : int array = [16; 7; 81J
Altering the elements of the copied array d does not alter the elements of the original array c:
# d. (0) <- 17; (c, d);;
- : int array * int array = ([ 16; 7; 81 J, [117; 7; 81 J )
The sub function returns a new array which contains a copy of the range of elements specified
by a starting index and a length. For example, the following copies a sub-array of 5 elements
starting at index 2 (the third element):
# Array.sub e 2 5;;
- : int arr ay = [I 3; 4; 5; 6; 7 IJ
In addition to these conventional array functions, OCaml offers some more exotic functions.
We shall now examine these functions in more detail.
The higher-order init function creates an array, filling the elements with the result of applying
the given function to the index i E {O ... n -I} of each element (illustrated in figure 3.5). For
example, the following creates the array ai = i
2
for i E {O ... 3}:
# let a = Array.init 4 (fun i -> i*i);;
val a : int array = [10; 1; 4; 91 J
The Array.init function is analogous to the list_init function we defined on page 37.
The higher-order function iter executes a given function on each element in the given array
in turn and returns the value of type uni t. The purpose of the function passed to this higher-
order function must, therefore, lie in any side-effects it incurs. Hence, the iter function is
only of use in the context of imperative, and not functional, programming. For example, the
following prints the elements of the array a:
3.2. ARRAYS

67
Figure 3.6: The higher-order Array .map function creates an array containing the result
of applying the given function f to each element in the given array a.

Figure 3.7: The higher-order Array.fold_left function repeatedly applies the given
function f to the current accumulator and the current array element to produce a new
accumulator to be applied with the next array element.
# Array. iter (fun e -> print_endline (string_of_int e)) a;;
o
1
4
9
unit = 0
The map function applies a given function to each element in the given array, returning an
array containing each result (illustrated in figure 3.6). For example, the following creates the
array bi = at:
# let b = Array.map (fun e -> e * e) a;;
val b : int array = [10; 1; 16; 811 ]
The higher-order fold_left and fold_right functions are more general and more useful than
map. The fold functions accumulate the result of applying their function arguments to each
element in turn. The fold_left function is illustrated in figure 3.7.
In the simplest case, the fold functions can be used to accumulate the sum or product of the
elements of an array by folding the addition or multiplication operators over the elements of
the array respectively, starting with a suitable base case. For example, the sum of the elements
of the array referred to by the variable b is 3 +4 +5 = 12:
# Array. fold_left ( + ) 0 b;;
- : int = 12
We have already encountered this functionality in the context of the IntRange module devel-
oped in section 2.3.3. However, arrays may contain arbitrary data of arbitrary types.
For example, an array could be converted to a list by prepending elements using the cons
operator:
# let to_list a = Array. foldJight (fun e 1 -> e ., 1) a [J ; ;
val to_list: 'a array -> 'a list = <fun>
# to_list [10; 1; 4; 9IJ;;
- : int list = [0; 1; 4; 9J
This to_list function uses fold_right to cumulatively prepend to the list 1 each element e
of the array a in reverse order, starting with the base-case of an empty list [J. The result is
a list containing the elements of the array in the correct order. The to_list function is, in
fact, already in the Array module:
# Array.to_list [10; 1; 4; 9IJ;;
- : int list = [0; 1; 4; 9J
Although slightly more complicated than iter and map, the fold_left and fold_right func-
tions are very useful because they can produce results of any type, including accumulated
primitive types as well as different data structures.
When pattern matching, the contents of arrays can be used in patterns. For example, the
vector cross product a x b could be written:
# let vec_cross a b = match (a, b) with
([Ixl; yl; zllJ, [lx2; y2; z21J) ->
[ly1*.z2 -. z1*.y2; z1*.x2 -. x1*.z2; x1*.y2 -. y1*.x2 IJ
I _ -> raise (Invalid_argument "cross");;
val vec_cross : float array -> float array -> float array = <fun>
Thus, patterns over arrays can be used to test the number and value of all elements, forming
a useful complement to the map and fold algorithms.
3.3 Lists
Arguably the simplest and most commonly used data structure, lists allow two fundamental
operations (see figure 3.8). The first is decapitation of a list into two parts, the head (the first
element of the list) and the tail (a list containing the remaining elements). The second is the
reverse operation of prepending an element onto a list to create a new list. The complexities of
both operations are 8(1), Le. the time taken to perform these operations is independent of the
number of elements in the list. Thus, lists are ideally suited for the creation of a data structure
containing an unknown number of elements (such as the loading of an arbitrarily-long sequence
of numbers) .
As we have already seen, OCaml implements lists in the language itself. In particular, the
cons operator :: can be used both to prepend an element and to represent decapitated lists
3.3. LISTS 69
Tail ------ Head
Figure 3.8: Lists are the simplest, arbitrarily-extensible data structure. Decapitation
splits a list li i E {O ... n - 1} into the head element h and the tail list ti i E {O ... n - 2}.
in patterns. Unlike arrays, the implementation of lists is functional, so operations on lists
produce new lists.
The List module contains the append, iter, map, fold_left and fold_right functions,
equivalent to those for arrays, an append function and a flatten function, which provides
equivalent functionality to that of Array. concat (Le. to concatenate a list of lists into a single
list). In particular, the append function has the pseudonym @:
# [1; 2J [3; 4J;;
- : int list = [1; 2; 3; 4J
The List module also contains several functions for sorting and searching. The contents of
a list may be sorted using the higher-order Li st . sort function, into an order specified by a
given total order function (the compare function in the Pervasi ves module, in this case):
# List. sort compare [1; 5; 3; 4; 7; 9J ; ;
- : int list = [1; 3; 4; 5; 7; 9J
An element may be tested for membership in a list using the List .mem function:
# List.mem 4 [1; 3; 4; 5; 7; 9J;;
- : bool = true
# List.mem 6 [1; 3; 4; 5; 7; 9J;;
- : bool = false
Similarly, the first element matching a given predicate function may be extracted using the
higher-order function Li st . find:
# List.find (fun i -> (i-6)*i > 0) [1; 3; 4; 5; 7; 9J;;
- : int = 7
This function raises the Not_found exception if all the elements in the given list fail to match
the given predicate:
# List.find (fun i -> (i-6)*i =0) [1; 3; 4; 5; 7; 9J;;
Exception: Not_found.
The contents of a list of key-value pairs may be searched using the List. assoc function to find
the value corresponding to the first matching key. For example, the following list 1 contains
(i, i
2
) key-value pairs:
# let 1 = List.map (fun i -> (i, i*i)) [1; 2; 3; 4; 5J;;
vall: (int * int) list = [(1,1); (2,4); (3,9); (4, 16); (5, 25)J
Searching 1 for the key i = 4 using the List. assoc function finds the corresponding i
2
= 16
value:
# List.assoc 4 1;;
- : int = 16
As we shall see in sections 3.5 and 3.6, equivalent functionality is provided with considerably
better asymptotic performance by the hash table and map data structures.
The ability to grow lists makes them ideal for filtering operations based upon arbitrary predi-
cate functions. The List. parti tion function splits a given list into two lists containing those
elements which match the predicate and those which do not. The following example uses the
predicate Xi 3:
# List.partition (fun x -> x <= 3) [1; 2; 3; 4; 5J;;
- : int list * int list = ([1; 2; 3J, [4; 5J)
Similarly, the List. fil ter function returns a list containing only those elements which
matched the predicate, Le. the first list that List. parti tion would have returned:
# List.filter (fun x -> x <= 3) [1; 2; 3; 4; 5J;;
- : int list = [1; 2; 3J
The partition and filter functions are ideally suited to arbitrarily extensible data structures
such as lists because the length of the output(s) cannot be precalculated.
In addition to the conventional higher-order iter, map, fold, sorting and searching functions,
the List module contains several functions which act upon pairs of lists. These functions all
assume the lists to be of equal length. If they are found to be of different lengths then an
Invalid_argument exception is raised. We shall now elucidate these functions using examples
from vector algebra.
The higher-order function map2 applies a given function to each pair of elements from two
equal-length lists, producing a single list containing the results. The type of the map2 function
is:
val map2 : ('a -> 'b -> 'c) -> 'a list -> 'b list -> 'c list
The map2 function can be used to write a function to convert a pair of lists into a list of pairs:
# let list_combine a b = List.map2 (fun a b -> (a, b)) a b;;
val list_combine : 'a list -> 'b list -> ('a, 'b) list = <fun>
3.3. LISTS 71
Applying the list_combine function to a pair of lists of equal lengths combines them into a
list of pairs:
# list_combine [1; 2; 3J [2; 3; 4J;;
-: (int * int) list = [(1,2); (2,3); (3, 4)J
Applying the list_combine function to a pair oflists of unequal lengths causes an exception
to be raised by the map2 function:
# list_combine [1; 2; 3J [2; 3; 4; 5J;;
Exception: Invalid_argument "List.map2".
In fact, the functionality of this list_combine function is already provided by the combine
function in the Li st module.
Vector addition can be written in terms of the map2 function:
# let veeadd = List.map2 (+. );;
val veeadd : float list -> float list -> float list = <fun>
When given a pair of lists a and b of floating-point numbers, this function creates a list
containing the sum of each corresponding pair of elements from the two given lists, i.e. a +b:
# vec3dd [1.; 2.; 3.J [2.; 3.; 4.J;;
-: float list = [3.; 5.; 7.J
The higher-order fold_left2 and fold_right2 functions in the List module are similar to
the fold_left and fold_right functions, except than they act upon two lists simultaneously
instead of one. The types of these functions are:
val fold_left2: ('a -> 'b -> 'c -> 'a) -> 'a -> 'b list -> 'c list -> 'a
val fold_right2: (' a -> 'b -> 'c -> 'c) -> 'a list -> 'b list -> 'c -> 'c
Thus, the fold_left2 and fold_right2 functions can be used to implement many algorithms
which consider each pair of elements in a pair of lists in turn. For example, the vector dot
product could be written succinctly using fold_left2 by accumulating the products of element
pairs from the two lists:
# let vecdot = List.fold_left2 (fun dab -> d +. a *. b) 0.;;
val vec_dot : float list -> float list -> float = <fun>
When given two lists, a and b, of floating-point numbers, this function accumulates the prod-
ucts ai x b
i
of each pair of elements from the two given lists, Le. the vector dot product
ab:
# vec_dot [1.; 2.; 3.J [2.; 3.; 4.J;;
- : flo at = 20.
The ability to write such functions using maps and folds is clearly an advantage in the context
of scientific computing. Moreover, this style of programming can be seamlessly converted to
using much more exotic data structures, as we shall see later in this chapter. In some cases,
algorithms over lists cannot be expressed easily in terms of maps and folds. In such cases,
pattern matching can be used instead.
Patterns over lists can not only reflect the number and value of all elements, as they can in
arrays, but can also be used to test initial elements in the list using the cons operator :: to
decapitate the list. In particular, pattern matching can be used to examine sequences of list
elements. For example, the following function "downsamples" a signal, represented as a list of
floating-point numbers, by averaging pairs of elements:
# let rec downsample = function
[J -> [J
I hi :: h2 :: t -> (hi +. h2) /. 2. :: downsample t
I [_J -> invalid_arg "downsample";;
val downsample : float list -> float list = <fun>
This is a simple, recursive function which uses a pattern match containing three patterns. The
first pattern downsamples the empty list to the empty list. This pattern acts as the base-case
for the recursive calls of the function (equivalent to the base-case of a recurrence relation).
The second pattern matches the first two elements in the list (hi and h2) and the remainder
of the list (t). Matching this pattern results in prepending the average of hi and h2 onto
the list resulting from downsampling the remaining list t. The third pattern matches a list
containing any single element, raising an exception if this erroneous input is encountered:
# downsample [5. J ; ;
Exception: Invalid_argument "downsample".
As these three patterns are completely distinct (any input list necessarily matches one and
only one pattern) they could, equivalently, have been presented in any order in the pattern
match.
The downsample function can be used to downsample an eight-element list into a four-element
list by averaging pairs of elements:
# downsample [0.; 1.; 0.; -1.; 0.; 1.; 0.; -1.J;;
- : float list = [0.5; -0.5; 0.5; -0.5J
The ability to perform pattern matches over lists is extremely useful, resulting in a very concise
syntax for many operations which act upon lists.
Note that, in the context of lists, the iter and map functions can be expressed succinctly in
terms of fold_left:
let iter f 1 = List. fold_left (fun accu e -> f e) 0 1
let map f 1 = List. fold_left (fun accu e -> f e :: accu) [J 1
and that all of these functions can be expressed, albeit more verbosely, using pattern matching.
The iter function simply applies the given function f to the head h of the list and then recurses
to iterate over the element in the tail t:
3.4. SETS
let rec iter f 1 = match 1 with
h: : t - > f h; iter f t
I [J -> 0
73
The map function applies the given function f to the head h of the list, prepending the result
f h onto the result map f t of recursively mapping over the elements in the tail t:
let rec map f 1 = match 1 with
h: :t -> f h :: map f t
I [J -> 0
The f old_left function applies the given function f to the current accumulator accu and
the head h of the list, passing the result as the accumulator for folding over the remaining
elements in the tail t of the list:
let rec fold_left f accu 1 = match 1 with
h: :t -> fold_left f (f accu h) t
I [J -> accu
The fold_right function applies the given function f to the head h of the list and the result
of recursively folding over the remaining elements in the tail t of the list:
let rec fold_right f 1 accu = match 1 with
h: :t -> f h (fold_right f t accu)
I [J -> accu
Thus, the map and fold functions can be thought of as higher-order functions which have
been factored out of many algorithms. In the context of scientific programming, factoring
out higher-order functions can greatly increase clarity, often providing new insights into the
algorithms themselves. In chapter 9, we shall pursue this, developing several functions which
supplement those in the core library.
As the algorithms provided over arrays refer to those using lists, the two functions for con-
verting between lists and arrays are both in the Array module. The Array. oLlist function
creates an array from the given list and the Array. to_list function creates a list from the
given array.
Having examined the two simplest containers provided by the OCaml language itself, we shall
now examine some more sophisticated containers which are provided in the core library.
3.4 Sets
In the context of data structures, the term "set" typically means a sorted, unique, associative
container. Sets are "sorted" containers because the elements in a set are stored in order
according to a given comparison function. Sets are "unique" containers because they do not
duplicate elements (adding an existing element to a set results in the same set). Sets are
"associative" containers because elements determine how they are stored (using the specified
comparison function).
The OCaml core library provides sets which are implemented as balanced binary trees
3
. This
allows a single element to be added or removed from a set containing n elements in O(ln n)
time. Moreover, the OCaml implementation also provides functions union, inter and diff
for performing the set-theoretic operations union, intersection and difference, respectively.
In order to implement the set-theoretic operations between sets efficiently, the sets used must
be based upon the same comparison function. The set implementation in the OCaml core
library enforces this requirement using a construct called a functor. Whereas functions map
values to values, functors map modules to modules.
The Set. Make functor transforms a simple module, which must implement the element type
t and a total-ordering function compare for comparing pairs of elements, into a complicated
module which implements a set of elements of type t using the comparison function compare.
For example, elements in a set of integers may be representing by the following module which
we choose to call Key:
# module Key =
struct
type t = int
let compare i j = if i < j then -1 else if i = j then 0 else 1
end; ;
module Key : sig type t = int val compare : int -> int -> int end
The type t is used to specify the type of an element in the set. The compare function provides
a total-ordering over the elements of the set. This function returns an integer value which
must be less than zero, zero or greater than zero when the given pair of elements compare
to be less than, equal or greater than, respectively. In this case, we have chosen to specify a
comparison function specific to values of type int (the type int -> int -> int of the compare
function is inferred from the use of the - operator and number 0). In general, the slower
but simpler and polymorphic comparison function compare, implemented in the Pervasives
module, can be used to compare pairs of values of various types. The polymorphic comparison
function compare is equivalent to:
# let compare i j = if i < j then -1 else if i = j then 0 else 1;;
val compare: 'a -> 'a -> int = <fun>
and could have been used in the Key module by specifying:
let compare = compare
A module IntSet, representing a set of integers, may then be created by applying the Set . Make
functor to our Key module in a single line of code, producing a module implementing a sub-
stantial number of functions:
3Balanced binary trees will be discussed in more detail later in this chapter.
3.4. SETS
# module IntSet =Set .Make(Key);;
module IntSet :
sig
type elt = Key. t
type t = Set. Make (Key) . t
val empty: t
val is_empty : t -> bool
val mem : elt -> t -> bool
val add : elt -> t -> t
val singleton : elt -> t
val remove : elt -> t -> t
val union : t -> t -> t
val inter : t -> t -> t
val diff : t -> t -> t
val compare : t -> t -> int
val equal : t -> t -> bool
val subset : t -> t -> bool
val iter: (elt -> unit) -> t -> unit
val fold: (elt -> 'a -> 'a) -> t -> 'a -> 'a
val for_all: (elt -> bool) -> t -> bool
val exists : (elt -> bool) -> t -> bool
val filter : (elt -> bool) -> t -> t
val partition: (elt -> bool) -> t -> t * t
val cardinal : t -> int
val elements : t -> elt list
val min_elt : t -> elt
val max_elt : t -> elt
val choose : t -> elt
val split: elt -> t -> t * bool * t
end
75
The IntSet module can be used to manipulate sets of integers. We shall only make use of
some of the functions provided. Naturally, all of the functions are documented in the OCaml
manual [2] and in the Set. Make functor of the core library itself, which can be read using
ocamlbrowser.
The type IntSet. t represents a set of elements of the type IntSet. elt (which is the same as
the specified type Key. t, Le. int).
The set of integers containing no elements can be obtained as:
# IntSet. empty; ;
- : IntSet. t = <abstr>
In order to demonstrate the use of sets, we shall define some helper functions to add a list of
integers to a given set and to convert between lists and sets of integers. A function to add a
list 1 of elements to an existing set s is most easily obtained by folding the add function of
the IntSet module over the list:
# let add_list 1 s = List. fold_right IntSet. add 1 s;;
val add_list: IntSet. elt list -> IntSet. t -> IntSet. t = <fun>
A function to create a set from a given list 1 can then be written in terms of the add_list
function by adding the list of elements to the empty set:
# let oLlist 1 = add_list 1 IntSet. empty; ;
val of_list: IntSet. elt list -> IntSet. t = <fun>
The elements of a set may be extracted using the elements function.
A set containing a single element is called a singleton set and can be created using the
IntSet . singleton function:
# let s = IntSet. singleton 3;;
val s : IntSet. t = <abstr>
# IntSet.elements s;;
- : IntSet. elt list = [3J
As sets are implemented in a functional style, adding an element to a set returns the resulting
set:
# let s = IntSet. add 5 s;;
- : IntSet. elt list = [3; 5J
By adding some more integers to our set we can check that duplicates are removed:
# let s = add_list [10; 1; 9; 2; 8; 4; 7; 4; 6; 7; 7J s;;
The number of elements in a set, known as the cardinality of the set, is given by:
# IntSet.cardinal s;;
- : int = 10
This indicates that the duplicates have been removed, leaving only the integers {i ... lO}. In
order to check this we can convert the set back into a list:
- : IntSet.elt list = [1; 2; 3; 4; 5; 6; 7; 8; 9; 10J
Note that the IntSet. fold function provided the elements in the set in the order prescribed
by our Key. compare function.
We can also demonstrate the set-theoretic union, intersection and difference operations. For
example:
{i, 3, 5} U{3, 5, 7} = {i, 3, 5, 7}
# IntSet. elements (IntSet. union (oLlist [1; 3; 5J) (of _list [3; 5; 7J));;
- : int list = [1; 3; 5; 7J
{i, 3, 5} n {3, 5, 7} = {3,5}
3.4. SETS
# IntSet.elements (IntSet.inter (of_list [1; 3; 5J) (of_list [3; 5; 7J));;
- : int list = [3; 5J
{1,3,5} \ {3,5, 7} = {1}
# IntSet.elements (IntSet.diff (of_list [1; 3; 5J) (of_list [3; 5; 7J));;
- : int list = [1J
The subset function tests if A c B. For example, {4,5,6} c {1 ... lD}:
# IntSet.subset (of_list [4; 5; 6J) s;;
- : bool = true
77
The spli t function divides a set into those elements less than a given element and those
elements greater than the given element, as well as providing a boolean indicating whether
the given element was present. For example, splitting s at 5 results in the sets {1 ... 4} and
{6 ... 10} and the boolean true because 5 E s:
# let (l, e, u) = IntSet. split 5 s in
(IntSet.elements 1, e, IntSet.elements u);;
([1; 2; 3; 4J, true, [6; 7; 8; 9; 10J)
Applying the built-in polymorphic comparison functions , <=, =, >=, >, <> and compare) to
many data structures of abstract types, such as sets, will produce unpredictable behaviour.
The reason is simply that these functions perform structural comparison of the data structures
and, in many cases, different structures will be used internally to convey the same semantic
meaning. For example, the internal structure of a value of type IntSet. t depends upon the
order in which the elements were inserted and, consequently, sets containing the same contents
may have different internal structure and, therefore, may compare as unequal when using the
polymorphic comparison functions despite being semantically equivalent:
# (oClist [1; 2; 3; 4; 5J) = (of_list [5; 4; 3; 2; 1]);;
- : bool = false
Sets can be compared correctly (semantically) using the compare function provided by the
Set. Make functor. In this case:
# IntSet.compare (of_list [1; 2; 3; 4; 5J) (of_list [5; 4; 3; 2; lJ);;
- : int = 0
In the context of scientific computing, set data structures are useful for a variety of reasons.
The set-theoretic operations union, intersection and difference are considerably faster when
performed between set data structures than when performed between unsorted arrays and lists.
Inserting new values such that ordering is preserved is also much faster for a set data structure
than for arrays and lists. Consequently, whenever a task requires the use of a sorted container,
the set data structure will, most likely, be more efficient than array- or list-based alternatives.
In chapter 10, we shall use a set data structure to compute the set of nth-nearest neighbours
in a graph and apply this to atomic-neighbour computations on a simulated molecule. In the
mean time, we have more data structures to discover.
3.5 Hash tables
A hash table is an associative container mapping keys to corresponding values. We shall refer
to the key-value pairs stored in a hash table as the elements of the hash table.
In terms of utility, hash tables are an efficient way to implement a mapping from one kind
of value to another. For example, to map strings onto functions. In order to provide their
functionality, hash tables provide add and remove functions to insert and delete mappings,
respectively, and a find function which looks-up and returns the value corresponding to a
given key.
Internally, hash tables compute an integer value, known as a hash, from each given key. This
hash of a key is used as an index into an array in order to find the value corresponding to the
key. The hash is computed from the key such that two identical keys share the same hash and
two different keys are likely (but not guaranteed) to produce different hashes. Moreover, hash
computation is restricted to 8(1) time complexity, typically by terminating if a maximum
number of computations is reached. Assuming that no two keys in a hash table produce the
same hash, finding the value corresponding to a given key takes 8(1) time
4
The OCaml core library contains an imperative implementation of hash tables in the Hashtbl
module. Hash tables may be used in two different ways. The simplest approach, which we shall
examine here, simply uses polymorphic hash tables generated by the Hashtbl. create func-
tion. A more sophisticated approach involves using the Hashtbl.Make functor to generate a
module implementing ahash table which uses customised equality and hashing functions. The
latter approach is required if the built-in polymorphic equality (=) and hash (Hashtbl.hash)
functions are not applicable. These built-in functions are not applicable to any types of data
structures for which different internal structures can be equal, e.g. two different balanced bi-
nary trees inside two equivalent sets should compare to be equal but the built-in equality
function (=) will indicate that they are not equal because they do not have the same structure
(as discussed in section 3.4).
For example, a hash table mapping strings to floating-point numbers may be constructed by
first creating a monomorphic hash table over, as yet, unknown types (denoted' _a and' _b in
OCaml):
# let ill = Hashtbl. create 5;;
val ill: (, _a, '_b) Hashtbl. t = <abstr>
The integer passed to the Hashtbl. create function is intended to indicate the number of keys
likely to be in the hash table.
Mappings may be added to the hash table musing the Hashtbl. add function:
# Hashtbl.add ill "Hydrogen" 1.0079;
Hashtbl. add ill "Carbon" 12.011;
Hashtbl.add ill "Nitrogen" 14.00674;
Hashtbl.add ill "Oxygen" 15.9994;
Hashtbl. add ill "Sulphur" 32.06;;
- : unit = 0
4Computing the hash in 8(1) time and then using it to access an array element, also in 8(1) time.
3.6. MAPS 79
Note that, as an imperative data structure, adding elements alters the hash table in-place
and, therefore, need not return the hash table.
The resulting hash table m, of type (string, float) Hashtbl. t, represents the following
mapping from strings to floating-point values:
Hydrogen --t 1.0079
Carbon --t 12.011
Nitrogen --t 14.00674
Oxygen --t 15.9994
Sulphur --t 32.06
Having been filled at run-time, the hash table may be used to look-up the values corresponding
to given keys. For example, we can find the average atomic weight of carbon:
# Hashtbl. find m "Carbon";;
- : float = 12.011
If necessary, we can also delete mappings from the hash table, such as the mapping for Oxygen:
# Hashtbl.remove m "Oxygen";;
- : unit = 0
The remaining mappings in the hash table are most easily printed using the iter function in
the Hashtbl module:
# let aux spec weight =
print_endline (spec-" -> "-(string_of_float weight in
Hashtbl.iter aux m;;
Carbon -> 12.011
Nitrogen -> 14.00674
Sulphur -> 32.06
Hydrogen -> 1.0079
- : unit = 0
Note that the order in which the mappings are supplied hy Hashtbl.iter (and map and fold)
are effectively random. In fact, the order is related to the hash function. Also, hashing is
another form of structural comparison and, like the polymorphic comparison functions, should
not be applied to many abstract types. For example, a hash table cannot be used with sets
as keys as the hash of a set depends upon its internal structure and, therefore, semantically
equivalent sets are likely to produce different hashes.
Hash tables can clearly be useful in the context of scientific programming. However, a func-
tional alternative to these imperative hash tables can sometimes be desirable. We shall now
examine a functional data structure which uses different techniques to implement the same
functionality of mapping keys to corresponding values.
80
-5
-10
1000 2000 3000
CHAPTER 3. DATA STRUCTURES
n
4000
Map
Hash Table
Figure 3.9: Measured performance (time t in seconds) for inserting key-value pairs into
hash tables and functional maps containing n - 1 elements. Although the hash table
implementation results in better average-case performance, the D(n) time-complexity
incurred when n = 2
P
- 1 tj p > 0 E Z produces much slower worst-case performance by
the hash table.
3.6 Maps
We described the functional implementation of the set data structure provided by OCaml in
section 3.4. The core OCaml library provides a similar data structure known simply as a
map5.
Much like a hash table, the map data structure associates keys with corresponding values.
Consequently, the map data structure also provides add and remove functions to insert and
delete mappings, respectively, and a find function which returns the value corresponding to
a given key.
Unlike hash tables, maps are represented internally by a balanced binary tree, rather than an
array, and maps differentiate between keys using a specified total ordering function, rather
than a hash function.
Due to their design differences, maps have the following advantages over hash tables:
Functional. Programs using maps are easier to reason over as maps cannot be mutated.
Persistent. Old versions of maps may be kept and used. Thanks to the functional
programming style, data is magically reused between versions.
Stable D(1n n) time-complexity for inserting and removing a mapping, compared to
unstable, amortized e(l) time-complexity in the case of hash tables (which may take
up to D(n) for some insertions, as illustrated in figure 3.9).
Customised comparison and hashing functions are required for non-trivial types of key
and comparison functions are often simpler to obtain or define than custom hashing
functions.
5Not to be confused with the higher-order map function provided with many data structures.
3.6. MAPS 81
The map data structure can be iterated over using iter, map, map i and fold functions,
all of which present mappings ordered by their keys according to the total-order function
of the map.
However, maps also have the following disadvantages compared to hash tables:
Logarithmic O(ln n) time-complexity for finding a mapping, compared to 0(1) time-
complexity6 in the case of hash tables (see figure 3.9).
Maps require a total ordering over keys to be specified as a function.
In the same way that the Set module contains the Set. Make functor, so the Map module
contains a Map . Make functor. Also analogously, this functor transforms a module implementing
the key type, and a comparison function giving a total ordering over keys, into a module
implementing a map data structure with keys of this type and polymorphic corresponding
values.
We shall now demonstrate the functionality of the map data structure reusing the example
of mapping strings to floating-point values. In this case, the type of a key is string and,
therefore, we must begin by implementing a Key module providing this type and a total
ordering over this type. This may be written as follows, making use of the String. compare
function to provide a total ordering over strings:
# module Key =
struct
type t = string
let compare = String. compare
end; ;
module Key:
sig type t = string val compare : String. t -> String. t -> int end
A mapping with keys of type Key. t, i.e. string, may be created from the Key module using
the Map. Make functor:
# module Weights = Map.Make(Key);;
module Weights :
sig
type key = Key. t
type 'at = 'a Map.Make(Key).t
val empty: 'a t
val is_empty: 'a t -> bool
val add : key -> 'a -> 'a t -> 'a t
val find: key -> 'a t -> 'a
val remove: key -> 'a t -> 'a t
val mem : key -> 'a t -> bool
val iter: (key -> 'a -> unit) -> 'a t -> unit
val map: ('a -> 'b) -> 'a t -> 'b t
val mapi : (key -> 'a -> 'b) -> 'a t -> 'b t
val fold: (key -> 'a -> 'b -> 'b) -> 'a t -> 'b -> 'b
val compare: ('a -> 'a -> int) -> 'a t -> 'a t -> int
val equal: ('a -> 'a -> bool) -> 'a t -> 'a t -> bool
end
6 Assuming no two keys in the hash table share the same hash.
A map data structure containing no mappings is then represented by the value:
# let m = Weights. empty; ;
val m: 'a Weights. t = <abstr>
Note that this value is polymorphic, representing a mapping from keys of type string to
corresponding values of any type. This value can then be used to construct mappings from
strings to any concrete type, such as float in this example.
Mappings may be added to musing the Weights. add function. As a functional data structure,
adding elements to a map returns a map containing both the new and old mappings. Hence
we repeatedly supersede the old data structure mwith a new data structure m:
# let m =Weights. add "Hydrogen" 1.0079 m in
let m = Weights. add "Carbon" 12.011 m in
let m = Weights. add "Nitrogen" 14.00674 min
let m = Weights. add "Oxygen" 15.9994 m in
let m = Weights. add "Sulphur" 32.06 m;
val m : float Weights. t = <abstr>
Note that specifying mappings to floating-point values has caused the type attributed to mto
change from a mapping of type' a Weights. t to a mapping of type float Weights. t, Le. to
values of type float.
In fact, there is some subtlety involved here. As we saw in the previous section, newly created
hash tables are monomorphic, their types ossified upon first use:
# let h = Hashtbl. create 1;;
val h: (, _a, '_b) Hashtbl. t = <abstr>
# Hashtbl.add h 1 2.;;
- : unit = 0
# h;;
- : (int, float) Hashtbl. t = <abstr>
In contrast, an empty map is polymorphic and the same empty map can be used to create
separate lineages with different types:
# let parent = Weights. empty; ;
val parent : 'a Weights. t = <abstr>
# let child1 =Weights. add "A string" 3 parent
and child2 = Weights. add "A string" 3. parent;;
val child1 : int Weights. t = <abstr>
val child2 : float Weights. t = <abstr>
The differences between monomorphic and polymorphic types will be discussed in more detail
in section A.5.
We can use the map mto find the average atomic weight of carbon:
# Weights. find "Carbon" m;;
- : float = 12.011
3.7. SUMMARY 83
I a set I (a, {3) hash table I (a, {3) map I a list
Create n init - - - -
Insert - - add replace add
Find - find find find find
Remove - remove_assoc remove remove remove
Sort sort sort - N/A -
Mapping get nth N/A find find
I Functions I a array I
Table 3.2: Functions implementing common operations over data structures. In the case
of set and map data structures, the functions are implemented in the module created by
the Set. Make or Map. Make functors.
Deleting mappings from the functional map data structure produces a new data structure
containing most of the old data structure:
# Weights. remove "Oxygen" m; ;
- : float Weights. t = <abstr>
The remaining mappings are most easily printed using the Weights. iter function:
# let aux spec weight =
print_endline (speC-" -> ,,- (string_of _float weight)) in
Weights.iter aux m;;
Carbon -> 12.011
Hydrogen -> 1. 0079
Nitrogen -> 14.00674
Oxygen -> 15.9994
Sulphur -> 32.06
- : unit = 0
Note that mstill contains the entry for oxygen as we ignored the result of removing this entry,
thus leaving ill intact.
The ability to evolve the contents of data structures along many different lineages during
the execution of a program can be very useful. This is clearly much easier when the data
structure provides a functional, rather than an imperative, interface. In an imperative style,
such approaches would most likely involve the inefficiency of explicitly duplicating the data
structure (e.g. using the Hashtbl. copy function) at each fork in its evolution. In contrast,
functional data structures provide this functionality naturally and, in particular, will share
data between lineages.
Having examined the data structures provided with OCaml, we shall now summarise the
relative advantages and disadvantages of these data structures using the notion of algorithmic
complexity developed in section 3.1.
3.7 Summary
As we have seen, the complexity of operations over data structures can be instrumental in
choosing the appropriate data structure for a task. Consequently, it is beneficial to compare
a set I (a, (3) hash table I (a, (3) map a list a array
Create n 8(n) 8(n)T O(nlnn)T O(nln n)T O(nln n)T
Insert 8(n)t i
th
8(i)t O(lnn) 8(1)TT
O(1n n)
Find O(n)t O(n) O(1nn)
8(1)TT
O(1n n)
Remove 8(n)T i
tn
8(i) O(1nn) 8(1) O(lnn)
Sort O(nlnn) O(nlnn) 8(1)T N/A 8(1)1
Mapping l[ a : 8(1) N/A a (3: 0(1) a {3 : 8 (In n)
I Complexities I
Table 3.3: Asymptotic algorithmic complexities of operations over different data struc-
tures. The set l[ denotes valid indices i E {O ... n - 1} of an array or list containing n
elements.
t not provided with the core OCaml distribution.
tt amortized complexity.
the asymptotic complexities of common operations over various data structures. Table 3.2
shows the functions provided by the core library to perform common operations. Table 3.3
gives the asymptotic complexities of the algorithms used by these functions, for data structures
containing n elements.
Having examined the various containers built-in to OCaml, we shall now examine a generali-
sation of these containers which is easily handled in OCaml before considering the creation of
new data structures.
3.8 Heterogeneous containers
The array, list, set, hash table and map containers are all homogeneous containers, i.e. for
any such given container, the elements must all be of the same type. Containers of elements
which may be of one of several types can also be useful. These are known as heterogeneous
containers.
Heterogeneous containers can be defined by first creating a variant type which unifies the
types allowed in the container. For example, values of this variant type number may contain
data representing elements from the sets il, lR or C:
# type number = Integer of int I Real of float I Complex of float * float; ;
type number = Integer of int I Real of float I Complex of float * float
A homogeneous container over this unified type may then be used to implement a heteroge-
neous container. The elements of a container of type number list can contain Integer, Real
or Complex values:
# let nums = [Integer 1; Real 2.; Complex (3. , 4.) J ; ;
val nums : number list = [Integer 1; Real 2.; Complex (3., 4.)J
Let us consider a simple function to act upon the effectively heterogeneous container type
number list. A function to convert values of type number to the built-in type Complex. t
may be written:
3.9. TREES
# let complex_of_number = function
Integer i -> { Complex. re = float_of_int i; im=O. }
I Real x - > { Complex. re = x; im = O. }
I Complex (re, im) -> {Complex.re = re; im = im};;
val complex_of _number: number -> Complex. t = <fun>
85
For example, mapping the complex_oLnumber function over the number list called nums
gives a Complex. t list:
# List.map complex_of_number nums;;
-: Complex.t list =
[{Complex.re = 1. ; Complex. im =O.}; {Complex. re = 2. ; Complex. im = O.};
{Complex.re = 3.; Complex.im= 4.}]
The list, set, hash table and map data structures can clearly be useful in a scientific context, in
addition to conventional arrays and the heterogeneous counterparts of these data structures.
However, a major advantage of OCamllies in its ability to create and manipulate new, custom-
made data structures. We shall now examine this aspect of scientific programming in OCaml.
3.9 Trees
In addition to the built-in data structures, the ease with which the OCaml language allows
tuples, records and variant types to be handled makes it an ideal language for creating and
using new data structures. 'frees are the most common such data structure.
A tree is a self-similar data structure used to store data hierarchically. The origin of a tree is,
therefore, itself a tree, known as the root node. As a self-similar, or recursive, data structure,
every node in a tree may contain further trees. A root which contains no further trees marks
the end of a lineage in the tree and is known as a leaf node.
The simplest form of tree is a recursive data structure containing an arbitrarily-long list of
trees. This may be represented in OCaml by the type:
# type tree = Node of tree list;;
type tree = Node of tree list
A balanced binary tree of depth d is represented by an empty node for d = 0 and a node
containing two balanced binary trees, each of depthd - 1, for d> O. This simple recurrence
relation is most easily implemented as a purely functional, recursive function:
# let rec balanced_tree = function
o -> Node []
I n -> Node [balanced_tree (n-i); balanced_tree (n-i)] ;;
val balanced_tree : int -> tree = <fun>
The tree depicted in figure 3.10 may then be constructed using:
Figure 3.10: A perfectly-balanced binary tree of depth x = 3 containing 2
x
+
1
- 1 = 15
nodes, including the root node and 2
X
= 8 leaf nodes.
# let example = balanced_tree 3;;
val example : tree =
Node
[Node [Node [Node []; Node []]; Node [Node []; Node []]];
Node [Node [Node []; Node []]; Node [Node []; Node []]]]
We shall use this example tree to demonstrate more sophisticated computations over trees.
Functions over the type tree are easily written. For example, the following function counts
the number of leaf nodes:
# let rec leaf_count = function
Node [] -> 1
I Node 1-> List.fold_left (fun s t -> S + leaf_count t) 1;;
val leaf_count: tree -> int = <fun>
# leaf_count example;;
- : int = 8
'frees represented by the type tree are of limited utility as they cannot contain additional
data in their nodes. An equivalent tree which allows arbitrary, polymorphic data to be placed
in each node may be represented by the type:
# type 'a ptree = PNode of 'a * 'a ptree list;;
type' a ptree = PNode of 'a * 'a ptree list
As a trivial example, the following function traverses a value of type tree to create an equiv-
alent value of type ptree which contains a zero in each node:
# let rec boring_ptree_of_tree = function
Node 1 -> PNode (0, List.map boring_ptree_oCtree 1);;
val boring_ptree_of_tree: tree -> int ptree = <fun>
For example:
# boring_ptree_of _tree (Node [Node []; Node []]);;
- : int ptree = PNode (0, [PNode (0, [J); PNode (0, [])])
As a slightly more interesting example, the following function converts a value of type tree
to a value of type ptree, storing unique integers in each node of the resulting tree:
3.9. TREES 87
Figure 3.11: The result of inserting an integer counter into each node of the tree depicted
in figure 3.10 using the counted_ptree_oCtree function.
# let counted_ptree_of_tree t =
let rec aux n = function
Node 1 ->
let aux2 t (n, 1) =
let (n2, t) = aux (n+1) t in
(n2, PNode (n, t) :: 1) in
List. fold_right aux2 1 (n, [J) in
let (_, 1) = aux 2 t in
PNode (1, 1);;
val counted_ptree_of_tree : tree -> int ptree = <fun>
This function marks the root node with the integer 1 and uses an auxiliary function aux to
cumulatively convert a list of values of type tree into a list of values of type ptree, storing
the result in a root PNode. The aux function folds an auxiliary function aux2 over each child
of the current node, accumulating the counter and a list of child trees.
Applying this function to our example tree produces a more interesting result (illustrated in
figure 3.11):
# counted_ptree_of_tree example;;
- : int ptree =
PNode (1,
[PNode (9, [PNode (13, [PNode (15, [J); PNode (14, [J)J);
PNode (10, [PNode (12, [J); PNode (11, [J) J) J ) ;
PNode (2, [PNode (6, [PNode (8, [J); PNode (7, [J)J);
PNode (3, [PNode (5, [J); PNode (4, [J)J)J)J)
In practice, storing the maximum depth remaining in each branch of a tree can be useful when
writing functions to handle trees. Values of our generic tree type may be converted into the
ptree type, storing the integer depth in each node, using the following function:
# let rec depth_ptree_of_tree = function
Node 1 ->
let aux t (od, 1) =
let t = depth_ptree_of_tree t and depth_of (PNode (d, _ = d in
(max od (depth_of t), t: : 1) in
let (d, 1) = List.fold_right aux 1 (-1, [J) in
PNode (d+ 1,1);;
val depth_ptree_of_tree : tree -> int ptree = <fun>
This function uses an auxiliary function aux to convert a list of child trees of type tree into a
list of child trees of type ptree whilst accumulating the maximum depth in any child branch.
The result is used to construct a PNode with a maximum depth of that of its children plus
one, and a list of ptree children.
Applying this function to our example tree produces a rather uninteresting, symmetric set of
branch depths:
# let example = depth_ptree_of_tree example;;
val example : int ptree =
PNode (3,
[PNode (2,
[PNode (1, [PNode (0, [J); PNode (0, [J)J);
PNode (1, [PNode (0, [J); PNode (0, [J) J) J) ;
PNode (2,
PNode (1, [PNode (0, [J); PNode (0, [J)J)J)J)
Using a tree of varying depth provides a more interesting result. The following function creates
an unbalanced binary tree, effectively representing a list of increasingly deep balanced binary
trees in the left children:
# let unbalanced_tree n =
let rec aux m =
if m=n then Node [J else
Node [balanced_tree m; aux (m+1) J in
aux 0;;
val unbalanced_tree: int -> tree = <fun>
This can be used to create a wonky tree:
# let wonky = unbalanced_tree 3;;
val wonky : tree =
Node
[Node [J;
Node
[Node [Node [J; Node [J J ;
Node [Node [Node [Node [J; Node [JJ; Node [Node [J; Node [JJJ; Node [JJJJ
Converting this wonky tree into a tree containing the remaining depth in each node we obtain
a more interested result (illustrated in figure 3.12):
# depth_ptree_of_tree wonky;;
- : int ptree =
PNode (5,
[PNode (0, [J);
PNode (4,
PNode (3,
[PNode (2,
PNode (1, [PNode (0, [J); PNode (0, [J)J)J);
PNode (0, [J)J)J)J)
3.9. TREES
o
89
Figure 3.12: An unbalanced binary tree with the remaining depth stored in every node.
In practice, the ability to express a tree in which each node may have an arbitrary number of
branches often turns out to be a hindrance rather than a benefit. Consequently, the number
of branches allowed at each node in a tree is typically restricted to one of two values:
zero branches for leaf nodes and
a constant number of branches for all other nodes.
'frees which allow only zero or two branches, known as binary trees, are particularly prolific
as they form the simplest class of such trees
7
, simplifying the derivations of the complexities
of operations over this kind of tree.
Although binary trees could be represented using the tree or ptree data structures, this would
require the programmer to ensure that all functions acting upon these types produced node
lists containing either zero or two elements. In practice, this is likely to become a considerable
source of human error and, therefore, of programmer frustration. Fortunately, when writing
in OCaml, the type system can be used to enforce the use of a valid number of branches at
each node. Such automated checking not only removes the need for careful inspection by the
programmer but also removes the need to perform run-time checks of the data, improving
performance. A binary tree analogous to our tree data structure can be defined as:
# type bin_tree =Leaf I Node of bin_tree * bin_tree;;
type bin_tree = Leaf I Node of bin_tree * bin_tree
A binary tree analogous to our ptree data structure can be defined as:
# type 'a pbin_tree =
Leaf of 'a I Node of 'a * 'a pbin_tree * 'a pbin_tree; ;
type 'a pbin_tree = Leaf of 'a I Node of 'a * 'a pbin_tree * 'a pbin_tree
Values of type ptree which represent binary trees may be converted to this pbin_tree type
using the following function:
7If only zero or one "branches" are allowed at each node then the tree is actually a list (see section 3.8).
# let rec pbin_tree_of_ptree = function
PNode (d, [J) -> Leaf d
PNode (d, [1; rJ) ->
Node (d, pbin_tree_of_ptree 1, pbin_tree_of_ptree r)
I PNode C, _) -> invalid_arg "pbin_tree_of_ptree";;
val pbin_tree_of_ptree: 'a ptree -> 'a pbin_tree = <fun>
For example, the arbitrary-branching-factor example tree may be converted into a binary tree
using the pbin_tree_oCptree function:
# pbin_tree_of_ptree example;;
- : int pbin_tree =
Node (3, Node (2, Node (1, Leaf 0, Leaf 0), Node (1, Leaf 0, Leaf 0)),
Node (2, Node (1, Leaf 0, Leaf 0), Node (1, Leaf 0, Leaf 0)))
Note that the 'a pbin_tree type, which allows arbitrary data of type 'a to be stored in all
nodes, could be usefully altered to an 'a 'b Pbin_tree type which allows arbitrary data of
type' a to be stored in leaf nodes and arbitrary data of type 'b to be stored in all other nodes:
# type (' a, 'b) pbin_tree =
Leaf of ' a
I Node of 'b * ('a, 'b) pbin_tree * ('a, 'b) pbin_tree;;
type' a 'b pbin_tree =
Leaf of 'a
I Node of 'b * ('a, 'b) pbin_tree * ('a, 'b) pbin_tree
Having examined the fundamentals of tree-based data structures, we shall now examine the
two main categories of trees - balanced trees and unbalanced trees.
3.9.1 Balanced trees
Balanced trees, in particular balanced binary trees, are prolific in computer science literature.
As the simplest form of tree, binary trees simplify the derivation of algorithmic complexities.
These complexities often depend upon the depth of the tree. Consequently, in the quest
for efficient algorithms, data structures designed to maintain approximately uniform depth,
known as balanced trees, are used as the foundation for a wide variety of algorithms.
A balanced tree (as illustrated in figure 3.10) can be defined as a tree for which the difference
between the minimum and maximum depths tends to a finite value for any such tree containing
n nodes in the limit
8
n - 00. Practically, this condition is often further constricted to be that
the difference between the minimum and maximum depths is no more than 2.
Balanced binary trees are prolific because they are very efficient for many useful operations.
The efficiency of these trees stems from their structure. In terms of the number of nodes
traversed, any node in a tree containing either n nodes or n leaf nodes may be reached by
O(ln n) traversals from the root.
8 Although taking limits over integer-valued variables may seem dubious, the required proofs can, in fact,
be made rigorous.
3.9. TREES 91
Figure 3.13: An optimally unbalanced binary tree of depth x = 7 containing 2x + 1 = 15
nodes, including the root node and x + 1 = 8 leaf nodes.
For example, the set and map data structures provided in the OCaml core library both make
use of balanced binary trees internally. This allows them to provide single-element insertion,
removal and searching in D(1n n) time-complexity.
For detailed descriptions of balanced tree implementation, we refer the eager reader to the
relevant computer science literature [5]. However, although computer science exploits balanced
trees for the efficient asymptotic algorithmic complexities they provide for common operations,
which is underpinned by their balanced structure, the natural sciences can also benefit from
the use of unbalanced trees.
3.9.2 Unbalanced trees
Many forms of data commonly used in scientific computing can be usefully represented hier-
archically, in tree data structures. In particular, trees which store exact information in leaf
nodes and approximate information in non-leaf nodes can be of great utility when writing
algorithms designed to compute approximate quantities. In this section, we shall consider
the development of efficient functions required to simulate the dynamics of particle systems,
providing implementations for one-dimensional systems of gravitating particles. We begin by
describing a simple approach based upon the use of a fiat data structure (an array of parti-
cles) before progressing on to a vastly more efficient, hierarchical approach which makes use of
approximate methods and the representation of particle systems as unbalanced binary trees.
Finally, we shall discuss the generalisation of the unbalanced-tree-based approach to higher
dimensionalities and different problems.
In the context of a one-dimensional system of gravitating particles, the mass m > 0 E lR and
position r E lR
1
of a particle may be represented by the record:
# type particle = { ill: float; r: float};;
type particle = {ill : float; r : float; }
A function force2 to compute the gravitational force (up to a constant coefficient):
between two particles, pi and p2, may then be written:
# let force2 p1 p2 =
let d = p2. r -. p1. r in
p1.m *. p2.m /. (d *. abs_float d);;
val force2 : particle -> particle -> float = <fun>
For example, the force on a particle PI of mass mI = 1 at position rl = 0.1 due to a particle
P2 of mass m2 = 3 at position r2 = 0.8 is:
1 x 3 300
F = (0.8 _ 0.1)2 = 49 6.12245
# force2 { m= 1. ; r = 0.1 } { m= 3. ; r = 0.8 };;
- : float = 6.12244897959183554
The particle type and force2 function underpin both the array-based and tree-based ap-
proaches outlined in the remainder of this section.
The simplest approach to computing the force on one particle due to a collection of other
particles is to store the other particles as a particle array and simply loop through the array,
accumulating the result of applying the f orce2 function. This can be achieved using a fold:
# let array_force p ps =
Array. fold_left (fun f p2 -> f +. force2 p p2) O. ps;;
val array_force: particle -> particle array -> float = <fun>
This function can be demonstrated on randomised particles. A particle with random mass
m E [0 ... 1) and position r E [0 ... 1) can be created using the function:
# let random_particle _ = {m = Random. float 1.; r = Random. float 1. };;
val random_particle: 'a -> particle = <fun>
A random array of particles can then be created using the function:
# let random_array n =Array. init n random_particle;;
val random_array: int -> particle array = <fun>
The following function computes the force on a random particle due to a random array of 10
5
particles, returning a 2-tuple of the time taken in seconds and the answer found
9
:
9Timing functions such as Sys. time will be discussed in more detail in section 8.2.
3.9. TREES
# let origin = random_particle 0;;
val origin: particle = {m = 0.140791689359313688;
r = 0.582751366306423546}
# let sys = random_array 100000;;
val sys : particle array = ...
# let t = Sys.time 0 in
let f = array_force origin sys in
let t = Sys.time 0 in
(t, f);;
- : float * float = (0.91, 2178953383.57117701)
93
Computing the force on each particle in a system of particles is the most fundamental task
when simulating particle dynamics. Typically, the whole system is simulated in discrete time
steps, the force computed for each particle being used to calculate the velocity and acceleration
of the particle in the next time step. In a system of n particles, the array_f orce function
applies the f orce2 function exactly n - 1 times. Thus, using the array_f orce function to
compute the force on all n particles would require 8(n
2
) time-complexity. This quadratic
complexity forms the bottleneck of the whole simulation. Hence, the array_force function is
an ideal target for optimisation.
In this case, the array-based function to compute the force on a particle took 0.82 seconds.
Applying this function to each of the 10
5
particles would, therefore, be expected to take almost
a day. Thus, computing the update to the particle dynamics for a single time step is likely
to take at least a day. This is highly undesirable. Moreover, there is no known approach to
computing the force on a particle which both improves upon the 8(n
2
) asymptotic complexity
whilst also retaining the apparent exactness of the simple, array-based computation we have
just outlined.
In computer science, algorithms are optimised by carefully designing alternative algorithms
which possess better complexities whilst also producing exactly the same results. This pedantry
concerning accuracy is almost always appropriate in computer science. However, many sub-
jects, including the natural sciences, can benefit enormously from relinquishing this exactness
in favour of artful approximation. In particular, the computation of approximations known to
be accurate to within a quantified error. As we shall now see, the performance of the array-
based function to compute the force on a particle can be greatly improved upon by using an
algorithm designed to compute an approximation to the exact result.
Promoting the adoption of approximate techniques in scientific computations can be somewhat
of an uphill struggle. Thus, we shall now devote a little space to the arguments involved.
Often, when encouraged to convert to the use of approximate computations, many scientists
respond by wincing and citing an article concerning the weather and the wings of a butterfly.
Their point is, quite validly, that the physical systems most commonly simulated on computer
are chaotic. Indeed, if the evolution of such a system could be calculated by reducing the
physical properties to a solvable problem, there would be no need to simulate the system
computationally.
The chaotic nature of simulated systems raises the concern that converting to the use of ap-
proximate methods is likely to change the simulation result in an unpredictable way. This is a
valid concern. However, virtually all such simulation methods are already inherently approx-
imate. One approximation is made by the choice of simulation procedure, such as the Verlet
method for numerically integrating particle dynamics over time [6]. Another approximation is
made by the use of finite-precision arithmetic. Consequently, the results of simulations should
never be examined at the microscopic level but, rather, via quantities averaged over the whole
system. Thus, the use of approximate techniques does not worsen the situation.
We shall now develop approximation techniques of controllable accuracy for computing the
force of a particle due to a given system of particles, culminating in the implementation of a
force function which provides a substantially more efficient alternative to the array_force
function for reasonable accuracies.
In general, the strength of particle-particle interactions diminishes with distance. Conse-
quently, the force exerted by a collection of distant particles may be well-approximated by
grouping the collection into a pseudo-particle. In the case of gravitational interactions, this
corresponds to grouping the effects of large numbers of smaller masses into small numbers of
larger masses. This grouping effect can be obtained by storing the particle system in a tree
data structure in which branches of the tree represent spatial subdivision, leaf nodes store
exact particle information and other nodes store the information required to make approxi-
mations pertaining to the particles in the region of space represented by their lineage of the
tree.
The spatial partitioning of a system of particles at positions ri E JR. may be represented by an
unbalanced binary tree of the type:
# type partition =
Leaf of particle list
I Node of partition * particle * partition; ;
type partition =
Leaf of particle list
I Node of partition * particle * partition
Leaf nodes in such a tree contain a list of particles at the same or at similar positions. Other
nodes in the tree contain left and right branches (which will be used to represent implicit
subranges [l, ~ l +u)) and ~ l +u), u), respectively) and the mass and position of a pseudo-
particle chosen to approximate the summed effects of all particles farther down the tree.
The mass m
p
and position rp of a pseudo-particle approximating the effects of a list of particles
(mi, ri) is given by the sum of the masses and the weighted average of the positions of particles
in the child branches, respectively:
The following function computes the pseudo-particle approximant ofthe given list of particles:
# let average 1 =
let aux a p = {rn = a.rn +. p.rn; r = a.r +. p.rn *. p.r} in
let pp = List. fold_left aux { rn = O. ; r = O. } 1 in
ifpp.rn=O. thenppelse{rn=pp.rn; r=pp.r/. pp.rn};;
val average: particle list -> particle = <fun>
For example, the pseudo-particle representing two particles {ml = 1, rl = -I} and {m2 =
3,r2 = I} is {m = 1 +3,r = :H-l +3)}:
3.9. TREES
# average [{m = 1. ; r = -1.}; {m =3. ; r = 1.}] ; ;
- : particle = {m = 4. ; r = O. 5}
95
A function to compute the root node shared by two branches of a part it i on tree, including
the pseudo-particle approximant in the root node, may then be written:
# let node_of (left, right) =
let of _child = function Leaf 1 - > average 1 I Node C, p, _) - > p in
let lp, rp = oLchild left, oLchild right in
let m = lp.m +. rp.m in
letr=ifm=O. thenO. else (lp.m*. lp.r+. rp.m*. rp.r) I. min
Node (left, {m = m; r =r }, right);;
val node_of: partition * partition -> partition = <fun>
The nested child_of function extracts the particle representation of a child branch in the
tree, either as the pseudo-particle representation of a list of particles in a leaf node, computed
by the average function, or as the pseudo-particle held in the non-leaf child node.
For example, creating a node from a left leaf containing two particles and an empty right leaf
results in a node containing the left leaf, the pseudo-particle and the right leaf:
# node_of (Leaf [{m = 1. ; r = -1.}; {m = 3. ; r = -1.}] , Leaf []);;
Node (Leaf [{m = 1.; r = -1.}; {m = 3.; r = -1.}] , {m = 4.; r = -1.}, Leaf [])
A particle system consists of the lower and upper bounds of the partition and the partition
itself:
# type system = { lower: float; tree : partition; upper: float};;
type system = { lower: float; tree: partition; upper: float}
We shall assume that a system is initialised with a range which encompasses the positions
of any particles which will be inserted into it. The task of inserting a particle then requires
traversal of the tree to a leaf node representing a range which includes the position of the
particle and, if necessary, the splitting of this leaf node to insert the new particle. This can
be achieved using the following function:
# let insert p sys =
let rec aux np 1 u =
let aux2 np left right 1 m u =
let (left, right) =
if np.r < m then (aux np 1 m left, right)
else (left, aux np m u right) in
node_of (left, right) in
function
Leaf [] - > Leaf [p]
Leaf (pph: :ppt as pp) ->
if pph.r =np.r then Leaf (np: :pph: :ppt) else
let m = 0.5 *. (1 +. u) in
let left, right ::= List. partition (fun p -> p. r < m) pp in
let left, right = Leaf left, Leaf right in
aux2 np left right 1 m u
I Node (left, _, right) -> aux2 np left right 1 (0.5 *. (1 +. u u in
{ sys with tree = aux p sys .lower sys. upper sys. tree}; ;
val insert : particle -> system -> system = <fun>
The nested aux function inserts the given particle into the given partition tree. The aux2
function nested within the aux function propagates insertion into the appropriate branch of
the tree, left or right, by calling aux with the particle (np) to be inserted, the implicit range
(either [l, m) or [m, u)) and the child tree. The aux function inserts a new particle into an
empty leaf by replacing it with a leaf containing a new particle. A leaf already containing
particles is split into a new pair of child partition trees and the aux2 function then used to
insert the new particle into the appropriate child tree. A non-leaf node simply uses the aux2
function, which will replace the appropriate branch of the tree and the pseudo-particle whilst
leaving the other branch intact.
For example, the empty particle system for particles in the range [0, 1) is:
# let empty_sys = { lower = o. ; tree = Leaf []; upper = 1. };;
val sys : system = {lower = O. ; tree = Leaf [] ; upper = i.}
Inserting a single particle results in the root node of the tree being a leaf node containing the
particle:
# let sys = insert {m = 3.; r = 0.1 } empty_sys;;
val sys : system = {lower = 0.; tree = Leaf [{m = 3. ; r = 0.1.}] ; upper = i.}
Inserting a second particle in the other half of the range of the system creates a balanced binary
tree of depth 1, the left-hand branch of the tree containing the particle in the lower-half of
the range and the right-hand branch containing the particle in the upper-half:
# let sys = insert { m=1.; r =0.8 } sys;;
val sys : system =
{lower = 0.;
tree =
Node (Leaf [{m =3.; r =0.1}], {m =4.; r =0.275},
Leaf [{m = 1. ; r = O. 8}] ) ;
upper = 1.}
Inserting a third particle near an existing particle deepens the tree, producing more interesting
structure and pseudo-particle content:
# let sys = insert { m=1.; r = 0.82 } sys; ;
val sys : system =
{lower = 0.;
tree =
Node (Leaf [{m = 3.; r = 0.1}], {m = 5.; r = 0.384},
Node (Leaf [], {m = 2.; r =0.81},
Node
(Node (Leaf [{m =1.; r =0.8}], {m =2.; r =0.81},
Leaf [{m = 1. ; r = o. 82}] ) ,
{m = 2. ; r = o. 81}, Leaf []);
upper =1.}
3.9. TREES
o
{m=3.; r=O.l}
0.5
1
97
{m=1.; r=O.8} {m=1.; r=O.82}
Figure 3.14: An unbalanced binary tree used to partition the space r E [0,1) in order to
approximate the gravitational effect of a cluster of particles in a system.
This tree is illustrated in figure 3.14. Note that the pseudo-particle at the root node of the
tree correctly indicates that the total mass of the system is m = 3 +1 +1 = 5 and the centre
of mass is at r = !(3 x 0.1 +0.8 +0.82) = 0.384.
We shall now consider the force on the particle at r = 0.1, exerted by the other particles at
r = 0.8 and 0.82. The force can be calculated exactly, in arbitrary units, as:
'" mimj 3 x 1 3 x 1
F = L-:-t (rj _ ri)2 = 0.72 + 0.722 11.9095
J
In this case, the force on the particle at ri = 0.1 can also be well-approximated by grouping
the effect of the other two particles into that of a pseudo-particle. From the tree, the pseudo-
particle for the range ::; r
p
< 1 is {m = 2.; r = o. 8n. Thus, the force may be well
approximated by:
rv mp mi _ 3 x 2 rv
F - (r
p
_ ri)2 - 0.712 - 11.9024
where m
p
and r
p
are the mass and centre of mass of the pseudo-particle, respectively.
Given the representation of a particle system as an unbalanced partition tree, the force on
any given "origin" particle due to the particles in the system can be computed very efficiently
by recursively traversing the tree either until a pseudo-particle in a non-leaf node is found to
approximate the effects of the particles in its branch of the tree to sufficient accuracy or until
real particles are found in a leaf node. This approach can be made mOre rigorous by bounding
the error of the approximation.
The simplest upper bound of error is obtained by computing the difference between the min-
imum and maximum forces which can be obtained by a particle distribution satisfying the
constraint that it must produce the pseudo-particle with the appropriate mass and position.
If ri (j. [l, u), the force F is bounded by the force due to masses at either end of the range and
the force due to all the mass at the centre of mass:
3 x 2 (0.81 - 0.5 1 - 0.81 )
1 - 0.5 (1 - 0.1)2 + (0.5 - 0.1)2
18.8426
For example, the bounds of the force in the previous example are given by r = 0.1, m = 3,
l = 0.5, c = 0.81, u = 1 and M = 2:
3x2
5:F5:
(0.81 - 0.1)2
11.9024 5: F 5:
If this error was considered to be too large, the function to approximate the force would recurse
into the smaller-scale spatial range [l, u) = [0.75,1). This tightens the bound on the force to:
11.9024 5: F 5: 12.5707
This recursive process can be repeated either until the bound on the force is tight enough or
until an exact result is obtained.
The following function computes the difference between the upper and lower bounds of the
force on an origin particle p due to a pseudo-particle pp representing a particle distribution in
the spatial range from 1 to u:
# let metric p pp 1 u =
if 1 <= p.r && p.r < u then infinity else
let r = p.r and c = pp.r in
let fmin = p.m *. pp.m I. sqr (p.r -. pp.r) in
let fmax = p.m *. pp.m I. (u -. 1) *.
((c -. 1) I. (sqr (u -. r)) +. (u -. c) I. sqr (1 -. r)) in
fmax -. fmin;;
val metric: particle -> particle -> float -> float -> float = <fun>
Note that the metric function returns an infinite possible error if the particle p lies within
the partition range [l, u), as the partition might contain another particle at the same position
!).
For example, these are the errors resulting from progressively finer approximations:
# metric { m = 3.; r = 0.1 } { m = 2.; r = 0.81 } O. 1. ; ;
- : float = infinity
# metric { m =3.; r =0.1 } { m = 2.; r = 0.81 } 0.5 1.;;
- : float = 6.94019227519524939
# metric { m= 3.; r = 0.1 } { m = 2. ; r = 0.81 } 0.75 1.;;
- : float = 0.668276868664461787
# metric { m = 3.; r = 0.1 } { m = 2. ; r = 0.81 } 0.75 0.875;;
- : float = 0.277220270131675051
A function to compute an approximation to the total force on a particle p due to other particles
in a system sys to within an error delta can be written:
# let force p sys delta =
let rec aux 1 u =function
Leaf 1 -> List. fold_left (fun f p2 -> f +. force2 p p2) O. 1
Node (left, pp, right) ->
if metric p pp 1 u < delta then force2 p pp else
let m = 0.5 *. (1 +. u) in
(aux 1 m left) +. (aux m u right) in
aux sys.lower sys.upper sys.tree;;
val force: particle -> system -> float -> float = <fun>
3.9. TREES
1092 t
4
2
-2
-4
.-6
99
Figure 3.15: Measured performance of the tree-based approach relative to a simple
array-based approach for the evaluation oflong-range forces showing the resulting frac-
tional error 8 = 10 - EllE vs time taken t = ttree/tarray relative to the array-based
method.
The tree representation of this particle system is easily constructed by folding our insert
function over the array which was used to test the array_force function:
# let sys = Array. fold_left (fun s p -> insert p s) empty_sys sys;;
val sys : system = ...
The tree-based force function can compute controllably accurate approximations to the force
on an origin particle due to a collection of other particles, trading accuracy for performance.
The following time function measures the time taken to compute the force to within the given
permissible error e, returning a 2-tuple of the time taken and the answer obtained:
# let time e =
let t = Sys.time 0 in
let ans = force origin sys e in
( (Sys. time 0) -. t, ans);;
val time : float -> float * float = <fun>
Applying this function with increasing permitted error results in a significant improvement in
performance:
# time 1e-9;;
- : float * float = (0.490000000000000213, 2178953383.57115459)
# time 1e-6;;
- : float * float = (0.179999999999999716, 2178953383.57127047)
# time 1e-3;;
- : float * float = (0.0500000000000007105,2178953383.58280039)
From measurements of real-time performance (illustrated in figure 3.15), when requiring a
force computation with an accuracy of one part in one million accuracy (log28 = -20),
the tree-based approach is approximately one thousand times faster (log2 t :::: -10) than the
array-based approach. Considering that, even when using the array-based approach, such
computations are inherently approximate, a fractional error of 10-
6
is a small price to pay for
three orders of magnitude improvement in performance.
The tree-based approach we have just described is a simple form of what is now known as
the Fast Multipole Method (FMM) [7]. Before being applicable to most physical systems, the
approaches we have described must be generalised to higher dimensionalities. This generali-
sation is most easily performed by increasing the branching factor of the tree from 2 to 2
d
for
a d-dimensional problem. A more powerful generalisation involves associating the branches
of the binary tree with subdivision along a particular dimension (either implicitly, typically
by cycling through the dimensions, or explicitly, by storing the index of the subdivided di-
mension in the node of the tree). In particular, this allows anisotropic subdivision of space,
i.e. some dimensions can be subdivided more than others. Anisotropic subdivision is useful
in the context of anisotropic particle distributions, such as those found in many astrophysical
simulations. One such method of anisotropic subdivision is known as the k-D tree.
Chapter 4
Numerical Analysis
Computers can only perform finite computations. Consequently, computers only make use
of finite precision representations of numbers. This has several important implications in the
context of scientific computation.
This chapter provides an overview of the representations and properties of values of types int
and float, used to represent members of the sets Z and JR, respectively. Practical examples
demonstrating the robust use of floating-point arithmetic are then given. Finally, some other
forms of arithmetic are discussed.
4.1 Number representation
In this section, we shall introduce the representation of integer and floating-point numbers
before outlining some properties of these representations.
4.1.1 Integers
Positive integers are represented by several, least-significant binary digits (bits). For example,
the number 1 is represented by the bits ... 00001 and the number 11 is represented by the
bits ... 01011. Negative integers are represented in twos-complement format. For example,
the number -1 is represented by the bits ... 11111 and the number -11 is represented by the
bits ... 10101.
Figure 4.1: Values i of the type int, called machine-precision integers, are an exact
representation of a consecutive subset of the set of integers i E [l .. .u] C Z where land u
are given by min_int and max_int, respectively.
101
102 CHAPTER 4. NUMERICAL ANALYSIS
Figure 4.2: Values of the type float, called double-precision floating-point numbers,
are an approximate representation of real-valued numbers, showing: a) full-precision
(normalised) numbers (black), and b) denormalised numbers (red),
Consequently, the representation of integers n E Z by values of the type int is exact within a
finite range of integers (illustrated in figure 4.1). This range is platform specific and may be
obtained as the min_int and max_int values in the Pervasives module. On a 32-bit platform,
the range of representable integers is substantial:
# min_int, max_int; ;
- : int * int = (-1073741824, 1073741823)
On a 64-bit platform, the range is even larger.
The binary representation of a value of type int may be obtained using the following function:
# let binary_of_int n =
let ree aux i =
let bit = if (n Isr i) land 1 = 0 then "0" else "1" in
bit-(if i=O then "" else aux (i-l)) in
aux (Sys, word_size - 2);;
val binary_of_int : int -> string = <fun>
On a w-bit machine, an int may use w - 1 bits (the remaining bit is used by the garbage
collector). This binary_of_int function contains a nested auxiliary function aux. The aux
function considers each bit i E {O ... w - 2}, extracting the bit using the expression en Isr
i) land 1, and prepending a "0" or "1" onto the remaining computation. The aux function
is initially called with i = w - 2, where w is given by Sys. word_size.
For example, the 31-bit binary representations of 11 and -11 are:
# binary_of_int 11;;
- : string = "0000000000000000000000000001011"
# binary_of_int (-11);;
- : string = "1111111111111111111111111110101"
As we shall see in this chapter, the exactness of the int type can be used in many ways.
4.1.2 Floating-point numbers
In science, many important numbers are written in scientific notation. For example, Avo-
gadro's number is conventionally written NA = 6.02214 x 10
23
This notation essentially
specifies the two most important quantities about such a number:
4.1. NUMBER REPRESENTATION
1. the most significant digits called the mantissa, in this case 6.02214, and
2. the offset of the decimal point called the exponent, in this case 23.
103
Computers use a similar, finite representation called ''floating point" which also contains a man-
tissa and exponent. In OCaml, floating-point numbers are represented by values of the type
float. Roughly speaking, values of type int approximate real numbers between -max_int
and max_int with a constant absolute error of ! whereas values of the type float have an
approximately-constant relative error.
In order to enter floating-point numbers succinctly, the OCaml language uses a standard "e"
notation, equivalent to scientific number notation a x lOb. For example, the number 5.4 x 10
12
may be represented by the value:
# 5.4e12;;
-: float=5.4e+12
As the name ''floating point" implies, the use of a mantissa and an exponent allows the point
to be ''floated'' to any of a wide range of offsets. Naturally, this format uses base-two (binary)
rather than base-ten (decimal) and, hence, numbers are represented by the form a x 2
b
where a
is the mantissa and b is the exponent. Double-precision floating-point values consume 64-bits,
of which 53 bits are attributed to the mantissa (including one bit for the sign of the number)
and the remaining 11 bits to the exponent.
Compared to the type int, the exponent in a value of type float allows a huge range of
real-valued numbers to be approximated. As for the type int, this range is given by values in
the Pervasives module:
# min_float, max_float; ;
- : float * float = (2. 22507385850720138e-308, 1.79769313486231571e+308)
Some useful values not in the set of real numbers JR. are also representable in floating-point
number representation. Numbers out of range are expressed by the values -0. (=I 0),
neg_infinity (-00) and infinity (00). For example, in floating-point arithmetic = -0:
# -1. I. infinity;;
- : float = -0.
Also, nan is a special value, reserved for calculations which do not return a real-valued number
x E JR., e.g. when a supplied parameter falls outside the domain of a function. For example,
In(-1) tJ. lR:
# log (-1.);;
- : float = nan
The domain ofthe function log in the Pervasives module is 0 ::; x, with log o. evaluating
to neg_infinity.
In particular, nan is the only float not equal to itself:
104
# nan <> nan;;
- : bool = true
CHAPTER 4. NUMERICAL ANALYSIS
In the case ofln(-1), the implementation of complex numbers provided in the Complex module
may be used to calculate the complex-valued result:
# Complex.log (Complex.neg Complex.one);;
- : Complex. t = {Complex. re = O. ; Complex. im = -3 . 14159265358979312}
As well as min_float, max_float, infinity, neg_infinity and nan, the Pervasives module
also contains an epsilon_float value:
# epsilon_float;;
- : float = 2.22044604925031308e-16
This is the smallest number that, when added to one, does not give one:
# 1. +. epsilon_float;;
- : float = 1.00000000000000022
Consequently, the epsilon_float value is seen in the context of numerical algorithms as it
encodes the accuracy of the mantissa in the floating point representation. In particular, the
square root of this number often appears as the accuracy of numerical approximants computed
using linear approximations (leaving quadratics terms as the largest remaining source of error).
This still leaves a substantially accurate result, suitable for most computations:
# 1. +. sqrt epsilon_float;;
- : float = 1. 00000001490116119
The approximate nature of floating-point computations is often seen in simple calculations.
For example, the evaluation of ! is only correct to 16 fractional digits:
# 1. /. 3.;;
- : float = 0.333333333333333315
In particular, the binary representation of floating-point numbers renders many decimal frac-
tions approximate. For example, although 1 is represented exactly by the type float, the
decimal fraction 0.9 is not:
# 1. -. 0.9;;
- : float = 0.0999999999999999778
Many of the properties of conventional algebra over real-valued numbers can no longer be
relied upon when floating-point numbers are used as a representation. For more details, see
the relevant literature [8].
4.2. QUIRKS
4.2 Quirks
105
In the interests of efficiency, float arithmetic uses whatever form of floating-point arithmetic
is provided in hardware and is closest to IEEE double-precision floating point.
However, x86 CPUs represent floating-point numbers in registers on the CPU using addi-
tional precision (80 bits instead of the usual 64). This can occationally result in unexpected
behaviour. For example, when compiled to byte-code or executed in the top-level, the following
program prints zero as expected:
# let twothirds = 2. /. 3. in
print_endline (string_of_float (2. /.3. - twothirds));;
O.
- : unit = ()
However, when compiled to x86 native-code using ocamlopt (version 3.08) this program pro-
duces a result close, but not equal, to zero:
3.70255041904e-17
This is a consequence of the value of the variable twothirds being stored as a 64-bit value in
memory and the sub-expression 2. /. 3. and result being evaluated in 80-bit registers.
4.3 Algebra
In real arithmetic, addition is associative:
(a+b)+c=a+(b+c)
In general, this is not true in floating-point arithmetic. For example, in floating-point arith-
metic (0.1 +0.2) +0.3 =J 0.1 + (0.2 +0.3):
# (0.1 +.0.2) +.0.3 = 0.1 +. (0.2 +.0.3);;
- : bool = false
In this case, approximate number representations have resulted in slightly different approxi-
mations to the exact answer:
# (0.1+.0.2) +.0.3,0.1+. (0.2+.0.3);;
- : float * float = (0.600000000000000089,0.6)
Hence, even in seemingly simple calculations, values of type float should not be compared
for exact equality.
More significant errors are obtained when dealing with the addition and subtraction of numbers
with wildly different exponents. For example, in real arithmetic 1.3 + 10
15
- 10
15
= 1.3 but
in the case of float arithmetic:
106
# 1.3 +. le15 -. le15;;
-: float = 1.25
CHAPTER 4. NUMERICAL ANALYSIS
The accuracy of this computation is limited by the accuracy of the largest magnitude numbers
in the sum. In this case, these numbers are 10
15
and _10
15
, resulting in a significant error of
0.05 in this case.
The accuracy of calculations performed using floating-point arithmetic may often be improved
by careful rearrangement of the expressions. Such rearrangements often result in more com-
plicated expressions which are, therefore, slower to execute. For example, this form of the
function f (x):
h(x) = vT+X-1
involves the subtraction of a pair of similar numbers when x :::: O. This may be expressed in
OCaml as:
# let f_l x = sqrt (1. +. x) -. 1.;;
val f_l : float -> float =<fun>
As expected, results of this function are significantly erroneous in the region x :::: O. For
example:
1 + 10
15
10
15
- 1 :::: 4.99999999999999875 ... X 10-
16
# f_l le-15;;
- : float = 4.44089209850062616e-16
The h function may be rearranged into a form which evades the subtraction of similar-sized
numbers around x :::: 0:
x
h(x) = 1 + vT+X
This may be expressed in OCaml as:
#letf_2 x=x/. (1. +. sqrt (1. +. x));;
val f_2 : float -> float = <fun>
Although h(x) = h(x) \:j x E lR, the h form of the function is better behaved when evaluated
using floating-point arithmetic, particularly in the region x :::: O. For example, the value of
the function at x = 10-
15
is much better approximated by h than it was by h:
# C2 le-15;;
- : float =4. 9999999999999994e-16
This is particularly clear on a graph of the two functions around x :::: 0 (illustrated in figure
4.3).
4.4. INTERPOLATION
f(x)
2x10-
15
1x 10-
15
-4x10-
15
-2x10-
15
X10-
15
-2x10-
15
2x10-
15
4x10-
15
x
107
Figure 4.3: Accuracy of two equivalent expressions when evaluated using floating-point
arithmetic: a) h(x) = v1 +x - 1 (red line), and b) h(x) = xj(1 +v1 +x) (green line).
4.4 Interpolation
Due to the accumulation of round-off error, loops should not use loop variables of type float
but, rather, use the type int and, if necessary, convert to the type float within the loop.
Interpolation is an important example of this.
The following higher-order function tries to fold over an interpolation across a semi-inclusive
range [l,u) making n applications of f(x) with:
x E {l, l +d, l +2d, ... , u - d}
where d = (u - l)jn:
# let interp f aeeu 1 u n =
let d = (u -. 1) /. float_of_int n in
let ree aux aeeu x = if x float -> 'a) -> 'a -> float -> float -> int -> 'a = <fun>
However, this function makes inappropriate use of floating-point arithmetic. Specifically,
the step size d = (u - l)jn is precalculated and repeatedly added to the "loop variable" x.
Consequently, this function is prone to cumulative errors in x.
Choosing a range and number of steps for which the floating-point representations happen to
be exact, this function produces the desired behaviour. For example, with l = 0, u = 1 and
n = 4, the function f is invoked exactly n = 4 times, as expected:
# interp (fun 1 x -> x:: 1) [J 0.1. 4;;
- : float list = [0.75; 0.5; 0.25; O.J
However, when the required arithmetic does not happen to be exact, unexpected behaviour
can arise. For example, with l = 0, u = 0.9 and n = 3, the function f is invoked four times
instead of n = 3 times:
# interp (fun 1 x->x:: 1) [J 0.0.93;;
- : float list = [0.899999999999999911; 0.6; 0.3; O.J
In this case, the result of repeatedly adding the approximate representation of d to that of x,
starting with x = l, produced an approximation which was slightly lower than u. Thus, the
function f was erroneously applied an extra time, with an argument approximately equal to
u = 0.9. This produced a list containing four elements instead of the expected three.
As such functionality is commonly required in scientific computing, a robust alternative must
be found.
Fortunately, this problem is easily solved by resorting to an exact form of arithmetic for the
loop variable, typically int arithmetic, and converting to floating-point representation at a
later stage. For example, the interp function may be written robustly by using an integer
loop variable i:
x(i) = l +i(u-l)
n
for i E {O ... n - 1}:
# let interp f accu 1 u n =
let x i = 1 +. (float_oCint i) /. (float_oCint n) *. (u -. 1) in
let rec aux i accu = if i < n then aux (i + 1) (f accu (x i)) else accu in
aux 0 accu;;
val interp: (, a -> float -> 'a) -> 'a -> float -> float -> int -> 'a = <fun>
Thanks to the use of an exact form of arithmetic, this function produces the desired behaviour:
# interp (fun 1 x -> x:: 1) [J O. 0.93;;
- : float list = [0.6; 0.3; O.J
We shall now conclude this chapter with two simple examples of the inaccuracy of floating-
point arithmetic.
4.5 Quadratic solutions
The solutions of the quadratic equation ax
2
+bx +c = 0 are well known to be:
-bJb
2
- 4ac
Xl,2 = 2a
The root Jb
2
- 4ac may be productively factored out of these expressions:
y = y!b
2
-4ac
y-b
Xl= --
2a
y+b
X2= ---
2a
These values are easily calculated using floating-point arithmetic:
4.6. MEAN AND VARIANCE
# let quadratic abc =
lety=sqrt (b*. b-. 4. *. a*. c) in
(-. b +. y) I. (2. *. a). (-. b -. y) I. (2. *. a);;
val quadratic: float -> float -> float -> float * float = <fun>
109
However, when evaluated using floating-point arithmetic, these expressions can be problem-
atic. Specifically, when b
2
4ac, subtracting 4ac from b
2
in the subexpression b
2
- 4ac
will produce an inaccurate result approximately equal to b
2
This results in -b + Vb
2
- 4ac
becoming equivalent to -b +b and, therefore, an answer of zero.
For example, using the conditions a = 1, b = 10
9
and c = 1, the correct solutions are
x ~ -10-
9
and -10
9
but the above implementation of the quadratic function rounds the
smaller magnitude solution to zero:
# quadratic 1. 1e91.;;
- : float * float = (0 . -1e+9)
The accuracy of the smaller-magnitude solution is most easily improved by calculating the
smaller-magnitude solution in terms of the larger-magnitude solution, as:
_ { b 2 0 ~ b c
Xl - b < 0 y-t X2 = -
2a Xla
This formulation, which avoids the subtraction of similar values, may be written:
# let quadratic abc =
let y = sqrt (b *. b -. 4. *. a *. c) in
let xi = (if b < O. then (y -. b) else -. (y +. b I. (2. *. a) in
xi. c I. (xi *. a);;
val quadratic: float -> float -> float -> float * float = <fun>
This form of the quadratic function is numerically robust, producing a more accurate ap-
proximation for the previous example:
# quadratic 1. 1e9 1.;;
- : float * float = (-1000000000 . -1e-09)
Numerical robustness is required in a wide variety of algorithms. We shall now consider the
evaluation of some simple quantities from statistics.
4.6 Mean and varIance
In this section, we shall illustrate the importance of numerical stability using expressions for
the mean and variance of a set of numbers.
The mean value x of a set of n numbers Xk is given by:
1 n
X=- LXk
n
k=l
This expression may be computed by a function written in terms of a fold_left by accumu-
lating the sum and number of the elements:
# let mean x =
let (sum, n) =
List.fold_left (fun (sum, n) e -> (sum +. e, n + 1 (0.,0) x in
sum /. float_of_int n;;
For example, the mean of {1, 3, 5, 7} is 1(1 +3 +5 + 7) = 4:
# mean [1. ; 3. ; 5. ; 7. ] ; ;
- : float =4.
Although the sum of a list of floating point numbers may be computed more accurately
by accumulating numbers at different scales and then summing the result starting from the
smallest scale numbers, the straightforward algorithm used by this mean function is often
satisfactory. The same cannot be said of the straightforward computation of variance.
The variance a
2
of a set Xk of numbers is typically written:
Although variance is a strictly non-negative quantity, the subtraction of the sums in this
expression for the variance may produce small, negative results when computed directly using
floating-point arithmetic, due to the accumulation of rounding errors. This problem can be
avoided by computing via a recurrence relation [8]:
M
k
= M
k
- l + (Xk - M
k
- l ) /k
8k = 8k-l +(Xk - Mk-l) X (Xk - Mk)
Thus, the variance may be computed more accurately using the following function:
# let variance =
let aux (m_k, s_k, k) x_k =
let m_k2 = m....k +. (x_k -. m....k) /. k in
(m_k2, s_k +. (x_k -. m_k) *. (x_k -. m_k2) , k +. 1.) in
function [J -> invalid_arg "variance" I xl:: t ->
let C, s, n2) = List. fold_left aux (xl, O. , 2.) t in
s /. (n2 -. 2.);;
val variance: float list -> float = <fun>
The nested auxiliary function aux accumulates the recurrence relation when applied to fold_left.
The body of the variance function is then a A-function which tries to decapitate the given
list. If the given list is empty than an Invalid_argument exception is raised. Otherwise, the
first element of the given list is used to initialise the recurrence relation (M
l
= Xl, 81 = 0
and k = 2) which is then executed over the remaining elements using fOld_left, to return a
better-behaved approximation to the variance a
2
8
n
/(n - 1).
For example, the variance of {1, 3, 5, 7} is a
2
= and the variance function gives an accurate
result:
4.7. OTHER FORMS OF ARITHMETIC
# variance [1.; 3. ; 5. ; 7. ] ; ;
- : float = 6.66666666666666696
111
The numerical stability of this variance function allows us to write a function to compute
the standard deviation (7 the obvious way, without having to worry about negative roots:
# let standard_deviation x = sqrt (variance x) ;;
val variance : float list -> float = <fun>
Clearly numerically stable algorithms which use floating-point arithmetic can be useful. We
shall now examine some other forms of arithmetic.
4.7 Other forms of arithmetic
As we have seen, the int and float types in OCaml represent numbers to a fixed, finite
precision. Although computers can only perform arithmetic on finite-precision numbers, the
precision allowed and used could be extended indefinitely. Such representations are known as
arbitrary-precision numbers.
In this section, we shall introduce arbitrary-precision rational and floating-point arithmetic as
well as adaptive-precision arithmetic, which solves problems using as little extra precision as
possible.
4.7.1 Rational arithmetic
Rationals, fractions of the form ~ q > 0, P E Z, may be represented exactly using rational
arithmetic. This form of arithmetic uses arbitrary precision integers to represent p and q.
Compared to the type float, rational arithmetic allows arbitrary precision to be used for any
value in R The higher the precision, the slower the calculations.
Arbitrary-precision integer and rational arithmetic are implemented by the Num module (in-
troduced in sections 2.6.1 and 2.7). We shall use the custom top-level built in section 2.7.
As we have seen, a factorial function may be written:
# open Num;;
# let rec factorial n =
if n = 0 then lnt 1 else lnt n *1 factorial (n-1);;
# string_of_num (factorial 33);;
- : string = "8683317618811886495518194401280000000"
Rational arithmetic, represented by the constructor Ratio, may then be used to calculate an
approximation to:
00 1
e=:L-:r
i=O ~
# let rec e n =
if n = 0 then lnt 1 else lnt 1 II factorial n +1 e (n - 1) ;;
val e : int -> Num.num = <fun>
For example:
~ .!. = 7437374403113
~ 2736057139200
2=0
# string_of _num (e 17);;
- : string = "7437374403113/2736057139200"
# float_of_num (e 17);;
2.71828182846
# exp 1.;;
- : float =2.71828182845904509
Rational arithmetic can be useful in many circumstances, including geometric computations.
4.7.2 High-precision floating point
Rational arithmetic is ill-suited to calculations involving numbers with wildly varying magni-
tudes. In such cases, a form of high-precision floating-point arithmetic can be useful. Typically,
this entails a representation with a controllably accurate mantissa and an arbitrary-precision
integer exponent. This functionality is provided by the freely available GNU MP (GMP)
library. OCaml bindings to GMP called mlgmp are freely available.
The OCaml bindings to the GNU MP library encapsulate the interface in a module Gmp.
This module contains submodules F, Q and Z implementing arbitrary-precision floating-point,
rational, and integer arithmetics, respectively.
A top-level which includes the functionality of mlgmp may be created using:
$ ocamlmktop -custom gmp.cma -0 gmp.top
This top-level may then be used to execute code using the GMP library.
For example, consider computing V2 to 50 decimal places accuracy by applying the Newton-
Raphson root-finding method to the function j(x) = x
2
- 2. We can begin by opening the
namespace of the Gmp module:
# open Gmp; ;
As we only wish to use high-precision floating-point arithmetic, we can productively replace
the usual operators with their high-precision equivalents:
# let (+. ), (-. ), (*. ), (I. ) =F.add, F.sub, F.mul, F.div;;
val ( +.) Gmp.F.t -> Gmp.F.t -> Gmp.F.t = <fun>
val ( -.) Gmp.F.t -> Gmp.F.t -> Gmp.F.t = <fun>
val ( *.) Gmp.F.t -> Gmp.F.t -> Gmp.F.t = <fun>
val ( I.) Gmp . F . t - > Gmp. F. t - > Gmp. F . t = <fun>
4.7. OTHER FORMS OF ARITHMETIC 113
The default precision (for new numbers) can be set in terms of the number of bits accuracy,
with 100 digits being log2 10
100
332:
# F. default_pree : = 332;;
- : unit = 0
A generic Newton-Raphson method may be implemented as a higher-order function which
accepts the function f : lR lR, derivative function f' : lR lR, initial estimate of the root x
and number of iterations n:
# let ree newton_raphson f f' x n =
if n = 0 then x else
newton_raphson f f' (x -. ((f x) I. (f' x))) (n - 1) ; ;
val newton_raphson :
(Gmp.F.t -> Gmp.F.t) -> (Gmp.F.t -> Gmp.F.t) -> Gmp.F.t -> int -> Gmp.F.t
=<fun>
In order to find .J2, we use f(x) = x
2
- 2 and, therefore, f'ex) = 2x:
# let f x = x *. x -. F. from_float 2.;;
val f : Gmp.F.t -> Gmp.F.t = <fun>
# let f' x = x *. F. from_float 2. ; ;
val f' : Gmp. F . t -> Gmp. F . t = <fun>
Computing the root of f (x) using only 7 iterations produces a result accurate to 50 digits,
represented by a value of the abstract type F. t:
# let x = newton_raphson f f' (F. from_float 1.) 7;;
val x : Gmp.F.t = <abstr>
This value may be converted into a string in the given base with the given number of digits
using the to_string_base_digits function in the F submodule of Gmpl:
# F. to_string_base_digits -base: 10 -digits: 50 x;;
- : string = "1. 4142135623730950488016887242096980785696718753769EO"
Although such high-precision arithmetic can be used to achieve the necessary precision, robust
algorithms using ordinary floating-point arithmetic, or adaptive-precision are likely to be much
more efficient.
4.7.3 Adaptive precision
Some problems can use ordinary precision floating-point arithmetic most of the time and
resort to higher-precision arithmetic only when necessary. This leads to adaptive-precision
arithmetic, which uses fast, ordinary arithmetic where possible and resorts to suitable higher-
precision arithmetic only when required.
Geometric algorithms are an important class of such problems and a great deal of interesting
work has been done on this subject [9]. This work is likely to be of relevance to scientists
studying the geometrical properties of natural systems.
IThis uses named arguments (of the form - name: value) which are described in section A.2.
114 CHAPTER 4. NUMEillCAL ANALYSIS
Chapter 5
Input and Output
In this chapter, we examine the various ways in which an OCaml program can transfer data,
including printing to the screen and saving and loading information on disc. In particular, we
examine some sophisticated tools for OCaml which greatly simplify the task of designing and
implementing programs to load files in well-defined formats.
5.1 Printing to screen
The ability to print information on the screen is useful as a means of conveying the result of
a program (if the result is simple enough) and providing run-time information on the current
state of the program as well as providing extra information to aid debugging. Naturally, OCaml
provides several functions to print to the screen
l
which are in the Pervasives module.
The print_string function can be considered the most primitive function for printing to the
screen. This function prints the given string with no carriage return. For example, using
the print_string function to print the string "Hello " and then the string
2
"world! \n" is
equivalent to just printing the string "Hello world! \n":
# print_string "Hello ";
print_string "world!\n";;
Hello world
- : unit = 0
Built-in types can be converted to strings using functions such as string_oi_int. To save
typing, abbreviated printing functions also exist for built-in types:
# print_float 1.3;;
1.3- : unit = 0
A carriage return can be printed using the print_newline function:
lIn Unix systems, these functions send character data to standard output (stdout) which is displayed on
the console by default but which can be piped into a file instead, if desired.
2The character '\n' represents a new line.
115
116
# print_newline ();;
- : unit = 0
CHAPTER 5. INPUT AND OUTPUT
The functionality of printing can only be provided by way of a side-effect and, therefore,
printing must be performed by functions. Accidentally omitting the unit argument to the
print_newline function is a common mistake which results in the function being returned
without printing anything:
# print_newline;;
- : unit -> unit = <fun>
A string can be printed with a terminating new line using the print_endline function:
# print_endline "Hello world!";;
Hello world
- : unit = 0
In addition to printing, the print_newline and print_endline functions also force any pre-
vious printing to be completed. This is known as flushing the output stream. Failing to
flush the output stream, particularly when printing debugging information, can be a source
of confusion. A stream may be flushed explicitly using the flush function in the Pervasives
module:
# flush stdout;;
- : unit = 0
Data can be read from or saved into a file on disc in much the same way as printing to the
screen.
5.2 Reading from and writing to disc
The act of saving data in a file is performed by opening the file, writing data to it and then
closing the file. We shall now examine the basic syntax for performing these operations.
Files can be opened for reading or writing using the open_in and open_out functions, respec-
tively. For example, the following opens a file (either replacing an existing file or creating a
new file) called ''test.txt'' for output, referring to the result (known as a file handle) as handle:
# let handle = open_out "test. txt";;
val handle : out_channel =<abstr>
The resulting file handle can then be passed to the output_string function in order to
write into the file. For example, the following outputs the string "Hello world!" to the
file "test. txt":
# output_string handle "Hello world!";;
- : unit = 0
5.3. MARSHALLING 117
Files will be closed automatically when the handle is gaxbage collected but can also be closed
explicitly using the close_in and close_out functions. For example, the following closes the
''test.txt'' file:
# close_out handle;;
- : unit = ()
A file might be closed explicitly to ensure that the file is closed before being reopened, e.g. when
writing to and subsequently reading from the same file.
We can load the contents of this file using equivalent functions:
# let h = open_in "test. txt" in
let s = input_line h in
close_in h;
s; ;
- : string = "Hello world!"
The functions for reading from disc may also be used to read from the command-line by
supplying the input channel stdin.
In addition to these functions, and equivalent functions dealing with integers, bytes and char-
acters, OCaml provides functions for performing generic input and output, a, technique known
as marshalling.
5.3 Marshalling
A very powerful pair of functions capable of saving and loading any value of any type are also
provided by OCaml. These functions axe input_value and output_value. For example, the
following outputs data representing a 3-tuple of type int * float * string list into a file
"test.dat":
# output_value (open_out_bin "test.dat") 0,3., ["piece"]);;
- : unit = 0
The input_value and output_value functions offer some sophistication in that they auto-
matically detect and handle cyclic and shared data structures. However, they are not type
safe. Consequently, when values axe read back, their type must be specified explicitly and
correctly. For example, the 3-tuple written to the file "test.dat" may be read back in using the
input_value function by explicitly declaxing the type of the result:
# let a : int * float * string list = input_value (open_in_bin "test.dat");;
val a : int * float * string list = 0, 3., ["piece"])
Note that the open_out_bin and open_in_bin functions were used to ensure that the file was
opened in binaxy mode
3
Although these functions are extremely useful, they count among
3This distinction is only important when the operating system makes a distinction, e.g. under Microsoft
Windows.
118 CHAPTER 5. INPUT AND OUTPUT
Abstract
Lexing Parsing
Characters ' Tokens ----=:....., Syntax
Tree
I IDENT"x" I
ASSIGN
INTEGER 1
PLUS
INTEGER 2
Figure 5.1: Parsing character sequences often entails lexing into a token stream and
then parsing to convert patterns of tokens into grammatical constructs represented hi-
erarchically by a tree data structure.
the more developmental aspects of the language and, therefore, are likely to change in the
future. In particular, the binary format is likely to change, potentially rendering old data
unreadable. Hence, the input_value and output_value functions are most useful for storing
data temporarily. For example, in order to allow a program to be stopped and started without
losing its intermediate data.
We shall now examine a very sophisticated, and yet safe and easy to use, method for inputting
data from more exotic formats.
5.4 Lexing and Parsing
In addition to the primitive input and output functions offered by OCaml, the language is
bundled with very powerful tools, called ocamllex and ocamlyacc, for deciphering the content
of files according to a formal grammar. This aspect of the language has been particularly
well honed due to the widespread use of this family of languages for writing compilers and
interpreters, Le. programs which understand, and operate on, other programs.
In the context of scientific computing, providing data in a human readable format which
has a formally defined grammar is highly desirable. This allows a data file to convey useful
information both to a human and to a computer audience. In the case of a human, the file
can be read using a text editor. In the case of a computer, the file can be parsed to produce a
data structure which reflects the information in the file (illustrated in figure 5.1). In the latter
case, the program could then go on to perform computations on the data and also, possibly,
save the data in the same format.
5.4. LEXING AND PARSING 119
The ability to use ocamllex and ocamlyacc is, therefore, likely to be of great benefit to
scientists. We shall now examine the use of these tools in more detail.
5.4.1 Lexing
The first step in using these tools to interpret a file is called lexing. This stage involves
reading characters from the input, matching them against patterns and outputting a stream
of tokens. A token is a value which represents some of the input. For example, a sequence of
space-separated digits could be lexed into a stream of integer-valued tokens.
In order to produce tokens, patterns are spotted in the characters by matching them against
regular expressions, also known as regexps. Analogously to pattern matches, regexps given to
ocamllex may contain several kinds of structure. Many of the kinds of constructs available
to a regexp for ocamllex are identical to those used in OCaml pattern matching. Specifically:
'x' matches the specified, single character.
match any single character.
"string" match the given string of characters.
Several other constructs are specific to regexps:
[ 'a' 'c' 'e' ] match any character in the given set. Within a set, ranges of consecutive char-
acters may be specified using the shorthand notation 'a'-'c' for 'a', 'b' or 'c'.
[ 'a' 'c' 'e' ] match any character not in the given set.
regexp * match zero or more repetitions of a string matching regexp.
regexp + match one or more repetitions of a string matching regexp.
regexp ? match regexp or the empty string.
regexPl # regexp2 match any string which matches regexPl and does not match regexp2.
regexpl I regexp2 match any string which either matches regexPl or matches regexp2.
regexPl regexp2 concatenate a string matching regexPl with a string matching regexp2.
eof match the end of file.
Before delving into the use of ocamllex in writing lexers, some example regular expressions
used to match practically important character sequences will be of use. String representations
of integers are very simple, consisting only of one or more decimal digits. This is easily
represented by the regular expression:
['0'-'9'J+
Note that, for this and most other regular expressions to work, the lexer must be greedy. In
this case, given a sequence of digits, the lexer will match them all to this regexp, rather than
matching only the first digit.
String representations of floating-point numbers are somewhat more adventurous. An initial
attempt at a regular expression might match a sequence a digits followed by a full-stop followed
by another sequence of digits:
['0'-'9']+ '.' ['0'-'9']+
This will match "12.3" and "1.23" correctly but will fail to match "123." and ".123". These
can be matched by splitting the regular expression into two variants, one which allows zero or
more digits before the full-stop and one which allows zero or more digits after the full-stop:
['0'-'9']+'.' ['0'-'9']* I ['0'-'9']* ' , ['0'-'9']+
Before continuing, we can usefully factor this regexp using a let construct provided by
ocamllex for dealing with regexps:
let digit = ['0'-'9']
digi t+ ' , digit* I digit*
, ,
digit+
As we have already seen, a conventional notation (e.g. 1,000,000,000,000=1e12) exists for
decimal exponents. The exponent portion of the string ("e12") may be represented by the
regexp:
let exponent = ['e' 'E'] ['+' '-']? digit+
Thus, a regular expression matching positive floating-point numbers represented as strings
may be written:
let digit = ['0'-'9']
let exponent = ['e' 'E'] ['+' '-'] digit+
(digit+ '.' digit* I digit* ' , digit+) exponent?
On the basis of these example regular expressions for integer and floating-point number rep-
resentations, we shall now develop a lexer. A file giving the description of a lexer for ocamllex
has the suffix ".mll". Although lexer definitions depend upon external declarations, we shall
examine the description of the lexer first. Specifically, we shall consider the file "myLexer.mll":
5.4. LEXING AND PARSING
{
open MyParser
let line =ref 1
}
let digit = ['0'-'9']
let exponent = ['e' 'E'] ['+' ,-,] digit+
let floating = (digit+ ' , digit* I digit*
, ,
digit+) exponent?
121
rule token = parse
[, , '\t'] {token lexbuf}
I '\n' { iner line; CR }
I floating { REAL (float_of _string (Lexing .lexeme lexbuf }
I digit+ { INTEGER(int_of_string (Lexing.lexeme lexbuf }
I eof { EOF }
I _ { failwith ("Mistake at line "-string_of_int !line) }
A lexer description for ocamllex begins with a header of ordinary OCaml code enclosed in
curly braces. In this case, the header opens the namespace of the MyParser module (which
we have yet to define) and creates a mutable variable line to keep track of the line number.
The header is followed by let constructs which build regular expressions. Finally, the guts
of the lexer appear as a rule called token which parses a sequence of characters, matching
them against regular expressions and executing corresponding actions, most of which produce
tokens.
The first rule matches spaces and tabs, simply absorbing this ''whitespace'' by recursively
calling the token rule. The second rule matches the new-line character, incrementing the
line count and generating the CR token. The third rule matches the string representation
of a floating point number. The lexbuf variable contains the current state of the lexer, the
Lexing .lexeme function extracts the matched string from lexbuf. This string is used to gener-
ate a REAL token containing the corresponding value of type float using the float_of _string
function. The fourth rule matches the string representation of an integer, generating an
INTEGER token containing the corresponding value of type into The fifth rule matches the
end-of-file marker, generating the EOF token. Finally, the sixth rule is a catch-all which matches
any other character sequences and raises an exception containing the line number of the invalid
input.
This lexer can be compiled into an OCaml program using the ocamllex compiler. For example,
from the Unix shell:
$ oeamllex myLexer.mll
12 states, 322 transitions, table size 1360 bytes
The resulting OCaml source code which implements a lexer of this description is placed in the
"myLexer.ml" file. Before compiling this file, we must create the myParser module which it
depends upon.
In many cases, a parser using a lexer would itself be generated from a parser description, using
the ocamlyacc compiler. We shall describe this approach in the next section but, before this,
we shall demonstrate how the functionality of a generated lexer may be exploited without
using ocamlyacc.
Before compiling the OCaml program "myLexer.ml", which implements our lexer, we must
create the MyParser module which it depends upon:
# module MyParser =
struct
type token = CR I REAL of float I INTEGER of int I EOF
let rec main lexer lexbuf = match lexer lexbuf with
CR ->
print_endline "CR";
main lexer lexbuf
INTEGER n ->
print_endline ("INTEGER "-string_of_int n);
main lexer lexbuf
REAL x ->
print_endline ("REAL "-string_of_float x);
main lexer lexbuf
I EOF -> 0
I _ -> failwith "Not EOF"
end; ;
module MyParser :
sig
type token = CR I REAL of float I INTEGER of int I EOF
val main: (, a -> token) -> 'a -> unit
end
Note that the CR, REAL, INTEGER and EOF tokens used by the lexer are actually nothing more
than type constructors, in this case for for the MyParser. token type. Having defined the
MyParser module and, in particular, the MyParser. token variant type, we can include the
functionality of the lexer into the top-level using the #use directive
4
:
# #use ''myLexer.ml'';;
Finally, we can try our lexer by passing an entry point into the lexer and a buffer for the
lexer to the MyParser. main function. The entry point for our lexer is the only rule, token,
represented by the MyLexer . token function. A buffer for the lexer can be created using the
Lexing. from_ channel function from the OCaml core library. Using standard input, we have:
# MyParser.main token (Lexing.from_channel stdin);;
The lexer is now being used to interpret standard input. Typing:
4 567.9
Results in MyParser .main printing:
4The top-level spits out a great deal of superfluous output, which we have not included here.
INTEGER 4
INTEGER 5
INTEGER 6
REAL 7.9
CR
Typing:
Rubbish!
Results in:
Exception: Failure "Mistake at line 2".
123
The capabilities of a lexer can clearly be useful in a stand-alone configuration. In particular,
programs using lexers, such as that we have just described, will validate their input to some
degree. In contrast, many current scientific applications silently produce erroneous results.
However, the capabilities of a lexer can be greatly supplemented by an associated parser, as
we shall now demonstrate.
5.4.2 Parsing
The parsing stage of interpreting input converts the sequence of tokens from a lexer into a
hierarchical representation (illustrated in figure 5.1) - the abstract syntax tree (AST). This
is performed by accumulating tokens from the lexer either until a valid piece of grammar is
recognised and can be acted upon, or until the sequence of tokens is clearly invalid, in which
case a Parsing. Parse_error exception is raised.
The ocamlyacc compiler can be used to create OCaml programs which implement a specified
parser. The specification for a parser is given by a description in a file with the suffix ".mly".
Formally, these parsers implement LALR(l) grammars described by rules provided in Backus-
Naur form (BNF).
For example, consider a parser, based upon the lexer described in the previous section, which
interprets textual data files. These files are expected to contain an integer followed by three
floating-point numbers on each line, for an unknown number of lines. This format might be
used to represent the chemical element and three dimensional coordinate of each atom in a
molecule.
A program to parse these files can be generated by ocamlyacc from a grammar which we shall
now describe. Given a file "name.mly" describing a grammar, by default ocamlyacc produces
a file "name.ml" containing an OCaml program implementing that grammar. Therefore, in
order to generate a MyParser module for the lexer, we shall place our grammar description in
a "myParser.mly" file.
Grammar description files begin by listing the definitions of the tokens which the lexer may
generate. In this case, the possible tokens are CR, EOF, INTEGER and REAL.
Tokens called NAME1, NAME2 and so on, which carry no associated values, may be defined
by:
124
%token NAMEl NAME2 ...
CHAPTER 5. INPUT AND OUTPUT
A token called NAME which carries an associated value of type type is defined by:
%token <type> NAME
Thus, the tokens for our lexer can be defined as:
%token CR EOF
%token <int> INTEGER
%token <float> REAL
The token definitions are followed by a declaration of the entry point into the parser (i.e. the
rule of the parser used to start the parsing process). For reasons that will become clear, we
shall use:
%start main
This is followed by a declaration of the type returned by the action corresponding to the entry
point. In this case:
%type <(int * float * float * float) list> main
Before the rules and corresponding actions describing the grammar and parser are given, the
token and entry-point definitions are followed by a separator:
%%
The guts of the parser are represented by a sequence of groupings of rules and their corre-
sponding actions:
group:
I rulel
{ actionl }
rulen
{ actionn };
A grouping represents several possible grammatical constructs, all of which are used to produce
OCaml values of the same type, i.e. the types of the expressions action1 to actionn must be
the same. Rules are simply a list of expected tokens and groupings. In particular, rules may
be recursive, i.e. they may refer to themselves, which is useful when building up arbitrarily
long lists.
Our parser will begin with a description of the expected contents of a line of input. We shall
call this single-rule group atom:
atom:
This group contains a single rule - the expected contents of a line of input:
I INTEGER REAL REAL REAL
125
The action corresponding to this rule will convert the matched tokens into an OCaml data
structure (which will become a branch of the resulting AST). In this case, the four matched
tokens will be converted into a 4-tuple int * float * float * float containing the contents
of each token. In general, the data associated with the i
th
token of a rule's pattern may be
referred to using the notation $i in the corresponding action. Thus, our action is simply:
{ ($1, $2, $3, $4) };
Our parser will only contain one other group, main, which we defined as the entry-point into
the parser when we specified %start main. This group contains two rules, the actions of which
return the type (int * float * float * float) list. The first rule handles a line of input
containing an atom followed by the remainder of the file, prepending the 4-tuple generated
for the atom onto the list generated by parsing the remainder of the input. The second rule
matches the end of the input, producing the empty list:
main:
atom CR main
{ $1 :: $3 }
EOF
{ [J };
Note that a semicolon terminator is placed at the end of each group.
An OCaml program "myParser.ml" implementing this parser, described in this "myParser.m1y"
file, may be compiled using the ocamlyacc program:
$ ocamlyacc myParser.mly
The MyLexer and MyParser modules can then be compiled using the usual commands:
$ ocamlc -c myParser.mli
$ ocamlc -c myParser.ml
$ ocamlc -c myLexer.ml
Having compiled the lexer and parser into byte-code, we can create a custom top-level which
includes their functionality:
$ ocamlmktop myParser.cmo myLexer.cmo -0 myparser.top
We can now demonstrate this lexer and parser working on an example file called "test.dat"
which contains
5
:
1 5.4 3.9 3.7
12 5.9 4.2 3.1
1 5.4 3.9 2.5
Running the custom top-level allows us to play with the lexer and parser. The following parses
the "test.dat" file:
$ ./myparser.top
# let lexbuf = Lexing. from_channel (open_in "test. dat") in
MyParser.main MyLexer.token lexbuf;;
- : (int * float * float * float) list =
[(1,5.4,3.9,3.7); (12,5.9,4.2,3.1); (1,5.4,3.9, 2.5)J
Thus our lexer and parser have worked together to interpret the integer and floating-point
numbers contained in this file as well as the structure of the file in order to convert the
information contained in the file into a data structure which can then be manipulated by
further computation.
Moreover, the lexer and parser help to validate input. For example, a file which erroneously
contains letters is caught by the lexer. In this case, the lexer will raise an exception which
contains a reference to the line number at which the error was noticed, as demonstrated in the
preceding section. A file which erroneously contains a floating-point value where an integer
was expected will be lexed into tokens without fault but the parser will spot the grammatical
error and raise the Parsing. Parser _error exception.
In an application, the call to the main function in the MyParser module can be wrapped in a
try . .. with ... to catch the Parsing. Pars er_error exception and handle it, most likely
making use of the line variable in the MyLexer module to obtain the line number at which
the grammatical error was noticed.
Currently, the input is expected to end with a new line. In practice, it may be useful to relax
the parser to also allow a file ending directly with an EOF, without the CR. This may be done
by supplementing the main group with an extra rule:
main:
atom CR main
{ $1 :: $3}
atom EOF
{ [ $1 J }
I EOF
{ [J };
As we have seen, the ocamllex and ocamlyacc compilers can be indispensable in both design-
ing and implementing programs which use a non-trivial file format. These tools are likely to
be of great benefit to scientists wishing to create unambiguous, human-readable formats.
5Note that the last line ends with a new line before the EOF.
Chapter 6
Visualization
The ability to visualise problems and data can be of great use when trying to understand
difficult concepts and, hence, can be of great use to scientists. Perhaps the most obvious
use of computer graphics is the visualisation of atomic and molecular systems. However, a
great many other problems can also be elucidated through the use of real-time graphics. In
particular, the ability to render animated 2D and 3D graphics can be exploited to improve
upon current, made-for-print graph drawing applications.
In this chapter, we shall introduce a powerful library which can be used to render high-fidelity,
real-time 2D and 3D graphics even on modest consumer hardware. We shall then introduce
OCaml bindings to this library which provide access to its functionality from programs written
in the OCaml language. Finally, we shall develop a graphical application written entirely in
OCaml.
6.1 Overview of OpenGL
Over the years, several libraries have been developed with the intent of providing access to
powerful computer graphics hardware whilst presenting a simple, easy-to-use interface to the
functionality. However, only one library has become the de-facto standard for this task - the
Open Graphics Library (OpenGL) by Silicon Graphics Incorporated (SGI). In recent years,
Microsoft have developed a competitor known as DirectX. However, although the capabilities
of OpenGL and DirectX are similar, DirectX only works on Microsoft operating systems
(i.e. Windows) whereas OpenGL is freely available for a wide variety of architectures and
operating systems, most notably Linux, Mac OS X and Windows.
In order to be able to render to the screen using OpenGL, a program must first acquire a
resource known as a rendering context. Obtaining a rendering context directly is often tricky
and the code required is specific to an OS. Consequently, the process of acquiring a rendering
context is typically performed using a cross-platform library. The OpenGL Utility Toolkit
(GLUT) is one such, freely available library which also provides access to input data from
the mouse and keyboard.
Fortunately, Jacques Garrigue, Isaac Thotts and other authors have written OCaml bindings
to OpenGL (called lablGL), glut (called lablglut) and several other OpenGL-related libraries.
We shall use these libraries in order to write OCaml programs which use OpenGL.
127
128 CHAPTER 6. VISUALIZATION
We shall now introduce a basic template program, based upon glut, which will be used as the
foundation for several visualisation programs written using OpenGL.
6.1.1 GLUT
A template program called "render.mI", which uses lablGL and lablglut, may be written
1
:
let
let width = ref 640 and height = ref 480 in
let argv' =Glut.init Sys.argv in
Glut.initDisplayMode ();
Glut. initWindowSize -w: ! width -h: ! height;
ignore (Glut. createWindow -title: "The window name");
let set_projection w h = 0 in
let render 0 = 0 in
let reshape -w -h =
GIDraw.viewport 0 0 w h;
set_projection w h;
width := w; height := h;
render 0 in
Glut. reshapeFunc - cb: (reshape) ;
Glut. displayFunc - cb: (render) ;
Glut.mainLoop 0
Two functions are currently missing, set_projection and render.
The set _proj ect ion function defines the two- or three-dimensional space visualised by the
rendering context. The render function is responsible for making the appropriate calls to
OpenGL to render whatever is required.
This program may be compiled by supplying the lablGL and lablglut archives which it depends
upon, using the syntax described in chapter 2:
$ ocamlopt -I +lablGL lablgl.cmxa lablglut.cmxa render.ml -0 render
We shall derive working programs from this template program in the remainder of this chapter.
In the mean time, let us dissect this template program.
The mutable width and height variables are used to store the current width and height of the
OpenGL rendering context, measured in screen pixels. The init function in the Glut module
is used to parse any command-line arguments pertaining to glut. The initDisplayMode
function can be used to request various properties for the rendering context:
an alpha channel using the alpha optional argument - used for transparency.
double buffering using the double_buffer optional argument - used to eliminate ffick-
ering during animations.
IThis uses named arguments (of the form -name:value) which are described in the appendices, in sec-
tion A.2, and the ignore function which simply ignores its argument and returns the value of type unit.
6.1. OVERVIEW OF OPENGL 129
a depth buffer using the depth optional argument - used to make nearer objects obscure'
farther objects regardless of the order in which they are drawn.
a stencil buffer using the stencil optional argument - used to restrict rendering to
certain pixels.
In this case, none of the properties are requested.
The Glut. initWindowSize function is used to request the initial size of the rendering context.
The Glut. createWindow function creates a window with the given title. The resulting window
contains the OpenGL rendering context, which can then be used for visualisation.
Once this code has been executed, call-back functions can be given to glut in order to perform
arbitrary computations on demand, including rendering for visualisation. These call-back
functions are executed by glut under appropriate circumstances, such as a key-press, mouse-
movement or required screen update. Several call-backs may be specified, the most important
of which are:
Render the contents of the window: displayFunc
Resize the window: reshapeFunc
Handle a key-press: keyboardFunc
Handle a mouse button press: mouseFunc
Handle mouse movement while a mouse button is pressed: motionFunc
Handle mouse movement when no mouse buttons are pressed: passiveMotionFunc
Each of these call-backs require specific arguments. For more details, consult the lablGL
documentation.
Before continuing, let us fill in the set_proj ection and render functions in order to create
a simple, working program.
The following implementation of the set_proj ection function sets up an orthogonal projec-
tion covering the two-dimensional space (0 ... W, 0 ... h) where w and h are the width and
height of the window in pixels, respectively:
let set_projection w h =
GlMat.mode 'projection;
GlMat .load_identity 0 ;
let w = float_of_int wand h = float_of_int h in
GlMat.ortho -x: (0., w) -y: (0., h) -z: (0., 1.);
GlMat .mode 'modelview in
The following implementation of the render function asks OpenGL to draw a red triangle on
a light-green background:
Figure 6.1: Simple demonstration, rendering a triangle using OpenGL.
let render 0 =
GIClear.color (0.8,1.,0.8);
GIClear.clear ['color];
GIDraw.color (1., 0.,0.);
GIDraw.begins'triangles;
GIDraw.vertex -x:O. -y:O. 0;
GIDraw . vertex -x: 100. -y: 200. 0;
GIDraw.vertex -x:200. -y:O. 0;
GIDraw . ends 0;
Gl. flush 0 in
The result of this program is shown in figure 6.1. In the next section, we shall alter this
program to demonstrate ways that more complicated objects can be rendered in terms of the
primitives provided by OpenGL.
6.2 Basic rendering
Several decades ago, the raster display emerged as the dominant method for displaying com-
puter graphics. These displays scan a two-dimensional surface in a characteristic pattern to
produce a graphical display. Ultimately, this is manifested by today's computers representing
their display as a matrix of coloured dots called pixels. A wide variety of algorithms have been
invented which determine pixels colours in order to produce meaningful or aesthetic results.
The details of these algorithms can be phenomenally complicated, driven by desire for more
capable graphics.
In the quest for ever more sophisticated graphics, dedicated hardware was produced in order
to render graphics as quickly as possible. Thanks to the economies of mass production and the
ubiquity of computer games, highly sophisticated rendering hardware is now commonplace.
However, such hardware does not support all of the methods of graphics rendering which have
been developed over the past few decades. In this section, we shall examine the rendering
primitives which are supported by modern graphics hardware and OpenGL.
The information required to render an object can be split into geometry and fill. The geometry
of the object defines the shape of the object in terms of OpenGL primitives. The fill determines
the way in which pixels are coloured in. In the simplest case, the geometry of an object can
6.2. BASIC RENDERING
Lines Line strip Line loop
246 246 246
III IW JH1
131
Triangles Triangle
strip
2 4 6 2 4 6

a) 1 3 5
b) 1 3 5
135
135
135
Triangle
fan
3 4
2_5
1 6
8 7
Quads
Quad
strip
2 3 2 4 6
I
c)
1 4 1 3 5
Polygon
3 4
2_5
1 6
d)
8 7
Figure 6.2: OpenGL primitives: a) lines, b) triangles, c) quadrilaterals, and d) convex
polygons.
be described by a set of triangles and the fill can be described by a single colour. Functions
pertaining to simple geometry and fill specification axe encapsulated in the GlDraw module by
lablGL.
6.2.1 Geometric primitives
Several geometric primitives axe provided by OpenGL. These primitives are points, lines,
triangles, quadrilaterals (quads) and convex polygons. Of these primitives, lines, triangles
and quads may be rendered individually or in groups (illustrated in figure 6.2). Lines may
be rendered individually or as a "strip", a set of abutting lines, or as a "loop", a strip with a
closing line. Triangles may be rendered individually or as a "strip", where adjacent triangles
shaxe two vertices, or as a "fan", where the triangles all shaxe a single, common vertex. Quads
may be rendered individually or as a strip.
The lablGL names given to these different primitives axe self-explanatory2:
Points: 'points
Lines: 'lines, 'line_strip, 'line_loop
2The ' preceding these names denotes a form of variant type called a polymorphic variant. These types have
special semantics which we have not yet described and which are not important in this context. Polymorphic
variants are discussed in more detail in section A.8.
132
a)
CHAPTER 6. VISUALIZATION
Figure 6.3: A circular annulus drawn as a single triangle strip: a) rendered result, and
b) the underlying geometry (a single triangle strip).
Triangles: 'triangles, 'triangle_strip, 'triangle_fan
Quads: 'quads, 'quad_strip
Polygon: 'polygon
The example implementation of the render function drew a triangle by calling the begins
function in the GlDraw module, with the argument (triangles, and then supplied three
vertices as two-dimensional coordinates using the vertex function in the same module, with
the optional arguments x and y.
The other primitives may be drawn by making similar calls to functions in the GlDraw module,
beginning with an appropriate argument to begins, followed vertex definitions specified by
calls to the vertex, vertex2, vertex3 or vertex4 functions and delimited by a call to the
ends function. For example, the following implementation of the render function draws a
circular annulus as a single triangle strip:
let strips = 72 in
let render () =
GlClear.color (0.6,1.,0.8);
GlClear.clear ['color];
GlDraw.color (0.5,0.,1.);
GlDraw. begins' triangle_strip;
for i = 0 to strips do
let pi = 4. *. atan 1. and i = float_of_int
let theta = 2. *. pi *. i /. float_of_int strips in
let x, y = sin theta, cos theta in
let xl, yl = 150. *. x +. 200., 150. *. Y+. 200. in
let x2, y2 =200. *. x +. 200.,200. *. Y+. 200. in
GlDraw. vertex -x: xl -y: yl ();
GlDraw. vertex -x: x2 -y: y2 ();
done;
GlDraw. ends 0;
Gl. flush 0 in
6.2. BASIC RENDERING 133
The result is shown in figure 6.3a along with the skeleton of the geometry in figure 6.3b,
showing the triangle strip. The underlying geometry is most easily visualised by specify-
ing that solid primitives are rendered as outlines rather than being filled, using the call
GlDraw.polygon_IDode 'both 'line.
Clearly, relatively simple programs can be used to visualise complicated geometries. Useful
information can also be conveyed using additional information, such as pixel colour.
6.2.2 Filling
The simplest form of filling simply fills any covered pixels with a constant colour, specified in
terms of red, green and blue components. The original example implementation of the render
function drew a red triangle by first specifying the colour using a call to the GlDraw. color
function with a 3-tuple specifying the red, green and blue components, respectively, as values
of the type float in the range O... 1, Le. the colour red was obtained by specifying full red
and no green or blue, giving the 3-tuple (1., 0., 0.). In the previous example, a mauve
annulus was rendered by specifying the 3-tuple (0.5, 0., 1.), i.e. full blue, half red and no
green.
OpenGL supports many, more sophisticated forms of filling including smooth shading (inter-
polating between different colours at different vertices), alpha blending (for transparency) and
texture mapping. For more information on these additional forms of filling see some of the
many books on OpenGL [10].
6.2.3 Projection
The set_projection function shared by both of the previous examples used a call to the
GlMat .ortho function to specify that the renderable area represented a two-dimensional space
(0 ... W, O... h), Le. the units of the space are pixels. This is easily altered to represent different
regions of 2D space but OpenGL can also be used to visualise 3D spaces by projecting them
onto the 2D space of the screen.
In order to use 3D rendering, the rendering context should possess a depth buffer. This can
be requested in the call to initDisplayMode:
Glut. initDisplayMode -depth:true ();
The set_projection function can then use 3D projection by:
setting the properties of the perspective transform (the field-of-vision, the aspect-ratio
and the depth of field) using the perspective function in the GluMat module.
specifying the location, target and up-direction ofthe camera using the look_at function
also in the Gl uMat module.
For example, the set_projection function could be altered to:
Figure 6.4: 3D perspectiveprojection of the circular annulus.
let set_proj ection w h =
GlMat.mode 'projection;
GlMat .load_identity 0 ;
let w = float_of_int wand h = float_of_int h in
GluMat . perspective -fovy: 45.0 -aspect: (w /. h) -z: (0.1, 1000.);
GluMat.look_at -eye: (3.,3., -5.) -center: (0., 0.,0.) -up: (0.,1.,0.);
GlMat.mode 'modelview;
GlMat . load_identity 0; in
The render function should also be altered to clear the depth buffer and enable depth testing.
We shall first factor out a function to render the annulus:
let strips = 72 in
let render _annulus 0
GlDraw.begins'triangle_strip;
let pi = 4. *. atan 1. and i = float_of_int i in
let theta = 2. *. pi *. i /. float_of _int strips in
List.iter GlDraw.vertex2 [1.5 *. x, 1.5 *. y; 2. *. x, 2. *. y]
done;
GlDraw. ends 0 in
The render function may then be redefined as:
let render 0 =
GlClear.color (0.6, 1.,0.8);
GlClear.clear ['color; 'depth];
Gl.enable'depth_test;
GlDraw.color (0.5, 0., 1.);
render_annulus ();
Gl.flush 0 in
The result of this 3D perspective view of the circular annulus is shown in figure 6.4.
6.2. BASIC RENDERING
6.2.4 Animation
135
In the current version of our OpenGL program, the render function is only called when neces-
sary - when all or part of the window containing the rendering context becomes visible. Two
fundamental alterations are required to produce animated graphics. Firstly, the animation
needs to be redrawn constantly. This is easily achieved by requesting that the window be
redrawn when nothing else is happening, by specifying the idle call-back:
Glut.idleFunc -cb: (Some Glut.postRedisplay);
Secondly, flicker-free animation is typically achieved by using two buffers, only one of which is
displayed at anyone time. Implementing this requires asking for a double buffered rendering
context:
Glut. initDisplayMode -depth :true -double_buffer :true 0 ;
The render function must then be altered to swap buffers once rendering is complete. This
can be achieved by calling Glut. swapBuffers after the call to Gl. flush.
We shall begin by defining a time function before the definition of the new render function:
let time =
let start =Unix. gettimeofday 0 in
fun 0 -> Unix.gettimeofday 0 -. start in
This function returns the time in seconds since the program started. The render function
may then be defined:
let render () =
GlClear.color (0.6,1.,0.8);
GlClear.clear ['color; 'depth];
Gl.enable 'depth_test;
GlDraw.color (0.5 *. (1. +. sin(time 0)), 0., 1.);
render_annulus ();
Gl. flush 0;
Glut. swapBuffers 0 in
The time function requires the gettimeofday function in the Unix module which, conse-
quently, must now be specified when compiling:
$ ocamlopt -I +lablGL lablgl.cmxa lablglut.cmxa unix.cmxa render.ml -0 render
The resulting program produces a result similar to that shown in figure 6.4 but cycling through
the colours blue and purple. Considerably more interesting animations can be obtained by
transforming objects.
6.3 Transformations
The task of transforming the vertices of an object, to shift Or stretch the object, can be rather
tedious. More importantly, this task can also be very computationally expensive if millions
of vertices are in play. Fortunately, OpenGL provides a simple and efficient way to transform
vertex coordinates. In particular, computations are automatically off-loaded onto dedicated
hardware when possible, greatly improving performance.
In OpenGL, transformation matrices are held on a stack. By adding a new transformation
onto the stack, or removing the last transformation from the stack, transformations can be
applied hierarchically.
The pedagogical example of a hierarchy of transformations is the rendering of a robot arm.
The arm might contain three joints, a shoulder joint at the base, an elbow and a wrist. The
base of the arm is static and the upper arm, lower arm and hand are each affected by all joint
rotations between them and the base. The upper arm is transformed only by the rotation at
the shoulder joint. The lower arm is transformed by the rotation at the elbow joint as well as
the shoulder joint. Finally, the hand is transformed by the rotations of the shoulder, elbow
and wrist joints. This is a hierarchy of three rotations, one for each of the three joints, with
each rotation affecting the remainder of the arm.
The current object transformation matrix (known as the "model view" matrix in OpenGL)
can be altered in the render function using the rotate, scale and translate functions in the
GlMat module. The current matrix can also be copied onto the stack using the push function
and the last matrix moved back off the stack into the current matrix using the pop function.
For example, the animated annulus in the previous example may be made to spin by simply
altering the current model view matrix using the rotate function. The resulting render
function must initialise the model view matrix to the identity matrix using the load_identity
function, to remove the effects of any transformations left over from the previous frame of
animation:
let render 0 =
GlMat.load_identity 0;
GlMat.rotate-angle:C100. *. time 0) -x:O. -y:1. -z:O. 0;
The result is a spinning, coloured, circular annulus.
A more interesting result can be obtained by replacing the call to render _annulus with a
sequence of transformations and calls:
GlMat.scale -x:0.l-y:0.l-z:0.l 0;
for i = 1 to 20 do
GlMat.rotate -angle: (-10. *. time 0) -x:0.25 -y:1. -z:O. 0;
GlMat.translate-x:(-2.5) -y:O. -z:O. 0
done;
for i = 1 to 40 do
GlMat.translate -x:2.5 -y:O. -z:O. 0;
GlMat.rotate -angle: (10. *. time 0) -x:0.25 -y:1. -z:O. 0;
render_annulus 0
done;
6.4. EFFICIENT RENDERING 137
a) b)
Figure 6.5: Two frames from an animated sequence of transformed circular annuli: a)
0.8, and b) 1.7 seconds into the animation.
The second loop repeatedly translates and rotates the next object relative to the previous
object, accumulating a spiral of 40 objects. The first loop performs the reverse operation
(effectively transforming by the inverse) 20 times, such that the animation is centred on the
20
th
object. The result is the animated spiral of coloured, circular annuli shown in figure 6.5.
Note that each annulus is still flat: the apparent curvature of the spiral is an illusion caused by
the presence of a sufficient number of small segments (the same illusion which makes the annuli
appear circular rather than polygonal). Also, note that reversing the transformation requires
the individual matrix transformations to be specified in reverse order: A-I B-
1
BA = I.
As we have seen, even the simplest rendering techniques allow quite complicated objects to
be rendered (the last example contains (73 - 2) x 40 = 2,840 triangles). However, in order
to render more complicated objects in real time, more efficient approaches to rendering are
required.
6.4 Efficient rendering
In the interests of correctness, the first version of any visualisation program should be written
using intermediate-mode rendering (Le. calls to the begins, vertex and ends functions in the
GlDraw module). Once a working first version of the required program has been written, the
program may be optimised by using more sophisticated approaches to rendering.
The main overhead of immediate mode rendering is the number of function calls required to
perform any given rendering task. Thus, optimisations typically allow data to be conveyed to
the OpenGL more efficiently in order to reduce the number of function calls required.
Sequences of OpenGL calls, such as those used to define the vertices of polygons, may be
cached in display lists. The contents of display lists cannot be altered without rebuilding
the display list and, thus, display lists are ideal for static geometry. As a rule of thumb,
performance is likely to increase when 10
2
vertices are stored in a display list. Functions
pertaining to display lists are encapsulated in the GlList module by lablGL.
For example, the circular annulus in the previous example is a static geometry requiring
72 x 2 = 144 unique vertices. Thus, the rendering of an annulus, as performed by the
render_annulus function, may be productively replaced with a display list. This can be
achieved by altering the render_annulus function to compile the calls required to define the
geometry in a display list, reusing the display list in future calls:
let annulus_list = ref None and strips = 72 in
let render_annulus 0 = match !annulus_list with
None -)
annulus_list := Some (GlList.create 'compile_and_execute);
let pi = 4. *. atan 1. and i = float_of_int i in
let theta = 2. *. pi *. i I. float_of_int strips in
List.iter GlDraw.vertex2 [1.5 *. x, 1.5 *. y; 2. *. x, 2. *. y]
done;
GlDraw. ends 0;
GlList.ends 0
Some 1 -) GlList. call 1 in
The annulus_list variable contains an optional reference to the display list holding the
definition of the annulus. When first called, the render_annulus function creates a new
display list using the create function in the GlList module, storing the resulting list in the
mutable annulus_list variable. A sequence of OpenGL commands are then both compiled
into the display list and simultaneously executed, until the GlList. ends function is called.
Future calls to the render_annulus function find a display list in the annulus_list variable
and execute this display list using the call function in the GlList module.
Alternatively, vertex data may be stored in vertex arrays. These may include vertex positions,
normal-vectors, colours and texture map coordinates. Vertices are then referred to by their
index in the vertex array. Rendering then requires a sequence of such indices to be supplied.
In order to further reduce the number of function calls made, sequences of vertex indices may
be stored in index arrays. Functions pertaining to vertex and index arrays are encapsulated
in the GlArray module by lablGL.
For example, a vertex array containing the 144 two-coordinate vertices of the circular annulus
may be created using:
let strips = 72 in
let vertex_array =
let a = Array.make (strips*4) O. in
for i=O to strips - 1 do
let j =i * 4 in
let pi = 4. *. atan 1. and i = float_of _int i in
let theta =2. *. pi *. i I. float_of _int strips in
a. (j+O) <- 1.5 *. x; a. (j+1) <- 1.5 *. y;
a.(j+2) <- 2. *. x; a.(j+3) <- 2. *. y;
done;
Raw.of_float_array a -kind: 'double in
The render_annulus function may then be altered to enable vertex arrays and to refer to
vertices in the array, rather than specifying their coordinates explicitly:
6.5. RENDERING SCIENTIFIC DATA 139
let render_annulus () =
GlArray.enable 'vertex;
GlArray. vertex' two vertex_array;
for i=O to strips * 2 do
GlArray. element (i mod ((strips - 1) * 2)) ;
done;
GlDraw. ends 0;
GlArray.disable 'vertex; in
The call to the enable function in the GlArray module enables vertex arrays. The call to the
vertex function tells OpenGL where the vertex array is. Calls to the element function refer
to a vertex in the array, replacing calls to the vertex or vertex2 functions in the GlDraw
module.
Alternatively, the sequence of vertex indices specified to the GlArray. element function may
be contained in an index array:
let index_array =
let a == Array. init ((strips + 1) * 2) (fun i -> i mod (strips * 2)) in
Raw. of _array a -kind: 'uint in
The triangle strip may then be rendered with a single call to the draw_elements function in
the GlArray module:
let render _annulus 0 ==
GlArray.enable'vertex;
GlArray. vertex 'two vertex_array;
GlArray. draw_elements' triangle_strip ((strips + 1) * 2) index_array;
GlArray.disable 'vertex in
Unlike display lists, vertex arrays allow vertex data to be altered. Consequently, vertex arrays
are most useful when rendering geometries which are constantly changing shape.
Having examined most of the fundamentals of rendering using OpenGL, we shall now develop
a program capable of rendering something useful.
6.5 Rendering scientific data
The ability to render simulated atomic structures can be useful in elucidating complicated
geometric properties but can also be used as a diagnostic tool. In this section, we shall develop
a program capable of loading a one-component atomic structure from file and animating the
structure in real-time using OpenGL.
We shall begin by defining a lexer and parser to load the file format. The file format consists
of three numbers, the vector coordinates of an atom, on each line. Thus the lexer, called
''lexer.mll'', generates FLOAT, CR and EOF tokens:
{
open Parser
let line = ref 1
{FLOAT(float_of_string (Lexing.lexeme lexbuf)) }
{ EOF }
{ failwith ("Mistake at line ! line) }
}
let digit = [ '0' - '9' J
let exponent = [ 'e' 'E' J [ '+'
let floating = '-'? (digit+ '.'
rule token = parse
[, , '\t'J {token lexbuf}
I '\n' { incr line; CR}
I floating
I digit+
I eof
I
'-' J digi t+
digit* I digiU
'. '
digit+) exponent?
The parser, called ''parser.mly'', recognises these lines containing triples of numbers and pro-
duces a list of 3-tuples:
%token CR EOF
%token <float> FLOAT
%start main
%type <(float * float * float) list> main
%%
atom: I FLOAT FLOAT FLOAT { $1, $2, $3 };
main: I atom CR main { $1 :: $3} I atom EOF { [$lJ } I EOF { [J };
The main program, called ''render.ml'', begins with the usual preamble:
let
let width = ref 640 and height = ref 480 in
let argv' = Glut.init Sys.argv in
Glut. initDisplayMode -depth:true -double_buffer: true 0;
Glut.initWindowSize -w: !width -h: !height;
ignore (Glut. createWindow -title: "Atomic visualisation");
Followed by a function to load the atomic coordinates as a list of 3-tuples, using the parser:
let atoms =
Parser.main Lexer.token (Lexing.from_channel stdin) in
In order to centre the rendered object, the mean coordinate is computed using a fold:
let offset =
let add (a, b, c) (d, e, f) = (a +. d, b +. e, c +. f) in
let n, (x, y, z) = let aux (n, t) r = (n+1, add t r) in
List. fold_left aux (0, (0., 0., 0.)) atoms in
let n = float_of_int n in
( -. x I. n, -. y I. n, -. z I. n) in
The set_projection function initialises the perspective transform to use a 45 field-of-vision
looking down the positive z-axis from the camera position (0,0, -100):
6.5. RENDERING SCIENTIFIC DATA
let set_projection w h ::;
G1Mat.mode 'projection;
G1Mat .load_identity 0 ;
let w = float_of_int wand h ::; float of_int h in
GluMat.perspective -fovy:45.0 -aspect:(w /. h) -z:(O.l, 1000.);
GluMat.look_at
-eye:(O., 0., -100.) -center:(O., 0.,0.) -up:(O., 1., 0.);
G1Mat.mode 'modelview;
G1Mat.load_identity (); in
In order to time the animation, we reuse the previously defined time function:
let time =
let start::; Unix. gettimeofday 0 in
fun 0 -> Unix. gettimeofday 0 -. start in
141
The render function begins by rotating the object about the y-axis once the object has been
centred by translating by the mean coordinate:
let render 0 =
G1Mat.load_identity ();
G1Mat.rotate -angle: (30. *. time 0) -x:O. -y:1. -z:O. 0;
G1Mat.translate3 offset;
The colour and depth buffers are then cleared, giving a white background:
G1Clear.color (1.,1.,1.);
G1Clear.clear ['color; 'depth];
The atoms are drawn as black points, five pixels in diameter, by iterating the vertex3 function
over the list atoms of atomic coordinates:
G1Draw.color (0., 0.,0.);
G1Draw.point_size 5.;
G1Draw.begins 'points;
List.iter G1Draw.vertex3 atoms;
G1Draw. ends 0;
Finally, the render function flushes the stream of OpenGL commands, causing the scene to be
rendered, and swaps buffers to display the result:
Gl.flushO;
Glut. swapBuffers 0 in
The program ends with the previous definition of the reshape function, registering reshape,
display and idle call-backs and entering the glut "main loop" to begin the animation:
1"
.'
Figure 6.6: A frame from a smoothly animated 1Q4-atom model of amorphous silicon.
let reshape -w -h =
G1Draw.viewport 0 0 w h;
set_projection w h;
width := w; height := h;
render 0 in
Glut.reshapeFunc -cb: (reshape);
Glut.displayFunc -cb: (render);
Glut. idleFunc -cb: (Some render);
Glut.mainLoop 0
This program may be compiled using:
$ ocamllex lexer.mll
$ ocamlyacc parser.mly
$ ocamlopt -c parser.mli
$ ocamlopt -c parser.ml
$ ocamlopt -c lexer.ml
$ ocamlopt -I +lablGL lablgl.cmxa lablglut.cmxa unix.cmxa parser.cmx lexer.cmx
render.ml -0 render
6.5. RENDERING SCIENTIFIC DATA 143
A snapshot of the animation resulting from the visualisation of a 10
4
-atom model of amorphous
silicon is shown in figure 6.6.
Before concluding this chapter we should reiterate that the lablGL bindings to OpenGL for
OCaml do not yet provide the safe execution environment offered by the core OCaml distribu-
tion. Indeed, the task of writing safe bindings to such an unsafe interface is quite formidable.
Next, we shall examine program transformations which can improve performance.
Chapter 7
Optimization
Despite advances in computer technology, improving efficiency can still be desirable. This task
is greatly simplified by starting with a correct but inefficient version of a program. Compared
to other languages, the features of OCaml make it ideally suited for the rapid creation of
reliable programs. Once a program has been shown to function correctly, these programs can
be targeted for optimization.
This chapter examines some techniques which can be used to optimize OCaml code. The
overall approach to whole program optimization is to perform each of the following steps in
order:
1. Profile the program compiled with automated optimizations and running on representa-
tive input.
2. Of the sequential computations performed by the program, identify the most time-
consuming one from the profile.
3. Calculate the (possibly asymptotic) algorithmic complexity of this bottleneck in terms
of suitable primitive operations.
4. If possible, manually alter the program such that the algorithm used by the bottleneck
has a lower asymptotic complexity and repeat from step 1.
5. If possible, modify the bottleneck algorithm such that it accesses its data structures less
randomly to increase cache coherence.
6. Perform low-level optimizations on the expressions in the bottleneck.
We shall now consider each of these processes in more detail.
7.1 Profiling
Before beginning to optimize a program, it is vitally important to profile the program running
on representative inputs in order to ascertain quantitative information on any bottlenecks
145
146 CHAPTER 7. OPTIMIZATION
in the flow of the program. For example, many scientific programs load data, perform a
computation and save the result. If the computation is trivial then most of the time will be
spent loading and saving data. In this case, the I/O routines would be the best targets for
optimization (in particular, the file formats themselves), and not the computational routine.
The most useful form of profiling offered by OCaml is on native code compiled with ocamlopt.
In this case, specifying the -p flag (at compilation and at linking) results in the generation
of a "gmon.out" file after the resulting executable is run. This file can be interpreted by the
GNU profiler gprof using the syntax:
gprof name >profile. txt
where name is the name of the executable. The resulting file "profile.txt" is split into three
sections:
1. List of functions in the program in descending order of the time which was spent within
the body of the function, Le. for a function !, this time excludes the time-spent in the
bodies of any other functions which were called by !.
2. A hierarchical representation of the time taken by each function call made in the pro-
gram.
3. A bibliography of function references.
For example, the following test program "sort.ml" loads file "input.dat" of numbers, sorts the
numbers and saves the result as a file "output.dat":
let
let infile = "input .dat" and outfile = "output. datil in
let data =
let ch = open_in infile in
let rec load 1 =
try load (float_of_string (input_line ch) .. 1)
with End_oCfile -> 1 in
load [J in
let data = List. sort compare data in
let ch = open_out outfile in
List. iter (fun x -> output_string ch (string_of_float x data
This program can be compiled into a native-code executable called "sort" with code to perform
profiling measurements inserted by the compiler using:
$ ocamlopt -p sort.ml -0 sort
When executed with a file "input.dat" containing 10
6
random numbers, this program runs 50%
slower than without profiling, Le. than if the -p flag had not been specified:
$ ./sort
7.1. PROFILING 147
The gprof program uses the "gmon.out" and "sort" files to create a textual representation of
the profiling information:
$ gprof sort >profile.txt
The same "sort" executable should be used to generate profiling information as was executed
to create the "gmon.out" file. Failing to do so will result in misleadingly erroneous output.
This information can be quite lengthy, hence we have chosen to pipe the output into a file
"profile.txt". In this case, we find the first section of the "profile.txt" to contain the following
profile information:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative
time seconds
35.43 5.13
18.54 7.82
11.40 9.47
3.32 9.95
2.76 10.35
2.69 10.74
2.62 11.12
2.24 11.44
1.93 11.72
1.93 12.00
1. 76 12.26
1. 73 12.51
1.38 12.71
self
seconds calls
5.13 1837
2.69 18673252
1.65 1022
0.48 12209450
0.40 2859
0.39 1000001
0.38 174762
0.33 12209439
0.28 300950
0.28 475712
0.26 8870322
0.25 12209626
0.20 12209439
self
s/call
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
total
s/call name
0.00 mark_slice
0.00 compare_val
0.00 sweep_slice
0.00 caml_oldify_one
0.00 caml_oldify_mopup
0.00 caml_format_float
0.00 camlList __rev_merge_275
0.00 caml_alloc_shr
0.00 camlList __rev_merge_rev_285
0.00 camlList__chop_267
0.00 caml_fl_merge_block
0.00 caml_fl_allocate
0.00 allocate_block
This section indicates the time spent within the body of each profiled function, Le. excluding
the time spent in child functions. In this case, the mark_slice function, part of the OCaml
garbage collector, is seen to have accounted for rv 35% the entire running time of the program,
taking 5.13 seconds in total during 1837 calls to this function. Although this is interesting
information, the second section typically provides more useful details.
The second section of the profiling information decomposes the time taken to execute the
program in terms of the hierarchy of function calls made by the program. Each subsection
concentrates on a different function, showing the functions which called it (including how
much time was spent in them and how many times they called) above the function itself and
the functions called below. In this case, the first subsection concerns the caml_main function:
granularity: each sample hit covers 2 byte(s) for 0.07% of 14.48 seconds
index %time self children called name
0.00 14.38 1/1 main [2]
[1] 99.3 0.00 14.38 1 caml_main [1]
0.00 14.37 1/1 caml_start_program [3]
0.00 0.01 1/1 caml_init_gc [60]
0.00 0.00 1/1 caml_init_custom_operations [133]
0.00 0.00 1/1 caml_init_ieee_floats [134]
0.00 0.00 1/1 parse_camlrunparam [147]
0.00 0.00 1/1 caml_init_signals [135]
0.00 0.00 1/1 init_atoms [142]
0.00 0.00 1/1 caml_executable_name [131]
0.00 0.00 1/1 caml_sys_init [139]
-----------------------------------------------
In this case, the caml_main function is seen to have been called once, by the main func-
tion, and to have called several other functions. Of the functions called by camLmain, the
camLstart_program function accounted for virtually all of the time.
Later subsections, concerning functions which the programmer has control over, provide more
useful information. The subsection concerning the camlList __sort_295 function is of partic-
ular interest:
[6] 67.4 0.03
0.00
9.73
4.76
1+951424 <cycle 1 as a whole> [6]
349525 camlList __sort_295 <cycle 1> [13]
Firstly, note that the names of the OCaml functions have been mangled. In the name
camlList __sort_295, the prefix caml simply denotes a function generated from OCaml code,
the List __ denotes a function from the List module, the sort is the name of the function
and the _295 is an internal index which allows OCaml to identify the precise function within
the program (which may contain numerous functions called sort).
The function referred to as camlList __sort_295 is, in fact, not the List. sort function called
by our program but another function named sort which is nested within the List. stable_sort
function. This information can, of course, be discovered by examining the second section of
the profile, tracing the function calls made by the program.
In this case, we see that a function in the List module called sort accounted for 67.4% of the
total running time of the program, having been called 349,525 times. The <cycle 1> refers
to the fact that this function is one in a chain of functions calling each other. Examining
the source code shows that the nested sort function is mutually recursive with a rev_sort
function. Indeed, other sections of the profile provide detailed information on the breakdown
of the time spent in these functions. Given that this program spends most of its time sorting,
and not loading or saving data, the sort function is the most suitable target for optimisation.
In this case, perhaps an array-based sort would be faster.
Profiling can be used to identify the performance critical portions of whole programs. These
portions of the program can then be targeted for optimisation. Algorithmic optimisations are
the most important set of optimisations which can be applied.
7.2 Algorithmic optimization
As we saw in chapter 3, the choice of data structure and of algorithm can have a huge impact
on the performance of a program.
In the context of program optimisation, intuition is often terribly misleading. Specifically,
given the profile of a program, intuition often tempts us to perform low-level optimisations on
the function or functions which account for the largest proportion of the running time of the
program. Counter intuitively, the most productive optimisations often stem from attempts to
reduce the number of calls made to the performance-critical functions, rather than trying to
optimise the functions themselves. Thus, a programmer must always strive to "see the forest
for the trees".
7.3. LOWER-LEVEL OPTIMIZATIONS 149
If profiling shows that most of the time is spent performing many calls to a single function
then, before trying to optimize this function (which can only improve performance by a con-
stant factor), consider alternative algorithms and data structures which can perform the same
computation whilst executing this primitive operation less often. This can reduce the asymp-
totic complexity of the most time-consuming portion of the program and is likely to provide
the most significant increases in performance.
For example, if profiling a program indicates that 80% of the running time is spent searching
an array for a given element, altering the program to use a set data structure, instead of an
array, is likely to be a productive optimisation, as searching is O(n) for arrays and O(1n n) for
sets (see table 3.3).
An extensive review of the performances of algorithms used in scientific computing is beyond
the scope of this book. The current favourite computationally-intensive algorithm used to
attack scientific problems in any particular subject area is often a rapidly moving target.
Thus, in order to obtain information on the state-of-the-art choice of algorithm it is necessary
to refer to published research in the specific area, or to web sites. When discussing particular
topics, we shall endeavour to reference recent research.
Only once all attempts to reduce the asymptotic complexity have been exhausted should other
forms of optimization be considered. We shall consider such optimizations in the next section.
7.3 Lower-level optimizations
Cache coherency should be considered carefully as the cost of cache-misses can be over an
order of magnitude in performance. There are two principal optimizations to improve cache
coherency. The first is to alter the algorithm such that the asymptotic complexity is preserved
whilst reducing the randomness in accesses to data structures. The second is to use data
structures which consume less memory and, therefore, are more likely to fit into caches. Both
of these optimizations require careful selection of the data structure. We shall now quantify
some of the differences in performance between the built-in data structures.
7.3.1 Benchmarking data structures
Measuring the performance of operations over the most common data structures can be a
productive way to obtain quantitative information which can then be used to justify design
decisions objectively when programming. However, measuring the performance of functions
in any setting, other than those in which the functions are to be used in practice, can easily
produce misleading results. Although we have made every attempt to provide independent
performance measurements, effects such as the requirements put upon the garbage collector
by the different algorithms are always likely to introduce systematic errors. Consequently,
the performance measurements which we now present must be regarded only as indicative
measurements.
Common operations over data structures include:
the construction of a data structure containing a given number of elements n (e.g. using
the Array.init function),
-
150
109
2
t
-18
-19
-20 -.-----
--._-----------
-21
-22 I '"
-23 ....- ..-,------. ..--- _
ow
-24
-25
CHAPTER 7. OPTIMIZATION
Array
List
Set
2 4 6 8
109
2
n
10 12 14 16 18 20
Figure 7.1: Measured performance (time t in seconds per element) of the creation oflists,
arrays and set data structures as a function of the number of elements n.
.............-.......L...
",.- __.--... f I ~
-..-.... ~
-.__...-..-... ~ ....----------
109
2
t
-18
-19
-20
-21
-22 ...
-23 '-. ."... .I
......._--J
-24 .......- .......- ......,.
-25
Array
List
Set
2 4 6 8
109
2
n
10 12 14 16 18 20
Figure 7.2: Measured performance (time t in seconds per element) of map functions over
list, array and set data structures containing n elements.
109
2
t
-18
-19
-20
-21
-22 ...
-23 ....
.... lIIll
-24 .. ....
.....-.....
-25
2 4 6 8
~
109
2
n
10 12 14 16 18 20
Array
List
Figure 7.3: Measured performance (time t in seconds per element) of the left-fold func-
tions over list and array data structures data structures containing n elements.
Array
List
Set
/'
.'"
L ....~
... . ,'"
... ...-
-. 111:, p ~ ........._-
II ... ~ _ ..........e...
.......... ~ ------
.... ----.-...__11".
1092 t
-18
-19
-20
-21
-22
-23
-24
-25
2 4 6 8
109
2
n
10 12 14 16 18 20
Figure 7.4: Measured performance (time t in seconds per element) of the right-fold func-
tions over list, array and set data structures containing n elements.
mapping onto a new data structure (e.g. using the List .map and Array.map functions),
and
folding a function over the elements in a data structure (e.g. using the higher-order
List .fold_left function).
In order to give some idea of the relative performance of these common operations when
dealing with the most common data structures - lists, arrays and sets - we have measured
the performance of these operations in artificial settings. In all cases, the data structures
contain elements of type float. The set data structure was created as:
module Key = struet
type t =flo at
let compare i j = if i -. j < O. then -1 else if i = j then 0 else 1
end
module Set = Set. Make (Key)
This data structure may then be compared against lists and arrays.
The Array. ini t function, used to create arrays, was matched by an equivalent, tail-recursive
1
list_init function to create lists:
# let list_init n f =
let ree aux n 1 = if n < 0 then 1 else aux (n - 1) (f n " 1) in
aux (n-1) [];;
val list_init : int -> (int -> 'a) -> 'a list = <fun>
and a tail-recursive set_ini t function to create sets:
# let set_init n f =
let ree aux n s = if n < 0 then s else aux (n - 1) (Set. add (f n) s) in
aux (n - 1) Set. empty; ;
val set_init : int -> (int -> Set.elt) -> Set.t = <fun>
ITail recursion is the most important low-level optimisation and is discussed in section 7.3.3.1.
The measured performance of these functions when used to create data structures of different
sizes is shown in figure 7.1. Array initialisation is fastest, followed by list initialisation and,
finally, set initialisation.
Array initialisation consists of allocating the whole array followed by filling in the elements
sequentially. Consequently, array initialisation becomes increasingly efficient for larger n, due
to cache coherency and the lessening significance of the initial allocation.
In contrast, list initialisation requires each element to be allocated individually, to create a
2-tuple of the head of the list and a reference to the tail. As a result, list initialisation is
typically 4- to 6-times slower than array initialisation, dominated by the cost of allocation,
and the time taken per element is roughly constant (independent of the size of the final list).
The efficiency of set initialisation worsens for larger n, as expected from the O(ln n) asymptotic
complexity ofelement insertion. However, considering the substantially more complicated data
structure (a balanced binary tree) which underpins sets, the performance of set creation is
surprisingly efficient compared to list creation.
The Array.map and non-tail-recursive List . map functions were used with an equivalent, tail-
recursive set_map function for sets:
# let set_map f s =Set. fold (fun e s -> Set. add (f e) s) s Set. empty
val set_map: (Set. elt -> Set. elt) -> Set. t -> Set. t = <fun>
to measure performance of map operations (illustrated in figure 7.2). Overall, the measure-
ments are similar to those of creation as the performance is dominated by the cost of allocating
a new data structure. However, two important features are clearly visible. Firstly, the per-
formance of the List . map function worsens considerably for n > 2
16
. This is due to the
non-tail-recursive nature of this function incurring a significant performance cost for deep
recursion. In contrast, the tail-recursive set_map function does not incur this cost and, con-
sequently, actually outperforms the List. map function for n > 2
19
despite the additional
sophistication of the data structure. Secondly, the performance of the Array.map function
exhibits transients at n::::: 2
13
and 220. These sudden drops in performance can be attributed
to machine-dependent cache effects.
The Array. fold_left and tail-recursive List. fold_left functions were used to measure the
performance ofleft-folds over array and list data structures (illustrated in figure 7.3), respec-
tively, accumulating the sum of the elements in the data structures. As these operations do
not require the creation of new data structures and, therefore, do not require much allocation,
the performance of the operation for lists is much closer to that for arrays compared to the
two previous operations. Specifically, the performance is insignificantly different in the range
50 < n < 1000. However, the additional memory requirements of lists results in the perfor-
mance loss due to cache effects appearing at slightly small n, at n ::::: 2
13
compared to n ::::: 2
15
for arrays.
The Array. fOld_right, the non-tail-recursive List. fOld_right and the tail-recursive Set. fold
functions were used to measure the performance of right-folds over array, list and set data
structures (illustrated in figure 7.4), respectively, again accumulating the sum of the elements
in the data structures. As for the left-fold measurements, these right-fold results did not
require the allocation of new data structures. However, the non-tail-recursive nature of the
List. fold_right function incurs an increasing performance cost for n > 2
11
, resulting in
List. fold_right being 35-times slower than Array. fold_right for n 2
20
.
Note that, considering the sophistication of the underlying balanced binary tree data structure,
folds over sets are remarkably fast.
These benchmark results may be used as reasonably objective, quantitative evidence to justify
a choice of data structure. We shall now examine other forms of optimisation, in approximately
decreasing order of productivity.
7.3.2 Automated transformations
The first optimizations to try are automated optimizations because these require little work.
We shall now discuss options given to the compiler which affect performance and a whole
program transformation known as defunctorizing.
7.3.2.1 Compiler optimizations
The most obvious compiler optimization is to use the native-code compiler rather than the
byte-code compiler. This typically results in code executing three times as quickly (often more
in the case of numerically intensive programs).
The native-code compiler also understands three flags (presented to it on the command line)
which affect performance:
The -unsafe flag removes bounds checking when accesses are made to array elements.
The -inline n flag controls the aggressiveness of the compiler's inlining of non-recursive
functions. Specifying a larger integer n causes larger functions to be inlined. The default
is n = 1.
The -noassert flag causes assertions to be skipped when compiling.
Removing bounds checking can increase performance by up to 15% but at the severe cost of
rendering an OCaml program unsafe to run. Consequently, turning bounds checking off is not
recommended.
Inlining a function explicitly substitutes a called function into the body of the caller. This
removes the overhead of making the function call and can facilitate better optimization of the
resulting machine code but often at the cost of increasing the amount of code and, therefore,
reducing program-cache coherence. Consequently, more aggressive inlining can increase or
decrease performance.
Assertions, of the form assert epred), verify the given predicate pred evaluates to true,
raising the Assert_failure exception if the predicate is false. This a useful way to perform
run-time sanity checks which can all be removed to improve performance by compiling with
the -noassert flag.
7.3.2.2 Defunctorizing
The current OCaml native-code compiler does not inline functions from functors. Performing
this inlining can provide a significant increase in performance (typically by less than a factor
of two but, in a contrived example, by a factor of over ten times). This optimization can
be performed by feeding the source code through a defunctorizer, such as the freely available
ocamldefun program before compiling.
7.3.3 Manual transformations
As a last resort, program transformations performed manually should be considered as a
means of optimization. We shall now examine several different approaches. Although we try
to associate quantitative performance benefits with the various approaches, these are only
indicative and are often chosen to represent the best-case.
7.3.3.1 Tail-recursion
Straightforward recursion is very efficient when used in moderation. However, the performance
of deeply recursive functions can suffer. This can be seen on the performance measurements
for the non-tail-recursive List . map and List . fold_right functions (shown in figures 7.2 and
7.4, respectively) at large n. Indeed, for n > 2
18
the performance is dominated by the cost
of recursion in both cases. Performance degradation due to deep recursion can be avoided by
performing tail recursion.
If a recursive function call is not tail recursive, state will be stored such that it may be restored
after the recursive call has completed. This storing, and the subsequent retrieving, of state is
responsible for the performance degradation.
Tail recursion involves writing recursive calls in a form which does not need this state. Most
simply, a tail recursive call returns the result of the recursive call directly, Le. without per-
forming any computation on the result.
For example, the fOld_range function from page 36 could have been defined as:
# let rec fold_range1 f accu 1 u =
if 1 int -> 'a) -> 'a -> int -> int -> 'a = <fun>
but was actually defined a..c;:
# let rec fold_range2 f accu 1 u =
if 1 int -> 'a) -> 'a -> int -> int -> 'a = <fun>
These fold_range1 and fold_range2 functions produce the same results:
# fold_range1 (fun t h -> h :: t) [] 0 10;;
- : int list = [0; 1; 2; 3; 4; 5; 6; 7; 8; 9]
# fold_range2 (fun t h -> h :: t) [] 0 10;;
- : int list = [0; 1; 2; 3; 4; 5; 6; 7; 8; 9]
. .
I
-. r;#J e:-: .
II II ~ : a =
--III ml,,[hA1I
II II II .... IlIIIli
.,. II
.:-......... II'" IlAt
II \.. II:.. .,..-
....g..,l116
11
t
1.x10-
4
0.5x10-
4
0.25x10-
4
0.75x10-
4
n
2000 4000 6000 8000
a)
t
0.1
0.08 .
."
..
0.06
-
fold_range1
0.04
0.02
II ...................
fold_range2
~
-..
............
n
2.5x10
5
5.x10
5
7.5x10
5
10.x10
5
b)
Figure 7.5: Measured performance of the non-tail-recursive fold_range1 and tail-
recursive fold_range2 functions summing n integers in t in seconds, showing: a) the
non-tail-recursive form is rv 15% more efficient for small ranges (n < 2
12
), and b) the
tail-recursive form is 5.7x more efficient for large ranges (n > 2
15
).
However, the fold_range1 function acts upon the result of its recursive call by passing the
result as an argument to the function f. Conversely, the fold_range2 function uses the result
of the call to f as an argument to the recursive call, returning the result of the recursive call
without acting upon it. This difference is due to the fold_range1 function counting upwards
(i.e. applying f to 1 first) whereas the fold_range2 function counts downwards (i.e. applying
f to u-1 first). Thus, the fold_range1 function is not tail recursive whereas the fold_range2
function is tail recursive.
As we have seen, tail recursiveness affects performance. Performance measurements for the
fold_range1 and fold_range2 functions are illustrated in figure 7.5. In this case, as for most
other functions, the non-tail-recursive form is slightly faster for shallow recursions (small n)
whereas the tail-recursive form is considerably faster for deep recursions (large n).
Tail-recursion optimisations lead to several high-level code transformations which can con-
siderably improve performance for large inputs. Specifically, the rev, rev_map, rev_map2
and rev_append functions in the List module are all tail-recursive and can, therefore, be
used to form replacements for the non-tail-recursive append, concat, flatten, map, map2,
fOld_right, fold_right2, remove_assoc, remove_assq, split, combine and merge. We
shall now examine some code transformations which can improve performance on large input
lists.
Most simply, applying an operator which is both associative and commutative over a list can
be done equivalently using fold_left or fOld_right. As the latter is not tail recursive, the
fold_left function should be preferred. For example, as integer addition commutes, the
following are equivalent:
# let sum1 = List. fold_left ( + ) 0;;
val sum1 : int list -> int = <fun>
#letsum21=List.fold_right(+) 10;;
val sum2 : int list -> int = <fun>
The tail-recursive sum1 function will be significantly more efficient than the non-tail-recursive
sum2 function when 1 has a large number of elements (i.e. 10
3
).
The humble map function is another useful, but not tail-recursive, function. For long lists,
the map function may be replaced by the tail-recursive rev_map function, which maps onto a
reversed list, followed by the rev function, if necessary. For example, the following function
will produce the same results as the map function (provided the function f being applied is
order-independent, e.g. if f is purely functional) but is considerably more efficient for long
lists:
# let map_tr f 1 = List. rev (List. rev_map f 1);;
val map_tr: (, a -> 'b) -> 'a list -> 'b list = <fun>
As a slightly more sophisticated example, the List. flatten function is not tail recursive and
may be considered equivalent to:
# let flatten 1 = List. fold_right (fun 1 accu -> 1 @ accu) 1 [J ; ;
val flatten: 'a list list - > 'a list = <fun>
This may be replaced with a tail-recursive version which builds the result up in reverse order
using rev_append before reversing the result using rev:
# let flatten_tr 1 =
let aux accu 1 = List. rev_append 1 accu in
List.rev (List.fold_left aux [J 1);;
val flatten_tr: 'a list list -> 'a list = <fun>
For example:
# flatten_tr [[1; 2; 3J; [4; 5; 6J; [7; 8; 9JJ;;
- : int list = [1; 2; 3; 4; 5; 6; 7; 8; 9J
This is useful when inlining sequences, for example when flattening a hierarchical data struc-
ture. However, note that the depth of recursion in the non-tail-recursive implementation of
the flatten function is due to the fOld_right and, therefore, is equal to the length of the
input list, not the length of the output list. Thus, the effects of tail recursion on practical
uses of the flatten function are likely to be less significant than for simpler functions, such as
map and fOld_right. In general, tail recursion is most important when the depth ofrecursion
has the same complexity as the algorithm itself. For example, an O(n) algorithm with In n
deep non-tail recursion is not likely to suffer from performance degradation. The Set. fold
function and the implementation of the discrete wavelet transform presented in section 10.5
are both examples of O(n) algorithms with Inn deep non-tail recursion.
Input Temporary Output Input Output
1
0
H
flo H f(fl
o
) I
1
0
f(fl
o
) I
11
H
fl
1
H f(f1
1
) I
11 f(fl 1) I
1
2
H
fl
2 H f(fl 2) I
1
2 f(fl 2 ) I
1
3
H
fl
3
H
f(f1
3
) I
1
3 f(f1
3
) I
1
4
H
fl
4 H f(fl 4) I
1
4 f(fl 4) I

1
n
-1
H
fl
n
-
1
H f(fln_l) I
1
n
_
1 f(fln_l) I
List.map f (List.map f 1) List.map (fun e -> f (f e 1
Figure 7.6: Deforestation refers to methods used to reduce the size of temporary data,
such as the use of composite functions to avoid the creation of temporary data struc-
tures illustrated here: a) mapping a function f over a list 1 twice, and b) mapping the
composite function f 0 f over the list 1 once.
7.3.3.2 Deforesting
Functional programming style often results in the creation of temporary data due to the
repeated use of maps, folds and other similar functions. The reduction of such temporary data
is known as deforestation. In particular, the optimization of performing functions sequentially
on elements rather than containers (such as lists and arrays) in order to minimize the number
of temporary containers created (illustrated in figure 7.6).
For example, the Shannon entropy H of a vector v representing a discrete probability distri-
bution is given by:
n
H(v) = L: vdn IVil
i=1
This could be written in OCaml by creating temporary containers, firstly Ui = In Vi and then
Wi =UiVi and finally calculating the sum H(v) = L:.:i Wi:
# let entropy1 v =
let u = List . map log v in
let w = List.map2 (*. ) v u in
List.fold_left(+.) O. w;;
val entropy1 : float list -> float = <fun>
This function can be completely deforested by performing all of the arithmetic operations at
once for each element, avoiding the use of the temporary containers u and w:
158
-5
-10
-15
-20
,"
2 4 6 8 10 12 14.;e "!....
,;II'" .#
1/"" ,
.,,' 't1'
._-

CHAPTER 7. OPTIMIZATION
entropy1
entropy2
Figure 7.7: Measured performance of the entropy1 and entropy2 functions computing
the Shannon entropy of arrays of n floating-point numbers, showing time taken t in
seconds.
# let entropy2 v =
List.fold_left (fun h v -> h +. v *. log v) o. v;;
val entropy2 : float list -> float.=: <fun>
The measured performance of the entropy1 and entropy2 functions is shown in figure 7.7.
The entropy2 function is rv 35 times faster for n 10
6
.
7.3.3.3 Terminating early
Algorithms may execute more quickly if they are allowed to terminate prematurely. However,
the trade-off between any extra tests required and the savings of exiting early can be difficult to
predict. The only general solution is to try premature termination when performance is likely
to be enhanced and revert to the simpler form if the savings are not found to be significant.
We shall now consider a simple example of premature termination as found in the core library
as well as a more sophisticated example requiring the use of exceptions.
The for_all function in the List module applies a predicate function to elements in a list,
returning true if the predicate was found to be true for all elements and false otherwise.
Note that the predicate need not be applied to all elements in the list, as the result is known
to be false as soon as the predicate returns false for any given element. In the core library,
this function is implemented as:
# let rec for _all p =function
[] -> true
I a::l->p a&&for_all p 1;;
The premature termination of this function is not immediately obvious. In fact, the && op-
erator has the unusual semantics of in-order, short-circuit evaluation. This means that the
expression p a will be evaluated first and only if the result is true will the expression for _all
p 1 be evaluated. Consequently, this implementation of the for_all function can return false
without recursively applying the predicate function p to all of the elements in the given list.
When using higher-order functions, such as folds, algorithms can no longer be prematurely
terminated in this way, Le. by not recursing. The solution is to escape from the higher-order
function by raising an exception. For example, the for _all function could be written in terms
of iter:
# exception Finished;;
exception Finished
# let for _all p 1:=
try
List. iter (fun e -> if not (p e) then raise Finished) 1;
true
with Finished -> false; ;
val for _all: (, a -> bool) -> 'a list -> bool := <fun>
This implementation ofthe for_all function tries to apply the predicate to all of the elements
of the given list without any applications returning false. If this is achieved then true is
returned. Otherwise, the Finished exception will be raised when the predicate function first
returns false. This exception will be caught by the try construct and the function returns
false.
Effectively, this use of exceptions allows functions to "escape" from deep recursions. This use
of exceptions is both generally applicable and useful.
7.3.3.4 Specializing data structures
A trade-off often exists between the genericity and the efficiency of data structures and func-
tions. This section concentrates on the performance impact of generic data structures. The
next section deals with generic (polymorphic) functions.
The humble mathematical vector, for example, is ubiquitous in numerical applications. How-
ever, several different data structures can be used to represent a vector. As always, the choice
of data structure can have a strong effect on the performance of the resulting program.
For example, if a vector is represented by a float list, the cross product may then be written
such that it raises an exception if called with vectors of any dimensionality except 3:
# let vec_crossl a b := match (a, b) with
([xl; yl; zlJ, [x2; y2; z2J)->
[yh.z2 -. zh.y2; zh.x2 -. xh.z2; xh.y2 -. yh.x2J
I _ -> invalid_arg "vec_cross";;
val vec_crossl : float list -> float list -> float list = <fun>
Alternatively, for a program which only uses 3D vectors (such as a particle simulation) we
may have chosen to represent a 3D vector using the record type:
# type vec3 = {x:float; y:float; z:float};;
type vec3 := { x : float; y : float; z : float; }
The cross product may then be written such that it is always valid:
# let vec_cross2 {x=x1; y=y1; z=z1} {x=x2; y=y2; z=z2} =
{x=yh.z2-. zh.y2;y=zh.x2-. xh.z2;z=xh.y2-. yh.x2};;
val vec_cross2 : vec3 -> vec3 -> vec3 == <fun>
Essentially, the OCaml type checker verifies at compile-time that the vectors passed to the
vec_cross2 function will always have three elements. Therefore, this need not be checked
at run-time, avoiding some computation. Consequently, the vec_cross2 function is rv 45%
faster than the vec_cross1 function
2
7.3.3.5 Avoiding polymorphic numerical functions

In OCaml, the creation and use of polymorphic functions (generic over the types which they
can handle) can be subtle. Avoiding polymorphic functions in the primitive operations of
numerically intensive algorithms can significantly improve performance.
For example, consider two functions which add and multiply the elements in an array of
floating-point values, respectively:
# let sum1 a =
let r == ref O. in
for i = 0 to Array. length a - 1 do
r:=!r+.a.(i)
done;
!r; ;
val sum1 : float array -> float == <fun>
# let product1 a ==
let r == ref 1. in
for i = 0 to Array. length a - 1 do
r:=!r*.a.(i)
done;
!r; ;
val product1 : float array -> float = <fun>
The common, higher-order fold_left function can be factored out from sum1 and product1:
val fold_left: ('a -> 'b -> 'a) -> 'a -> 'b array -> 'a
When written in terms of fOld_left, the sum and product functions may be written more
concisely:
# let sum2 = Array. fold_left ( +. ) O.
and product2 == Array. fold_left ( *. ) 1.;;
val sum2 : float array -> float == <fun>
val product2 : float array -> float == <fun>
Clearly, this has significantly reduced the amount of code required to provide the specified
functionality.
2This optimisation is facilitated by the unboxing of the vec3 type, as we shall see in section 7.3.3.6.
sum2
sum1
.
.
...
n
0.25x10
6
0.5x10
6
0.75x10
6
1.x10
6
.
..
t
0.05
0.04
0.03
0.02
0.01
E_..:.iP"'-
a)
product1
....
./
J"/ ..'
- - - - .-.
./""""... .:..----....
J - - ~
n
0.25x10
6
0.5x10
6
0.75x10
6
1.x10
6
0.01 product2
t
0.05
0.04
0.03
0.02
b)
Figure 7.8: Measured performance of the sum and product functions when applied to
arrays of n floating-point numbers, showing time taken t in seconds for: a) the sum
functions, and b) the product functions.
However, the fold_left function is polymorphic. Currently, the OCaml compiler only gener-
ates generic implementations of polymorphic functions. Consequently, the fold_left function
contains dispatch code to perform the task appropriately for any given type (although the type
is always float in this case).
Also, the current OCaml compilers do not inline functions which are passed as arguments to
higher-order functions. Consequently, trivial functions passed to Array. fold_left cannot be
inlined and the resulting function call can be a significant overhead.
These overheads result in the sum2 and product2 functions executing significantly more
slowly than the sum1 and product1 functions. The measured performance of the sum1, sum2,
product 1 and product2 functions is shown in figure 7.8. The polymorphism-free sum1 and
product1 functions are rv 50% faster than the polymorphic-fold-based sum2 and product2
functions. Thus, polymorphic functions should not be used in the performance-critical parts
of programs.
7.3.3.6 Unboxing data structures
Typically in functional languages, most data structures are boxed. This means that data
structures are stored as a reference to a different piece of memory. Although elegant, boxing
Figure 7.9: As data structures are boxed by default, an array of complex numbers Zi =
xi+iYi stored as a (float * float) array is actually represented by an array of pointers
to pairs of pointers to floating-point numbers.
Figure 7.10: The ocamlopt compiler unboxes records with all fields of the type float.
Consequently, an array ofcomplex numbers stored as a Complex. t array is more efficient
than a (float * float) array.
can incur significant performance costs.
For example, much of the efficiency of arrays stems from their elements occupying a contiguous
portion of memory and, therefore, accesses to elements with similar indices are cache coherent.
However, if the array elements are boxed, only the references to the data structures will be in
a contiguous portion of memory (see figure 7.10). The data structures themselves may be at
completely random locations. Consequently, cache coherency may be very poor.
Fortunately, ocamlopt will not box values of many types, including float array, decreasing
memory requirements and greatly improving performance. Consequently, in performance-
critical code, unboxed data structures should be used in preference to boxed data structures.
For example, computing the product of an array of complex numbers is unnecessarily inefficient
when the numbers are represented by a (float * float) array (see figure 7.9). A Complex. t

Figure 7.11: The ocamlopt compiler unboxes values of the type float array. Conse-
quently, an array of complex numbers stored as a float array of alternate real and
imaginary values is more efficient than a Complex. t array.
array is more efficient (see figure 7.10), where Complex. t is defined in the Complex module
of the core library as the record type:
type t -= { re ; float; im : float; }
A function to compute the product of an array of complex numbers might be written over
the type (float * float) array, where each element is a 2-tuple representing the real and
imaginary parts of a complex number:
# let c_prodl a =
letz=ref (1.,0.) in
for i=O to Array. length a - 1 do
z := match !z, a. (i) with (rel, im!), (re2, im2) ->
rel *. re2 -. iml *. im2, rel *. im2 +. iml *. re2
done;
! z; ;
val c_prodl : (float * float) array -> float * float = <fun>
A more efficient alternative may be written over the type Complex. t array:
# let c_prod2 a =
let z = ref Complex. one in
for i=O to Array.length a - 1 do
z := Complex.mul !z a. (D
done;
(lz).Complex.re, (lz).Complex.im;;
val c_prod2 : Complex. t array -> float * float = <fun>
An even more efficient alternative may be written by storing the array of n complex numbers
in a float array with 2n elements (see figure 7.11):
# let cprod3 a =
let z_re = ref 1. in let z3m = ref O. in
for i=O to Array.length a / 2 - 1 do
z_re := !z_re *. a. (i * 2) -. lz_im *. a. (i * 2 + 1);
z_im := !z_re *. a. (i * 2 + 1) +. !z_im *. a. (i * 2);
done;
!z_re, !z_im;;
val c_prod3 : float array -> float * float = <fun>
Measuring the performance of the c_prodl, c_prod2 and c_prod3 functions (illustrated in
figure 7.12) shows that the c_prod2 function is rv 33% faster than the c_prodl function and
the c_prod3 function is rv 28% faster than the c_prod2 function.
Having examined the many ways OCaml programs may be optimised, we shall now review
existing libraries which may be of use to scientific programmers before presenting a variety of
example functions and programs.
t
0.07
0.06
0.05
0.04
0.03
0.02
0.01
-
.. -.-.
.... . .....
D rl'.
... ""...-.... -.
.. ...
l1li- ,JI.. .
..- ........ . .
~
. .
......... JI" ......
.. -. rtfJ ....
.."" "'..L -.". ......
..""oIl; ~ ....
",. \01'01'"
~ ~ ...-
n
1x10
5
2x10
5
3x10
5
4x10
5
5x10
5
c-prod1
c-prod2
c-prod3
Figure 7.12: Measured performance of the c_prodl, c_prod2 and c_prod3 functions com-
puting the product of arrays of n complex numbers showing time taken t in seconds.
Chapter 8
Libraries
8.1 Command-line arguments
Many programs, particularly under Unix, are designed to be run from the command line.
Such programs typically allow arguments to be passed on the command line. For example,
the ocamlopt compiler allows several flags such as -p and -unsafe, as we have seen:
$ ocamlopt -p -unsafe test.ml -0 test
The Arg module can be used to parse such command-line arguments and, therefore, can
be useful when writing controllable programs. This module is described in detail in the
OCaml reference manual [2] but we shall provide an overview and examples here because this
functionality is often required in scientific programs.
Parsing is performed by the parse function in the Arg module:
val parse;
(Arg.key * Arg.spec * Arg.doc) list ->
Arg. anon_fun -> Arg. usage_msg -> unit
Command-line arguments may be specified using a hyphen, known as keyword arguments.
For example, the -unsafe argument to the ocamlopt compiler is a keyword. Such named
arguments mayor may not be followed by associated information. For example, the -inline
argument to the ocamlopt compiler is followed by an integer.
Command-line arguments may also be specified without a keyword, known as anonymous ar-
guments. For example, the file names to be compiled by ocamlopt are specified as anonymous
arguments, such as test .ml in the above example.
The first argument to the parse function is a list of 3-tuples specifying the keyword, action
specification and description of each named command-line argument. The second argument
specifies the function used to handle anonymous command-line arguments (i.e. those without
a keyword). The final argument specifies the usage message, printed if invalid command-line
arguments are given to the program.
In the type of the parse function, the types key, doc and usage_msg are all string and the
type spec is:
165
166
type spec =
Unit of (unit -> unit)
Boolof (bool -> unit)
Set of bool ref
Clear of bool ref
String of (string -> unit)
Set_string of string ref
Int of (int -> unit)
Set_int of int ref
Float of (float -> unit)
Set_float of float ref
Tuple of Arg. spec list
Symbol of string list * (string -> unit)
Rest of (string -> unit)
CHAPTER 8. LIBRARIES
This spec type allows appropriate actions to be specified for a comprehensive selection of
named arguments. The way in which the Arg module is used to parse command-line arguments
is best understood by example.
Consider a simple program "test.ml" which prints x
Y
for some specified x and y. The arguments
x and y are most easily specified as anonymous arguments:
let x, y =
let input = ref [J in
let usage = "Usage: test <x> <y>" in
Arg.parse [J (fun x -> input := x :: !input) usage;
match! input with
[y; xJ -> float_of_string x, float_of_string y
I _ -> invalid_arg usage in
Note that each new anonymous argument is prepended onto the string list referenced by
input and, consequently, appear in reverse order in the pattern match.
This program can be compiled and executed quite simply because the Arg module is provided
in the core OCaml distribution:
$ ./test 2 3
8.
The Arg module also provides a -he1 p argument which causes the program to print its
command-line options and exit:
$ ./test -help
Usage: test <x> <y>
-help Display this list of options
--help Display this list of options
For a more sophisticated example, consider a similar program which allows the arguments x
and y to be specified as named arguments. This may be written by specifying the keywords
-x and -y and using the Float constructor to provide a function to set x and y:
8.2. TIMING 167
let x, y -=
let x = ref None and y = ref None in
let usage = "Usage: test -x <x> -y <Y>" in
let set_x = "-x", Arg.Float (fun a -> x := Some a), "the x value" in
let set_y -= "-y", Arg.Float (fun a -> y :-= Some a), "the y value" in
Arg.parse [set_x; set_yJ ignore usage;
match !x, !y with
Some x, Some y -> x, y
I _ -> invalid_arg usage in
This program may be compiled and executed as simply as the first example:
$ ./test -x 2 -y 3
8.
Naturally, named command-line arguments may be specified in any order:
$ ./test -y 3 -x 2
8.
Specifying the -help argument now provides information on each named argument:
$ ./test -help
Usage: test -x <x> -y <y>
-x the x value
-y the y value
-help Display this list of options
--help Display this list of options
The ability to parse command-line arguments can be a productive first step towards writing
easy-to-use programs.
8.2 Timing
Two different timing functions are provided by OCaml when running under Unix:
The Sys. time function returns the time in seconds spent executing the current program.
The Unix .gettimeofday function returns the time in seconds since the "Epoch"l.
These timing functions can be useful in several circumstances. Most simply, as a means of
profiling a program, measuring the amount of time required to perform different operations.
By using timing functions selectively, useful profiling information can be obtained without
introducing the overhead of full profiling caused by compiling with profiling on (as discussed
IMidnight UTe 1st January 1970
168 CHAPTER 8. LIBRARIES
in section 7.1). Timing functions can also be used in benchmarking (as seen in section 7.3.1)
and in the creation of real-time programs, such as those producing animations (as discussed
in chapter 6).
When timing operations, a higher-order function which returns the time taken to compute its
function argument can be useful:
# #load "unix. ema" ; ;
# let time f =
let t = Unix.gettimeofday () in
f ();
(Unix. gettimeofday () -. t;;
val time: (unit -> 'a) -> float = <fun>
This function may be used to time any given unit -> unit function. For example, the
resolution of this timing function may be found experimentally by applying it within an
application of Array.map (which is comparatively very fast). For a large array a:
# let a = Array.make 1000000 0;;
val a ; int array = ..
The time taken to apply Array. map is small:
# time (fun () -> Array.map (fun i -> i) a);;
- : float = 0.389801025390625
The time spent in the timer function is then very significant:
# let t, b =
let b =ref [I I] in
let f () = b := Array.map (fun _ -> time (fun i -> i)) a in
let t =time f in
t, !b;;
val t : float = 1.5448310375213623
val b : float array =
[I 1. 1920928955078125e-06; 1.9073486328125e-06; 9. 5367431640625e-07; 0.;
The time taken to call this timer function is approximately 1.54 - 0.39 = 1.15J.ls and the
resolution of this timer is approximately 1J.ls (which is not surprising given that this is known
as the microsecond timer!).
In contrast, at least on this platform
2
, the Sys. time function is remarkably inaccurate by
comparison:
2 Athlon/Linux.
8.3. BIG ARRAYS 169
# let time f ""
let t "" Sys.time 0 in
f 0;
(Sys.time 0) -. t;;
val time: (unit -> 'a) -> float"" <fun>
# time (fun 0 -> Array.map (fun i -> i) a);;
- : flo at "" 0.369999999999999218
# let t, b ""
let b "" ref [I I] in
let f 0"" b :"" Array.map (fun _ -> time (fun i -> i)) a in
let t "" time f in
t, !b;;
val t : float"" 1.38000000000000078
val b : float array""
[10.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.;
Not uncoincidentally, these results are all very close to multiples of O.Ols, Le. the Sys. time
function only has centi-second resolution. Thus, the Unix. gettimeofday function is likely to
be the timer of choice, at least on this platform.
8.3 Big arrays
The array type was described in detail in section 3.2. This type offers a polymorphic, homo-
geneous container optimised for use within OCaml programs.
However, the array type suffers from two main drawbacks:
The maximum number of elements, given by Sys .max_array_length, is only around
four million
3
on a 32-bit architecture.
Values of the type array are not easily handled by functions written in other languages.
These drawbacks are most important in the context of numerical programming, which some-
times requires the use of very large arrays of numeric elements (e.g. float) and the use of
functions written in other languages, such as C and Fortran.
These drawbacks are addressed by another type of array, known as big arrays, which can
contain an arbitrary number of elements of various different numeric types and which are stored
in either C or Fortran format. However, big arrays also have some relative disadvantages:
The elegant pattern matching syntax for the array type cannot be used with big arrays.
The top-level currently lacks the ability to print the contents of a big array.
Access to the elements of big arrays from OCaml is slower than access to the array type.
3In the case of float arrays, the maximum size is half Sys .max_array_length.
Definitions pertaining to big arrays are in the Bigarray module of the core OCaml distribution
and are described in detail in the OCaml reference manual [2]. We shall give only a briefreview
of the functionality of big arrays, required to understand the remainder of this book.
A top-level including the functionality of big arrays may be created and entered using:
$ ocamlmktop bigarray. cma -0 bigarray. top
$ ./bigarray.top
The Bigarray module is designed to have its namespace opened:
# open Bigarray; ;
Specialised big arrays of one-, two- and three-dimensions may be handled using definitions
in the submodules Array1, Array2 and Array3, respectively, as well as arrays of arbitrary
dimensionality in the Genarray submodule.
Elements in a big array essentially have two different types associated with them, defined by
the kind of big array:
The OCaml type used the handle the elements, e.g. int or float .
The storage type used to represent each element in memory, e.g. int32_el tor float64_el t.
For example, the ''kind'' of a big array which uses the OCaml type float but which actually
stores the floating-point values using the 32-bit IEEE single-precision format (i.e. sacrificing
precision for memory usage) is denoted by the value:
# let mykind = float32; ;
val mykind: (float, Bigarray. float32_elt) Bigarray .kind = <abstr>
Note that the abstract type of this value prescribes the use of the OCaml type float and the
storage type float32_el t.
The array "layout" used by the C language may be specified using:
# let mylayout = c_layout;;
val mylayout : Bigarray. c_layout Bigarray . layout = <abstr>
A 1D array of this kind using this layout may then be created from a value of the type array
using the of _array function in the Array1 module:
# let a = Array. init 4 (fun i -> 1. I. float_of_int (1 + i));;
val a : float array = [11.; 0.5; 0.333333333333333315; 0.251 ]
# let a = Array1.of_array mykind mylayout a;;
val a: (float, Bigarray. float32_elt, Bigarray. c_layout)
Bigarray . Array1. t =<abstr>
8.4. VECTOR-MATRIX 171
An efficient iter function over such a big array may be defined by specializing the big array
type:
# let iter f (a: (float, float32_el t, c_layout) Array1. t) =
let len =: Array1.dim a in
if len = 0 then () else
for i =: 0 to len - 1 do
f (Arrayl.get a i)
done; ;
val iter: (float -> 'a) -> (float, Bigarray.float32_elt,
Bigarray. c_layout) Bigarray. Arrayl. t -> unit = <fun>
The contents of a may then be printed using this iter function:
# iter (fun x -> print_endline (string_oCfloat x)) a;;
1.
0.5
0.333333343267
0.25
- : unit = ()
Note the reduced precision of the representation of
We shall use big arrays in the context of vector-matrix computations and the Fourier transform.
8.4 Vector-Matrix
Many scientific problems can be phrased in terms of vector-matrix algebra. Thus, the ability to
handle vectors and matrices can be instrumental in writing scientific programs. In particular,
the ability to perform some complicated computations on them (e.g. finding the eigenvalues
of a matrix) can be pivotal in scientific programs. Such computations are often prone to
numerical error and, therefore, can be tedious to program robustly.
Fortunately, LAPACK is a well known, freely-available library of functions for performing
many such vector and matrix computations [11]. Much of the functionality of LAPACK
is available from OCaml through freely-available bindings called lacaml written by Markus
Mottl, Christophe Troestler, Oleg Trott and Liam Stewart.
Once LAPACK and lacaml are installed, a top-level which includes the functionality of the
lacaml bindings may be created using:
$ ocamlmktop -custom -cclib -11apack2 -I +lacaml bigarray.cma lacaml.cma -0
lacaml.top
Programs may be compiled similarly to byte-code and native code:
$ ocamlc -custom -cclib -11apack2 -I +lacaml bigarray.cma lacaml.cmafile.ml -ofile
$ ocamlopt -cclib -llapack2 -I +lacaml bigarray. cmxa lacaml. cmxa file .ml -0 file
The ability to use the LAPACK library to perform vector-matrix computations is a great boon
for scientific programming in OCaml.
8.5 Fourier transform
Many scientific computations require the use of the Fourier transform, or an algorithm based
upon the Fourier transform (such as fast convolution). In the context of numerical algorithms,
the Fourier transform cannot be performed. This stems from the fact that the Fourier trans-
form is obtained by taking the limit of infinite period and, of course, computers cannot handle
infinite amounts of data. However, the coefficients of a Fourier series may be computed given
uniformly spaced sampling data. Hence, Fourier series are typically computed in the place of
the Fourier transform and, misleadingly, algorithms for computing Fourier series are referred
to as Fourier transform algorithms. In particular, the Fast Fourier Transform (FFT) algo-
rithm, which computes the Fourier series of a set of n uniform samplings in 8(nln n) time
complexity for any4 n.
Implementing a FFT which works for any n is decidedly tricky. Fortunately, this hard work
has already been done for us. Matteo Frigo and Steven G. Johnson have written and dis-
tributed an excellent implementation of the FFT, called the Fastest Fourier Transform in the
West (FFTW). This implementation is freely available on the web. Christophe Troestler has
written OCaml bindings for the FFTW library and also distributed them for free on the web.
We shall now describe the use of the FFTWlibrary via these bindings, assuming both FFTW
and the bindings have already been installed.
A top-level which includes the functionality of the FFTW library may be created using:
$ ocamlmktop fftw.cma -0 fftw.top
This top-level may then be used to compute Fourier series.
$ . /fftw. top
# open Bigarray; ;
In the interests of efficiency, the FFTW library provides a function to generate partially
specialised functions to compute the FFT for a given n. The OCaml bindings to the FFTW
library present this functionality as a curried function Fftw. create. The resulting function
acts upon a big array. We shall use the following fourier function which acts upon normal
arrays of the type Complex. t array:
4 Although many numerical texts aimed at scientists (e.g. the infamous Numerical Recipes [12]) mistakenly
claim that the Fast Fourier Transform can only be performed on integer power of two numbers of samples, this
is not true. Indeed, this has not been true for decades.
8.5. FOURIER TRANSFORM
# let (fourier, ifourier) =
let to_big a =
let n = Array.length a in
let big_a = Array1. create Fftw. complex c_layout n in
Array.iteri (fun i z -> Array1.set big_a i z) a;
big_a in
let of_big big3 =
let n =Array1. dim big_a in
Array.init n (fun i -> Array1.get big3 i) in
let fft norm dir a =
let n, big_a = Array.length a, to_big a in
of_big (Fftw. create -normalize :norm dir n big_a) in
(fft false Fftw.forward, fft true Fftw.backward);;
val fourier : Complex. t array -> Complex. t array = <fun>
val ifourier : Complex. t array -> Complex. t array = <fun>
This fourier function will compute the Fourier series V
s
from U
r
:
n-l
V
s
= L ure-27rirs/n
r=O
The ifourier function will compute U
r
from the Fourier series Vs:
n-l
U
r
= .!. L vse27rirs/n
n s=O
173
Note the asymmetric normalisations, unconventional in the physical sciences.
In the interests of clarity, we shall use the following function to create a string representing a
complex number:
# let string_of_complex z = match z.Complex.re, z.Complex.im with
0.,0. ->"0"
x, O. -> string_of_float x
0., Y -> (string_of_float y)-"i"
x, y -> (string_of_float x)-" + "-(string_of_float y)-"i";;
val string_of_complex : Complex. t -> string = <fun>
and the following function to create a string representing an array of complex numbers:
# let string_of_complex_array a =
let 1 = Array. to_list a in
"[I"-(String.concat "; "(List.map string_of_complex l))-IIIJ";;
val string_of_complex_array : Complex. t array -> string = <fun>
Let us create variables to use as short-hand notations for values n = -1, z = 0 and p = 1 of
type Complex. t:
a)
U
r
0.5
-0.5
-1
b)
-8 -6 -4
Im[v
s
]
10
7.5
5
2.5

-5
-7.5
-10
2 6
.. Figure 8.1: Fourier series of a discretely sampled sine wave, showing: a) the samples
U
r
r E [0,16) and Fourier series and b) the corresponding Fourier coefficients V
s
computed numerically using FFTW.
# let (n, z, p) =
let p =Complex. one in
(Complex. neg p, Complex. zero, p) ; ;
val n : Complex.t = {Complex.re = -1.; Complex.im= -O.}
val z : Complex.t = {Complex.re = 0.; Complex.im= O.}
val p : Complex. t = {Complex. re = 1.; Complex. im = O.}
This creates an array a containing (0,1,0,-1,0,1,0,-1,0,1,0,-1,0,1,0,-1) and, therefore,
n = 16:
# let a = CI z; p; z; n; z; p; z; n; z; p; z; n; z; p; z; n 1J ; ;
val a : Complex.t array =
# string_of_complex_array a;;
-: string = "CIO; 1.; 0; -1.; 0; 1.; 0; -1.; 0; 1.; 0; -1.; 0; 1.; 0; -1.IJ"
This discrete sampling is illustrated in figure 8.1a. We shall now calculate the functional form
of the Fourier series via the numerically computed series found using FFTW.
Taking the samples to be unit-separated samples, the Nyquist frequency is lINy = The signal
a may be considered to be a sampling of a real-valued sinusoid A sin(21fl/r) with amplitude
A = 1 and frequency l/ = !lINy = i.
The FFT of a is:
# string_of_complex_array (fourier a);;
- ': string = "C 10; 0; 0; 0; -8. i; 0; 0; 0; 0; 0; 0; 0; 8. i; 0; 0; 0 IJ "
Each element V
s
of this array may be taken to represent the frequency l/(8) = 8l1Ny/n and
amplitude A = of Fourier components in the signal. As we are dealing with Fourier series,
the indices 8 are periodic over n. Consequently, we may productively interpret the second half
of this result as representing negative frequencies, as illustrated in figure 8.1b.
In this case, the only non-zero elements are V4 = -8i and V12 = 8i. This shows that the signal
can be represented as the sum of two plane waves which, in fact, partly cancel to give a sine
8.5. FOURIER TRANSFORM
wave:
f(r) ( L vSe21firs/n + L vse21fir(s/n-l))
O:::;s<!n !n:::;s<n
(V4e21fir4/16 _ V12ie21fir(12/16-1))
11
6
+

as expected.
Also, the inverse FFT of the FFT of a recovers the original a:
# string_of_complex_array (ifourier (fourier a;;
- : string = " [10; 1.; 0; -1.; 0; 1.; 0; -1.; 0; 1.; 0; -1.; 0; 1.; 0; -1. I] "
175
The FFTW library and the OCaml bindings for FFTW allow the Fourier series of large,
possibly multidimensional data sets to be computed quickly and with relative ease.
176
CHAPTER 8. LIBRARIES
Chapter 9
Simple Examples
In this chapter, we shall design and develop many simple functions. These derivations are
provided for readers who wish to expose themselves to simple examples of OCaml code before
attempting to decipher the more involved examples presented in chapter 10. Readers fluent
in OCaml may wish to skip this chapter.
9.1 Arithmetic
Many useful functions simply perform computations on the built-in number types. In this
section, we shall examine progressively more sophisticated numerical computations.
The heaviside step function:
is a simple numerical function which may be implemented trivially in OCaml:
# let heaviside x = if x < O. then o. else 1. ; ;
val heaviside : float -> float = <fun>
The Kronecker 0-function:
may be written as:
# let kronecker i j = if i = j then 1 else 0; ;
val kronecker : 'a -> 'a -> int = <fun>
However, this implementation is polymorphic, which may be undesirable for two reasons:
Static type checking will not enforce the application of this function to integer types
only.
177
178 CHAPTER 9. SIMPLE EXAMPLES
Polymorphism incurs run-time performance penalties and this function may be used in
performance-critical code.
Consequently, we may wish to restrict the type of this function by adding an explicit type
annotation:
# let kronecker (i : int) j = if i = j then 1 else 0; ;
val kronecker : int -> int -> int = <fun>
Erroneous applications of this function to inappropriate types will now be caught at compile-
time by the OCaml compilers and this function will execute more quickly due to removal of
polymorphism.
Computations involving trigonometric functions may be performed using the sin, cos, tan,
asin (arcsin), acos (arccos), atan (arctan), atan2 (as for atan but accepting signed numer-
ator and denominator to determine which quadrant the result lies in), COSh, sinh and tanh
functions. For example, the constant 1r (= 3.14159265358979312 ... ) is most easily calculated
as 1r = 4 arctan 1 using the a tan function:
# let pi = 4. *. atan 1.;;
val pi : float = 3.14159265358979312
The conventional mathematical functions vx (sqrt x) and eX (exp x) are required to compute
the Gaussian:
1 1 (X 11)2
j(x) = e ~ -,..
~ 7
Thus, a function to calculate the Gaussian may be written:
# let gaussian mu sigma x =
exp (-. sqr (x -. mu) /. (2. *. sqr sigma)) /.
(sqrt (2. *. pi) *. sigma);;
val gaussian; float -> float -> float -> float = <fun>
As this implementation of the Gaussian function is curried, a function representing a proba-
bility distribution with a given f-L and (7 may be obtained by applying the first two arguments:
# let f =gaussian 1. 0 . 5; ;
val f ; float -> float = <fun>
# Array. init 21 (fun i -> f (float_of _int i /. 10.));;
- : float array =
[10.107981933026376126; 0.157900316601788299; 0.221841669358911087;
0.299454931271489755; 0.38837210996642596; 0.483941449038286731;
0.579383105522965458; 0.666449205783599341; 0.73654028060664678;
0.78208538795091187; 0.797884560802865406; 0.782085387950911759;
0.73654028060664678; 0.666449205783599341; 0.57938310552296568;
0.483941449038286731; 0.388372109966425849; 0.299454931271489755;
0.221841669358911087; 0.157900316601788382; 0.1079819330263761261J
9.1. ARITHMETIC 179
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
Figure 9.1: The first seven rows of Pascal's triangle.
We have already seen the factorial function:
# let rec factorial n = if n < 1 then 1 else n * factorial (n - 1);;
The binomial coefficient is typically defined in mathematics as:
- r)!
Naturally, this may be written directly in terms of the factorial function:
# let binomi al n r = factorial n / (f actori al r * factorial (n - r)) ; ;
val binomial int -> int -> int = <fun>
# List.map (binomial 6) [0; 1; 2; 3; 4; 5; 6J;;
- : int list = [1; 6; 15; 20; 15; 6; 1J
However, even when the result can be represented within machine precision, this naIve imple-
mentation of the binomial function can fail because a subexpression (specifically n!) overflows:
- : int list = [1; 0; -2; -9; -24; -44; -59J
In this case, the correct result e:) = 1716 is well within machine precision and the erroneous
results due to overflowing arithmetic are likely to be deemed unacceptable.
This problem with numerical precision is most easily circumvented by resorting to computing
Pascal's triangle, where each number in the triangle is the sum of its two "parents" from the
row before (illustrated in figure 9.1). This may be represented as the recurrence relation:
(;) = { (n;') l
r=O
r=n
otherwise
Computing binomial coefficients using Pascal's triangle is more robust than computing via
factorials because the numbers involved now increase monotonically, only overflowing if the
result overflows.
The recurrence relation may be implemented as a recursive function:
# let rec binomial n r =
if r = 0 I I r = n then 1 else
binomial (n - 1) r + binomial (n - 1) (r - 1);;
val binomial: int -> int -> int = <fun>
However, the double recursion and lack of reuse of previous results leads to an asymptotic
algorithmic time-complexity of O( ~ . Although this implementation is numerically robust,
its complexity may be unacceptable.
The complexity may be improved by reusing previous results. This can be achieved by com-
puting rows of Pascal's triangle up to row n and then extracting the r
th
element. Such an
algorithm may be implemented using a list to represent each row of the triangle:
# let binomial n r =
let rec aux n =
let rec aux2 = function
[J -> []
I [hJ -> [hJ
I hi: :h2::t -> (hi + h2) :: aux2 (h2 :: t) in
if n = 0 then [1] else aux2 (0 :: aux (n - 1 in
List.nth (aux n) r;;
Alternatively, an equivalent function may be written using arrays, by mutating a single array
in-place:
# let binomial n r =
let b =Array. init (n + 1) (fun i -> if i = 0 then 1 else 0) in
for i = 1 to n do
b.(i)<-l;
for j = i - 1 downto 1 do
b.(j) <-b.Cj) +b.Cj -1);
done;
done;
b. (r); ;
val binomial : int -> int -> int = <fun>
Both the list- and array-based implementations can compute the previous example without
overflowing:
- : int list = [1; 13; 78; 286; 715; 1287; 1716J
Although the asymptotic time-complexity has been worsened from O(n) for the factorial-
based implementation to O(n
2
) for the Pascal's triangle implementations, this complexity is
still acceptable because n is limited to small values by the rapid growth of the result. Also, note
that the asymptotic space-complexity is O(n) for the implementations based upon Pascal's
triangle.
The recurrence-relation-based implementation may also be made more efficient by simply
storing previous results. This can be achieved by storing results in a hash table which maps
2-tuples (n,r) onto answers ~ :
9.2. LIST RELATED 181
# let rec binomial =
let memory = Hashtbl. create 1 in
fun n -> fun r ->
if r = 0 I I r = n then 1 else
try Hashtbl. find memory (n, r)
wi th Not_found ->
let ans = binomial (n - 1) r + binomial (n - 1) (r - 1) in
Hashtbl.add memory (n, r) ans;
ans; ;
Storing and recalling previously computed results in this manner is known as memoizing (see
section A.7). This memoized implementation of the binomial function only computes results
as necessary (lazy evaluation). Consequently, the asymptotic time-complexity of computing
C) for some constant c is now O(n) rather than O(n
2
), as for the list- and array-based im-
plementations. However, the asymptotic space-complexity has increased from O(n) to O(n
2
)
in the memoized implementation.
In addition to arithmetic!, many useful functions are related to data structures.
9.2 List related
In this section, we shall examine a variety functions which act upon lists. These functions
are often polymorphic and typically make use of either recursion and pattern matching or
higher-order functions. These concepts can all be very useful but are rarely seen in current
scientific programs.
9.2.1 count
The ability to count the number of elements in a list for which a predicate returns true is
sometimes useful. A function to perform this task may be written by accumulating the count,
folding a test over each element in turn. As addition is both associative and commutative and
fold_Tight is not tail recursive, this task is best performed using the fold_left function:
# let list_count pred 1 =
List.fold_left (fun count e -> count + if pred e then 1 else 0) 0 1;;
val list_count: (' a -> bool) -> 'a list -> int = <fun>
For example, the following counts the number of elements which are exactly divisible by three
(0, 3, 6 and 9):
# list_count (fun x -> x mod 3 = 0) [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J;;
- : int =4
As a polymorphic function, count may be applied to lists of any type. For example, this
counts the number of lists in the given list which are greater than Or equal to
2
the list [2; 3;
4J according to the built-in lexicographic ordering:
Ipun intended.
2 As ( <= ) a b means a <= b.
# list_count (( <=) [2; 3; 4J) [[1; 2; 3J; [2; 3; 4J; [3; 4; 5JJ;;
- : int = 2
The actual lists counted are easily extracted by applying the List. fil ter function instead:
# List.filter (( <=) [2; 3; 4J) [[1; 2; 3J; [2; 3; 4J; [3; 4; 5JJ;;
- : int list list = [[2; 3; 4J; [3; 4; 5JJ
The list_count function applies the given predicate function to all n elements in the given
list. Consequently, the asymptotic complexity of this function in terms of the number of
predicate tests performed n is 8(n).
9.2.2 position
The ability to prepend elements to lists indefinitely makes them the ideal data structure for
many operations where the length of the output depends upon the input. Let us examine a
function which composes an arbitrary length list as the result.
A function like list_count but which returns a list of the indices of the matching elements can
also be useful. This functionality can be obtained by folding with an accumulator containing
both the current index i and the resulting index list is:
# let list_position pred 1 =
let aux (i, is) e = i + 1, if pred e then i :: is else is in
snd (List.fold_left aux (0, [J) 1);;
val list_position: (, a -> bool) -> ' a list -> int list = <fun>
As the fold returns a 2-tuple containing the list length and the list of indices, the snd function
from the Pervasi ves module is used to extract the second element of the 2-tuple (the result).
The auxiliary function aux prepends the current index i onto the result is if the predicate
pred matches, and increments the current index i.
For example, the following extracts the list ofindices of the elements which are exactly divisible
by three:
# list_position (fun x -> x mod 3 = 0) [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J;;
- : int list = [9; 6; 3; OJ
Like list_count, the list_position function is useful for general purpose list dissection.
9.2.3 mapi
In addition to the conventional higher-order functions map and rev_map, analogous functions
which present the integer index as well as the value of the each element can be useful, i.e. to
find {!(O,lo),!(1,h), ... ,!(n-1,ln-l)}. These function are conventionally called mapi and
rev_mapi, the former currently being provided for arrays (described in section 3.2) but neither
are provided for lists. For example, the Array .mapi function may be used to convert an array
of values into an array of index-value pairs:
9.2. LIST RELATED
# Array.mapi(funi e->i, e) [I'a'; 'c'; 'e'; 'g'; 'i'IJ;;
-: (int*char) array = [1(0, 'a'); (1, 'c'); (2, 'e'); (3, 'g'); (4, 'i')IJ
183
The mapi function for lists could be written using pattern matching, with an auxiliary function
to accumulate the current index:
# let list_mapi f 1 =
let rec aux n = function
h: :t -> let h =f n h in h :: aux (n + 1) t
I [J -> [J in
auxOl;;
val mapi : (int -> ' a -> 'b) -> ' a list -> 'b list = <fun>
This implementation of the list_mapi function uses a 2-argument nested auxiliary function
aux which accepts the current index n and the remaining list. The aux function is initially
called with the arguments 0 and the input list 1. The aux function repeatedly decapitates
the remaining list, applying the given function f to the current index n and the head h of the
remaining list and prepending the resulting value f n h onto the list formed by recursing on
the tail t, until no elements remain. Note that, by using the form let h = f n h in, this
function can guarantee to apply the given function f in forwards order to each element in the
given list, Le. the first application of f is to the first element of the given list.
This list_mapi function provides the same functionality for lists as the Array. mapi function
does for arrays. For example:
# list_mapi (fun i e -> i, e) ['a'; 'c'; 'e'; 'g'; 'i'J;;
-: (int * char) list = [(0, 'a'); 0, 'c'); (2, 'e'); (3, 'g'); (4, 'i')J
However, the list_mapi function is not tail-recursive. A tail-recursive alternative may be
written by composing the result in reverse.
A rev_mapi function for lists could be written using pattern matching, with an auxiliary
function to accumulate the current index:
# let list_rev_mapi f 1 =
let rec aux n accu = function
h: :t -> aux (n + 1) (f n h :: accu) t
I [J -> accu in
aux 0 [J 1;;
val list_rev_mapi: (int -> ' a -> 'b) -> ' a list -> 'b list = <fun>
This implementation of the 1i st_rev_mapi function uses a 3-argument nested auxiliary func-
tion aux which accepts the current index n, the accumulated result accu and the remaining
list. This auxiliary function repeatedly decapitates the remaining list, recursing with the in-
dex, accumulator and remainder as the incremented index n+l, the result f n h of applying
the given function f to the current index and the current element prepended onto the accu-
mulator accu and the tail of the remainder. The way in which the aux function recurses is
important in several ways:
By repeatedly decapitating the input and prepending the result onto the accumulator,
the accumulator is built in reverse order.
As the function application f n h appears in an argument to the recursive call, this
application of the function f must be applied before recursing
3
and, therefore, the first
application of the given function f can again be guaranteed to be to the first element of
the given list. Hence, there is no need to use a let h = f n h in construct to guarantee
application order, as in the previous example.
This implementation of the list_rev_mapi function is tail recursive because the result
of the recursive call is not acted upon. Consequently, the list_rev_mapi function will
be considerably faster than the list_mapi function when applied to large lists.
The list_rev_mapi function produces the reverse of the result of the list_mapi function:
# list_rev_mapi (fun i e -> i, e) ['a'; 'c'; 'e'; 'g'; 'i'J;;
- : Cint * char) list = [(4, 'i'); (3, 'g'); (2, 'e'); (1, 'c'); (0, 'a')J
Interestingly, an equivalent list_rev_mapi function may be rewritten in terms of a fold, by
accumulating a 2-tuple containing the current index n and the resulting list 12:
# let list_rev_mapi f 1 =
snd (List. fold_left (fun (n, 12) e -> n + 1, (f n e :: 12)) (0, [J) 1);;
val list_rev_mapi: (int -> ' a -> 'b) -> ' a list -> 'b list = <fun>
However, an equivalent list_mapi function cannot be written in terms of a fold without
performing 2 traversals of the input list. Specifically, the functionality may be obtained
either by reversing the result of list_rev_mapi or by using fold_right, in which case the
accumulated index must count down from the length of the list which can only be obtained
by explicitly counting the number of elements in the list using length:
# let list_mapi f 1 = List. rev (list_rev_mapi f 1);;
val list_mapi: (int -> ' a -> 'b) -> ' a list -> 'b list = <fun>
# let list_mapi f 1 =
let aux e (n, 1) = (n - 1, (f (n - 1) e :: 1)) in
snd (List.fold_right aux 1 (List.length 1, [J));;
val list_mapi: (int -> ' a -> 'b) -> ' a list -> 'b list = <fun>
Thus, for small input lists, 1ist_mapi is best written in terms of pattern matching.
9.2.4 chop
Consider a function to chop a list into two lists at a given index i. This can be achieved by
recursively chopping the tail at index i - 1 until i = 0 and the 2-tuple of the empty list (the
front list) and the remaining tail (the back list) is returned. When completed, the recursive
calls have the decapitated head prepended onto the front list fr:
3 Formally, this can be attributed to the fact that OCaml is a strict language, meaning that function
arguments are always evaluated before the function application takes place. In contrast, some other languages,
known as lazy languages, only evaluate expressions when their result is required.
9.2. LIST RELATED
# let rec list_chop i 1 = match i, 1 with
I 0, 1 -> ([J, 1)
I i, h::t -> (fun (fr, ba) -> h :: fr, ba) (list_chop (i - 1) t)
I _ -> invalid_arg "list_chop";;
val list_chop: int -> 'a list -> 'a list * 'a list = <fun>
185
As this implementation of the chop function only traverses the list to the given index i, the
algorithm is 8(i). Also, this implementation is not tail recursive, as the A-function acts upon
the result of the recursive call. A tail-recursive alternative may be written using an auxiliary
function which accumulates the front list in reverse order, applying rev to obtain the correct
result. As a function which returns the front list in reverse order can be useful when defining
other functions, we shall separate this into a list_rev_chop function:
# let list_rev_chop i 1 =
let rec aux i fr ba = match i, fr, ba with
0, fr, ba -> (fr, ba)
Ii, fr, h::t -> aux (i -1) (h:: fr) t
I _ -> invalid_arg "list_rev_chop" in
auxi[Jl;;
val list_rev_chop: int -> 'a list -> 'a list * 'a list = <fun>
A tail-recursive function equivalent to list_chop may then be written in terms of list_rev_chop:
# let list_chop_tr i 1 =
(fun (fr, ba) -> List.rev fr, ba) (list_rev_chop i 1);;
val list_chop_tr : int -> 'a list -> 'a list * 'a list = <fun>
For example, chopping the first five elements off the list {O ... 9} gives the lists {O ... 4} and
{5 ... 9}:
# list_chop_tr 5 [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J;;
- : int list * int list = ([0; 1; 2; 3; 4J, [5; 6; 7; 8; 9J)
The list_rev_chop function supplies the first list in reverse order:
# list_rev_chop 5 [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J;;
- : int list * int list = ([4; 3; 2; 1; OJ, [5; 6; 7; 8; 9J)
As a tail-recursive function, list_chop_tr will be considerably faster than list_chOp for
large i.
As we shall see, these functions can be used in the creation of several, more sophisticated
functions.
9.2.5 dice
Consider a function called list_dice which splits a list containing nm elements into n lists of
m elements each. This function may be written in terms of the list_chop function developed
in section 9.2.4.
# let rec list_dice m 1 =
match list_chop m 1 with
(1, [J) -> [lJ
I (11, 12) -> 11 :: list_dice m 12;;
val list_dice: int -> 'a list -> 'a list list = <fun>
For example, the list_dice function may be used to dice the list {1 ... 9} into lists containing
3 elements each:
# list_dice 3 [1; 2; 3; 4; 5; 6; 7; 8; 9J;;
- : int list list = [[1; 2; 3J; [4; 5; 6J; [7; 8; 9JJ
This function could be used, for example, to convert a stream of numbers into 3D vectors
represented by lists containing three elements.
9.2.6 replace
The ability to replace the i
th
element of a list is sometimes useful. As the i
th
element of a
list may be reached by traversing the previous i elements, this task can be done in 8(i) time
complexity. A function to perform this task may be written in terms of the list_rev_chop
function (described in section 9.2.4) by replacing the head of the back list before appending
the front list in reverse order using the rev_append function:
# let list_replace xiI = match list_rev_chop i 1 with
fr, ba -> List . rev_append fr (x :: List. tl ba);;
val list_replace: 'a -> int -> 'a list -> 'a list = <fun>
For example, the following replaces the 6
th
element of the given list
4
with the number 100:
# list_replace 100 5 [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J;;
- : int list = [0; 1; 2; 3; 4; 100; 6; 7; 8; 9J
More sophisticated functions may also be written in terms of the chop function.
9.2.7 sub
Another function found in the Array module but not in the List module is the sub function.
This function extracts a subset of consecutive elements, a sub-array in the context of arrays.
A tail-recursive equivalent for lists may be written:
# let lisCsub i j 1 =
fst (liscchop_tr (j - i) (snd (liscrev_chop i 1)));;
val list_sub: int -> int -> 'a list -> 'a list = <fun>
This implementation takes the back list (using snd) after chopping at i and then chops this
list at j - i, giving the result as the front list (extracted using the fst function).
For example, the sublist with indices [3,7) of the list {a ... 9} is the list {3 ... 6}:
4Remember, indices conventionally start at zero in OCaml.
9.2. LIST RELATED
# lisCsub 3 7 [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J;;
int list = [3; 4; 5; 6J
187
Just as Array. sub can be useful, so this list_sub function can come in handy in many
different circumstances.
9.2.8 extract
A function similar to the list_replace function (described in section 9.2.6) but which extracts
the i
th
element of a list, giving a 2-tuple containing the element and a list without that element,
can also be useful. As for list_replace, the list_extract function may be written in terms
of the list_rev_chop function:
# let list_extract i 1 = match list_rev_chop i 1 with
fr, h::t -> h, List.rev_append fr t
I _ -> invalid_arg "list_extract";;
val list_extract: int -> 'a list -> 'a * 'a list = <fun>
For example, extracting the element with index five from the list {O ... 9} gives the element 5
and the list {O ... 4,6 ... 9}:
# list_extract 5 [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J;;
- : int * int list = (5, [0; 1; 2; 3; 4; 6; 7; 8; 9J)
This function has many uses, such as randomizing the order of elements in lists.
9.2.9 randomize
This function can be used to randomize the order of the elements in a list, by repeatedly
extracting randomly chosen elements to build up a new list:
# let list_randomize 1 =
let extract_rand 1 = list_extract (Random. int (List . length 1)) 1 in
let rec aux accu = function
[J -> accu
I 1 -> (fun (h, t) -> aux (h:: accu) t) (extract_rand 1) in
aux [J 1;;
val list_randomize: 'a list -> 'a list = <fun>
Tllis implementation contains a nested function extract_rand which extracts a random el-
ement from the given list and an auxiliary function aux which repeatedly extracts random
elements, prepending them onto an accumulator to build up a randomized list. The aux func-
tion uses a A-function to prepend the extracted element onto the accumulator and recurse.
Although the recursive call to aux is within this A-function, the result is not acted upon and,
therefore, this implementation of the list_randomize function is tail recursive.
For example, applying the list_randomize function to the list {O ... 9} gives a random per-
mutation containing the elements 0 ... 9 in a random order:
# list_randomize [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J;;
- : int list = [6; 9; 8; 5; 1; 0; 3; 2; 7; 4J
This function is useful in many situations. For example, the programs used to measure the
performance of various algorithms presented in this book used this list_randomise function
to evaluate the necessary tests in a random order, to reduce systematic effects of garbage
collection.
9.2.10 permute
The ability to compute all permutations of a list is sometimes useful. Permutations may be
computed using a simple recurrence relation, by inserting the head of a list into all positions
of the permutations of the tail of the list. Thus, a function to permute a list is most easily
written in terms of a function which inserts the given element into the given n-element list at
all n + 1 possible positions:
# let rec distribute e = function
(h: :t) as 1 -> (e: :1) :: (List.map (fun x -> h: :x) (distribute e t))
I [J -> [[eJ J ; ;
val distribute: 'a -> 'a list -> 'a list list = <fun>
This distribute function operates by prepending an answer, the element e prepended onto
the given list 1, onto the head of the given list prepended onto each of the distributions of the
element e over the tail t of the given list.
For example, the following inserts the element 3 at each of the three possible positions in the
list [1; 2J:
# distribute 3 [1; 2J ; ;
- : int list list = [[3; 1; 2J; [1; 3; 2J; [1; 2; 3JJ
A function to permute a given list may then be written:
# let rec permute = function
e :: rest -> List. flatten (List . map (distribute e) (permute rest))
I [J -> [[J J ; ;
val permute: 'a list -> 'a list list = <fun>
This permut e function then operates by distributing the head of the given list over the per-
mutations of the tail.
For example, there are 3! = 6 permutations of three values:
# permute [1; 2; 3J;;
- : int list list =
[[1; 2; 3J; [2; 1; 3J; [2; 3; 1J; [1; 3; 2J; [3; 1; 2J; [3; 2; 1J J
The permute function has many uses, including combinatorial optimisation.
9.3. STRING RELATED
9.2.11 Run-length encoding
189
A transformation called run-length encoding, often used for data compression, converts a list
Xi into a list of 2-tuples (x, n)i representing ni consecutive repeats of each Xi. A function to
perform this task using a given comparison function may be written:
# let rle_eq eq 1 =
let rec aux 12 x n = function
[J -> List.rev ((x, n): :12)
I h::t when eq x h -> aux 12 x (n+1) t
I h::t -> aux ((x, n): :12) h 1 tin
match 1 with [J -> [J I h::t -> aux [J h 1 t;;
val rle_eq: (, a -> 'a -> bool) -> 'a list -> (, a * int) list = <fun>
The body of this rle_eq function either maps the empty list onto the empty list or applies
the nested auxiliary function aux with an empty accumulator, the head of the input list, one
(signifying one repeat of the head) and the tail of the input list as the remainder. The aux
function then repeatedly decapitates the remaining list, either incrementing the repeat counter
ifthe new head is the same as the previous head, creating a new 2-tuple (h,1) if the new head
is different or returning the reverse of the accumulator if there are no remaining elements,
Le. the remaining list is empty.
For example, the following run-length encodes an int list by comparing elements using the
polymorphic equality operator =:
# rle_eq (=) [1; 1; 1; 2; 2; 3; 4; 5; 6; 6; 7; 7; 7J;;
- : (int * int) list =
[0,3); (2,2); (3,1); (4,1); (5,1); (6,2); (7,3)J
Clearly, many useful functions can be written in a functional style. However, functions over
strings and, particularly, over arrays are often better suited to an imperative style.
9.3 String related
Programs are often required to produce human-readable output. Many string-related functions
can be used to simplify the task of creating such output. In this section, we shall describe
the conventional factoring of string-related functions for printing data structures and develop
a few such functions.
In the remainder of this chapter, we shall use a fold right function for strings not supplied by
the Core library:
# let string_fold_right f s x =
let r = ref x in
for i = String . length s - 1 downto 0 do
r : = f s. [iJ ! r
done;
!r; ;
val string_fold_right : (char -> 'a -> 'a) -> string -> 'a -> 'a = <fun>
Printing and reading data structures as strings is often accomplished by factoring the conver-
sion into separate functions for:
1. Converting the individual parts of the data structure to and from strings.
2. Converting a whole data structure to and from a string.
3. Printing or reading the string using the usual 10 functions (which were described in
chapter 5).
We shall now demonstrate the development of such functions.
The ability to print a list can often be useful. In the interests of consistency, the output
may be productively written using OCaml syntax. In order to write a polymorphic function,
capable of converting any list into a string, a function is required to convert an individual
element. Thus, a string_oLlist is most usefully implemented as a higher-order function:
# let string_of_list string_of 1 =
II [II -String. concat "; " (List. map string_of 1) - "J "; ;
val string_of _list: (' a -> string) -> 'a list -> string = <fun>
An int list may then be converted into a string by supplying the string_oLint function
to the string_oLlist function:
# string_of_list string_of_int [1; 2; 3; 4; 5J;;
- : string = "[1; 2; 3; 4; 5J"
Naturally, an equivalent string_oLarray function is easily defined. We shall now examine
a slightly more sophisticated example.
9.3.2 DNA sequence 10
The following variant type may be used to represent the set of DNA nucleotides:
# type nucleotide = Adenine I Cytosine I Guanine I Thymine;;
type nucleotide = Adenine I Cytosine I Guanine I Thymine
A DNA sequence may then be represented by the type:
# type sequence = nucleotide list;;
type sequence = nucleotide list
In order to write a function capable of reading DNA sequences, we begin by writing a function
capable of reading a single nucleotide:
9.3. STRING RELATED 191
# let nucleotide_of_char = function
'A' -> Adenine I 'c' -> Cytosine I 'G' -> Guanine I 'T' -> Thymine
I _ -> invalid_arg "nucleotide_of_char";;
val nucleotide_of_char : char -> nucleotide = <fun>
For example, the Guanine constructor is the representation of the nucleotide corresponding to
the character G:
# nucleotide_of_char 'G';;
- : nucleotide = Guanine
This function may then be folded over a string to build up a list of nucleotides, converting a
string into a DNA sequence:
# let sequence_of _string s =
string_fold_right (fun c seq -> nucleotide_of_char c :: seq) s [];;
val sequence_of _string: string -> nucleotide list = <fun>
For example, the string GATTACA may be converted into a list of explicitly-named nu-
cleotides:
# let gattaca = sequence_of_string "GATTACA";;
val gattaca : nucleotide list =
[Guanine; Adenine; Thymine; Thymine; Adenine; Cytosine; Adenine]
Finally, a function to read a line of characters as a DNA sequence is easily written in terms
of the input_line function in the Pervasi ves module:
# let input_sequence ch = sequence_of_string (input_line ch);;
val input_sequence: in_channel -> nucleotide list = <fun>
This function may then be used to read DNA sequences from a file or from standard input.
The converse operations, used to print a DNA sequence, are written in a similar fashion,
beginning with a function to convert a single nucleotide into a string:
# let string_of_nucleotide = function
Adenine -> "A" I Cytosine -> "c" I Guanine -> "G" I Thymine -> "T";;
val string_of_nucleotide : nucleotide -> string = <fun>
The string_oCnucleotide function may then be used to write a function to convert a list of
nucleotides into a string by simply concatenating the string representations of each nucleotide:
# let string_of _sequence s =
String. concat '''' (List. map string_of _nucleotide s);;
val string_of_sequence : nucleotide list -> string = <fun>
For example, the previously generated nucleotide list can be converted back into the string
GATTACA:
192
# string_of_sequence gattaca;;
- : string = "GATTACA"
CHAPTER 9. SIMPLE EXAMPLES
Strings generated by string_oCsequence may, of course, be printed using the simple func-
tion:
# let print_sequence seq = print_endline (string_of_sequence seq);;
val print_sequence: nucleotide list -> unit = <fun>
The input_sequence and print_sequence functions may then be used to perform 10 on
DNA sequence information in human readable form. We shall now consider the slightly more
difficult task of printing matrices.
9.3.3 Matrix 10
Consider the more complicated problem of printing and reading matrices, represented as values
of the type float array array such as:
# let i3 = [I [11.; 0.; O. I];
[10.; 1.; 0.1];
[10.; 0.; 1.IJ IJ;;
val i3 : float array array =
[1[11.; 0.; O.IJ; [10.; 1.; O.IJ; [10.; 0.; 1.IJIJ
A simple function to print such matrices may be written:
# let string_of_matrix m=
let row r =
String. concat " " (List .map string_of _float (Array. to_list r)) in
String. concat "\n" (List. map row (Array. to_list m)) ;;
val string_of_matrix : float array array -> string = <fun>
When applied to i3, this implementation ofthe string_oCmatrix function works perfectly:
# print_endline (string_of_matrix i3);;
1. o. o.
o. 1. O.
o. O. 1.
unit = 0
However, when given a matrix with elements whose string representations are of different
widths, the results produced by this implementation of the string_oLmatrix function are
not always desirable. For example, a matrix containing the numbers 0.1234 and 0:
# let m= Array.map (fun r -> Array.map (( *. ) 0.1234) r) i3;;
val m : float array array =
[I [I 0 . 1234; 0.; o. IJ; [10.; O. 1234; O. IJ; [10.; 0.; o. 12341 J IJ
In this case, the result is confusing because the columns are not aligned:
9.3. STRING RELATED
# print_endline (string_of_matrix m);;
0.1234 O. O.
O. 0.1234 O.
O. O. 0.1234
unit = 0
193
This can be remedied by padding the columns to the maximum width for each column. A
function to pad a string to the given length may be written:
# let string_pad_left s n =
let len = String. length s in
if len >= n then s else String. make (n - len) , ,- s;;
val string_pad_left : string -> int -> string = <fun>
For example, padding the string "0.1234" out to ten characters inserts four spaces:
# string_pad_left "0.1234" 10;;
- : string = " 0.1234"
This string_pad_left function can then be used to create a string representation of a matrix
more carefully, by padding columns out to their maximum width:
# let string_of _matrix m =
let m= Array.map (Array.map string_of _float) m in
let width-=
let aux w s -= max w (String . length s) in
Array.init (Array.length m.(O))
(fun i -> Array.fold_left aux 0 m.(i)) in
let m-=
Array.map (Array.mapi (fun j x -> string_pad_left x width. (j))) m in
let row r -= String. concat " " (Array. to_list r) in
String. concat "\n" (List. map row (Array. to_list m));;
val string_of_matrix : float array array -> string = <fun>
In this implementation ofthe string_oLmatrix function, the nested width variable contains
the maximum width of each column.
For example, this string_oLmatrix function can be used to create much more readable
representations of matrices:
# print_endline (string_of_matrix m) ; ;
0.1234 O. O.
O. 0.1234 O.
O. O. 0.1234
unit = 0
Further enhancements to this function might include the ability to align the decimal place
down each column (although this would not be very useful with scientific notation).
9.4 Array related
Many useful functions are provided by the List module which are not provided by the Array
module. In particular, the fold_left2, fold_right2 and map2 functions which handle pairs of
lists. As we saw in section 3.3, these functions are useful when implementing binary operators
which act over pairs of vectors.
9.4.1 map2
The map2 function may be written in terms of the existing ini t function:
# let array_map2 f a b =
let len = Array.length a in
if len <> Array . length b then invalid_arg " array_map2" ;
Array.init len (fun i -> f a.(i) b.(i));;
val array_map2: ('a -> 'b -> 'c) -> 'a array -> 'b array -> 'c array = <fun>
For example, the array_map2 function may be used to implement vector addition over vectors
represented by the type float array:
# let vec_add a b = array_map2 (fun a b -> a +. b) a b; ;
val vec_add : float array -> float array -> float array = <fun>
# vec3dd [11.; 2.; 3. I] [12.; 3.; 4. I] ; ;
- : float array = [13.; 5.; 7. 1]
Thus the array_map2 function clearly has a use in scientific computing.
9.4.2 Double folds
Mimicking the existing Array. fold_left function, we can write:
# let array_fold_left2 f x a b =
if len <> Array.length b then invalid_arg "array_fold_left2";
let r = ref x in
for i = 0 to len - 1 do
r:= f !r a.(i) b.(i)
done;
!r; ;
val array_fold_left2 :
('a -> 'b -> 'c -> 'a) -> 'a -> 'b array -> 'c array -> 'a = <fun>
A fold_right2 function may be written equivalently to the array_fold_left2 function:
9.4. ARRAY RELATED 195
# let array_fold_right2 f a b x =
if len <> Array. length b then invalid_arg larray_fold_right2";
let r = ref x in
for i = len - 1 downto 0 do
r:= f a.(i) b.(i) !r
done;
!r; ;
val array_fold_right2 :
('a -> 'b -> 'c -> 'c) -> 'a array -> 'b array -> 'c -> 'c = <fun>
As we have already seen in section 3.3, these fold functions have natural uses in vector algebra,
such as computing the vector dot product.
9.4.3 rotate
The ability to rotate the elements of an array can sometimes be of use. This can be achieved
by creating a new array, the elements of which are given by looking up the elements with
rotated indices in the given array:
# let array_rotate i a =
let aux k =
let k = (k + i) mod n in
a. (if k < 0 then n + k else k) in
Array.init n aux;;
val array_rotate : int -> 'a array -> 'a array = <fun>
This function creates an array with the elements of a rotated left by i. For example, rotating
two places to the left:
# array_rotate 2 [10; 1; 2; 3; 4; 5; 6; 7; 8; 9IJ;;
- : int array = [12; 3; 4; 5; 6; 7; 8; 9; 0; 11 J
Rotating right can be achieved by specifying a negative value for i. For example, rotating
right three places:
# array_rotate (-3) [10; 1; 2; 3; 4; 5; 6; 7; 8; 9IJ;;
- : int array = [17; 8; 9; 0; 1; 2; 3; 4; 5; 61J
Considering this function alone, the performance can be improved significantly by rotating the
array elements in-place, by swapping pairs of elements. This can be regarded as a deforesting
optimisation (see section 7.3.3.2). However, the more elegant approach presented here can be
refactored in the case of many subsequent rotations (and other, similar operations) such that
no intermediate arrays need be created. In section 10.2, this optimisation is used to improve
the asymptotic complexity of a commonly implemented global minimization algorithm.
9.4.4 Matrix trace
A useful quantity in the context of matrices is the trace of a square matrix, defined as the
sum of the diagonal elements.
# let trace a =
let aux (i, tr) r = i + 1, tr +. r. (i) in
snd (Array.fold_left aux (0,0.) a);;
val trace : float array array -> float = <fun>
This function folds a nested auxiliary function aux over the rows of the matrix M, accumulating
the current row index i and the trace tr. The aux function increments i and adds the Mii
element to the trace.
For example, the trace of the 3 x 3 identity matrix is simply 3:
# trace [I [11.; 0.; o. IJ; [10.; 1.; o. IJ; [10.; 0.; 1. IJ IJ ; ;
- : float = 3.
Clearly, functions written in an imperative style can be useful. We shall now consider the
high-level factorisation of the functional and imperative functions we have just developed.
9.5 Higher-order functions
As we have already hinted, aggressively factoring higher-order functions can greatly reduce
code size and sometimes even lead to a better understanding of the problem. In this section,
we shall consider various different forms of higher-order functions which can be productively
used to aid brevity and, therefore, clarity.
9.5.1 Data structures of functions
In section 7.3.3.2, we introduced the concept of deforesting computations by composing com-
posite functions. This task can be aided by the development of data structures (e.g. a list) of
functions.
The task of mapping some functions over a list may be productively generalised to the task
of repeatedly mapping a list of functions over a given list. This can be implemented naIvely
by the following function:
# let maps fs 1 = List. fold_left (fun 1 f -> List . map f 1) 1 fs;;
val maps: (' a -> 'a) list -> 'a list -> 'a list = <fun>
For example, the following multiplies each element by three and then adds two:
# maps [( * ) 3; ( + ) 2J [1; 2; 3J;;
- : int list = [5; 8; 11J
9.5. HIGHER-ORDER FUNCTIONS 197
However, as discussed in section 7.3.3.2, the efficiency of this implementation of the maps
function may be considerably improved by first compositing the list of functions into a single
function and then mapping the composite function over the input list. This functionality
is most easily achieved by first writing a higher-order function to composite a given list of
functions:
# let compose fs = fun X -> List .fold_left (fun x f -> f x) x fs;;
val compose: (, a -> 'a) list -> 'a -> 'a = <fun>
For example, the composite of the two functions used in the previous example is given by:
# let f = compose [( * ) 3; ( + ) 2J;;
val f : int -> int = <fun>
This function may then be applied to an individual value, e.g. f(2) = 3 x 2 +2 = 8:
# f 2;;
- : int = 8
A deforested version of the maps function may then be written:
# let maps fs 1 = List . map (compose fs) 1;;
val maps: (, a -> 'a) list -> 'a list -> 'a list = <fun>
For a list of n functions to be applied to a list of m values, this implementation of the maps
function produces the same result as the previous implementation but without producing the
intermediate lists:
# maps [( * ) 3; ( + ) 2J [1; 2; 3J ; ;
- : int list = [5; 8; 11J
However, this is only likely to be of benefit when n m. Moreover, functions in the list must
have the same type and, therefore, the type has been inferred to be 'a - > 'a for all 'a. Thus,
functions representing a series of transformations between different types may not by passed
by list.
9.5.2 Tuple related
Functions to perform operations such as map over tuples of a particular arity are also useful.
For example, the following implements some useful functions over 2-tuples:
# let map_2 f (a, b) = (f a, f b)
and list_of_2 (a, b) = [a; bJ
and array_of_2 (a, b) = [Ia; bIJ;;
val map_2 : ('a -> 'b) -> 'a * 'a -> 'b * 'b
val list_of_2 : 'a * 'a -> 'a list
val array_of_2: 'a * 'a -> 'a array
For example, mapping a function f over a 2-tuple (a, b) results in the 2-tuple (f (a), f (b)):
# map_2 string_of_float (5.7, 9.3);;
- : string * string = ("5.7", "9.3")
Such functions can be used to reduce code size in many cases.
9.5.3 Generalised products
The vector dot product is a specialised form of inner product. The inner and outer products
may, therefore, be productively written as higher-order functions which can then be used as a
basis for more specialised products, such as the dot product.
The inner product is most easily written in terms of a given fold_left2 function:
# let inner fold_left2 base f 11 12 g =
fold_left2 (fun accu e1 e2 -> g accu (f e1 e2)) base 11 12;;
val inner:
'a -> 'b -> 'c -> 'd) -> 'e -> 'f -> 'g -> 'h) ->
'e -> ('b -> 'c -> 'i) -> 'f -> 'g -> ('a -> 'i -> 'd) -> 'h = <fun>
The vector dot product for vectors represented by values of the type float list may then be
written in terms of this inner function:
# let dot a b = inner List.fold_left2 O. (*. ) a b ( +. );;
val dot: float list -> float list -> float = <fun>
For example, (1,2,3) . (2,3,4) = 20:
# dot [1.; 2.; 3. J [2.; 3.; 4. J ; ;
- : float = 20.
The generalised outer product is not easily generalised over data structure. Thus, we shall
suffice with a tail-recursive implementation specific to lists:
# let outer f 11 12 =
let aux 1 e1 =
List. fold_left (fun 1 e2 -> f e1 e2 :: 1) [J 12 :: 1 in
List.rev_map List.rev (List.fold_left aux [J 11);;
For example:
(1,2,3) 0 (2,3,4) = :)
6 9 12
# outer ( *. ) [1.; 2.; 3. J [2.; 3.; 4. J ; ;
- ; float list list = [[2.; 3.; 4.J; [4.; 6.; 8.J; [6.; 9.; 12.JJ
Aggressive factoring of higher-order functions can clearly be useful in the context of numerical
computation. In fact, the inner and outer functions may be further generalised to apply to
tensors of different ranks. We shall leave this as an exercise for the interested reader!
9.5. HIGHER-ORDER FUNCTIONS
9.5.4 Converting between container types
199
The elements in a container may be copied into a container of a different type by folding an
insertion function over the input, for the fold function of the input container and the insertion
function of the output container. A function to convert a container into a list using a given
f old function (with the interface of a fold_right function) may, therefore, be written:
# let list_of fold c = fold (fun h t -> h: :t) c [J;;
val list of ((, a -> 'a list -> 'a list) -> 'b -> 'c list -> 'd) -> 'b -> 'd
= <fun>
This may be used to create a list _of _array function, equivalent to the existing higher-order
to_list function in the Array module, by passing the Array.fold_right function to the
higher-order list _of function:
# let list_of_array a = list_of Array. fold_right a;;
val list_of_array: 'a array -> 'a list = <fun>
The functionality provided by the higher-order list_of function may also be applied to con-
siderably more sophisticated containers with ease. For example, the following implements a
set of strings:
# module StringSet = Set. Make (String) ; ;
A function to convert a StringSet back into a string list may be written in terms of
list_of:
# let list_of _string_set = list_of StringSet. fold; ;
val list_of_string_set : StringSet. t -> StringSet. elt list = <fun>
Equivalently to the list_of function, a higher-order function to convert data structures of
strings into a StringSet may be written by taking the fold_right function of the data
structure as an argument:
# let string_set_of fold c =
fold (fun e s -> StringSet. add e s) c StringSet. empty; ;
val string_set_of :
((StringSet.elt -> StringSet.t -> StringSet.t) -> 'a -> StringSet.t->
'b) -> 'a -> 'b = <fun>
A list of strings may be converted into a StringSet by passing the List. f Old_right function
as an argument to string_set_oClist:
# let string_set_of_list = string_set_of List.fold_right;;
val string_set_of _list : StringSet. elt list -> StringSet. t = <fun>
For example, let us create a set called myset by inserting "tree", "plug", "bug" and then
"slug":
# let myset = string_set_of_list ["slug"; "bug"; "plug"; "tree"];;
val myset : StringSet. t = <abstr>
However, as the fold_right function in the List module is not tail recursive, the resulting
intset_oLlist function will be unnecessarily inefficient on input lists with many elements.
This is most easily addressed by using a higher-order function rev_fold to convert between
the argument-orders of left and right folds:
# let rev_fold fold f a b =fold (fun a b -> f b a) b a;;
val rev_fold:
'a -> 'b -> 'e) -> 'd -> 'e -> 'f) -> ('b -> 'a -> 'e) -> 'e -> 'd -> 'f
= <fun>
When the order of the elements is unimportant, this rev_f old function may then be used to
apply a left fold where a right fold was expected and vice-versa. In the context of filling a
set, the order of insertion makes no difference. Thus, the intset_of_list function may be
written using the more efficient, tail-recursive List. fold_left function:
# let string_seCof _list = string_set_of (rev_fold List. fold_left) ; ;
val string_set_of_list : StringSet. elt list -> StringSet. t = <fun>
This may be used to create a set called myset2 by inserting "slug", "bug", "plug" and then
"tree":
# let myset2 = string_set_of _list ["slug"; "bug"; "plug"; "tree"];;
val myset2 : StringSet. t = <abstr>
As expected, the two different versions of the string_set_oLlist function produced the
same result (as a set is a sorted container) despite inserting the elements into the set in
different orders:
# List.map list_ot_string set rmyset; myset21;;
- : StringSet.elt list list =
[["bug"; "plug"; "slug"; "tree"]; ["bug"; "plug"; "slug"; "tree"]]
Factoring higher-order functions dealing with data structures can clearly be very productive,
not only in terms of brevity but also because alterations required to change functionality
become more localised. For example, in some circumstances, the ideal choice of data structure
is not obvious and, therefore, the ability to pic'n'mix different data structures can be useful.
This can be achieved by providing consistent interfaces to data structures in terms of higher-
order functions.
We shull now exumine the design and implementation of some practically useful progrul1ls.
Chapter 10
Complete Examples
In this chapter, we shall develop several complete programs used in scientific computing. In
particular, we shall take examples from each of the most generic computational problems
encountered in scientific computing. The programs presented in this chapter could be opti-
mised to improve performance but we have chosen to illustrate the advantages of clear and
succinct code. In particular, we use comments for the first time, to describe the purpose and
specification for portions of code. Comments should be used to clarify all but the simplest of
programs.
10.1 Maximum entropy method
In this section, we shall develop a program which makes use of two important concepts com-
monly required in scientific computing a:::; well as an arguably under-appreciated third concept:
Fourier transform - a transform which converts between temporal and spectral repre-
sentations of signals, commonly occurring in the mathematical descriptions of natural
systems and often used in analysis.
Local function minimization - algorithms used to find a minimum of a given function in
the region of given initial arguments.
Maximum entropy method - a technique used to extend available data whilst introducing
minimal new information.
Specifically, we shall develop a program to arbitrarily extend experimentally observed diffrac-
tion data in order to facilitate transformation into real space via the Fourier transform.
Experimental measurements of a function of interest are typically limited to measuring over a
finite range. In many cases, an interesting or important property can be represented in terms
of the function over an infinite range. Diffraction experiments are one example of this.
201
202
S(k)
2
1.75
1.5
1.25
1
0.75
0.5
0.25
CHAPTER 10. COMPLETE EXAMPLES
,
i!
~ :
, .
.. . A
'. '. f\
; ~ \ ~ ~ ~
. : : i
;V \J
I.
!
.
5 10
Figure 10.1: Experimentally measured static structure factor S(k) of amorphous silicon
measured over a finite range 0.424 < k < 23.001 in a neutron-diffraction experiment [13].
F(k)
3
-2
Figure 10.2: Reduced static structure factor F(k) = k(S(k) - 1) interpolated to F(O) = 0
for k < 1q and clamped to F(k) = 0 for k > k
u
.
10.1.1 Formulation
In a diffraction experiment, the scattering of incident waves diffracted by a material is mea-
sured as a function of the wavelength of the incident waves (illustrated in figure 10.1). The
measured function, known as the static structure factor S(k) where k is the wavelength, may
only be measured over a finite range of wavelengths 1q ::; k ::; k
u
.
When diffracting neutrons, for example, the lower limit 1q is determined by experimental
errors due to slow moving neutrons and the upper limit k
u
is determined by the maximum
momentum which can be imparted to a neutron. However, the real-space radial distribution
function g(r), which conveys information about the atomic structure on length scales r around
lA, is related to S(k) by a Fourier sine transform over all k 2 0 [14]:
1 1
00
g(r) = 1 + -4- (S(k) - l)ksin(kr)dk
7rpor 0
In the remainder of this discussion we shall concentrate on the treatment of the subexpression
F(k) = k(S(k) - 1), known as the reduced static structure factor.
The missing data for 0 ::;. k < 1q and k > k
u
may be treated in several different ways. The
most naIve approach is to interpolate F(k) to F(O) = 0 for k < 1q and truncate k > k
u
by
10.1. MAXIMUM ENTROPY METHOD 203
setting F(k) = 0 in this range (illustrated in figure 10.2). The interpolation is typically very
reasonable, thanks to the linearity of the function in this region. However, the truncation is
a poor approximation to the true signal, which is expected to continue oscillating to much
higher k. Despite the fact that this truncation introduces severe oscillations when Fourier
transformed, this approach is commonly used in practice as more appropriate approaches are
regarded as being too difficult to implement. One such approach is the Maximum Entropy
Method (MEM), which we shall now describe and, amazingly, implement as a little OCaml
program.
The MEM regards observed data as constants and extends these data by adding variables.
The values of these variables are then determined by maximising a suitably chosen measure of
entropy with respect to the variables. In practice, the Shannon entropy of the discrete Fourier
power spectrum is often used as the measure of entropy.
Thus, the reduced static structure factor shown in figure 10.2 may be objectively extended to
arbitrarily higher k using the maximum entropy method. As F(k) appears in a Fourier sine
transform, we shall use the Shannon entropy of the sine transform.
10.1.2 Implementation
Our program is split into a lexer, a parser and the main program which performs the core
computation.
10.1.2.1 Lexer
We begin by defining a lexer "mem_lexer.mll". This lexer is based upon the parser "mem_parser.mly"
and tracks the current line number in order to provide helpful error messages for unexpected
input:
{
open Mem_parser
let line = ref 1
}
The lexer must be able to handle signed floating point numbers or integers. Thus we define
the necessary regular expressions:
let digit = [ '0' -' 9' J
let mantissa = digit+ ' .' digi t* I digi t* '.' digi t+
let exponent = [ , e ' , E' J [ , +' , -' J digit+
let floating = [, +' , -' J? mantissa exponent?
let integer = ['+' '-'J? digit+
The lexer contains a single rule which ignores whitespace, counts new lines and lexes curly
braces, commas and numbers (which are treated as floating-point numbers):
{ FLOAT(float_of_string (Lexing .lexeme lexbuf)) }
{ EOF }
{ failwith ("Mistake at line n-string_of_int !line) }
204 CHAPTER 10. COMPLETE EXAMPLES
rule token = parse
[' , '\t'J {token lexbuf}
I '\n' { iner line; token lexbuf }
I ,{, { OPEN}
I '}' { CLOSE}
I ',' { COMMA }
I floating
I integer
I eof
I _
As usual, the tokens used in the lexer are defined in and used by the parser.
10.1.2.2 Parser
The parser ''mem_parser.mly'' simply reads a list of comma separated numbers enclosed in
curly braces. The EOF, OPEN, CLOSE, COMMA and FLOAT tokens produced by the lexer are
declared first:
%token EOF OPEN CLOSE COMMA
The main rule of the parser simply returns a float list:
%start main
%type <float list> main
%%
The parser contains only two rules. The recursive tail rule interprets the remainder of the
list, finishing with a close brace:
tail:
I FLOAT COMMA tail
{ $1 :: $3 }
I FLOAT CLOSE
{ [$lJ };
The main rule interprets a whole list, beginning with an open brace:
main:
I OPEN tail EOF
{ $2 }
I OPEN CLOSE EOF
{ [J };
The result produced by this simple parser is then ready to be analyzed by the main program.
10.1. MAXIMUM ENTROPY METHOD
205
The program implementing the maximum entropy method uses the FFTW library (described
in section 8.5) to perform the discrete Fourier transforms. The OCaml bindings to this library
represent data as big arrays (described in section 8.3). Thus, we begin by opening the names-
pace of the Bigarray module in order to access its members without having to prefix them
with Bigarray. each time:
open Bigarray
We shall use the square root of the machine epsilon to determine the accuracy required by the
local minimisation algorithm:
let delta = sqrt epsilon_float
A map2 function over arrays will also be used:
let array_map2 f a b =
if len <> Array . length b then invalid_arg "array_map2";
Array.init len (fun i -> f a. (i) b. (i))
The local function minimization algorithm, which will be applied to the cost function f(x),
requires a function to compute \7xf. This is most simply achieved by computing:
(\7
x
fh = f(y(k)) - f(x)
8
where:
{
Xi i =1= k
Yi (k) = Xi +8 i = k
The may be implemented by the following numerical grad function:
(* Numerical approximation to the gradient of "f" at "x". *)
let n_grad f x =
let n = Array. length x in
let f_x = f x in
let f' = Array. create nO. in
for i = 0 to n - 1 do
let old_x_i = x.(i) in
x. (i) <- x. (i) +. delta;
f' . (i) <- (f x -. f_x) ;. delta;
x. (i) <- old_x_i;
done;
f'
Note that the old value of Xi is stored, rather than trying to recompute it using the expression
Xi +0 - 0 which would be prone to numerical error.
The local function minimization can be performed using the gradient descent algorithm. This
algorithm repeatedly tries to step in the opposite direction to the grad by an amount A:
xn+I = Xn - X'Vxf
If f(Xn+l) < f(x
n
) then the step is accepted and the step size A is increased slightly. If
f(Xn+l) 2. f(x
n
) then the step is not accepted and the step size A is greatly reduced. In
particular, steps which do not alter f(x) give f(Xn+l) = f(x
n
) to within machine precision
and are rejected. Consequently, when x is as close to a local minimum as possible, small
proposed steps will not alter f(x) and A will be reduced rapidly. The algorithm may then
terminate.
The following function implements this algorithm:
(* Gradient-ascent local-minimisation algorithm *)
let grad_descent f f' x =
let rec aux lambda x f _x =
if lambda < delta then x else
let new_x = array_map2 (fun x d -) x +. lambda *. d) x (f' x) in
let f_new_x = f new_x in
if f _new_x )= f x then aux (0.5 *. lambda) x f _x else
aux (1.1 *. lambda) new_x in
aux delta x (f x)
Note that the value of f(x
n
) is passed as an argument to the nested auxiliary function aux,
thus avoiding the need to recompute f(x
n
) each iteration.
The gradient descent and numerical grad functions may be combined to produce a higher-
order n_grad_descent function which will minimize the given function f using numerical
approximations to the grad '\Jxf:
(* Gradient ascent using numerical gradient. *)
let n_grad_ascent f = grad_descent f (n_grad f)
As described in section 8.5, a fourier function, to compute the discrete Fourier transform
of an array using the FFTW library, may be written by converting to and from big array
formats:
(* Fast Fourier Transform. *)
let (fourier, ifourier) =
let to_big a =
let big_a = Array1. create Fftw. complex c_layout (Array. length a) in
Array. i teri (fun i z -) Array1. set big_a i z) a;
big_a in
let of _big big3 =
Array.init (Array1.dim big_a) (fun i -) Array1.get big_a i) in
let fft norm dir a =
let (n, big_a) = (Array. length a, to_big a) in
of_big (Fftw.create dir n big_a) in
(fft false Fftw,forward, fft true Fftw.backward)
10.1. MAXIMUM ENTROPY METHOD 207
The discrete Fourier sine transform of an arrayl x = {O, Xl, ... , Xn-l} may be computed using
the fourier function by transforming to a double-length array with odd symmetry:
y = {O, YI, . .. ,Y2n-l} = {O,XI, ... ,xn-l, 0, -Xn-l, ,-Xl}
The Fourier sine transform y can then be extracted as the first half of the discrete Fourier
transform y of y:
ih = Yi tj i E {O ... n -l}
Thus, the Fourier sine transform is implemented by the following function:
(* Fourier Sine Transform in terms of FFT. *)
let fist a =
let aux i = {Complex.re = a.(i); im = O. } in
let aux i =
if i = 0 I I i = n then Complex. zero else
if i < n then aux i else
Complex.neg (aux (2 * n - i)) in
let b = Array. init (2 * n) aux in
let b = fourier b in
Array.init n (fun i -> b.(i).Complex.im)
The Shannon entropy H(x) is conventionally defined for a probability distribution x as:
H(x) = - L Xi InXi
i
where x is assumed to be normalised such that:
In order to compute the Shannon entropy of an unnormalised distribution, such as Yi, we
must account for the normalisation, giving:
H(x) = In ~ X i -~ X i l n X i
This is most easily computed by accumulating the two sums simultaneously:
(* Compute the Shannon entropy of the constants and variables. *)
let entropy consts vars =
let a = fist (Array. append consts vars) in
let aux (s, h) x =
let x = abs_float x in
if x < delta then (0. , 0.) else
(s+. x, h+. X*. log x) in
let s, h = Array. fold_left aux (0., 0.) a in
log s -. h
1 As we are analysing real-valued functions related by the Fourier sine transform, the first element (which
represents zero frequency) is always zero.
The main body of the program begins by parsing the command-line arguments in order to
obtain the desired length n to which the input data is to be extended:
let _ =
(* Parse command-line arguments. *)
let n =
let i ters = ref [] in
Arg.parse [] (fun s -> iters := s .. liters) "mem <n>";
match liters with
[n] -> int_of_string n
I _ -> invalid_arg "Usage: mem <n>" in
The data themselves are then loaded using the parser and converted into array form:
(* Load the experimental data as the constants. *)
let consts =
Mem_parser .main Mem_lexer. token (Lexing. from_channel stdin) in
let consts = Array. of_list consts in
The number of input data provided is referred to as i and a check is performed to ensure that
i < n:
let i = Array.length consts in
if n <= i then invalid_arg tIn too small";
The new variables vars are initialised to zero before being determined by locally maximizing
the Shannon entropy with respect to the variable values:
(* Locally maximise entropy. *)
let vars =
let vars = Array. init (n - i) (fun _ -> 0.) in
n_grad_descent (entropy consts) vars in
Finally, the resulting data are output in the same form as the input:
(* Output extended data. *)
let out = Array. to_list (Array. append consts vars) in
let out = List . map string_of_float out in
print_endline ("{"-(String.concat ", " out)-"}")
In the interests of efficiency, the lexer, parser and main program should be compiled into
native code before being executed.
This program, implementing the maximum entropy method approach to the extension of
experimentally observed diffraction data, may be compiled using:
10.1. MAXIMUM ENTROPY METHOD
F(k)
3
209
2
-1
k
-2
Figure 10.3: Reduced static structure factor F(k) extended to k 50.
$ ocamllex mem_lexer.mll
$ ocamlyacc mem_parser.mly
$ ocamlopt -c mem_parser.mli
$ ocamlopt -I +fftw -cclib -lfftw_stub bigarray.cmxa fftw.cmxa mem_lexer.ml
mem_parser.ml mem.ml -0 mem
The resulting executable mem may then be used to extend diffraction data.
10.1.2.5 Results
The mem program may be used to extend the experimentally observed data shown in figure 10.2.
The number of samples can be extended
2
from 910 to 2,048 in under 4 hours, the result of
which is illustrated in figure 10.3.
The ability of the maximum entropy method to extend such data is almost magical.
10.1.2.6 Optimisation
As this program spends most of its time computing FFTs, it is ideally written in a language
such as OCaml where FFTs are easily computed and the important, but not performance-
critical, remainder of the program may be written clearly and succinctly. However, there is
still room for optimisation.
In order to optimise this program, we must first consider improvements to the asymptotic com-
plexity. This is tricky because the complexity of the gradient descent algorithm is unknown.
However, two potential improvements spring to mind:
Use previous x
n
, f(x
n
) and [\i'f] (x
n
). For example, by using another approach to local
function minimization such as the conjugate gradient algorithm.
2We chose to extend to an integral power of two 2
11
= 2048 number of samples because the FFT is most
efficient when applied to products of small primes.
Use the correlation between the values of nearby Xi. For example, by partially solving
the problem for a subset of the variables and reusing the partial solution to create
progressively more complete solutions. This could be done by interpolating the values
of missing variables.
Having introduced a local function-minimization algorithm (gradient descent) in this section,
we shall now examine the topic of global function minimization.
10.2 Global minimization
Finding a deeper, more global minimum of an arbitrary function is a significantly more chal-
lenging problem than local function minimization, and is also extremely important in the
context of scientific computing. Several different global function minimization algorithms
exist, many of which make repeated use of local function-minimization algorithms.
Simulated annealing is one such algorithm. This is a Monte-Carlo approach
3
which considers
a randomly altered set of parameter values x E nand accepts or probabilistically rejects the
proposed change based upon the increase E ElR in f(x), fEn -t R If the change does not
increase f(x) (=? E 0) then the change is always accepted. If the change increases f(x)
(=? E > 0) then the change is randomly accepted with a probability P(E) = e-(3E for some
{3 E R This probabilistic process is repeated many times with progressively larger values for
{3.
As the value of {3 is analogous to (kBT)-l in thermodynamics, {3 may be considered to be
the inverse of a fictitious temperature. Increasing the value of {3 as the simulation progresses
therefore corresponds to cooling the system, hence the name simulated annealing. As the
fictitious temperature falls, proposed changes which increase the energy of the system are
progressively less likely to be taken, and the system tends to fall into local minima.
Unlike our local function minimization example, we shall use a discrete problem to demon-
strate global function minimization. Both discrete and continuous minimization problems are
commonplace in science. In particular, the task of annealing a real system of atoms may be
considered either continuously (in terms of the vector coordinates of the atoms r E (lR
3
)n [6])
or discretely (in terms of the nearest neighbour topology {N
1
.. . N
n
} [15]). Discrete, global
minimization problems also appear outside science. For example, in complicated routing prob-
lems such as integrated circuits and printed circuit boards in electronics.
We shall address the d-dimensional travelling salesman problem, defined by a list of vertex
coordinates r E (lRd)n. The task is to find the route which traverses each vertex in the graph
exactly once and has the shortest length. For a path P, defined as a list of vertex indices
PEnn where n= {1 ... n} for some n> 1, the path length l(P) is:
n-l
l(P) = L IrPi - rpi+l I
i==l
In theory, this problem may be solved exactly by considering all permutations of PEnn and
finding the shortest length permutation. However, the number of permutations is n! which,
3Known as mndomised algorithms in computer science.
10.2. GLOBAL MINIMIZATION
Old
Path
New
Path
211
- Removed
- Inserted
Figure 10.4: Reducing the length of a path by swapping the order of a pair (i, i + 1) of
adjacent vertices in the path P, showing: a) removed edges (red), and b) inserted edges
(blue).
even for reasonably small problems, is too vast to compute explicitly. Simulated annealing
provides a practical solution to this problem by relinquishing exactness in favour of an ap-
proximate solution.
10.2.1 The mutate function
The practical solution to this problem, using simulated annealing, requires an auxiliary func-
tion to mutate a given path. The capabilities of this function are very important. Indeed,
improvements to the mutate function are often more productive than any of the possible whole
program optimisations, even the more important low-level optimisations such as deforesting.
In fact, improvements to the mutate function are likely to fall into the category of algorith-
mic optimisations but this is difficult to prove rigorously as the complexities of the travelling
salesman problem are difficult to quantify in sufficient detail.
Understanding the ways in which the mutate function can be improved requires a deeper
knowledge of the mechanism by which simulated annealing finds possible solutions. Although
we shall concentrate on solving travelling salesman problem, the points made are equally
applicable to other applications of simulated annealing, including continuous optimisation
problems.
Essentially, simulated annealing improves upon the naIve approach of trying randomly selected
paths P by evolving the path gradually. In order to evolve the path gradually, simulated
annealing must use a function which mutates a path, only makes small alterations to the
path. However, the meaning of the phrase "small alterations to the path" is not obvious.
Intuitively, this might mean introducing mutations which leave most of the Pi unaltered. For
example, by altering only two vertex indices i and j in P at a time, by swapping Pi, with Pj
(illustrated in figure lOA). In theory, all possible paths, Le. permutations, can be reached by
repeatedly swapping pairs of elements in P.
In fact, this intuitive picture is horribly misleading. The phrase "small alterations to the
path" actually relates to exchanging small numbers of edges, Le. altering m edges at a time
where m(n) is 0(1). Thus, swapping pairs of adjacent vertices is only one of several ways to
exchange edges in the path. Several other forms of mutation are also possible, all of which
may be written in terms of O(n) adjacent-vertex swaps, Le. these mutations are asymptotically
faster:
212
Old
Path
New
Path
- Removed
- Inserted
Figure 10.5: Reducing the length ofa path by rotating the path P by one vertex, showing:
a) 1 removed edge (red), and b) 1 inserted edge (blue).
Old
Path
New
Path
P
j
- Removed
- Inserted
Figure 10.6: Reducing the length of a path by reversing the order of indices between an
arbitrary pair (i,j) in the path P, showing: a) 2 removed edges (red), and b) 2 inserted
edges (blue).
Old
Path
Intermediate
Path
<I.
.,:
( ...]
.........
New
Path
ff--+---' P
j
- Removed
....... Intermediate
- Inserted
Figure 10.7: Reducing the length of a path by moving a vertex from ito j in the path P,
showing: a) 3 removed edges (red), and b) 3 inserted edges (blue).
Rotation:
{PI, ,P
n
}
-4 {PI+i, ,Pn,PI",.,Pn-i}
Reversal:
{PI, ,Pi, ... ,Pj , ,P
n
}
-4 {PI, P
n
}
Splice:
{PI, , Pi, ... , Pj, , Pk, Pk+Il, P
n
}
-4 {PI, , Pi-I, Pj+Il ... ,Pj, Pk+l,'" ,P
n
}
213
We shall now define a program capable of approximating the solution to the travelling salesman
problem for arbitrary n and d using the rotation, reverse and splice mutations.
10.2.2 Efficiency
We shall use an unconventional implementation of this algorithm which is asymptotically
faster than the conventional, array-based implementation presented in most monographs [12].
Thus, before describing our implementation we shall review the conventional approach.
The rotate, reverse and splice functions are conventionally implemented by altering the
array of indices used to represent the path. For randomly chosen mutations, the number of
altered vertex indices is O(n) where n is the number of vertices on the path. As these O(n)
operations form the bottleneck of the whole algorithm they are conventionally optimised by:
Deforesting - rather than creating new arrays, the old arrays are altered in-place by
swapping elements.
Premature termination - the cost of a proposed mutation is calculated in 0(1) time
in terms of the lengths of added and removed edges. Only accepted mutations are then
performed explicitly.
However, these low-level optimisations do not improve the asymptotic complexity.
In theory, the rotate, reverse and splice functions only alter 0(1) edges in the path. Thus, a
better implementation should be able to improve upon the O(n) complexity of the conventional
approach. We shall use the simplest improvement of representing the path implicitly, as an
array of vertex indices and a composite indirection function. This approach begins by storing
the array explicitly and indirecting through the identity function. A mutation is represented
by:
The change in path length.
214
An indirection function f II ---* II (ll
necessary.
{O ... n - 1}) which reorders the indices as
x=vn
If a mutation is accepted then the path is altered implicitly by composing f with the current
indirection function 9 to give [f 0 g] (i) = f (g(i)) : II ---* ll.
When k indirections have been accumulated, the cost of vertex lookups in the path becomes
O(k) rather than 0(1). Thus, the number of indirections must be controlled in order to obtain
good performance.
Indirections can be removed by explicitly generating a new array PI = Pg(i) and replacing the
current indirection function 9 with the identity function. We shall refer to this as flattening.
As this requires copying each vertex index, flattening is O(nk). If the algorithm flattens
when k ?: x for some unknown x E JR., the average complexity of flattening is O(nk/x). The
average complexity of the O(x) indirections between flattens is O(xk). Thus, the asymptotic
complexity of this algorithm is optimal when:
nk =xk
x
Therefore, we shall flatten whenever k ?: vn, giving the mutate function O(vn) asymptotic
complexity.
This implementation is split into a lexer, parser and main program.
10.2.3.1 Lexer
We begin by defining a lexer and parser to load a description of the problem. The lexer,
described by the file "salesman_lexer.mll", understands floating-point numbers and begins by
opening the namespace of the parser and initialising the current line number to one:
{
open Salesman_parser
let line = ref 1
}
Regular expressions matching integer and floating point types are then defined before the
lexing rule:
let digit = [ '0'-'9' ]
let exponent = ['e' 'E' ] [ '+' '-' ] digit+
let floating = (digit+ '.' digiu I digiU '.' digit+) exponent?
The lexer contains a single lexing rule which ignores whitespace, emits CR tokens for new lines,
FLOAT tokens for floating-point numbers (in either usual or integer notation) and an EOF token
at the end of the input:
rule token = parse
[' , '\t 'J {token lexbuf }
'\n' { iner line; CR }
floating
[' 0' - '9' J+ { FLOAT(float_of_string (Lexing .lexeme lexbuf)) }
eof { EOF }
{ failwith ("Mistake at line "-string_of_int !line) }
As usual, the tokens used in the lexer are defined in and used by the parser.
10.2.3.2 Parser
215
The parser, described by the file "salesman_parser.mll", comprehends a list of vectors as lines
of space-separated floating-point numbers. The CR, FLOAT and EOF tokens are declared first:
%token CR EOF
This is followed by the definition of the entry point main and its expected type:
%start main
%type <float list list> main
%%
The parser uses two rules to parse input. The list rule reads a list of whitespace-separated
floating-point numbers ending with a new line:
list:
I FLOAT list
{ $1 :: $2}
CR
{ [J };
The main rule reads a list of these lists, ending with EOF:
main:
I list main
{ $1 :: $2}
I EOF
{ [J };
The result produced by this parser is then ready to be analyzed by the main program.
The main program, in the file "salesman.ml", begins with the definitions of infix operators
4
to
perform vector arithmetic:
let ( +1 ) = List.map2 (+. )
let ( - 1 ) = List. map2 ( -. )
let dot = List. fold_left2 (fun dab -> d +. a *. b) O.
let length a = sqrt (dot a a)
A helper function is then defined:
(* (i in {a .. n-1}, j in {O.. n-2}) -> k <> i in {a .. n-1} *)
let including i j = if j >= i then j + 1 else j
The including function is intended to act upon values i E {O ... n - 1} and j E {O ... n - 2}
to produce a value k =J. i E {O ... n - 1}.
Two helper functions for dealing with random values of type int are then defined:
(* 0 -> 0 1 n -> k in {a .. n-1} *)
let rand = function 0 -> 0 1 n -> Random. int n
(* (i, n) -> k <> i in {a .. n-1} *)
let rand_except i n = including i (rand (n - 1
The rand function is an alias to the Random. int function which, given n, returns a value of
type int in the range {O ... n - 1}. The rand_except function produces a random number
k=j:iE {O .... n-1}.
The main body of the program is then defined in the usual construct:
let
The command-line arguments are then parsed to extract the required number of iterations:
(* Extract the number of iterations as a command-line argument. *)
let iters =
let i ters = ref [J in
Arg.parse [J (fun s -> iters:= s .. liters) "salesman <iters>";
match! i ters with
[iJ -> int_oCstring i
1 _ -> invalid_arg "Usage: salesman <iters>" in
The input list of vertex coordinates is then parsed from standard input:
(* Parse the vertex coordinates from stdin. *)
let vert_coords =
let lexbuf = Lexing. from_channel stdin in
try Salesman_parser.main Salesman_lexer.token lexbuf
4Infix operator definitions are discussed in section A.3 of the appendices.
10.2. GLOBAL MINIMIZATION 217
Parsing errors raise the Parsing.Parse_error exception which is caught and handled by
naming the line number on which the error was noticed:
with Parsing.Parse_error ->
let line = string_of _int (! Salesman_lexer . line) in
failwith ("Syntax error at line "-line);
exit 1 in
The list of vertex coordinates vert_coords is then converted from a float list list into a
float list array:
let vert_coords = Array. of_list vert_coords in
The number of vertices is denoted n:
let n = Array.length vert_coords in
The path is initialised to {O ... n - 1} which, for randomly-ordered input, is a random path:
let path = Array. init n (fun i -> i) in
The number of indirections through rotate, reverse or splice is zero and the accumulated
indirection function is the identity function:
let indirects, indirect = ref 0, ref (fun i -> i) in
A function to get the indirected i
th
vertex on the path:
let get i = path. (! indirect i) in
An edge_length function to calculate the distance between a given pair of vertices on the
path, returning zero if either vertex is invalid:
(* Calculate separation of vertices i and j. *)
let edge i j =
if i>=O && i <n && j >=0 && j <n then
length (vert_coords.(get i) -I vert_coords. (get j))
else O. in
A rotate_cost function to calculate the change in path length due to a proposed rotation:
let rotate_cost i =
(edge (n - 1) 0) -. (edge (i - 1) i) in
The rotate function indirects array lookups to k using modulo arithmetic:
218
(* Rotate by i in {1 .. n-l}. *)
let rotate i k = (k + i) mod n in
A reverse_cost function to calculate the change in path length due to a proposed reversal:
let reverse_cost i j =
let edge i j =
if i >= 0 && i < n && j >= 0 && j < n then edge i j else O. in
edge (i -1) j +. edge i (j + 1) -. edge (i -1) i -. edge j (j + 1) in
The reverse function indirects array lookups to k, reversing the order of vertices between i
and j inclusive:
(* Reverse indices [i .. j]. *)
let reverse i j k =
if i <= k && k <= j then j + i - k else k in
The splice function is more sophisticated than the rotate and reverse functions. Indices
into the intermediate form (see figure 10.7) are first indirected through a function f. Indices
into the final form are indirected through a function 9 which indirects through f if required:
f(k)
g(k)
{
k k< i
k+l+1 i ~ k
{
f(k) k ~ j
k-j+i j k ~ j l
f(k-l-1) j+l<k
The f(k) function, also a function of 1 and i in the program, is productively factored from
splice_cost and splice:
let f 1 i k = if k < i then k else k + 1 + 1 in
The splice_cost function calculates the change in path length due to a proposed splice:
let splice_cost 1 i j =
let j 1, j2 = f 1 i (j - 1), f 1 i j in
edge j 1 i +. edge (i + 1) j 2 +. edge (i - 1) (i + 1 + 1)
-. edge (i - 1) i -. edge (i + 1) (i + 1 + 1) -. edge j1 j2 in
The splice function is easily implemented in OCaml:
(* Splice indices from [i, i+l] into {a .. j-1, ... , j .. n-1}. *)
let splice 1 i j k =
if k < j then f 1 i k else
if k <= j + 1 then k - j + i else
f 1 i (k - 1 - 1) in
10.2. GLOBAL MINIMIZATION 219
The mutate function randomly chooses to use either rotate, reverse or splice, passing
randomly generated arguments and returning a 2-tuple of the change in path length and the
indirection function:
(* Randomly rotate, reverse or splice. *)
let mutate 0 =
match rand 3 with
o ->
let i = 1 + rand (n - 1) in
rotate_cost i, rotate i
1 ->
let i = rand n in let j = rand_except i n in
let i, j = min i j, max i j in
reverse_cost i j, reverse i j
->
let 1 = rand (n - 1) in
let i = rand (n - 1) in
let j = rand_except i (n - 1 - 1) in
splice_cost 1 i j, splice 1 i j in
A path_length function to compute the total length of a path simply loops through the edges,
accumulating the edge lengths:
(* Compute the length of the given path. *)
let path_length path =
if n < 2 then O. else begin
let len = ref O. in
for i=O to n - 2 do
len := !len +. edge i (i + 1)
done;
!len
end in
The shortest path and its length are stored as a mutable 2-tuple, initialised to the initial path
and its length:
let ~ o r t ~ t - ref (!path, path_length !path) in
Some subjective algorithm is required to decrease the fictitious temperature as the simula-
tion progresses. We choose to decrease the temperature exponentially, increasing the inverse
temperature {3 from to to tl:
(* Initial and final inverse fictional temperatures, beta. *)
let to = 100. and tl = 10000. in
let beta = ref to in
This can be achieved by multiplying {3 by:
where p is the number of iterations, called i ters in this program, giving:
let delta = (tl I. to) ** (1. I. floaCof_int iters) in
The program then loops though i ters iterations:
for i = 1 to iters do
At each iteration, {3 is increased by multiplying by <5:
(* Lower the temperature. *)
beta := !beta *. delta;
A mutation to the path is proposed, returning the change in path length delta_E and an
indirection function f:
let delta_E, f = mutate () in
The mutation is accepted if it shortens the path or probabilistically accepted if it lengthens
the path:
if delta_E < O. I I Random. float 1. < exp (-. !beta *. delta_E) then
begin
The record of the current path length is updated.
len := !len +. delta_E;
The number of indirections is incremented:
incr indirects;
The new indirection function indirect is obtained by applying the new mutate function f to
a given index i before applying the current indirect
5
:
indirect: = let g = ! indirect in fun i -> g (f i)
end;
If the new path is believed to be the shortest path so far then the path length is computed
explicitly, to remove any accumulated errors:
if ! len < snd ! shortest then len : = path_length 0 ;
If the new path really is the shortest path so far then the mutable 2-tuple shortest is updated
to contain a copy of the path and its length:
5Note that! indirect has been factored out as this must be evaluated now. Delaying the evaluation of this
subexpression would result in the !indirect function executing itself and, therefore, looping indefinitely.
if ! len < snd ! shortest then
(* Accept the shortened path. *)
shortest := (Array.copy path, !len);
221
In order to be user friendly, the current shortest path is printed to stderr whenever the
current iteration number i satisfies log2 i E Z:
if 0 = i land (i - 1) then begin
let len = string_of _float (snd ! shortest) in
output_string stderr ("Length = "-len-"\n");
flush stderr;
end;
If the current number of indirections exceeds ..;n then the implicit indirections are flattened
by replacing the array representation of the path and implicit indirections by an explicit copy
of the path, zero indirections and an identity indirection function:
if ! indirects * ! indirects >= n then begin
Array.blit (Array.init n (fun i -> get i)) 0 path 0 n;
indirects : =0;
indirect := (fun i -> i);
end;
done;
In the interests of efficiency, the lexer, parser and main program should be compiled into
native code.
This program, which implements the simulated annealing approach to the travelling salesman
problem, may be compiled using:
$ ocamllex salesman_lexer.mll
$ ocamlyacc salesman_parser.mly
$ ocamlopt -c salesman_parser.mli
$ ocamlopt unix.cmxa salesman_parser.ml salesman_lexer.ml salesman.ml -0
salesman
For input in a file "verts.dat" of the form:
0.140791689359 0.582751366306
0.410471260708 0.330020920806
0.126148465548 0.315248400694
The salesman program may then be used to find a short path, stored in "path.dat", between
the vertices by performing, say, 10
4
iterations:
$ ./salesman 10000 <verts.dat >path.dat
Length = 26.7949782142
Length = 26.7949782142
Length = 25.5792771459
Length = 25.435009196
Length = 23.2092782558
Length = 19.5810248816
Length = 15.6036848984
Length = 13.4418641956
We shall now use this program to find short paths in a randomly generated array of vertices.
10.2.3.5 Results
The shortest paths found after various numbers of iterations of the full simulated annealing
implementation are shown in figure 10.8.
Measuring the number of iterations and shortest path found (illustrated in figure 10.9), the
more capable mutate function is clearly substantially more efficient.
10.3 Finding nth-nearest neighbours
In this section, we shall develop a program which uses some operations required by advanced
scientific programs:
Set-theoretic operations: union, intersection and difference.
Graph-theoretic operations: nth-nearest neighbours.
The graph-theoretic problem of finding the nth-nearest neighbours allows useful topological
information to be gathered from many forms of data produced by other scientific computations.
For example, in the case of simulated atomic structures, where topological information can
aid the interpretation of experimental results when trying to understand molecular structure.
Such topological information can also be used indirectly, in the computation of interesting
properties such as the decomposition of correlation functions over neighbour shells [16], and
shortest-path ring statistics [17].
We shall describe our unconventional formulation of the problem of computing the nth-nearest
neighbours of atoms in an atomic structure simulated under periodic boundary conditions
before describing a program for solving this problem and presenting demonstrative results.
10.3.1 Formulation
The notion of the nth-nearest neighbours Nt of a vertex i in a graph is rigorously defined by
a recurrence relation based upon the set of first nearest neighbours == Hi of any atom i:
n=O
n=1
n22
10.3. FINDING NTH_NEAREST NEIGHBOURS 223
a)
c)
e)
b)
d)
f)
Figure 10.8: Shortest paths found using simulated annealing after: a) 10
4
, b) 10
5
, c) 10
6
,
d) 10
7
, and e) 10
8
, and f) 10
9
iterations.
224
109
2
(I-I
min
)
10
...
8
6
4
2
5
-2
-4
Figure 10.9: Performance of different mutate functions as number of iterations i vs
excess path length l - lmin over the minimum path length lmin found, for: a) swapping
pairs of vertices only (blue), and b) rotate, reverse and splice (red).
As a recurrence relation, this computational task naturally lends itself to recursion. As this
recurrence relation only makes use of the set-theoretic operations union and difference, the
data structure manipulated by the recursive function is most naturally a set (described in
section 3.4).
In order to develop a useful scientific program, we shall use an infinite graph to represent
the topology of a d-dimensional crystal, Le. a periodic tiling. Computer simulations of non-
crystalline materials are typically formulated as a crystal with the largest possible unit cell,
known as the 8upercell. Conventional approaches to the analysis of these structures reference
atoms by their index i E Jr ={1. " N} within the origin supercell. Edges in the graph repre-
senting bonded pairs of atoms in different cells are then handled by treating displacements
modulo the supercell (illustrated in figure 10.10). However, this conventional approach is well-
known to be flawed when applied to insufficiently large supercells [17, 6], requiring erroneous
results to be identified and weeded out manually.
Instead, we shall choose to reference atoms by an index i = (i
o
, ii) where i
o
E Zd and ii E Jr.
This explicitly includes the offset i
o
of the supercell as well as the index ii within the supercell
(illustrated in figure 10.11). Neighbouring vertices in the graph representing the topology are
defined not only by the index of the neighbouring vertex but also by the supercell containing
this neighbour (assuming the origin vertex to be in the origin supercell at {O}d).
We shall now develop a complete program for computing the nth-nearest neighbours of a given
vertex with index i from a list of lists ri of the indices of the neighbours of each vertex. We
begin by defining a lexer and parser to interpret a file defining the graph-representation ri as
a list of space-separated integers on each line.
10.3. FINDING NTH_NEAREST NEIGHBOURS
" , Co 1" 1')" "
"" .. ,
Q
tl,O},,'
: ',';' 'f1,:-J1
" .,'
() <)?i '-' "
225
Figure 10.10: Conventionally, atoms are referenced only by their index i E ][ ={1 ... N}
within the supercell. Consequently, atoms i, j E J[ at opposite ends of the supercell are
considered to be bonded.
Figure 10.11: We use an unconventional representation which allows all atoms to be
indexed by the supercell they are in as well as their index within the supercell. In this
case, the pair of bonded atoms are referenced as ((0,0), ii) and ((1,0), ij), i.e. with i in the
origin supercell (0,0) and j in the supercell with offset (1,0).
10.3.2.1 Lexer
The "nth_lexer.m1l" file, defining the lexer, begins by opening the namespace of the parser
and initialising the line number to one:
{
open Nth_parser
let line = ref 1
}
Regular expressions floating and integer are then defined, capable of handling signed num-
bers:
let digit = [ '0' - '9' J
let exponent = [ 'e' 'E' J [ '+' '-' J? digit+
let floating = [ '+' ,-, J? (digit+ ' . ' digit* I digi t*
let integer = ['+' '-' J? digit+
, ,
digit+) exponent?
Lexing uses a single rule (called tOken) which ignores whitespace and new lines and produces
OPEN and CLOSE tokens for curly braces, a COMMA token for commas, INT and FLOAT tokens for
integers and floating-point numbers and an EOF token when the end of the input is reached:
rule token = parse
[, , '\t'J {token lexbuf }
I '\n' { iner line; token lexbuf }
I '{' { OPEN}
I ,}' { CLOSE}
I ',' { COMMA }
I integer { INT(int_of_string (Lexing.lexeme lexbuf)) }
I floating {FLOAT(float_of_string (Lexing.lexeme lexbuf)) }
I eof { EOF }
I _ { failwith (
lI
Mistake at line lI- s tring_of_int ! line) }
As usual, the tokens used in the lexer are actually defined in the parser.
10.3.2.2 Parser
The "nth_parser.m1y" file, defining the parser, begins with a header defining the tokens, entry
point and type returned by the parser:
%token EOF OPEN CLOSE COMMA
%token <int> INT
%start main
%type <float list * (float list * (int * int list) list) list> main
%%
The type returned by the parser contains an initial float list representing the vector extent
of the periodic supercell, followed by a list of vertex descriptions. Each vertex is described by
a float list giving the vertex coordinate and a list of neighbours. Each neighbour is defined
by an int index to another vertex and an int list giving the supercell offset.
A comma-separated list of integers enclosed in curly braces is parsed by the int_list and
int_list_tail rules:
int_list_tail:
INT COMMA int_list_tail
{ $1 :: $3 }
INT CLOSE
{ [$1J };
int list:
OPEN int_list_tail
{ $2 }
OPEN CLOSE
{ [J };
A comma-separated list of numbers enclosed in curly braces is parsed in a similar way using
a generic number rule which interprets floating-point numbers and integers, converting the
latter into floating-point numbers:
number:
FLOAT
{ $1 }
INT
{ float_oCint $1 };
This number rule is then used to parse lists of numbers:
float_list_tail:
I number COMMA float_list_tail
{ $1 :: $3 }
I number CLOSE
{ [$1J };
float_list:
I OPEN float_list_tail
{ $2 }
OPEN CLOSE
{ [J };
Neighbour information is parsed with the supercell offset appearing first, as an int list
6
,
followed by the int index of the neighbour:
neighbour:
I OPEN int_list COMMA INT CLOSE
{ $4, $2 };
6Supercell offsets could be represented by arbitrary-precision integers but the int type is more than satis-
factory in practice.
The description of a single vertex is parsed by decapitating the vertex coordinate and then
loading the list of neighbours:
vertex_tail:
I neighbour COMMA vertex_tail
{ $1 :: $3 }
I neighbour CLOSE
{ [$1] };
vertex:
I OPEN float_list COMMA OPEN vertex_tail CLOSE
{ $2, $5 };
The body of the input contains a list of vertex descriptions:
vertex_list_tail:
I vertex COMMA vertex_list_tail
{ $1 :: $3 }
I vertex CLOSE
{ [$1] };
vertex_list:
I OPEN vertex_list_tail
{ $2 };
Finally, the whole input is parsed as the supercell extent followed by the vertex list:
main:
OPEN float list COMMA vertex_list CLOSE EOF
{ $2, $4 };
The result of parsing the input is then ready to be analyzed by the main program.
The "nth.ml" file, containing the main program, may then be written. This begins with the
definition of a data structure to represent a set of vertices. The key of this set is a vertex
description, the int index and int list supercell offset:
(* A vertex in a set. *)
module VertexKey = struct
type t = int * int list
let compare = compare
end
We could have defined a custom compare function for the VertexKey module but, in the
interests of simplicity, we shall use the built-in polymorphic compare function.
A set of these elements may then be defined as:
(* A set of vertices. *)
module VertexSet = Set. Make (VertexKey)
229
The program then contains three helper functions (similar to those defined in chapter 9) used
to manipulate and act upon lists and sets of integers.
A function to initialise a list:
let list_init n f =
let rec aux 1 = function
o -> 1
I n -> aux (f n :: 1) (n-1) in
aux [J n
A function to add a pair of integer lists i
o
+jo:
let add_i = List .map2 ( + )
The higher-order list_rev_iteri function applies its function argument to each element of
the given list in turn:
let list_iteri f 1 =
ignore (List.fold_left (funn e ->f n e; n+1) 0 1)
The list_oLset and set_oLlist functions to convert between lists and sets of vertices:
let list_of_set s = VertexSet.fold (fun e 1 -> e: :1) s [J
let set_of_list 1 =
List. fold_left (fun s e -> VertexSet. add e s) VertexSet. empty 1
A list_map3 function, equivalent to the List .map2 func(;ion hut. act.ing over three lists simul-
taneously:
let rec list_map3 f 11 12 13 = match 11, 12, 13 with
h1: :t1, h2: :t2, h3: :t3 ->
f h1 h2 h3 :: list_map3 f t1 t2 t3
I [J, [J, [J - > [J
I _ -> invalid_arg "list_map3"
After these definitions, the main part of the program is nested within the conventional con-
struct:
let
The command-line arguments are parsed (see section 8.1) to extract the values of nand i:
(* Extract the centre-atom index "i" and neighbour shell "n". *)
let n, i =
let input = ref [J in
Arg.parse [J (fun x -> input := x :: !input) "nth_nn <n> ";
match! input with
[i; nJ -> int_of_string n, int_of_string i
I _ -> invalid_arg "Usage: nth_nn <n> " in
The lexer and parser are then used to interpret the input from stdin, returning a value of the
type int list list:
(* Load the supercell extent, atomic coordinates and bonds. *)
let supercell, pos, bonds =
let supercell, atoms =
try
let lexbuf = Lexing.from_channel stdin in
Nth_parser.main Nth_lexer.token lexbuf
In the event of an error during parsing (due to invalid input) the exception is caught and a
helpful error message generated which includes the line of input at which the error was noticed:
with Parsing. Parse_error ->
let line = string_of_int ! Nth_lexer . line in
print_endline ("Syntax error at line II-line);
exit 1 in
The list is the ideal data structure for loading the data because the number of vertices in the
graph is determined by the size of the input and, therefore, is not already known.
However, the core of the program will be randomly accessing the sets of nearest neighbours
for each vertex, before performing set-theoretic operations on these sets. Thus, the list of lists
of vertices may be productively converted into an array of sets of vertices. This is most easily
done by creating arrays and then filling in the vertex coordinate and nearest neighbour set for
each vertex using the list_iteri function:
(* Make an array of atomic coordinates and an array of sets of
nearest-neighbours. *)
let bonds = Array. create (List . length atoms) VertexSet. empty
and pos = Array. create (List . length atoms) [J in
let aux i (r, 1) =
pos. (i) <- r;
bonds. (i) <- set_of_list 1 in
list_iteri aux atoms;
This expression then returns the vector extent supercell of the supercell, the array pos of
vertex coordinates and the array bonds of nearest neighbour sets:
super cell , pos, bonds in
Before performing any neighbour computations, we perform some sanity checks on the input:
(* Check dimension consistency. *)
let
let d = List . length supercell in
231
The vertex coordinates are tested to make sure they are all of the same dimensionality as the
supercell:
let test 1 = assert (d = List . length 1) in
Array. fold_left (fun 0 -> test) 0 pos;
The neighbour supercell offsets are tested similarly:
let aux 0 1 =
VertexSet. fold (fun C, 1) 0 -> test 1) 1 () in
Array. fold_left aux 0 bonds;
assert (0 0) supercell in
The core of the program is the nth_nn function which computes the set of nth-nearest neigh-
bours of a vertex i. The nth_nn function is defined as a A-function in order to provide a local
hash table called memory:
(* Compute the "n"th nearest neighbours of "i". *)
let rec nth_nn =
The A-function implementing the nth_nn function accepts two arguments representing nand
i. However, i is represented by a 2-tuple giving the int index i and the int list supercell
offset io:
funn (i, io)->
In order to improve performance, the result of the nth_nn function is memoized (described in
section A.7) in the hash table memory. Thus, the function begins by checking the hash table
for a previously computed result, returning a previous result if one was found:
(* Look for a previous result. *)
try Hashtbl.find memory (n, i)
If a previous result was not found then the result is computed:
with Not_found -> match n with
The set of oth-nearest neighbours of a vertex i is the singleton set {i}:
o -> VertexSet. singleton (i, io)
The set of 1
st
-nearest neighbours of a vertex i E {1 ... N} was given in the input and is stored
as element i - 1 of the array bonds:
1 ->
let nn = bonds. (i - 1) in
If the vertex i is in the origin supercell then the neighbour is in the supercell given by its
offset:
if io = zero then nn else
Otherwise, the neighbour's offsets jo in the set should be translated by the offset io of the
vertex i using the add_i function:
let aux (j, j 0) s = VertexSet. add (j, add i io j 0) s in
VertexSet.fold aux nn VertexSet.empty
The nth-nearest neighbours for n > 1 are given by set-theoretic operations on the sets ~ n 2
and ~ n l denoted pprev and prev, respectively:
n ->
let pprev = nth_nn (n - 2)(i, io) in
let prev = nth_nn (n - 1) (i, io) in
The union:
can be computed using only t.wo lines of code by folding the union function over the set ~ n l :
let aux j t = VertexSet. union (nth_nn 1 j) t in
let t = VertexSet. fold aux prev VertexSet. empty in
The remainder of the computation involves removing the two previous neighbour shells:
let t = VertexSet. diff (VertexSet. diff t prev) pprev in
Finally, the result is stored in the hash table memory for future reference, before being returned:
Hashtbl.add memory (n, i) t;
t in
A function pos_of is then defined to compute a string representation of the vector coordinate
of a vertex at the given supercell offset, using the list_map3 function to offset the vertex
coordinate pes. (i-i) by the supercell offset io multiples of the supercell extent supercell:
(* String representation of the coordinate of atom "i" in
supercell "io". *)
let pos_of (i, io) =
let aux io s r = r +. s *. float_of_int io in
let r = list_map3 aux io super cell pos. (i - i) in
let r = List .map string_of_float r in
"{"- (String. concat ", II r) -"}" in
The program ends with the appropriate invocation of the nth_nn function, conversion of the
result to a string and output by printing to stdout:
(* Invoke "nth_nn". *)
let nn = list_of_set (nth_nn n (i, zero)) in
let nn = String. concat ", II (List.map pos_of nn) in
print_endline (II{"-nn-II}")
Before the components of this program may be executed, they must be compiled.
The lexer, parser and main program may be compiled into a native code executable nth using:
$ ocamllex nth_lexer.mll
$ ocamlyacc nth_parser.mly
$ ocamlopt -c nth_parser.mli
$ ocamlopt unix.cmxa nth_lexer.ml nth_parser.ml nth.ml -0 nth
For appropriate input "cfg", the nth executable may be invoked to find the nth-nearest neigh-
bour of the i
th
atom to produce a list of the coordinate8 of the neighbours in a file "neigh-
bours.dat" by:
$ . /nth n i <cfg >neighbours. dat
We shall now examine some demonstrative results obtained using this program.
10.3.3 Results
In condensed matter physics, the set of nth-nearest neighbours is often known as the n
th
_
nearest neighbour shell. This terminology is self-evident from the approximately-spherical
shape of a neighbour shell for n 1. Figure 10.12a shows the shell formed by the 150
th
_
nearest neighbours of a randomly chosen atom in a 10
5
-atom model of amorphous silicon. In
contrast, figure 10.12b shows the icosahedral shell formed by the 150
th
-nearest neighbours of
Figure 10.12: The 150
th
-nearest neighbour shells: a) 83,272 neighbours from a 105-atom
model of amorphous silicon [18], and b) 56,252 neighbours from a perfect diamond-
structure crystal.
an ideal diamond crystal. These examples are computed in only 10 minutes using the program
we have just presented.
These approximately-spherical shells in amorphous structures have been shown to propagate
order across surprisingly large distances, even in strongly disordered materials, and have been
shown to be responsible for some of the anomalous properties of amorphous materials [19].
Of course, these results are considerably more compelling when visualised using real-time 3D
graphics, e.g. a program similar to that developed in section 6.5.
10.4 Eigen problems
In this section, we shall develop a program which demonstrates two important concepts often
seen in scientific computing:
Matrix computations using LAPACK [11], specifically eigenvalue computation.
Properties of random matrices.
A wide variety of naturally occurring phenomena may be modelled in terms of vectors and
matrices. Most notable, perhaps, is the representation of quantum-mechanical operators as
matrices, the eigenvalues of which are well known to have special importance [20]. Solving
matrix problems can require various different forms of matrix manipulation, particularly forms
of factorization. One prevelant task is the computation of eigenvalues which we shall examine
here.
10.4. EIGEN PROBLEMS 235
An interesting avenue of theoretical research aims to elucidate the properties of random ma-
trices (ensembles of matrices with elements chosen randomly according to defined probability
distributions). Thus, we shall now develop a program capable of generating a random matrix
and computing its eigenvalues.
As this program only requires the desired extent of the matrix, there is no need for a lexer and
parser and the input can be read as a command-line argument instead. Thus, the program
consists of a single "eigen.ml" file. This file begins by opening the namespaces of the Bigarray
and Lacaml . S modules:
open Bigarray
open Lacaml . S
The Lacaml. S module provides functions for handling matrices using single (32-bit) precision
floating- point arithmetic.
A function f to create an n x m matrix as an array of arrays, using a given element-generating
function, may be written in terms of Array. init:
(* Initialise a matrix as an array of arrays. *)
let init_matrix n m f =
Array. init n (fun i -> Array. init m (f i))
The eigenvalues resulting from a solution appear along the diagonal of a 2D big array (big
arrays are discussed in section 8.3). The elements along the diagonal can be extracted as an
array using the function:
(* Extract the diagonal of a 2D big array as an array. *)
let array_of_diag m=
let n = Array2. dim1 m in
if n <> Array2.dim2 mthen invalid_arg "array_of_diag";
Array. init n (fun i -> Array2 .get m (i + 1) (i + 1))
The eigenvalues of a matrix can be solved using the geev function [11]. This function alters a
given big array in-place, leaving the eigenvalues along the diagonal. Thus, the eigenvalues of
a matrix may be computing using the function:
(* Compute the eigenvalues of a matrix. *)
let eigenvalues m=
let m = Mat. of _array m in
ignore (geev m) ;
array_of_diag m
The main body of the program begins by parsing the command-line arguments to extract the
desired number of rows and columns in the random matrix:
let
let n =
let n = ref [] in
Arg.parse [] (fun s -> iters := s .. liters) "eigen <n>";
match !iterswith
[n] -> n
I _ -> invalid_arg "Usage: eigen <n>" in
A random matrix is then generated using the ini t_matrix function and the eigenvalues Ak
are found using the eigenvalues function:
(* Compute the eigenvalues of a random matrix. *)
let f __ = float_of_int (2 * Random. int 2 - 1) in
let m = init_matrix n n f in
let lambda =eigenvalues m in
The resulting eigenvalues are sorted into ascending order and output to stdout:
(* Sort and output the eigenvalues. *)
Array.sort compare lambda;
let lambda =Array. to_list lambda in
let lambda = List .map string_oLfloat lambda in
print_endline ("{"- (String. concat ", II lambda) -"}")
This program can be used to compute the eigenvalues Ak of a randomly generated dense
matrix. For example, the eigenvalues of a randomly generated 1024 x 1024 matrix may be
generated and stored in the file "eigenvalues.dat" using:
$ ocamlopt -cclib -11apack2 -I +lacaml bigarray.cmxa lacaml.cmxa eigen.ml -0
eigen
$ ./eigen 1024 >eigenvalues.dat
This computation only tal"es 1-2 minutes.
10.4.2 Results
A famous result of random matrix theory is the semi-circle law of eigenvalue densities for
random n x n matrices from the Gaussian Orthogonal Ensemble (GOE):
P(A) = { ;1f vn - A
2
-vn < A< vn
o otherwise
Although the derivation of the semi-circle law only applies to GOE matrices, the distributions
of the eigenvalues found by this program (for M
ij
= 1) are also well approximated by the
semi-circle law (illustrated in figure 10.13).
In fact, matrix computations have given empirical evidence that the semi-circle law is far more
widely applicable than its current derivation would suggest [21, 22].
10.5. DISCRETE WAVELET TRANSFORM
n P(,l)
15
..,
10
5
-30 -20 -10
"
10
.
.
.'.
' .
30
237
Figure 10.13: The approximately semi-circular eigenvalue density P(>..) for a dense, ran-
dom, square matrix Mij = 1 with n = 1024, showing the prediction of the semi-circle
law (blue line) and the computed eigenvalue distribution (red dots).
10.5 Discrete wavelet transform
In this section, we shall examine a simple form of wavelet transform known as the Haar wavelet
transform. Remarkably, the definition of this transform is more comprehensible when given
as a program, rather than as a mathematical formulation or English description.
The Haar wavelet transform of a length n = 2
P
P 2: 0 E Z float list is given by the following
function:
# let haar 1 =
let rec aux 1 s d = match 1, s, d with
[sJ, [J, d -> s :: d
[J , s, d -> aux s [J d
hi: :h2: :t, s, d -> aux t (hi +. h2 .. s) (hi -. h2 :: d)
I _ -> invalid_arg "haar" in
aux 1 [J [J;;
val haar : float list -> float list = <fun>
For example, the Haar wavelet transform of the sequence (1,2,3,4, -4, -3, -2, -1) is the more
redundant sequence (0,20,4,4, -1, -1, -1, -1):
# haar [1. ; 2. ; 3.; 4.; -4.; -3.; -2.; -1. J ; ;
-: float list = [0.; 20.; 4.; 4.; -1.; -1.; -1.; -1.J
The aux function, nested inside the haar function, implements the transform by tail recursively
taking pairs of elements off the input list and prepending the sum and difference of each pair
onto two internal lists called s and d, respectively. When the input is exhausted, the process
is repeated using the list of sums of pairs as the new input. Finally, when the input contains
only a single element, the result is obtained by prepending this element (the total sum) onto
the list of differences. This algorithm is difficult to describe any other way.
The inverse transform may be written:
# let ihaar =
let rec aux 1 s d = match 1, s, d with
1, [J, [J -> 1
s, [J, d -> aux [J s d
t, hi: : s, h2: : d -> aux (0.5 *. (hi +. h2):: 0.5 *. (hi -. h2):: t) s d
I _ -> invalid_arg "ihaar" in
function [J -> [J I s::d -> aux [J [sJ d;;
# ihaar [0.; 20.; 4. ; 4.; -1.; -1.; -1.; -1. J ; ;
- : float list = [1.; 2.; 3.; 4.; -4.; -3.; -2.; -1.J
We shall now describe and formulate the fundamentals of wavelet transforms in order to put
. these functions into context.
All wavelet transforms consider their input (taken to be a function of time) in terms of oscil-
lating functions (wavelets) which are localised in terms of both time and frequency. Specif-
ically, wavelet transforms compute the inner product of the input with child wavelets which
are translated dilates of a single, mother wavelet. As the mother wavelet is both tempo-
rally and spectrally localised, the child wavelets (as dilated translates) are scattered over the
time-frequency plane. Thus, the wavelet transform of a signal simultaneously conveys both
temporal and spectral content simultaneously. This property is the foundation of the utility
of wavelets.
Discrete wavelet transforms of a length n input restrict the translation and dilation parameters
to n discrete values. Typically, the mother wavelet is defined such that the resulting child
wavelets form an orthogonal basis. In 1989, Ingrid Daubechies introduced a particularly
elegant construction which allows progressively finer scale child wavelets to be derived via a
recurrence relation [23]. This formulation restricts the wavelet to a finite width, a property
known as compact support. In particular, the pyramidal algorithm [24, 25] implementing
Daubechies' transform (used by the above functions) requires only O(n) time complexity,
even faster than the FFT. The Haar wavelet transform is the simplest such wavelet transform
and the haar function above implements all of these features.
The Haar wavelet transform is our last example to demonstrate the remarkable expressiveness
ofOCaml.
Bibliography
[1] E. Chailloux, P. Manoury, and B. Pagano, Developing applications with Objective Caml.
Cambridge, England: O'Reilly, 2000.
[2] X. Leroy, D. Doligez, J. Garrigue, D. Remy, and J. Vouillon, The Objective Caml system.
2004.
[3] C. A. Gunter and J. C. Mitchell, Theoretical Aspects of Object-Oriented Programming.
Boston, MA, USA: MIT Press, 1994.
[4] M. Abadi and L. Cardelli, A Theory of Objects. New York, USA: Springer-Verlag, 1996.
[5] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to algorithms.
Cambridge, MA, USA.: MIT Press, 200l.
[6] D. Frenkel and B. Smit, Understanding Molecular Simulation from Algorithms to Appli-
cations. New York, USA: Academic Press, 1996.
[7] W. Rankin and J. Board, "A portable distributed implementation of the parallel multipole
tree algorithm," in IEEE Symposium on High Performance Distributed Computing, (Los
Alamitos), pp. 17-22, IEEE Computer Society Press, 1995.
[8] D. E. Knuth, The Art of Computer Programming. Boston, MA, USA: Addison Wesley,
1997.
[9] J. Shewchuck, "Adaptive precision floating-point arithmetic and fast robust geometric
predicates," Discrete (3 Computational Geometry, vol. 18, no. 3, pp. 305-363, 1997.
[10] D. Shreiner, M. Woo, J. Neider, and T. Davis, OpenGL Programming Guide: The Official
Guide to Learning OpenGL, Version 1.4. Harlow, England: Addison Wesley, 2004.
[11] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz,
A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen, LAPACK Users' Guide.
Philadelphia, USA: Society for Industrial and Applied Mathematics, third ed., 1999.
[12] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipies
in C The Art of Scientific Computing. Cambridge, UK: Cambridge University Press,
1992.
[13] S. Kugler, L. Pusztai, L. Rosta, P. Chieux, and R. Bellissent, "The structure of evaporated
pure amorphous silicon. neutron diffraction and reverse monte carlo investigations," Phys.
Rev. B, vol. 48, p. 7685, 1993.
239
240 BIBLIOGRAPHY
[14] J.-P. Hansen and I. R. McDonald, Theory of Simple Fluids. New York, USA: Academic
Press, 1990.
[15] N. Mousseau and G. T. Barkema, "Thaveling through potential energy landscapes of
disordered materials: the activation-relaxation technique," Phys. Rev. E, vol. 57, p. 2419,
1998.
[16] S. R. Elliott, The physics and chemistry of solids. New York, USA: John Wiley & sons,
2000.
[17] D. S. Franzblau, "Computation of ring statistics for network models of solids," Phys. Rev.
B, vol. 44, no. 10, pp. 4925-4930, 1991.
[18] G. T. Barkema and N. Mousseau, "High-quality continuous random networks," Phys. Rev.
B, vol. 62, p. 4985, 2000.
[19] J. D. Harrop, Structural properties of amorphous materials. PhD thesis, Cambridge
University, UK, 2004.
[20] S. Gasiorowicz, Quantum physics. London, England: John Wiley and Sons, 2003.
[21] T. A. Brody, J. Flores, J. B. French, P. A. Mello, A. Pandey, and S. S. M. Wong, "Random-
matrix physics - spectrum and strength fluctuations," Rev. Mod. Phys., vol. 53, p. 385,
1981.
[22] P. A. Lee and T. V. Ramakrishnan, "Disordered electronic systems," Rev. Mod. Phys.,
vol. 57, p. 287, 1985.
[23] I. Daubechies, "Orthonormal bases of compactly supported wavelets," Comm. Pure Appl.
Math., vol. 41, p. 909, 1988.
[24] S. Mallat , "Multiresolution approximations and wavelet orthonormal bases of L2(R),"
Transactions of the American Mathematical Society, vol. 315, no. 1, pp. 69-87, 1989.
[25] S. G. Mallat, "A theory for multiresolution signal decomposition: the wavelet represen-
tation," IEEE Trans. on Patt. Anal. and Mach. Intel.,. vol. 11, no. 7, pp. 674-693, 1989.
Appendix A
Advanced Topics
A.I Data sharing
An important concept which sometimes arises in OCaml is the ability to share mutable data
between several data structures. This can be achieved using double indirection, e.g. a reference
to a reference to the shared data.
For example, this creates a (trivial) imperative data structure which we wish to share:
# let data = ref 3;;
val data: int ref = {contents = 3}
Two different data structures may then reference data:
# let a = ref data;;
val a : int ref ref = {contents = {contents =3}}
# let b = ref data;;
val b : int ref ref ={contents ={contents =3}}
The original data is now shared between a and b (illustrated in figure A.1). Therefore, the
contents of a and of b may be altered simultaneously by altering data:
# data := 4; ;
- unit = 0
# a;;
- : int ref ref ={contents ={contents =4}}
data 3
Figure A.I: By sharing a reference data to the value 3 between two references a and b,
the value 3 may be shared.
241
242 APPENDIX A. ADVANCED TOPICS
The ability to share data in this way is useful in many circumstances.
However, an important caveat arises from the sharing of doubly indirected references. The
make function in the Array module creates an array with all elements set to the given value.
Thus, if the given value is mutable (e.g. a reference) then, as mutable data structures them-
selves, the elements of the resulting array will all share the same data. For example, the
following array will have all elements sharing a single reference to zero:
# let a :::: Array.make 3 (ref 0);;
val a : int ref array = [I {contents:::: O}; {contents = O}; {contents:::: O} I]
Altering any element in the array then affects the value of all other elements in the array:
# a. (0) := 7; a;;
- : int ref array:::: [I {contents:::: 7}; {contents:::: 7}; {contents = 7} I]
This is usually not the desired behaviour. Solutions to this problem are given in section B.6.
A.2 Labelled and optional arguments
The OCamllanguage supports two unusual forms of function argument:
Labelled arguments allow function arguments to be named and then supplied in any order.
Optional arguments allow function arguments to be omitted, replaced either by a default
value (specified in the definition of the function) or by an option type.
Labelled arguments use the syntax - arg : var where arg is the name of the argument label
and var is the name of the variable bound to this argument in the body of the function. For
example, the ipow function may be written using named arguments:
# let rec ipow -x: x -n: n :::: if n=O then 1. else x *. ipow -x: x -n: (n-1) ; ;
val ipow : x:float -> n:int -> float:::: <fun>
The arguments to this function may be specified conventionally, ignoring the labelling:
# ipow 5. 3;;
- : float:::: 125.
Alternatively, the labelled arguments to this function may then be specified in any order when
the function is called by referring to them by name:
# ipow -n:3 -x:5.;;
- : float:::: 125.
In particular, this facility may be used to create more specialised functions by specifying some
of the arguments of more general functions in any order. In the case of ipow, we are more
likely to want to raise to the power of a constant rather than to raise a constant to any given
power. For example, we may wish to define a function to cube a given number:
A.2. LABELLED AND OPTIONAL ARGUMENTS
# let pow3 = ipow -n:3;;
val pow3 : x: float -> float = <fun>
# pow3 5.;;
- : float = 125.
243
The ability to name function arguments is of particular use when dealing with complicated
interfaces to libraries, such as the interfaces to graphical libraries discussed in chapter 6.
Optional arguments are defined using the syntax? (var=val) where var is the argument and
variable name and val is the default value of this variable, used if no value is specified. When
called, optional arguments are specified using the same syntax as labelled arguments. For
example, the following function creates a vector from the given pair of values, defaulting
either to zero if unspecified:
# let make_vee ? (x=O .) ? (y=O .) () = (x, y) ; ;
val make_vee: ?x:float -> ?y:float -> unit -> float * float = <fun>
Applying this function with both, only one or none of its arguments results in the omitted
coordinates being substituted with zero:
# make_vee -x: 1. -y: 2. 0;;
- : float * float = (1., 2.)
# make3ee -y:2. 0;;
- : float * float = (0., 2.)
# make3ee 0;;
- : float * float = (0., 0.)
In order to infer which optional arguments have not been specified, optional arguments must
always be accompanied by a non-labelled argument. Hence the trailing value of type unit
in the previous example. The compilers spot failure to specify a non-labelled argument and
complain:
# let f ? (x=O .) = 0;;
Warning: This optional argument cannot be erased
val f : ?x:float -> unit = <fun>
Optional arguments may also be specified without default values, in which case the corre-
sponding variable in the function body becomes an option type. This is best illustrated by a
function which returns the argument it receives:
# let f ?x 0 = x; ;
val f : ?x:' a -> unit -> 'a option =<fun>
Note that the return type of this function is 'a option, rather than simply 'a.
Specifying the optional argument results in a Some value being passed to the function:
# f -x:5 0;;
- : int option = Some 5
Omitting the optional argument results in None being passed to the function:
# f ();;
- : 'a option = None
A short-hand notation exists for function definitions and calls with labelled arguments when
the argument or variable name is the same as that of the label. In such cases, the argument
or variable name and preceding colon may be omitted, Le. -x:x may be written -x.
Optional arguments are particularly useful in the context of interfaces, such as libraries im-
plementing graphical user interfaces, where optional arguments allow full functionality to be
accessible whilst providing a simpler alternative for specifying common subsets of function
arguments. This is used to good effect in the glut bindings for OCaml, described in chapter 6.
A.3 Defining binary infix operators
In the context of scientific computing, data types are often used to represent mathematical
objects, such as vectors, matrices, quaternions and hyper-complex numbers. However, mathe-
matical expressions written in terms of function calls are obfuscated compared to conventional
mathematical notation.
As we have already seen, the arithmetic binary infix operators may be referred to as conven-
tional functions by enclosing them in parentheses. For example, integer addition:
#(+);;
- : int -> int -> int = <fun>
# ( + ) 3 4;;
- : int = 7
Similarly, binary infix operators may be defined by writing the function name in this syntax.
For example, a += operator which increments and returns an int ref:
# let ( += ) a b = a := ! a + b; a;;
val ( +=) : int ref -> int -> int ref = <fun>
# (ref 3) +=4;;
- : int ref = {contents = 7}
A handy trick when defining infix operators over a type is to define the infix operators in a
module called Inf ixes which is nested within the module which defines the type and associated
functions. This allows the namespace of the Name. Infixes module to be opened, providing
access to the infix operators without providing access to other functions implemented by the
module. This trick is exploited by the GMP bindings for arbitrary-precision integer arithmetic
(Gmp. Z. Infixes).
A.4. INSTALLING TOP-LEVEL PRETTY PRINTERS
A.4 Installing top-level pretty printers
245
In section we created a FloatRange module for handling ranges [l, u). However, when playing
with this module from the top-level, values of type FloatRange. t are printed <abstr> as the
type t is abstract:
# let a = FloatRange.make 1. 3. and b =FloatRange.make 2. 5.;;
val a : Range. t = <abstr>
val b : Range. t = <abstr>
The ability to print such values in the form [l, u) would be very useful.
This can be achieved by installing a custom pretty printer in the top-level, for printing any
values of the type FloatRange. t. We shall begin by defining a function capable of converting
such a value to a string:
# let string_of _floatrange r =mat ch FloatRange. to_pair r with
(1, u) -> "["-string_of_float 1-", "-string_of_float u-")";;
val string_of _floatrange : FloatRange. t -> string = <fun>
A function to print a range to a given format stream! may then be written:
# let print_floatrange f r = Format. fprintf f "%s" (string_of _floatrange r);;
val print_floatrange : Format. formatter -> FloatRange. t -> unit = <fun>
This function may then be installed as the top-level pretty printer for values of the type
FloatRange. t:
# #install_printer print_floatrange;;
For example:
# let a = FloatRange .make 1. 3. and b = FloatRange .make 2. 5.;;
val a : FloatRange. t = [1., 3.)
val b : FloatRange. t = [2., 5.)
This functionality can be useful in many circumstances, particularly when dealing with math-
ematical constructs.
A.5 Monomorphism
Although we have previously dedicated little discussion to the topic of monomorphic types,
such as ' _a, we have occasionally presented code which has been deemed to contain monomor-
phic types by OOaml. For example, the type of an empty hash table, as discussed in section
3.5:
IThe Format module described in the manual [2].
246
# Hashtbl.create 1;;
- : (' _a, '_b) Hashtbl. t = <abstr>
APPENDIX A. ADVANCED TOPICS
Monomorphic types can also appear as the result of performing an 'I]-reduction, when a closure
is formed by the application of some of a polymorphic function's arguments. For example,
when applying the first argument to the List. map2 function:
# let combine a b = List.map2 (fun a b -> (a, b)) a b;;
val combine: 'a list -> 'b list -> ('a * 'b) list = <fun>
# let combine'" List .map2 (fun a b -> (a, b)); ;
val combine: '_a list -> '_b list -> (' _a * '_b) list = <fun>
They can even appear as the result of seemingly trivial expressions. For example, although
the list containing the empty array is polymorphic, the array containing the empty list is
monomorphic:
#[[IIJJ;;
-: 'a array list = [[IIJJ
# [I [J 1J ; ;
- : '3 list array = [I [J 1J
The appearance of monomorphic types can be something of a mystery. Firstly note that,
whereas a polymorphic type 'a denotes an expression which is valid for all types 'a, a
monomorphic type' _a denotes an expression which is valid for some specific type' _a. Con-
sequently, a monomorphic type will be ossified into a specific type as soon as its type can be
inferred.
Thus, monomorphism is often a result of mutability, as demonstrated by the last example.
The empty array is not mutable. Therefore, the list containing the empty array is not mutable.
Consequently, the type of the list containing the empty array is truly polymorphic. In contrast,
the array containing the empty list is mutable - the empty list could be replaced by a list of
elements of some particular type.
This appearance of monomorphic types can be undesirable. For example, the 'I]-reduced com-
bine function was, most likely, intended to be a polymorphic function. Fortunately, monomor-
phic types can be generalised to polymorphic types by wrapping the offending expression in
a polymorphic function ('I]-expansion).
A.6 Functors
As we have seen, functors act as functions which map modules to modules, such as the
Set. Make and Map. Make functors introduced in chapter 3. In the case of the Set and Map
modules, a functor is used to enforce the correct use of operations between sets and maps,
such as set unions.
The functionality of Set and Map could have been provided without functors:
Each of the member functions could have been made to accept the comparison function
as an argument. However, users of Set and Map could then accidentially pass the wrong
comparison function, resulting in unexpected and undefined behaviour.
A. 7. MEMOIZATION 247
The functor could be replaced by a higher-order function which accepts the compari-
son function and returns a record containing all of the member functions. Sharing the
comparison function improves safety. However, the resulting records are then only dis-
tinguished by their type. Therefore, the compiler would not ensure that functions from
one record were not accidentally applied to data from another record which happened
to have the same type, e.g. when the records represents sets of integers with different
comparison functions. Such a mistake would also result in unexpected and undefined
behaviour.
Thus, the use of functors in the Set and Map modules improves safety.
Functors can be used not only to add safety assurance but also to provide a form of speciali-
sation. For example, a particle simulator may be written as a functor which maps a module
representing a particle onto a module representing a simulation of such particles.
A.7 Memoization
Caching is a productive way to optimise a program or function. In many cases, when the result
of a function depends only upon its argument values and the function produces no side-effects,
a mapping from arguments to the resulting return value can be used to cache the effect of the
function. This is known as memoization.
For example, the following function fib computes the nth Fibonacci number:
# let ree fib n = if n < 3 then 1 else fib (n - 2) + fib (n - 1);;
val fib: int -> int = <fun>
# Array.init 10 (fun i -> fib (i + 1;;
- : int array = [11; 1; 2; 3; 5; 8; 13; 21; 34; 551J
This implementation is quite slow, even for small n. A higher-order timer function can be
used to measure the performance of different implementations of the fib function:
# let time f x =
let t = Sys.time 0 in let fx = f x in fx, Sys.time 0 - t;;
val time: ('a -> 'b) -> 'a -> 'a * float = <fun>
For example, using this fib function to compute fib 35 takes 2.26 seconds:
# time fib 35;;
- : int * float = (9227465, 2.26)
An important cause of the inefficiency of this function is the lack of reuse of previous results.
This can be addressed by caching the return value of the fib function for each argument value
n in a hash table memory:
# let rec cached3ib =
fun n ->
try Hashtbl.find memory n
with Not_found ->
let fn ::=
if n < 3 then 1 else cached_fib (n - 2) + cached_fib (n - 1) in
Hashtbl.add memory n fn;
fn; ;
val cached_f ib : int -> int ::= <fun>
Caching the effect of the recursive fib function greatly improves its performance, to the extent
that the previous benchmark now takes an immeasurably small time to execute:
# time cached_fib 35;;
- : int * float = (9227465,0.)
Moreover, the process of caching the effect of a recursive single-argument function, such as the
fib function, can be factored out into a higher-order function. However, this is non-trivial in
the case of recursive functions as they must call the memoized version of themselves and not
the original unmemoized version. This requires the fib function to be rewritten in the form
of a higher-order function which accepts the function f which it is to call as an argument:
# let rec fib' f n::= if n<3 then 1 else f (n - 2) + f (n - 1);;
val fib' : int -> int = <fun>
Functions such as this can then be memoized using the higher-order memoizel function:
# let memoize1 f =
let cache =Hashtbl. create 1 in
let rec f' n::=
try Hashtbl.find cache n
with Not_found -> (fun fn -> Hashtbl. add cache n fn; fn) (f f' n) in
f'; ;
val memoize1 : ' a -> 'b) -> 'a -> 'b) -> 'a -> 'b = <fun>
For example, a memoized variant of the original fib function may be created by simply
applying the unmemoized fib function to the memoizel function:
# let memoized_fib::= memoize1 fib';;
val memoized_fib : int -> int = <fun>
# time memoized_fib 35;;
- : int * float = (9227465, 0.)
The memoizel function may be productively used to memoize many other functions.
A.8. POLYMORPHIC VARIANTS
A.8 Polymorphic variants
249
The OCamllanguage allows the use of a special type known as a polymorphic variant, the
names of which are denoted by an initial back-tic, e.g. 'Stomp. The utility of polymorphic
variants lies in their typing.
Polymorphic variants are used in the same way as variant types:
# let extract 1 = function
'None -> 0
I 'Some a -> a;;
val extract 1: [< 'None I 'Some of int J - > int =<fun>
Note that there was no need to declare the polymorphic variants 'None and 'Some. Instead,
the type [< 'None I 'Some of int ], attributed to the argument of the extract 1 function,
denotes any subset of the set containing these two types. For example:
# let a = [ 'None; 'Some 1; 'Some 2 J;;
val a: [> 'None I 'Some of int J list = ['None; 'Some 1; 'Some 2J
# List.map extract1 a;;
- : int list = [0; 1; 2J
Writing this function in a different style results in a different inferred type for the polymorphic
variant argument:
# let extract2 = function
'Some a -> a
I _ -> 0;;
val extract2: [> 'Some of int J -> int = <fun>
In this case, the type [> 'Some of int ] denotes any superset of the set containing 'Some.
For example:
# let a = [ 'None; 'Some 3; 'Other J;;
val a: [> 'None I 'Other I 'Some of int J list =
['None; 'Some 3; 'OtherJ
# List.map extract2 a;;
- : int list = [0; 3; OJ
Polymorphic variants can be said to "weaken the type system" as they allow a wider range of
potentially incorrect uses compared to conventional variants. However, polymorphic variants
can be used in many productive ways. Most simply, polymorphic variants can be used to evade
the verbosity associated with namespaces. For example, the lablGL bindings to OpenGL
(described in chapter 6) uses polymorphic variants to refer to the considerable number of
enumerated values used by OpenGL.
Some discussion of the implementation and use of polymorphic variants is given in the litera-
ture [2].
A.9 Phantom types
Thus far, we have only examined straightforward use of the OCaml type system. This type
system is clearly very sophisticated, automating a great many tedious and error-prone chores.
In addition to simple use, the type system can be exploited in some non-trivial ways.
In particular, types may be made to include constraints which are then statically verified
by the compiler. As OCaml is statically typed, the process of verifying the correct use of
the constraints is performed at compile-time. Consequently, these methods do not incur any
run-time performance cost.
For example, when writing a program which deals with both sorted and unsorted lists, a
module can be written which implements sorted lists, verifying their correct use at compile-
time:
# module SortedList
sig
type ('a, 'b) t
val sorted_of: 'a list -> ('a, [ 'Up]) t
val rev_up: ('a, [ 'Up J) t -> ('a, [ 'Down J) t
val rev_down: ('a, [ 'Down J) t -> ('a, [ 'Up J) t
val list_of : ('a, [< 'Up I 'DownJ) t -> 'a list
end =
struct
type ('a, 'b) t = 'a list
let sorted_of 1 = List. sort compare 1
let rev_up 1 = List. rev 1 and rev_down 1 = List. rev 1
let list_of 1 = 1
end; ;
module SortedList
sig
type (, a, ' b) t
val sorted_of: 'a list -> ('a, [ 'Up J) t
val rev_up: ('a, [ 'Up J) t -> ('a, [ 'Down J) t
val rev_down: ('a, [ 'Down J) t -> ('a, [ 'Up J) t
vallist-..of : ('a, [ 'Up I 'Down J) t -> 'a list
end
In this case, sorting a list oftype ' a list using the SortedList. sorted_of function produces
a list of type (' a, ['Up]) SortedList. t:
# let 1 = SortedList. sorted_of [1; 3; 2; 8; 6; 9J ; ;
vall: (int, [ 'Up J) SortedList. t = <abstr>
# SortedList.list_of 1;;
- : int list = [1; 2; 3; 6; 8; 9J
The phantom type [' Up] in the type of 1 is used to indicate that the elements in the list are
in non-descending order. Applying the SortedList . rev_up function to 1 produces a list in
descending order:
A.10. EXPONENTIAL TYPE GROWTH
# let rl =SortedList. rev_up 1; ;
val rl : (int, [ 'Down J) SortedList.t = <abstr>
# SortedList.list_of rl;;
- : int list = [9; 8; 6; 3; 2; lJ
251
Again, the type [' Down] is used to convey whether or not the list is sorted, in this case in
non-ascending order.
The OCaml type system will enforce the appropriate use of functions restricted by their
phantom type. For example, trying to apply the rev_up function to a down sorted list results
in a type error caught at compile-time:
# SortedList.rev_up rl;;
This expression has type (int, [ 'Down J) SortedList. t
but is here used with type (int, [ 'Up J) SortedList. t
These two variant types have no intersection
Thus, phantom types may well be useful in the context of scientific computing. For example,
to enforce the correct use of different types of similar mathematical objects, such as row and
column vectors.
A.IO Exponential type growth
Surprisingly, the ML type inference algorithm has exponential complexity. In particular,
the inferred type of an expression may grow exponentially with the size of the expression.
For example, the following nested expressions results in a type which grows exponentially in
complexity for each repeated nesting:
# let x x y z = z x y in
let x y = x (x y) in
x; ;
- : 'a -> 'b -> 'c -> 'd -> 'e -> 'f -> 'g -> 'h -> 'i -> 'j -> 'k->
'1-> 'rn -> 'n -> '0 -> 'p -> 'q -> ('a -> 'q -> 'r) -> 'r) -> 'p -> 's) ->
's) -> '0 -> 't) -> 't) -> 'n -> 'u) -> 'u) -> 'rn -> 'v) -> 'v) -> '1 -> 'w) -> 'w) -> 'k
-> 'x) -> 'x) -> 'j -> 'y) -> 'y) -> 'i -> 'z) -> 'z) -> 'h -> 'ai) -> 'ai) -> 'g -> 'bi)
-> 'bl) -> 'f -> 'c1) -> 'ci) -> 'e -> 'dl) -> 'di) -> 'd -> 'ei) -> 'ei) -> 'c -> 'fi)
-> 'fl) -> 'b -> 'gi) -> 'gl = <fun>
Exponential types are totally useless but fun.
Appendix B
Troubleshooting
This appendix describes and solves some non-trivial problems typically encountered when
learning the OCaml language.
B.l Dangerous if
The if construct is easily misused in the absence of sufficient bracketing. Specifically, many
functions begin with preliminary tests, such as sanity checks, which are often written as single-
line if constructs such as:
if n=O then 1 else
n * factorial (n-1);;
# factorial 5;;
- : int = 120
In this case, the expression following the else may be equivalently bracketed in begin and
end:
if n=O then 1 else begin
n * factorial (n-1)
end; ;
val factorial : int -> int = <fun>
# factorial 5;;
- : int = 120
However, the latter form is mOi:lt robui:lt to i:leemingly i:limple alteratioIli:l. For example, all
incorrect attempt to supplement the termination of the function with some printed, debugging
output:
if n=O then print_endline "Finished! "; 1 else
n * factorial (n-1);;
Syntax error
253
254 APPENDIX B. TROUBLESHOOTING
In this case, the language is trying to parse this input as the non-sensical:
let rec factorial n =
if n=O then print_endline "Finished!";
1
Bracketing can save the day:
if n=O then (print_endline "Finished!"; 1) else
n * factorial (n-1) ;;
# factorial 5;;
Finished!
- : int = 120
More confusingly, the grammar of the OCaml language is also eager to bind the first valid
expression after an else. An incorrect attempt to print debugging information after the else
is caught as a type error:
if n=O then 1 else
print endline "Working... ";
n * factorial (n-l);;
This expression has type unit but is here used with type int
In this case, the language has parsed this input as an attempt to present the result of the
expression print_endline "Working ... ", which is of type unit, as an alternative in the if
expression to the 1 of type into
Such mistakes will not always be caught by the type checker. The following example is
supposed to print out whether or not the given integer is an integer power of two, incrementing
a counter if it was not:
# let counter = ref 0;;
val counter: int ref = { contents = 0 }
# let ipow_oL2 i =
if i land (i - 1) = 0 then print_endline "yes" else
incr counter;
print_endline "no";;
val ipow_of_2 : int -> unit = <fun>
Applying this function to a value which is not an integer power of two produces the expected
response:
# ipow_oL2 3;;
no
- : unit = 0
B.2. SOOPING SUBTLETIES 255
However, when applied to an integer power of two, the ipow_of_2 function appears somewhat
indecisive:
# ipow_of_2 4;;
yes
no
- : unit = 0
In fact, the function is incorrectly printing "no" in all cases because only the incr counter
expression has been associated with the else of the if construct, Le. the function was parsed
as:
# let ipow_of_2 i =
if i land (i - 1) = 0 then
print_endline "yes"
else
incr counter;
print_endline "no";;
val ipow_of_2 : int -> unit = <fun>
This problem is easily fixed by inserting correct bracketing:
# let ipow_oL2 i =
if i land (i - 1) = 0 then print_endline "yes" else begin
incr counter;
print_endline "no"
end; ;
val ipow_oL2 : int -> unit = <fun>
Giving:
# ipow_oL2 3;;
no
- : unit = 0
# ipow_oL2 4;;
yes
- : unit = 0
as expected.
In general this problem can be avoided by adhering to a simple guideline: bracket all non-
trivial expressions appearing after the then or else keywords. Notably, this guideline may be
broken when the expression raises an exception and, therefore, control will never propagate
beyond this point.
B.2 Scoping subtleties
The region of a program in which a bound variable name may be referred to is known as
the scope of the variable. Although scoping is relatively simple in the context of imperative
languages, the presence oflocally defined functions can make things more interesting in OCaml.
For example, the following function creates and returns a function for raising to the power of
twice the given number:
256
# let f y =
let z = 2. *. y in
(fun x -> x ** z) ; ;
val f : float -> float -> float = <fun>
APPENDIX B. TROUBLESHOOTING
In this case, the variable z is used in the A-function result.
This can be a source of confusion and errors when nested functions use the same or similar
variable names. Slight mistakes can then result in the wrong variable being used in a given
context. As the typing of the program is likely to be correct, the compiler is unlikely to pick
up such mistakes.
B.3 Evaluation order
Unlike other languages, arguments are evaluated in an unspecified order in OCaml. In the
context of purely functional programs this makes no difference as expressions are independent.
In the context of imperative programs, this can affect the result. Consequently, programmers
must strive to avoid the temptation of assuming that arguments are evaluated in any particular
order.
For example, the following mutable variable can be used to determine evaluation order:
# let x = ref 1;;
val x : int ref = { contents = 0 }
The following function doubles its mutable argument:
# let double x = x : =2 * !x; ;
val x : int ref -> unit = <fun>
Evaluating the following expression shows that, in this case, OCaml has chosen to evaluate
the subexpressions in reverse order, incrementing x to 2, doubling it to 4 and storing it:
# (!x, double x, iner x);;
- : int * unit * unit = (4,0,0)
The order of evaluation of expressions may be guaranteed in a number of ways. In particular,
the let ... in construct guarantees to evaluate the new definition before proceeding. For
example, this extracts the current value of x (4) before doubling and incrementing it:
# let a = !x in let b = double x in (a, b, iner x);;
- : int * unit * unit = (4,0,0)
In the context of imperative programming, evaluation order is also guaranteed by compound
expressions formed by the; operator. For example, the following guarantees to double x
before incrementing it:
# double x;
iner x;
!x; ;
- : int = 19
In summary, programs should always be written in an evaluation-order independent manner.
Short-circuit evaluated operators (Le. && and I I) are notable exceptions as these operators
are guaranteed to evaluate arguments in order, only as necessary.
BA. CONSTRUCTOR ARGUMENTS
B.4 Constructor arguments
257
Multiple-argument variant-type constructors cannot have their arguments supplied in a tuple.
This confusion arises because the syntax used to supply the arguments of a variant constructor
looks like that of a tuple.
For example, in the context of the 2-argument constructor On:
# type button = Off I On of int * string;;
type button = Off I On of int * string
The following attempt to use the On constructor by supplying the two arguments as a 2-tuple
mine fails:
# let mine = (1, "mine");;
val mine: int * string = (1, "mine")
# On mine;;
The constructor On expects 2 argument(s),
but is here applied to 1 argument(s)
This problem is easily circumvented by using a function to map a tuple to the variant type,
such as:
# let button_of (i, s) = On (i, s);;
val button_of: int * string -> button = <fun>
# button_of mine;;
- : button = On (1, "mine")
Although the distinction between constructor arguments and tuples is comparatively uncom-
mon, it can be a source of confusion.
B.5 Recycled types
Type definitions are always treated uniquely by OCaml. Even if a type definition is identical
to a previous definition, the language guarantees to treat it separately.
For example, the following defines two types (mytypel and mytype2) and two values (a and
b) of these types:
# type mytype1 = On; ;
type mytype1 = On
# let a = On;;
val a : mytype1 = On
type mytype2 = On
# let b = On; ;
val b : mytype2 = On
The values are clearly incomparable because they are of different types.
258 APPENDIX B. TROUBLESHOOTING
# a =Q;;
This expression has type mytype2 but is here used with type mytypel
In this case, the error is entirely self-explanatory. However, the same situation arises even if
the types have the same name, in which case the error message is more confusing:
type mytype2 = On
# let c = On;;
val c : mytypel = On
# b =
This expression has type mytype2 but is here used with type mytype2
Future versions of the OCaml compilers may produce a more useful error message in such
cases but this is clearly worth remembering.
B.6 Mutable array contents
If an array is initialised using the Array. make function
1
with mutable contents then the
contents will be shared between all elements in the array. This is usually not the desired effect.
For example, the following creates an array containing 3 elements, all of which reference the
same integer:
# let a = Array.make 3 (ref 0);;
val a : int ref array = [I {contents = O}; {contents = O}; {contents = O} I]
Assigning from any elements then affects all elements:
# a.(O) := 7; a;;
- : int ref array = [I {contents = 7}; {contents = 7}; {contents = 7} I]
This problem is most easily solved by using the Array. ini t function to create the array,
specifying a function which returns a new reference at each invocation:
# let a =Array. init 3 (fun _ -> ref 0);;
val a : int ref array = [I {contents = O}; {contents = O}; {contents =O} I]
# a.(O) := 7; a;;
- : int ref array = [I {contents = 7}; {contents = O}; {contents = O} I]
The A-function creates a new reference to zero for each array element.
lOr, equivalently, the deprecated Array. create function.
B.7. POLYMORPHIC PROBLEMS
B.7 Polymorphic problems
259
As we have seen, the built-in polymorphic functions (e.g. =) can be inappropriately applied
to:
Data structures containing functions.
Values of abstract types.
Also, note that this set of polymorphic functions includes compare and hash:
# compare;;
-: 'a->'a->int=<fun>
# Hashtbl.hash;;
- : 'a -> int = <fun>
If applied to data structures containing function values, these polymorphic functions are likely
to raise the Invalid_argument exception:
# let a = (1, fun i -> 1) and b = (1, fun i -> 2) ;;
val a : int * (, a -> int) = (1, <fun
val b : int * (, a -> int) = (1, <fun
# a =b;;
Exception: Invalid_argument "equal: functional value".
Inappropriate application of these polymorphic functions is likely to become a problem during
the development of a program, as the data structures used by a program evolve. For example, if
the development of a program results in a non-performance-critical portion of a whole program
becoming performance critical then subsequent optimisation is likely to involve the use of more
sophisticated data structures, such as replacing an association list with a map or hash table.
Any applications of built-in polymorphic functions to this data structure (e.g. comparing sets
of mappings by applying =) then become erroneous.
Such errors are currently difficult to track down. However, in theory, a tool may be written
which finds some of these errors automatically by searching for applications of these polymor-
phic functions to types which contains functions or abstract types.
B.8 Local and non-local variable definitions
Although similar in appearance, the let keyword is used to create two different constructs.
Specifically, non-nested (outermost) let = ;; constructs make new definitions in the
current namespace whereas nested let = in ... constructs make local definitions.
The difference between nested and non-nested definitions can sometimes be confusing. For
example, the following is valid OCaml code which defines a variable a:
260
# let a =
let b = 4 in
b * b;;
val a : int = 16
APPENDIX B. TROUBLESHOOTING
In contrast, the following tries to make a non-local definition for a within the nested expression
for b, which is invalid:
# let b = 4 in let a = b * b..Li..
Syntax error
Regardless, the latter code can appear when programmers are drunk or tired.

Ocaml For Scientists

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Ocaml For Scientists

Загружено:

Авторское право:

Доступные форматы

Flying Frog Consultancy Ltd.

7.3.3.5 Avoiding polymorphic numerical functions

Вам также может понравиться