Вы находитесь на странице: 1из 68

CompTools for Economics and Finance - Paolo Pellizzari

In progress: Feb 2020

Prologue
This course was conceived in 2013 and gradually developed to improve and refine the
knowledge of the R software and experiment some new teaching techniques, based on flipped
classes and automatic generation of hundreds of exams/exercises.

0 Intro
1. This course aims at providing:
(a) better understanding of theory, techniques, and problems encountered in your
math, stat and economics course.
(b) ability to “translate” a problem into workable R code, which gets a numerical
solution and provides insights on the relevant issues of the problem. Virtually
all the models and exercises of your previous, loosely quantitative, courses are
amenable to this “treatment”.
(c) practical knowledge of R programming environment. Basic programming skills
will be acquired together with ideas on how to “compute” everything that has a
formal structure (and can, say, be described in terms of functions, derivatives,
integrals, estimates, graphs, optimization, utility. . . )
2. Study material:
(a) Download the handouts by Emmanuel Paradis, http://cran.r-project.org/
doc/contrib/Paradis-rdebuts_en.pdf and the classic “Introduction to R” by
W. N. Venables, D. M. Smith and the R Core Team, http://cran.r-project.org/
doc/manuals/r-release/R-intro.pdf
We will not cover the two handouts in full but most of the material can be
studied and understood using parts of the two files.
(b) The “official” support page for the course is on Moodle, the learning platform of
Ca’ Foscari University. Goto moodle.unive.it, register as a Moodle user, search
the course under Studenti lauree e lauree magistrali / Area economica / Corsi di
laurea triennale / Economia e commercio / search for the string “computational”.
The direct link is https://moodle.unive.it/course/view.php?id=2610
(c) This document will expand, as time goes, in a full-fledged set of notes.
I have included a few sections where I present things that smart students have
taught me during the years. The material is very useful and can improve your
understanding and final performance.

1
(d) Some of the lectures will be flipped. This means that you will have to study the
material before the lecture. Yes, you study theory at home and we’ll work on
practical session in class. The first flipped class is scheduled for February 5th
2020 and below there is a list of the urls with the video material.
i. Flipped lecture 3 (fundamental principles): http://www.youtube.com/watch?v=
l73G_bOFDvg.
ii. Flipped lecture 4 (graphics, low vs high level): http://www.youtube.com/
watch?v=U19K2zNVOvA
iii. Flipped lecture 6 (root-finding): http://youtu.be/r9N0izLKQB0 (part A)
and http://youtu.be/kIR4GC20n8M (part B)
iv. Flipped lecture 8 (optim and 2-dim optimization): http://youtu.be/b_r7u4IgOhY
(part A) and http://youtu.be/ViUy3BTuBwI (part B).
v. Flipped lecture 9 (constrOptim and linear constraints): http://youtu.be/
MCvz-c6UUkw
(e) Additional self-study material can be found at http://www.stat.berkeley.edu/
share/rvideos/R_Videos/R_Videos.html
(f) There is no other way to understand programming than do it yourself: practice,
practice, practice, practice. . .

3. Assessment will be focused on your practical ability to solve problems using R.


The exam will be held in the PC Lab at Palazzo Moro, approximately one hour
of time to answer 17 questions, which will require to know R functions, enter some
commands and code, interpret the results and so on.
Grades will tentatively be given giving 2 for each correct reply, -0.5 for a wrong reply
and 0 for “no-reply”, i.e., blank. A few questions are very basic and grading is starker:
+1 if right and -1 if wrong or blank.

Tentative schedule, see next page

1 I week
1.1 Install
Introduction, installation and overview: download R at http://cran.r-project.org.
You can also download Rstudio at http://www.rstudio.com/. Rstudio has a nice
graphical interface that let you see in the same windows the console, graphs, files and the
help page. Most students like Rstudio more than the basic R, which is believed to be too
spartan. However, take care: I don’t know whether it will be possible to run the exams
with Rstudio (but it should not be a big problem at all).

2
3/2 Intro and scientific notation
I week 4/2 R commands
5/2 FLIPPED: fundamental principles, used-defined functions
10/2 FLIPPED: graphics, high vs low level commands
II week 11/2 Image, persp, outer, contour
12/2 FLIPPED: root finding
17/2 Optimize 1-dim functions
III week 18/2 FLIPPED: optim for 2-dim functions
19/2 Revision and FLIPPED: linear constraints Ax ≤ b and constrOptim
24/2 State preference model
IV week 25/2 State preference model and examples
26/2 State preference model and examples
2/3 Sample spaces and simulation
V week 3/3 Simulation
4/3 Simulation examples and mock exam
(if everything goes smoothly no need for “recovery week”)
Table 1: Tentative schedule.

1.2 Scientific notation


Scientific notation is a convenient way to write numbers that are too big or too small to
allow for the standard numeric format. In Italy, the debt of the public sector is about
2380306 millions euro, i. e., 2380306000000 (Wikipedia on 02-02-2020, the data is relative
to 2018). First, it’s hard to understand the meaning of such a number. You can help
yourself rewriting it as
2, 380, 306, 000, 000
but I’m not sure this is really successful. Second, it would be useful to read the number,
as wording can help understanding. This is when scientific notation comes in. A number
can be always written as
a × 10b ,
where a is a real number called mantissa and the exponent b is a (positive or negative)
integer. a can be picked so that it absolute value is between 1 and 10 and I like to think
that b tells how many times the decimal points of the mantissa should be shifted to the
right (left) if b is positive (negative).
The Italian debt is (Italian http://www.nationaldebtclocks.org/debtclock/italy)

2, 380, 306, 000, 000 = 2.380306e × 1012 = 2.380306e + 12

where I have shown the so called e-notation. This means that once you take 2.380306
you should move the decimal points to the right 12 times to recover the debt in standard
notation. Try yourself. The scientific notation is helpful because it helps reading the
number. Recall that 1 billion is 1,000,000,000 = 1e9 (nine zeros after the 1) and 1 thousand

3
is 1,000 = 1e3. Hence, to reach the exponent 12, you multiply billions with thousands and
realize that the Italian debt is about 2 thousands 3 hundreds of billions (precisely, 2.380306
thousands of billions or 2.380306 trillions).
Let’s forget about debt, next example is less gloomy... I commute by train to reach
unive and it turns out1 that in EU-27 the rail fatality rate is 7.8125e-11 passenger-km.
This means that for each travelled km there are 7.8125e-11 deaths. Can you express such
a number in words? It’s about 8 hundredths of billionth (8 centesimi di miliardesimo). If
you want to write the number in the standard notation, you have

0.000000000078125,

obtained shifting the decimal point to the left 11 times (as b is -11). Even if I travel roughly
8000 km a year, I can survive the risk, literally!
More details are at http://en.wikipedia.org/wiki/Scientific_notation (also http:
//en.wikipedia.org/wiki/Names_of_large_numbers and http://en.wikipedia.org/wiki/
Names_of_small_numbers may be useful).
The exam will always (take note!) have one question on scientific notation. Please
observe that the e that you see in the scientific notation has nothing to do with the
eexponential function: again, it mean “10 raised to the following number” and tells you
you to shift (move) the decimal point.

1.3 Computing with R


1. Sections 1.7, 1.8, 1.9 and chapter 2 of R-intro.

2. Sections 2.2, 2.3 of RfB.

Precedence notation: R uses standard precedence rules when computing mathematical


expressions. In general:

1. Powers are computed first;

2. Multiplications and divisions next (in the order they appear);

3. Finally, sums and subtractions are performed.

Some operators, such as : to create sequences like in 1:4, have even higher precedence
and are executed before powers. A couple of examples will help.

> 1*2^3-4/5
[1] 7.2
1
http://pedestrianobservations.wordpress.com/2011/06/02/comparative-rail-safety/

4
The first to be executed is the power: 1*8; the minus is computed after the division by 5:
-0.8; we are left with 1*8-0.8 and multiplication is done before subtraction, 8-0.8. The
final result is 7.2.
Guess what’s computed typing exp(−x2 /2) when x = 1. The correct answer is 0.6065307,
as can be easily checked using R. However, this is an interesting experience: open Excel
and type in a cell the formula =exp(-1^2/2); press enter; you will get 1.6487213. This is
unfortunate, to say the least. Excel is probably the most commonly used software on earth
and yet it fails to “understand” standard mathematical precedence rules. Indeed, Excel
computes
exp(−12 /2) = exp(1/2) = 1.6487213,
the problem being in the fact that -1 is squared. Instead, you must square the one first
and only then consider the minus. This experience is the main reason why I do not want
to use Excel unless I’m brutally coerced.
Variables, regular sequences (seq), vectors, matrices, extraction. See chapter 5 in R-
intro and 3.4 in RfB.

1.4 Fundamental principles


There is a flipped lecture on fundamental principles: component-wise, recycling and the
fact that no or little feed-back and error signals are given by R. For user-defined functions,
see also 10.1 to 10.5 in R-intro.
The following code show how to define and plot a function and its derivatives.

> f <- function(x) x**3-3*x


> curve(f,-3,3)
> grid(col=1)
> df <- function(x,h=0.001) (f(x+h)-f(x-h))/2/h
> curve(df,add=T,lty=2)
> ddf <- function(x,h=0.001) (f(x+2*h)-2*f(x)+f(x-2*h))/4/h**2
> curve(ddf,add=T,lty=4)

5
15
10
5
f(x)

0
−5
−10
−15

−3 −2 −1 0 1 2 3

Example 1 Taken from http: // arxiv .org/ abs/ 1110 .1319 on the number of Facebook
users. The population is
KP0 ert
P (t) = .
K + P0 (ert − 1)

Estimates in the articles can be used to get the following graph with the point (2,0.850)
approximately showing the current time:

> logi <- function(t,K=1.11,r=1.40,P0=0.234)


+ K*P0*exp(r*t)/(K+P0*(exp(r*t)-1))
> curve(logi(x),-5,10,xlab="Time",ylab="FB Users",ylim=c(0,1.5))
> grid(col=1)
> points(2,0.85)

6
1.5
1.0
FB Users


0.5
0.0

−5 0 5 10

Time

Example 2 If you wish to define the two-dimensional function

g(x, y) = x2 + y 2 + xy + 3y − 1

> g <- function(x,y) x**2+y**2+x*y+3*y-1


> x <- seq(-3,3,length=51)
> y <- seq(-3,3,length=51)
> z <- outer(x,y,g)

The seq commands define the domain in the x and y space, respectively. outer computes
a matrix z of values of g, for each couple of x ∈ x and y ∈ y. Different graphs can be
drawn:

> image(x,y,z)

7
3
2
1
0
y

−1
−2
−3

−3 −2 −1 0 1 2 3

> contour(x,y,z);grid(col=1)
3

15 15 20 25 30

10
2

5
1

0
0
−1
−2

10
−3

15

−3 −2 −1 0 1 2 3

> image(x,y,z);contour(x,y,z,add=T)
> grid(col=1)

8
3

15 15 25 30
20
10
2
1

0
0
y

−1
−2

10
−3

15

−3 −2 −1 0 1 2 3

> persp(x,y,z,theta=-30,ticktype="detailed")

30

20
z

10

0
3
2
1
0 3
2
y

−1 1
0
−2 −1 x
−2
−3 −3

9
2 II week
2.1 High vs low level graphs
There is a flipped lecture on high and low level graphical commands, see chapter 12 in r-
intro, skipping sections 12.1.2, 12.1.3, 12.2.2, 12.3, from 12.5.4 onwards. See in particular
section 4.5 in RfB (do not cover the more advanced material of section 4.6).

2.2 Root-finding
It is important to compute solutions (aka roots) of equations. Formally, we want to solve
f (x) = 0,
where f is a univariate function and, as it is clearly seen, the right-hand side of the equation
is null. The following example will clarify issues and concepts. We wish to solve
ex/2 = x2 − x.

> curve(x**2-x,-2,3)
> curve(exp(x/2),-2,3,add=T)
> grid(col=1)
6
5
4
x^2 − x

3
2
1
0

−2 −1 0 1 2 3

10
The R command uniroot can be used to find one root at a time, provided that the
equation is of the form f (x) = 0 and a interval with a unique root is given as an argument.
This interval is called a bracketing interval, because it brackets a unique root. As the
previous equation has not the correct form (there is no 0 on the right-hand side), it must
be rewritten as
ex/2 − x2 + x = 0.
The bracketing intervals can be seen in the graph and, for instance, they are [-1,0] and
[2,3]. Alternatively, a plot of f (x) = ex/2 − x2 + x can be drawn to determine the intervals.
In each of the two intervals there is a unique root and uniroot can be used.

> f <- function(x) exp(x/2)-x^2+x


> x1 <- uniroot(f,c(-1,0))
> x2 <- uniroot(f,c(2,3))
> x1$root
[1] -0.5119985
> x2$root
[1] 2.381285

Exercise 1 Find the roots of f (x) = x3 − 3x [plot a graph to bracket the roots, then use
uniroot; otherwise polyroot]. See the solutions in the last pages of this handout.

Once you know how to solve equations, it is possible to find extremal points setting the
derivative to zero, f 0 (x) = 0 (it’s an equation, baby!).
The first problem is to define a derivative as R has no symbolic capabilities and cannot
differentiate in ordinary terms. However, the derivative is a function that can be well
approximated via difference quotients, see http://en.wikipedia.org/wiki/Derivative.
Consider the function f (x) = x3 − 3x.

> f <- function(x) x^3-3*x


> df <- function(x,h=0.001) (f(x+h)-f(x-h))/2/h
> curve(df(x),-3,3,main="Derivative of f(x)" )
> grid(col=1)
> e1 <- uniroot(df,c(-2,0))$root # first extremal point
> e1
[1] -0.9999996
> e2 <- uniroot(df,c(0,2))$root # second extremal point
> e2
[1] 0.9999996

11
Derivative of f(x)
25
20
15
df(x)

10
5
0

−3 −2 −1 0 1 2 3

In e1 and e2 the derivative vanishes. Check the second derivative to see whether they
are max o min.

> ddf <- function(x,h=0.001) (df(x+h)-df(x-h))/2/h


> curve(ddf(x),-3,3,main="Second derivative of f(x)")
> grid(col=1)
> flex <- uniroot(ddf,c(-1,1))$root # change in convexity/concavity
> ddf(e1) # this is negative: e1 is a max
[1] -5.999997
> ddf(e2) # this is positive: e2 is a min
[1] 5.999997

12
Second derivative of f(x)
15
10
5
ddf(x)

0
−5
−10
−15

−3 −2 −1 0 1 2 3

Exercise 2 Verify analytically that our results are correct, writing and using the hand-
computed first and second derivatives of f (x) = x3 − 3x.
Exercise 3 Use polyroot to find the roots of f (x) = x3 − 3x. Can you use the same
method for the function df?

3 III week
3.1 Maximization/minimization of functions
Generalities: objective function, decision variables, constraints. More often than not poor
decision making is resulting from unclear objective, partial knowledge of the variables
(the objects that you have responsibility over) and lack of understanding of constraints
(limitations, technical or conceptual) that one faces.
There is a fundamental difference between 1-dim and multidimensional optimization.
The problem
max f (x)
x∈R
is solved with optimize. In contrast, the problem
max f (x) = max f (x1 , x2 , . . . , xn )
x∈Rn x1 ,x2 ,...,xn

must be solved with optim.

13
Example 3 (1-dim.) Consider again f (x) = x3 − 3x to be minimized. Optimize needs a
function first and an interval as second argument.

> f <- function(x) x**3-3*x


> curve(f(x),-3,3);grid(col=1)
> r <- optimize(f,c(0,3))
> r
$minimum
[1] 0.9999889

$objective
[1] -2
15
10
5
f(x)

0
−5
−10
−15

−3 −2 −1 0 1 2 3

To maximize, use the option maximum=T or see help(optimize).

Example 4 (2-dim.) Consider the function g(x, y) = x2 + y 2 + xy + 3y − 1. This is the


standard way to define a function g two variables x and y and it’s the preferred way to
draw pictures (surface with persp, images or contours).

14
> g <- function(x,y) x**2+y**2+x*y+3*y-1
> x <- seq(-3,3,length=51)
> y <- seq(-3,3,length=51)
> zg <- outer(x,y,g)
> contour(x,y,zg)
> grid(col=1)
3

15 15 25 30
20

10
2
1

0
0
−1
−2
−3

15

10

−3 −2 −1 0 1 2 3

However, it’s important to realize that g can also be defined as

gb(x) = x21 + x22 + x1 x2 + 3x2 − 1,

where x = (x1 , x2 ). In other words, while g is formally a function of two variables, gb is a


function of one thing alone, x, whose two components are used in the computation. The
second way must be used in multi-dim optimization.

> gb <- function(x) x[1]**2+x[2]**2+x[1]*x[2]+3*x[2]-1


> res <- optim(c(1,1),gb)
> res
$par
[1] 1.000032 -1.999912

$value
[1] -4

15
$counts
function gradient
91 NA

$convergence
[1] 0

$message
NULL
Observe that gb is defined, as stated above, in terms of a unique vector x ∈ R2 . Also notice
that optim needs an initial guess first and a (vectorial) function second as arguments.
> gb <- function(x) g(x[1],x[2])
> res <- optim(c(1,1),gb)
> res
$par
[1] 1.000032 -1.999912

$value
[1] -4

$counts
function gradient
91 NA

$convergence
[1] 0

$message
NULL
This alternative definition of gb exploits the fact that often we already have defined g and
we could avoid a good deal of typing.
Please observe that our candidate minimizer c(1,1) in optim is a point in R2 , it is not
an interval as many risk-loving students keep repeating...

3.2 Systems of equations


You may wish to take a more challenging path to find extrema of multidimensional func-
tions, solving for the critical points that zero the partial derivatives.
(
gx0 (x, y) = 0
gy0 (x, y) = 0

16
Solving equations in several variables is harder (to be formally correct, “solving nonlinear
systems of equations in several variables”) and you must use some ingenuity.
> gx <- function(x,y,h=0.001) (g(x+h,y)-g(x-h,y))/(2*h)
> gy <- function(x,y,h=0.001) (g(x,y+h)-g(x,y-h))/(2*h)
> zgx <- outer(x,y,gx)
> zgy <- outer(x,y,gy)
> contour(x,y,zgx,level=0)
> contour(x,y,zgy,level=0,lty=4,add=T)
> grid(col=1)
3

0
2
1
0
−1
−2

0
−3

−3 −2 −1 0 1 2 3

The graph shows the intersections of the 0-level curves for the x- and y-partial deriva-
tives. Along each curve, one partial derivative vanishes and in the intersection they are
both null. The point is approximately (1,-2).
We can solve the same system also morphing the problem in an optimization. Consider
the minimization
min g 0 (x, y)2 + gy0 (x, y)2
x,y

The objective cannot be smaller than 0 and, when both the partial derivatives vanish, the
sum of squared zeros is zero and therefore the minimum is attained. Cheap advice: this is
really a kind of magic! Solving systems is difficult and it turns out that it’s much easier to
solve this minimization problem. Sometimes in life, you have a difficult problem. . . try to
reframe it in totally different terms. You may more easily find a solution.

17
> gbx <- function(x,...) gx(x[1],x[2])
> gby <- function(x,...) gy(x[1],x[2])
> sumsqu <- function(x) gbx(x)**2+gby(x)**2
> res <- optim(c(1,1),sumsqu)
> res
$par
[1] 1.000233 -2.000196

$value
[1] 9.782539e-08

$counts
function gradient
89 NA

$convergence
[1] 0

$message
NULL

You should be cautious when using such a trick. Indeed you can be confident in a
solution of the system only if the objective sumsqu at the solution is null (as in the previous
case).

Exercise 4 (easy) Can you analytically prove that (1,-2) is a minimizer? Solve by hand
and using R the resulting linear system.

Exercise 5 (funny) Find local extrema for the cubic function g(x, y) = x3 + y 2 + xy +
3y − 1.

Exercise 6 (challenging) Does the function g(x, y) = x3 + 2y 2 x + y − 1 have local ex-


trema? Why does the sumsqu problem provides a solution? Has the function global max o
min?

Hint: you can solve, graphically or manually, the first-order conditions, zeroing the deriva-
tives.

3.3 Integrals
Use integrate.

18
Exercise 7 Define and sketch the graph of
x2 − 1
f (x) = .
x2 + 1
Then solve the equation in x Z x
f (t) dt = 2.
1

Hint: define an auxiliary function with integrate and $value, then use uniroot.

4 IV week
4.1 Constrained optimization
There are two approaches: penalty methods and constrOptim.
Assume you want to maximize a function over some set D, like in
max g(x)
x∈D

As you do not want to obtain solutions outside D, you penalize points that do not belong
to D and solve the free problem
max g(x) − λ(“penalty if x ∈
/ D”).
x

The idea is that you get some positive penalty and, hence, worse values for unfeasibility.
Implicitly, we assume there is zero penalty in the domain D. This is very convenient as it
allows to turn a constrained problem, possibly with a complex D, in a unconstrained (free)
optimization problem that can be tackled by optim.
More formally, define D in terms of q constraints by a function h : Rn → Rq :
D = {x : h(x) ≥ 0}
and let (
0 h(x) ≥ 0;
I(x) =
1 otherwise.
In other words, I(x) is an indicator function that takes the value 0 if x ∈ D but flags with
a 1 the points x ∈
/ D (that will be penalized). The problem we solve is finally
max g(x) − λI(x),
x

where λ >> 0 is a parameter to suitably magnify the penalty (signalled by I(x) = 1). We
now solve the problem
max x2 + y 2 + xy + 3y − 1,
x,y

under the three constraints x ≥ 0, y ≥ 0, x + y ≤ 4.

19
> g
function(x,y) x**2+y**2+x*y+3*y-1
<bytecode: 0x1034e0c00>
> gb
function(x) g(x[1],x[2])
<bytecode: 0x112220d78>
> Ib <- function(x) x[1]<0 | x[2]<0 | (x[1]+x[2]>4)
> gbpen <- function(x,lambda=100) gb(x)-lambda*Ib(x)
> res <- optim(c(1,1),gbpen,control=list(fnscale=-1))
> res
$par
[1] 3.54465e-09 4.00000e+00

$value
[1] 27

$counts
function gradient
233 NA

$convergence
[1] 0

$message
NULL

The solution is the border point (0,4). Remember to start the optimization from a
feasible point. It is useful to visualize how penalties work and, in order to do so, we build
a graph in the standard way, defining x and y.

> I <- function(x,y) x<0 | y<0 |(x+y>4)


> gpen <- function(x,y,lambda=1000) g(x,y)-lambda*I(x,y)
> x <- seq(-5,5,len=51)
> y <- seq(-5,5,len=51)
> zgpen <- outer(x,y,gpen,lambda=100)
> persp(x,y,zgpen,theta=-30,ticktype="detailed")

20
0
zgpen

−50

−100

4
2
0 4
2
y

−2 0
−2 x
−4
−4

It is interesting to see a heat-map with contours:

> image(x,y,zgpen)
> contour(x,y,zgpen,add=T)
> grid(col=1)

21
−60 −2
0
−4 −3
0 0
−5
4

0
20

−3
0
−1
0 −8
0
2

10
−2
0
−4
0
−5
0
0
0
y

−100 −90 −80 −70 −60


−2
−4

−7
0
−5

−6
0

−4 −2 0 2 4

Another method to solve problems when constraints are linear uses constrOptim:

> a <- matrix(c(1,0,-1,0,1,-1),3,2)


> b <- c(0,0,-4)
> res <- constrOptim(c(1,1),gb,NULL,a,b,control=list(fnscale=-1))
> res
$par
[1] 3.213127e-08 4.000000e+00

$value
[1] 27

$counts
function gradient
294 NA

$convergence
[1] 0

$message
NULL

22
$outer.iterations
[1] 3

$barrier.value
[1] 0.0005545176

Exercise 8 The previous computations solve the same problem solved before.
Read help(constrOptim) and explain in detail the meaning of a, b and all the other used
parameters.

Exercise 9 Solve the problem

max x2 + y 2 + xy + 3y − 1,
x,y

under the three constraints x ≥ 0, y ≥ 0, x + y ≤ 4, drawing the contour lines of the


objective function and the domain D, i.e., three lines. Then argue looking at the graph that
the curve with highest level touches the set D. . .

Exercise 10 Maximize/minimize the usual x2 +y 2 +xy+3y−1 inside the circle x2 +y 2 ≤ 4.


Is the maximum higher than 10? [it can be done in many ways: directly, with a penalty
approach; graphically, even though it may be difficult to say whether it’s larger or smaller
than 10; switch to polar coordinates so that the domain is rectangular and use L-BFGS-B,
which allows bounds; study the restriction on the circle as a function of the angle θ]

A consumer gets utility p


u(x, y) = 2x(y − 2)
from the consumption of x units of beef and y units of bread. Assume that one unit of
beef and one unit of bread both cost p = 1 and that consumer’s income is 4.

Exercise 11 Sketch the domain of u(x, y).

Exercise 12 Draw a heat-map of u, as well as its contours and a surface.

Exercise 13 On the contour plot, draw the domain of the utility maximization problem
under budget constraints.

Exercise 14 Solve the problem. This can be done graphically and optimizing with penal-
ties. Can the problem be solved with constrOptim?

Exercise 15 Argue that utility is increasing in any north-east direction. Hence, the maxi-
mum will be on the x+y = 4 constraint. Define a function of one variable alone, embedding
the constraint into the function (it’s also called a restriction). Plot this function on one
variable in the right domain. Solve the 1-dim optimization problem. Does this procedure
gives you the same results as the previous ones?

23
4.2 State preference model
What you know about systems of linear equations can be used to understand a beautiful
model of a financial market, in which assets are traded in two periods of time. The model,
known as“State preference model”or“Asset pricing model”, is a powerful tool to understand
important ideas related to portfolios, replication of assets, risk-hedging and arbitrage.
If you are interested in the topic you should read http://economia.unipr.it/DOCENTI/
DEDONNO/docs/files/CastBrevissima.pdf. The text is sparkling, insightful and written
in Italian (a good reason to learn it!)
The model is as follows:
1. There are two periods, today and tomorrow, corresponding to t = 0 and t = 1. You
can buy or (short-)sell assets in t = 0 and will get the payoff in t = 1.
2. There is uncertainty over the future and m possible states can materialize tomorrow.
We often think that nature will pick the future state, as opposed to agents that do
not have the knowledge to predict what lies ahead. Hence, depending on the state of
nature at t = 1, the assets will produce different payoffs.
3. There are n available assets. Each of them is a vector of future payoffs y (at t = 1),
which can be purchased/sold at t = 0 for a price π. As there are n assets, it is
convenient to describe this market using a payoff matrix collecting all the column
vectors y1 , y2 , ..., yn :
 .   .   . 
.. .. ..
    ..  
Y = y1  y2  . yn  ,
.. .. ..
. . .
with the agreement that π1 is the cost of v1 , π2 is the cost of y2 and so on.
The description of the market is over and an example will help. Assume that there are
m = 3 possible states of nature tomorrow and n = 3 assets:
 
Bankaccount Stock Riskystock
 Good 104 108 112 
Y =  Fair

104 102 106 
Gloomy 104 100 93
Assume that all assets cost 100, i.e., π1 = π2 = π3 = 100. Rows are relative to future
states and you can think, say, that the first row is relative to the payoffs that you will get
if tomorrow the economy will be good. In this case, the payoffs are high and you will cash
104 if you are the holder of the first asset, 108 if you are the the holder of the second asset
and so on. Clearly, if the gloomy state is the prevailing one tomorrow, the owner of the
third asset will get only 93: taking into account that it was paid 100, it’s a bad loss!
Columns should be thought as assets. The first column y1 always pay 104 and, hence,
it’s like a (safe) bank account that pays the same amount regardless of the state. The other

24
two columns have fluctuating payoffs and behave like stocks that depend on the future state
of the economy. The third column is dubbed “risky stock” because payoffs are much more
variable than the second column.

Portfolio: a vector x = (x1 , x2 , ..., xn )0 ∈ Rn of the quantities held of each asset. In the
financial jargon, we refer sometimes to x as “the weights of the portfolio”. The payoff
of a portfolio x is Y x, as this product is exactly a weighted sum of the columns of
A, i.e., a weighted sum of the assets.
The cost of a portfolio is π1 x1 + π2 x2 + ... + πn xn , which can also be computed as the
row times column product π 0 x.

Replication: we say that a vector b can be replicated of there is a portfolio that has the
same payoffs. In other words, b can be replicated if there is x such that Ax = b.

Arbitrage: this is a situation in which there is a portfolio x with null payoff and non-null
price.2
An arbitrage is a free-lunch: you sell for a positive price something that always pays
zero and, hence, should cost zero. It’s a no-loss gamble that can be sold for a profit
but will produce no payoff for the buyer.

Consider the financial market with Y and p described above: the portfolio x0 =
(1/3, 1/3, 1/3) has a future payoff of
  1  
104 108 112 3
108
Y x = 104 102 106  13  = 104 .
1
104 100 93 3
99

The cost of the portfolio is


1 1 1
π1 x1 + π2 x2 + π3 x3 = 100 + 100 + 100 = 100.
3 3 3
So, if you pay 100 for the portfolio that is mixing in equal proportions the original three
assets, you can get (108, 104, 99)0 . Let’s notice in passing the power of diversification:
mixing the assets has reduced the maximum losses (and the gains) in every state of the
world.
Can the asset b = (108, 104, 99)0 be replicated? Yes, we have just seen that there is
a portfolio, namely x = (1/3, 1/3, 1/3)0 such that Ax = b and the cost of the replicating
portfolio is 100.
Can the asset b = (112, 106, 104)0 be replicated? We look for the solution x of the
system Y x = b:

2
Arbitrage is a serious topic and, here, I’m just scratching the surface. We only cover one special type
of arbitrage and much more can (and should) be said on this matter, see http://economia.unipr.it/
DOCENTI/FAVERO/docs/files/CastBreve.pdf.

25
> Y <- matrix(c(104,104,104,108,102,100,112,106,92),3,3)
> b <- c(112,106,104)
> x <- solve(Y,b)
> x
[1] 0.03846154 1.00000000 0.00000000

The answer is positive: b can be replicated with x1 ≈ 0.038 of the first asset (putting a
small amount in the bank account) and x2 = 1 of the second asset. The cost of replication
(i.e., the cost of the replicating portfolio) is

> sum(c(100,100,100)*x)
[1] 103.8462

This is an interesting case as b keeps the better payoffs coming from the risky stock
but avoids the heavy losses that would be incurred if the risky stock is bought. Don’t you
think this is nice? Mmhh... where is the trick? Well, on the one hand there is no trick:
this can be done just solving linear systems with R. On the second hand, however, there
are other two equally interesting ways to interpret what happened here.

1. I pay 103.8462 to get b instead of 100 to buy the risky stock. Therefore, the additional
amount 3.8462 is the cost to insure my position in the bad state of the world: I have
paid 3.8462 to reduce my risk in one specific future occurrence.

2. The portfolio x blends a stock with a bond. Often investors reduces the variability of
their cashflows using the (certain) proceeds to compensate for poor results of other
investments in some states. Here, if the future state is “gloomy” then the stock is
making a null profit (it was paid 100, it gives 100) but the 0.03846154 units in the
bank account still pays 4% so that 0.03846154 · 104 = 4 are gained and “compensate”
for the outcome of the stock.

Food for thought: can you replicate any b? Hint: can you solve the system Y x = b
for any b, given this specific Y ? Can you imagine why such a market is called complete?
Next, we’ll discuss an arbitrage arising in a slightly modified market. Let Y be defined
as  
104 108 112
Y x = 104 102 106 ,
104 100 104
and let the prices of the asset be p = (100, 100, 104). Observe that the third column of
Y is the asset b = (112, 106, 104)0 that we have discussed a few lines ago and recall that
b can be replicated using the first and second columns (assets) at a cost of about 103.85.
Now, something interesting goes on: the third asset can be sold on the market at a price
of 104 but the same replicated asset can be obtained (mixing the first two columns) at
the cost 103.85. Hence, a clever trader may start a money pump, creating replicates of b

26
for 103.85 each and selling them immediately for 104. For each round, he will make the
difference 104-103.85=0.15, regardless of the future state.
We now check the definition stated before: there should be a portfolio x with null
payoff vector and strictly positive price. Consider x = (−0.03846154, −1, 1), which should
be easily interpreted: 0.038 units of the first asset and 1 unit of the second are bought,
whereas 1 unit of the third asset (b, really) are sold. The payoff Y x and the cost of portfolio
x are given by:
> Y <- matrix(c(104,104,104,108,102,100,112,106,104),3,3)
> x <- c(-0.03846154,-1,1)
> pai <- c(100,100,104)
> Y %*% x
[,1]
[1,] -1.6e-07
[2,] -1.6e-07
[3,] -1.6e-07
> sum(pai*x)
[1] 0.153846

The trader cashes 0.15 e replicating and selling the asset and his profits can be inflated
doing the same operation many times (i.e., trading large volumes). It is likely that once
an arbitrage is spotted, traders will soon wipe it, as large volumes are affecting the prices
and, in particular, the cost of the third asset will tend to go back 103.846, closing the
opportunity for gains at no risk.
This final example illustrates how to deal with singular systems, which is “problematic”
in R... and in life! Let  
105 120 110
105 110 102
Y =105 90 98  ,

105 80 90
and assume all the prices are 100, π = (100, 100, 100)0 . Observe that we have m = 4 states
of the world or rows and n = 3 assets or columns. Can the vector b = (119, 119, 95, 95)0
be replicated? Well, we just have to check whether there is a vector x such that Y x = b.
This is what you get trying to solve the problem with solve(Y,c(119,119,95,95)):
Error in solve.default(Y, c(119, 199, 95, 95), 4, 3):
singular matrix a in solve
The problem lies in the fact that solve can be used to solve square systems with an
invertible matrix of coefficients. Here, we have a 4 × 3 matrix, which is singular as finding
an inverse is impossible.
However, there is a way out. First, you may preliminary try to figure out if the system
has a solution using the Rouché-Capelli theorem and checking the ranks of Y and Y|b:

27
> Y <- matrix(c(105, 120, 110,
+ 105, 110, 102,
+ 105, 90, 98,
+ 105, 80, 90),4,3,byrow=T)
> b <- c(119,119,95,95)
> qr(Y)$rank
[1] 3
> qr(cbind(Y,b))$rank
[1] 3

In this case, the ranks of the complete and incomplete matrices are equal and hence
the system has ∞n−r = ∞3−3 = 1 solution. This solution can be worked out using the
generalized inverse of Y , computed using the function ginv of the package MASS.3

> require(MASS)
> giY <- ginv(Y) # compute the generalized inverse
> sol <- giY %*% b # multiply the gen inverse with b
> sol
[,1]
[1,] 1.4
[2,] 1.6
[3,] -2.0

We are ready to answer: yes, indeed (119, 119, 95, 95)0 can be replicated by the portfolio
x = (1.4, 1.6, −2.0)0 : buying 1.4 units of the first asset, 1.6 units of the second asset and
(short-)selling 2 units of the third asset.
So far, so good... but be aware! You can always compute the generalized inverse and
you can always multiply it with the b, getting a vector. This result, however, is a solution
of the original system, as in the previous case, only if the system has solutions; otherwise
the result is not a solution.
So, for safety, it is absolutely necessary to check whether what you have found is indeed
a solution:

> Y %*% sol


[,1]
[1,] 119
[2,] 119
[3,] 95
[4,] 95
3
When you solve a system, say Ax = b and A is invertible, the solution can be obtained by x = A−1 b.
When the inverse A−1 does not exist, as in this case, you can instead use the generalized inverse. In a way,
if you know that there are solutions, you can always find one multiplying A+ b, where A+ is the inverse (if
it exists) or the generalized inverse (if the inverse doe not exist).

28
> b # the two are equal: sol is indeed a solution
[1] 119 119 95 95

To show that things can be different, consider the vector b = (119, 118, 117, 115)0 : can
it be replicated?
> b <- c(119,118,117,115)
> sol <- giY %*% b
> sol
[,1]
[1,] 0.94206349
[2,] 0.01666667
[3,] 0.16666667
> Y %*% sol
[,1]
[1,] 119.25
[2,] 117.75
[3,] 116.75
[4,] 115.25

We have used the giY matrix, which was computed before, and easily obtained sol.
You may think that sol replicates b but it a look at our check shows that this is false,
false, false, false. . .
To sum up:
1. take extra care with singular and rectangular systems.

2. find a candidate solution using ginv of MASS.

3. always check your result.

4. attend a course in linear algebra to understand what’s really going on here!

Exercise 16 With the previous 4×3 matrix Y and prices, can the vector b = (119, 118, 117, 116)0
be replicated? If so, how much does it cost if no arbitrage is assumed?

4.3 On ranks, replication and rounding errors


I’m indebted to Elena “Cooper Supertramp” who first pointed out this issue.

Consider the following two assets (104, 104, 104)0 and (101, 116, 103)0 whose prices are
93.60 and 95.85, respectively. In such a market, how much would cost an asset whose
payoff is (102.71, 109.14, 103.57)0 ?
To see whether the third asset (column) is replicable, we check the ranks:

29
> Y <- matrix(c(104,104,104,101,116,103),3,2)
> Yb <- cbind(Y,c(102.71,109.14,103.57))
> qr(Y)$rank
[1] 2
> qr(Yb)$rank
[1] 3

Ranks are different and b is not replicable.


Hey, wait a minute! I (paolop) know how b was built because I wrote the program
which created the exercise. The vector b was 4/7 times the first asset + 3/7 times the
second. In other words, this is b:
> 4/7*Y[,1]+3/7*Y[,2]
[1] 102.7143 109.1429 103.5714

As you see, the only difference between this vector and the b used before is that we rounded
to the second decimal digit. So far, so good. But let me know call bright the right b and
show that bright is indeed replicable:
> bright <- 4/7*Y[,1]+3/7*Y[,2]
> Ybright <- cbind(Y,bright)
> qr(Ybright)$rank # this is 2, as r(Y)
[1] 2

Hence, bright can be replicated but b cannot. And the reason is only due to rounding!
But there is something more going on. You see, you do not want rounding to stress your
life: rounding should be a good thing, allowing to forget lots of irrelevant figures (cifre)
and you do not want it to interfere with replication and arbitrage, which are very relevant
practical issues. There are two ways to understand this.
1. Ranks are delicate objects and are often computed looking at determinants. If the
rank of Y is 2, then there is a 2 by 2 matrix extracted from Y whose determinat is
non-zero and it is not possible to find a 3 by 3 matrix with non-null determinant; if
the rank of Yb is 3, then there is a 3 by 3 matrix extracted from Yb whose determinat
is non-zero. However, the determinant of Yb (the rounded version) is

> det(Yb)
[1] 4.16

which is relatively close to 0, if you compare 4.16 to the sizes of the elements of the
matrix that exceed 100. R decided that this was not zero (and, consequently, rank
is 3) but I showed you that this is only due to the effects of rounding to the second
decimal digit.

30
2. The second explanation looks more carefully at the way the rank is computed:

> qr(Yb)
$qr
[,1] [,2] [,3]
[1,] -180.1332840 -184.7520861 -1.821078e+02
[2,] 0.5773503 -11.5181017 -4.936577e+00
[3,] 0.5773503 -0.1382626 2.005019e-03

$rank
[1] 3

$qraux
[1] 1.577350269 1.990395604 0.002005019

$pivot
[1] 1 2 3

attr(,"class")
[1] "qr"

Notice that $qraux shows 3 numbers, two of which are clearly different from zero.
The third, well, is different from zero but it is small. The rank is exactly counting
how many of the elements of $qraux are non-zero. If you see 3 non-zero numbers,
that’s ok; but you should be aware that the 0.002005019 is telling you that the matrix
Yb is not “far” from being of rank 2.
Therefore, b is almost replicable, as can be seen using the generalized inverse:

> require(MASS)
> sol <- ginv(Y) %*% c(102.71,109.14,103.57)
> sol
[,1]
[1,] 0.571379
[2,] 0.428593
> Y %*% sol
[,1]
[1,] 102.7113
[2,] 109.1402
[3,] 103.5685

Indeed, the previous vector is extremely close to b

31
Take home message: arbitrages are ways to make money in which you aggressively
exploit a mispricing; if the mispricing is 0, then no arbitrage; but if you see a mispricing
that is close to zero, is that a true mispricing or just due to rounding? Well, you have to
check things very carefully as computers have limited accuracy (all computers, not only
mine or yours) and exploiting a tiny mispricing requires to trade huge quantities as you
cash a tiny amount for each round.
Whether or not you want to engage in massive trading to get 1/1000 of euro per unit is
also related to practical considerations (transaction costs and taxes, say, could wipe your
profits), However, ranks (and $qraux!) are here to tell you that linear algebra is important,
rounding exists and extra care and wisdom is needed when checking if something is zero
on a computer.

5 V week
Broadly speaking, this Section deals with simulation, a useful tool to solve or shed light
on complex problems. Simulation is based on the idea that you can run experiments to
gain insight and understand, to some extent, what’s going on. This may be the first step
to build a model or the only way to approximately solve a problem. In a way, simulation
may be thought as an example of the scientific method in action: given a “problem”, many
experiments may be run to figure out what a solution may look like. Galileo did that using
a ball and a plane, we can do the same performing experiments with the PC and R.
Experiments produce results that may be diverse and are affected by noise. This ran-
domness must be generated on a computer drawing what we call random numbers. Typi-
cally, simulation aims at answering two kind of questions:

• How likely is that my outcome is acceptable? Given a definition of“acceptability”, this


question is tackled running many experiments and recording whether the experiment
“is successful”. Hence, after N experiments are run, you’ll have a sequence of N
1/0, standing for acceptable or unacceptable. Often you store the whole history of
experiments in a vector, say res for results, filled with 1 and 0. The reply to the
question, i.e., the probability we look for, is then just the number of 1’s in the vector
over the number of experiments N , or sum(res)/N.

• What’s the average outcome of my experiment? Here, we do not ask about the
(rate of) success of an experiment but require a numeric summary4 of the range of
possible outcomes of the experiments. Again, experiments are run and, this time,
the outcomes are recorded (as opposed to 1 for “success” and 0 for “failure”). Ul-
timately, you’ll have a vector res whose components record the numeric outcomes.
The aggregate measure we look for is now mean(res)
4
Such a summary of many results into one or two numerals is an art as well as a science. Here we have
decided that the summary we are looking for is the average, barely scratching the surface of the fascinating
methods and tools that can be used to analyze randomness, risk, uncertainty, dispersion, cost of making
the wrong decisions and so forth.

32
The following sections briefly describe the notions that are needed to answer to one of
the two kinds of questions we have just mentioned.

5.1 Drawing random numbers


R has a vast number of functions that can be used to draw different random numbers. A
working definition of random number for this course is

The numerical (or alpha-numerical) result of a random experiment, such as an


ideal dice roll.

What I like of the previous “definition” is that it doesn’t really say much and indeed it
fosters further thinking: even you accept the idea of a numerical results, everything depends
on the specific experiment or experiments that are run. If you flip a coin you can have head
or tail (0 or 1) with 50-50% probability; if you roll two dice you can get 2, 3, ..., 12 with
different frequencies; if you count the fraction of students that correctly solved exercise 1
in an exam, you get numbers in the interval [0, 1], like 0.15 or 0.87 (15 or 87% of correct
answers to the first exercise, depending on the preparation of the class).
Hence, it should be clear that a random number, coming from whatever experiment,
needs to be described in terms of what can be observed and how likely is every possible
outcome. Without being fussy, I call distribution such a description.5
As mentioned before, R offers many way to draw random numbers from several distri-
butions (i.e., you can run several typical and often useful experiments and R tells you the
outcome). The functions have a common structure rname(n,parameters), where name
recall the distribution you are drawing from, n is the number of experiments that are run
and parameters provide additional details (on the experiment, if you wish).

• runif, R(andom)UNIF(orm) samples uniformly from an interval of values. This


means that any number in the interval is as likely as any other (uniform) to be
drawn

> runif(3) # produces 3 random numbers (in [0,1], by default)


[1] 0.3279207 0.9545036 0.8895393
> runif(2,min=-3,max=3) # 2 random numbers in [-3,3]
[1] 1.1568204 0.8430409
> curve(dunif(x,-3,3),-4,4)
5
You’ll attend whole courses devoted to statistics and probability. On an entirely different note, there
is much to say about the possibility to create pseudo-random numbers using a computer, which is a pro-
grammable machine where nothing is left to chance, see http://plato.stanford.edu/entries/chance-
randomness/ or http://www.bbc.co.uk/programmes/b00x9xjb

33
0.15
0.10
dunif(x, −3, 3)

0.05
0.00

−4 −2 0 2 4

The picture shows the density, loosely speaking how likely it is for a number to
be drawn. The graph shows that all numbers in [−3, 3] have the same (constant)
likelihood and, hence, are treated equally in the sampling process. At the same time,
observe that there is no chance to obtain a number outside [-3,3], being null the
density.

• rnorm, R(andom)NORM(al) samples random numbers giving more weight to “cen-


tral” outcomes.

> rnorm(5) # 5 random numbers


[1] 2.5283366 0.5490967 0.2382129 -1.0488931 1.2947633
> rnorm(5,mean=3) # 5 random numbers, more likely to be around 3
[1] 3.825540 2.944314 2.215618 2.266497 2.784135
> curve(dnorm(x,3),0,6)

34
0.4
0.3
dnorm(x, 3)

0.2
0.1
0.0

0 1 2 3 4 5 6

The picture of the normal density shows that the likelihood of sampling around 3
(the mean) is higher than elsewhere.
The normal density is possibly the most important distribution and there are funda-
mental reasons to use it in many models of random phenomena (indeed, many kinds
of experiments produce results where outcomes are concentrated around some mean,
with a likelihood that is nicely spread in a bell-shaped fashion. Students be aware,
there is a universe behind my words... but I’m intrepid enough to attempt to give
you a simple description!)

• There are plenty of other distributions: rbeta, rlnorm, rt, rlogi... draw from a
Beta, lognormal, Student’s t and logistic distributions, respectively.

5.2 How to simulate an event with probability p?


The uniform distribution and the runif function can be used to simulate an event with a
given probability. Reflect on the sentence: “The event A has 40% probability to happen”.
You may agree that this means that attempting to generate A many times will be successful
40% of times, for a large number of attempts.6
6
If A is “Paolo prepares a good apple pie”, that’s exactly the situation: I can expect 4 out of 10 pies to
be perfect (or 40 out of 100...); the other pies have all some defect or another. Don’t worry too much: my
probability to prepare stunning tomato-and-basil spaghetti is close to 100%!

35
The following R lines produces 100 random uniform numbers in [0, 1] (that’s the default
for runif). How many of them do you expect to be smaller than 0.4?
> rn <- runif(100)
> rn[1:5] #show the first 5 out of 100
[1] 0.3688455 0.1524447 0.1388061 0.2330341 0.4659625
> (rn<0.4)[1:5] #show the first five
[1] TRUE TRUE TRUE TRUE FALSE
> sum(rn<0.4)
[1] 40

Well, runif uniformly samples numbers in [0, 1], without any bias in favor of any
1
specific number. Hence, falling in every one tenth of the unit interval has probability 10
and falling in [0, 0.4] is equivalent to fall in any one of the 4 parts [0, 0.1], [0.1, 0.2], [0.2, 0.3]
4
and [0.3, 0.4]. We conclude that the probability to fall in [0, 0.4] is 10 = 40% = 0.4.
This is not a coincidence and the argument can be easily adjusted to prove that
A uniform number in [0, 1] being smaller than p is an event occurring with
probability p.
In other words, if you want to generate an event whose probability is 0.65, type:
> runif(1)<0.65
[1] TRUE

You’ll get a true with probability 65%=0.65. For a confirmation, try the experiment,
say, 10000 times.
> experiments <- runif(10000)<0.65
> sum(experiments)
[1] 6548

We should get 0.65, shouldn’t we? And indeed 0.6548 is obtained, a result that is quite
close to the target 0.65.

5.3 sample
An insanely powerful R function is named sample. The functions does what it promises, it
samples from a set of values, with given probabilities, with or without replacement. The
syntax is
sample(x,size,replace=F,prob=NULL),
meaning that size elements are drawn from the set x. If the default options are left
unchanged, sampling is done without replacement and probabilities are uniform, i.e., all
elements of x have the same chances to be selected. A few examples will clarify the idea:

36
• sample(c(0,1),10) will give an error as it’s impossible to sample without replace-
ment 10 items from the set x containing only 2 elements.

• > # sample 10 w replacement in {0,1,5}


> sample(c(0,1,5),10,replace=T)
[1] 1 1 5 1 0 0 1 0 5 5

The elements of x are drawn with the same probability, as prob is not specified and
this is the default behavior.

• This is a special case mentioned in the documentation, x is a positive integer.

> sample(8)
[1] 7 3 4 1 6 8 2 5

Here (by default, it’s a convention) you sample 8 elements from the sequence of 8
numbers 1:8, with no replacement and uniform probability. It’s the R way to give a
random permutation of the first eight positive integers. In other and simpler words,
this produces a random shuffling of the eight numbers.

• You can also use sample to generate an event with probability p, see the previous
subsection.

> # draws either 1 or 0, with prob 0.4 and 0.6


> sample(c(1,0),1,prob=c(0.4,0.6))
[1] 1
> # 10 experiments, success prob is 0.4 (40%)
> sample(c(0,1),10,T,prob=c(0.4,0.6))
[1] 0 1 1 1 1 0 0 1 1 1

We can repeat the 10000 experiments run in the previous subsection using sample:

> experiments <- sample(c(1,0),10000,rep=T,prob=c(0.65,0.35))


> sum(experiments)
[1] 6562

• sample lets you specify the experiment in that both the outcomes and the probability
can be provided:

37
> sample(c(-0.5,0,0.1,0.2,0.3),10,rep=T,prob=c(1,2,3,4,5))
[1] 0.1 0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.2 0.2

You can think to x as the set of possible yearly returns from an investment: you
can loose 50%, make 0 return or get 10, 20 or 30%, resembling a situation in which
typically some money can be made in most cases but there is the chance to get a
large loss.
You may notice that the vector c(1,2,3,4,5), used to define the probabilities is
strange as these should not exceed 1 (as well as being nonnegative). In this case,
R by default normalizes the vector assuming that this means that if -0.5 has some
probability to be drawn, then 0 has twice as much, 0.1 three times as much and so
on. Hence, the true probabilities used in this case are

> 1:5/sum(1:5)
[1] 0.06666667 0.13333333 0.20000000 0.26666667 0.33333333

1
from which you see, for instance, that a return of 30% will be obtained 3
of the times
and a large loss is experienced with probability 6.7%.
How much is the average return of the investment? There is a simple analytical way
to compute this figure but, in this course, we will approximate the answer using the
power available in R: we generate many random experiments and evaluate the average
outcome as follows

> returns <- sample(c(-0.5,0,0.1,0.2,0.3),10000,rep=T,


+ prob=c(1,2,3,4,5))
> mean(returns)
[1] 0.13973

Despite the chance of a sizeable loss, the average return is still a healthy 14% per
year.

5.4 Some interesting or “realistic” examples


1. The yearly payoff of a project whose cost is 100 is 1000, 100, 1 with probabilities
1 2 3
, , , respectively. What is the expected net gain?
6 6 6
Observe that this example is quite similar to the previous one about returns. Now
raw payoffs net of costs are given and some additional care must be used. We simulate
(many) payoffs as before and then subtract the (fixed) cost to get an answer.

38
> N <- 1000 # N experiments
> payoffs <- sample(c(1000,100,1),N, rep=T,
+ prob=c(1/6,2/6,3/6))
> mean(payoffs-100)
[1] 109.494

The average gain (i.e., average payoff less the fixed cost) is about 100.

2. You can buy a lottery in which with 50-50 probability you cash either a uniform
random number or a normal random number.

(a) Can you experience a loss?


(b) What’s the average revenue?
(c) Which is the probability to get more than 0.5?
(d) Which is the probability to get less than -1?

The fist question may be answered at once: yes, a loss can be experienced when the
normal is selected (by chance) and a negative outcome is following (this is possible
with the normal). To answer the other questions we need to simulate the lottery
many times: begin drawing “uniform” or “normal” with the same probability; then
accordingly record the outcome, repeating the process N times.

> N <- 1000


> res <- rep(0,N) # vector to store results
> for(i in 1:N)
+ {
+ uorn <- sample(c("uniform","normal"),1) # uniform or normal
+ if( uorn=="uniform")
+ res[i] <- runif(1) else res[i] <- rnorm(1)
+ }

The vector res now contains 1000 simulation of the lottery. The average revenue is
mean(res)=0.25.
To estimate the probabilities to get more than 0.5 or less than 1, use the code:

> sum(res>0.5)/N # should be about 40%


[1] 0.399
> sum(res< -1)/N # about 8%
[1] 0.086

39
Please take care and remember to enter a space when typing res< -1 to avoid to
assign the variable res the value 1: <- is the assignment operator whereas < - means
“smaller than negative (something)”.
3. What’s the probability that a standard uniform is bigger that a uniform number in
[0.1,1.1]?
We can solve the problem by simulation, along the following lines: create many
standard uniforms and store them in the vector x; create many uniforms in [0.1,1.1]
and store them in y; compare x and y and count how often x>y (in order to do that x
and y must have the same number of elements); divide by the number of experiments.

> N <- 10000 # this means "many"


> x <- runif(N) # N uniforms (in [0,1])
> y <- runif(N,0.1,1.1) # N uniforms in ([0.1,1.1])
> sum(x>y)
[1] 3991
> sum(x>y)/N
[1] 0.3991

A captivating way to conceptually “compute” this results is to draw the following


picture
1.2
1.0
0.8
0.6
0

0.4
0.2
0.0

0.0 0.2 0.4 0.6 0.8 1.0 1.2

Index

40
Notice that whenever a point (x, y) is sampled, with x ∈ [0, 1] and y ∈ [0.1, 1.1],
it must be in the aquamarine region. As sampling is uniform, any point in the
aquamarine rectangle is as likely as any other to be selected.
We are interested in the cases in which x > y so draw the line y = x and look at the
portion of the rectangle that is below the line, in red:

> plot(0,t="n",xlim=c(0,1.2),ylim=c(0,1.2))
> polygon(c(0,0,1,1),c(0.1,1.1,1.1,0.1),col="aquamarine")
> abline(0,1)
> polygon(c(0.1,1,1),c(0.1,1,0.1),col="red")
1.2
1.0
0.8
0.6
0

0.4
0.2
0.0

0.0 0.2 0.4 0.6 0.8 1.0 1.2

Index

The probability to draw a red point in the aquamarine rectangle depends on the
2
respective areas and some trivial computations (verify yourself!) show it is 0.91 =
0.405 = 40.5%. Is this in agreement with the result obtained through simulation? If
this is not the case, try increasing N .

Exercise 17 How often is a uniform number in [-1,1] smaller than a uniform number
in [-0.5,1.5]?

4. A random walk can be used to represent several interesting situations in which one,
physically or figuratively, moves to the right or to the left in some random way.

41
5. How probable is that we have common birthdays (same day, but different years are
allowed)? Based on your experience, you may think that this is quite a rare event.
Mmhh, let’s work on this using simulation.
Assume we have a group of S people with their nice birthdays coded as a one day in
1:365, where I neglect leap years for simplicity. We’ll simulate N such groups, for a
large N , and check how often the same day is duplicated.
The simulation will be run as follows

• create one group with S persons


• scan the group for duplicates (duplicated birthdays in that group); record 1 if
there are duplicates or 0 if there is none;
• repeat the previous steps N (i.e., many) times;
• finally, count the recorded number of 1’s (i.e., the number of groups in which at
least one duplicate - a common birthday! - is present) and divide by N .

> N <- 1000


> S <- 30
> res <- rep(0,N) # vector to store results 1/0
> for(i in 1:N) {
+ group <- sample(1:365,S,rep=T)
+ if(any(duplicated(group))) res[i] <- 1 else res[i] <- 0
+ }
> res[1:10]
[1] 1 0 1 0 1 0 1 1 0 1
> sum(res/N)
[1] 0.728

Let me do three things: explain this result; clarify some finer details on the code we
have used; and generalize the solution to the birthday problem.
First, the previous result shows that it’s indeed quite likely that a common birthday
is in a group with S = 30 persons: a joint party can be held in about 70% of cases
(to be fussy, if you pick a random group of 30 persons, with probability 70% there
will be a common birthday). The interesting thing is that one may intuitively have
guessed this probability to be about 30/365 which is much, much, much lower than
70%.
Second, the most important part of the simulation, as always, is inside the for loop,
where experiments are run. We repeat N times, for i taking values 1, 2, 3, ..., 999, 1000,
the same experiment: we create group sampling with replacement S birthdays in

42
1:365; then we ask R to check for duplicates using duplicated, a function telling
whether each element of group is duplicated. In other words, duplicated(group)
is a vector of TRUE/FALSE (remember, a boolean vector) saying for each person if
there is another guy/lady with the same birthday. Finally, if there is any TRUE in
duplicated(group) we record 1 in the ith component of res; else we record a 0.
Third, the fact that S = 30 yields a high probability of a common birthday prompts
to investigate a bit how this probability depends on S. The following code can be
used to plot this unexpected dependence (I’m leaving the code here for the brave
willing to dig into this example to build some programming experience: may the
force be with you! If you don’t care about the code then ok, I accept your faith...
but still you must study and understand the figure and its meaning).

> birthday <- function(S,N=1000){


+ res <- rep(0,N) # vector to store results 1/0
+ for(i in 1:N) {
+ group <- sample(1:365,S,rep=T)
+ if(any(duplicated(group))) res[i] <- 1 else res[i] <- 0
+ }
+ sum(res/N)
+ } # birthday is a function collecting the previous instructions
> s <- seq(5,250,by=5)
> plot(s,sapply(s,birthday),t="b",
+ xlab="S(ize of group)",ylab="Prob of common birthday")

43
1.0

●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●


0.8


Prob of common birthday


0.6


0.4


0.2


0.0

0 50 100 150 200 250

S(ize of group)

A look at the graph makes clear that the probability of a common birthday is quickly
increasing to 100% for large groups. The intuition that this is a rare event is (very!)
false for medium to large groups and, say, S = 50 is enough to have a probability of
birthday(50)=0.976. Good point to stop: party over, oops out of time... 7

6. This is a managerial example related to performance assessment. Things, organi-


zations and even human beings can (and sometimes must) be ranked along many
dimensions. As a prototype, consider the following situation: an item has three fea-
tures whose quality is measured using three (independent) random numbers, so that
1 means perfect and 0 means entirely defective. For instance, let (0.35,0.71,0.28)
be the three measures: along the first and third dimension, this item is well below
average (0.5) but the second feature scored an adequate value of 0.71, being 1.00 the
highest possible value.
Now assume that the item is deemed satisfactory if the smallest (among the three)
score is larger than 0.4. What’s the probability to have a satisfactory item?

> N <- 1000


> res <- rep(0,N) # vector to store the results
> for(i in 1:N){
7
Give http://www.youtube.com/watch?v=UjivDeA7Qu0 a chance. If you are more serious try http:
//en.wikipedia.org/wiki/Birthday_problem

44
+ x1 <- runif(1) # first score
+ x2 <- runif(1) # second score
+ x3 <- runif(1) # third score
+ minx <- min(c(x1,x2,x3))
+ if(minx>0.4) res[i] <- 1 else res[i] <- 0
+ }

As done several times, res keeps track of our experiments based on repeated simu-
lations of three uniform scores (x1, x2 and x3). We are ready to answer:

> mean(res)
[1] 0.22

In other words, less than 25% of the items will be cleared. Do you have any thought
on this assessment procedure? Can you think to any other way to provide a sound
judgement on multi-faceted items?

Exercise 18 Divers’ performances get several grades from several judges. Then the
highest and the smallest grade are discarded and only the medium grades contribute
to the final “mean” grade.
Rework the previous exercise discarding the smallest and biggest random numbers
(i.e., keep the medium one). What’s the probability that the medium score exceeds
0.4? Compare with the previous result, I personally find the difference quite striking.
Hint: the code is pretty much the same: if you have three numbers like 1, 4.5, 2.1
the medium one can be always obtained using sort(c(1,4.5,2.1))[2] ...

7. The Mont Hall Problem.

Suppose you’re on a game show, and you’re given the choice of three doors:
Behind one door is a car; behind the others, goats. You pick a door, say
No. 1, and the host, who knows what’s behind the doors, opens another
door, say No. 3, which has a goat. He then says to you, “Do you want to
pick door No. 2?” Is it to your advantage to switch your choice?

The description is taken from http://en.wikipedia.org/wiki/Monty_Hall_problem


(in italian, http://it.wikipedia.org/wiki/Problema_di_Monty_Hall) where you
can read a lot about this problem. I hope you’ll enjoy looking through the solution
there but... this is a course on R and simulation!
We’ll tackle the problem simulating many times the two strategies of keeping the first
door or switching to the other one that is offered. I’ll call the two strategies “keep”
and “switch”. Without changing the problem, I assume that the car is always placed

45
Figure 1: The Monty Hall problem. A car is behind one of three doors. After the first
pick, another door with a goat is opened. Should you “keep” or “switch” to the other closed
door? Source: http://en.wikipedia.org/wiki/File:Monty_open_door.svg, in the pub-
lic domain.

behind the first door, but this is unknown to the player who honestly pick a random
door in the first place and cannot exploit the knowledge that the only goat-free door
is No. 1. This is only done to keep the R simpler, no trick, no cheating.
First, we simulate what’s the probability of winning the car with “keep”:

> N <- 1000


> res <- rep(0,N) # to store results
> for(i in 1:N){
+ pick <- sample(c(1,2,3),1) # pick door at random
+ # there is nothing else to do as you always "keep"
+ # no matter of what you are offered
+ if(pick==1) res[i] <- 1 else res[i] <- 0
+ }

So, “keep” gives you a sum(res)/N=0.323 probability to win the car (indeed, it’s 13 ).
Good, but this is not the end of the story: systematic “switch” can be simulated as

> N <- 1000


> res <- rep(0,N) # to store results
> for(i in 1:N){
+ pick <- sample(c(1,2,3),1) # pick door at random
+ if(pick==1) shown <- sample(c(2,3),1)
+ if(pick==2) shown <- 3
+ if(pick==3) shown <- 2
+ # shown is neither the first nor the door that was picked

46
+ # negative indexes take away from the vector
+ # "switch": the remaining is alway chosen
+ taken <- c(1,2,3)[c(-pick,-shown)]
+ if(taken==1) res[i] <- 1 else res[i] <- 0
+ }
> sum(res)/N
[1] 0.7

Wow! A few lines of R, some thousands of computerized experiments and a tiny


bit of coding have solved a famous puzzle that ignited furious discussions about the
optimal strategy to use: you should switch, no doubt, and get the car with probability
2
3
(unless you are very fond of goats).

Exercise 19 Almost every night on RAI 1 you could see “Affari tuoi”. Interestingly,
players win if they keep the right parcel (as it contains money) and they are often
offered the chance to “switch” by the cruel lady at the other end of the phone.
The situation is different but can you draw some conclusions on the optimal way to
behave after the Monty Hall problem discussion?

Exercise 20 The following code carries out “switch” in a different but equivalent
way. Can you explain why in some detail?

> N <- 1000


> res <- rep(0,N) # to store results
> for(i in 1:N){
+ pick <- sample(c(1,2,3),1) # pick door at random
+ shown <- sample(c(1,2,3),1,prob=c(0,1-(pick==2),1-(pick==3)))
+ # shown is neither the first nor the door that was picked
+ # negative indexes take away from the vector
+ # "switch": the remaining is alway chosen
+ taken <- c(1,2,3)[c(-pick,-shown)]
+ if(taken==1) res[i] <- 1 else res[i] <- 0
+ }
> sum(res)/N
[1] 0.682

47
Selected solutions

1. > f <- function(x) x**3-3*x


> curve(f,-3,3)
> grid(col=1)
> r1 <- uniroot(f,c(-2,-1))$root;r1
[1] -1.732051
> r2 <- uniroot(f,c(-0.5,0.5))$root;r2
[1] 0
> r3 <- uniroot(f,c(1,2))$root;r3
[1] 1.73205
> roots <- c(r1,r2,r3)
> points(roots,f(roots))
15
10
5
f(x)

● ● ●
0
−5
−10
−15

−3 −2 −1 0 1 2 3

3. Type help(polyroot) for details. As the function is a polynomial, coefficients must


be provided to polyroot starting from the constant up the highest degree (think, in
other words, that f (x) = 0 − 3x + 0x2 + x3 ).

48
> polyroot(c(0,-3,0,1))
[1] 0.000000+0i 1.732051-0i -1.732051+0i

The
p roots are shown in complex notation but it should be clear that they are 0,
± (3) ≈ ±1.732051.
No, the method cannot be used for df, which was defined numerically using a small
h and it is not a polynomial (even though it could be done with a little effort).

16. You have to solve the system using ginv and check the result.

> require(MASS) # for ginv


> Y <- matrix(c(105, 120, 110,
+ 105, 110, 102,
+ 105, 90, 98,
+ 105, 80, 90),4,3,byrow=T)
> b <- c(119,118,117,116)
> sol <- ginv(Y) %*% b
> Y %*% sol
[,1]
[1,] 119
[2,] 118
[3,] 117
[4,] 116

The result is equal to b and we conclude that the asset can be replicated. In the
absence of arbitrage, the cost of replication should be equal to the cost of sol:

> sum(sol*c(100,100,100))
[1] 112.4603

The price is 112.460317460318 and any other price for b would generate an arbitrage.

6 Useful things, happy disasters and insightful errors


This section collects a few examples of bizarre but insightful R code that I have been
learning from students.

1. On the use of optimize, credit to Filippo Toneatti. Consider the problem of min-
imizing f (x) = x4 − x2 + 4x2 + 4x − 1 on the interval [−5, 5]. This is simple stuff,
isn’t it?

49
> f<-function(x) -x**4-x**3+4*x**2+4*x-1
> curve(f,-5,5)
> optimize(f,c(-5,5))
$minimum
[1] -4.999944

$objective
[1] -420.9783
> f(-5)
[1] -421
> f(5)
[1] -631
0
−100
−200
−300
f(x)

−400
−500
−600

−4 −2 0 2 4

Well, clearly R got it wrong! It can immediately be seen from the graph that the
minimizer is x = 5 and the minimum value is −631. Care is needed when you use
algorithms and there are multiple local minima (in x = −5, x = 5 and something
is also happening in [−2, 2]). The algorithm used by R is a rather sophisticated one
but it doesn’t see the minimizer on the right and is misled into x = −5. A slightly

50
modified definition of f can be used to see where R computes the function in the
minimization process:

> f2<-function(x) {cat(x,"\n"); -x**4-x**3+4*x**2+4*x-1}


> optimize(f2,c(-5,5))
-1.18034
1.18034
-2.63932
-3.54102
-4.098301
-4.442719
-4.655581
-4.787138
-4.868444
-4.918694
-4.94975
-4.968944
-4.980806
-4.988138
-4.992669
-4.995469
-4.9972
-4.998269
-4.99893
-4.999339
-4.999591
-4.999747
-4.999844
-4.999904
-4.999944
-4.999944
$minimum
[1] -4.999944

$objective
[1] -420.9783

It is as if R explored up to 1.18 taking the decision to give up on the right. The first
lesson is that caution is always needed when using numerical methods. Use a graph
to check when you can, use a graph, use a graph...
But there is a second lesson for the curious: how can R be so dummy? Have a look
at the messy situation in [−2, 2] and think about the gravitational metaphor.

51
> curve(f,-2,2)
> points(1.18,f(1.18))
6


4
f(x)

2
0
−2

−2 −1 0 1 2

Clearly, a lot of things are going on, with lots of local extrema: R probes the function
at 1.18 but the ball then falls back on the left... unlucky!
More advanced stuff
You may need to optimize hundreds of functions and in this case you do not have
the possibility or time to manually look at all the graphs (this happened to me while
doing research: I had to maximize thousands of utility functions, one for every agent
in the model). Then you need a better algorithm:

> myoptimize <- function(f,interval,maximum=F,...){


+ a <- interval[1]
+ b <- interval[2]
+ q1 <- 3/4*a+1/4*b
+ q2 <- (a+b)/2
+ q3 <- 1/4*a+3/4*b
+ res <- NULL
+ res[[1]] <- optimize(f,c(a,q2),maximum=maximum,...)
+ res[[2]] <- optimize(f,c(q1,q3),maximum=maximum,...)
+ res[[3]] <- optimize(f,c(q2,b),maximum=maximum,...)

52
+ x <- c(res[[1]]$objective,res[[2]]$objective,res[[3]]$objective)
+ trueres <- if(maximum) which.max(x) else which.min(x)
+ res[[trueres]]
+ }

The previous code is doing a simple and practical things: it splits the given opti-
mization interval [a, b] in four equal parts [a, q1 , q2 , q3 , b] and optimize in each of the
three partly overlapping intervals [a, q2 ], [q1 , q3 ] and [q2 , b]. Then the best solutions
among the three that were found is provided. While there is no guarantee that this
will work for any f , it is much more difficult for this method to be fooled. See what
happens in the previous case: problem solved!

> myoptimize(f,c(-5,5))
$minimum
[1] 4.999922

$objective
[1] -630.9586

2. On the importance of starting points in optimization, credit to Simone Scalco. As-


sume you have to maximize f (x, y) = −2x2 + 3xy − 4y 2 − 5x − 4y + 4 subject to the
constraints x ≤ 5, y ≤ 3 and 3x + y ≥ 5. After the definition of the function and of
the constraints, one standard way to tackle the problem is use constrOptim with a
simple initial points satisfying the constraints, with no further worries.

> f <- function(x,y) -2*x^2+3*x*y+4*y^2-5*x-4*y+4


> fb <- function(x) f(x[1],x[2])
> A <- matrix(c(-1,0,3,0,-1,1),3,2)
> b <- c(-5,-3,5)
> constrOptim(c(4.9,2.9),fb,NULL,A,b,
+ control=list(fnscale=-1))
$par
[1] 1.000648 3.000000

$value
[1] 30

$counts
function gradient
278 NA

53
$convergence
[1] 0

$message
NULL

$outer.iterations
[1] 3

$barrier.value
[1] 0.0003544281

It looks as if (1, 3) is the maximizer but we didn’t look any graph and tried (only) a
single convenient starting point. Let’s check.

> x <- seq(0,5,len=51)


> y <- seq(-5,5,len=51)
> z <- outer(x,y,f)
> image(x,y,z);contour(x,y,z,add=T)
> abline(h=3);abline(v=5)
> abline(5,-3)
> points(4.9,2.9);points(1,3,pch=19)

80

60
4

40
● ●
20
−20
2

−40

0
−6
0
y

0
20
−2

40

60
−4

80

100

0 1 2 3 4 5

54
constrOptim started from the point in the upper right and ended up at the higher
level in (1, 3), the filled point. However, the graph makes clear that the function rises
again in the lower tip of the triangular domain (not shown in the previous graph).
Looking at the whole domain and starting from (4, −4), which is strictly feasible,
indeed produces the exact result.

> x <- seq(0,5,len=51)


> y <- seq(-10,5,len=51)
> z <- outer(x,y,f)
> image(x,y,z);contour(x,y,z,add=T)
> abline(h=3);abline(v=5)
> abline(5,-3)
> points(4,-4);points(5,-10,pch=19)
> constrOptim(c(4,-4),fb,NULL,A,b,
+ control=list(fnscale=-1))
$par
[1] 5.000000 -9.999999

$value
[1] 219

$counts
function gradient
502 NA

$convergence
[1] 0

$message
NULL

$outer.iterations
[1] 3

$barrier.value
[1] 0.002334433

55
4

50
2

−50
0

0
−2
y

50
−4


100

150
−6

200

250
−8

300
350

400 200
−10

0 1 2 3 4 5

This example stresses again and again the importance of selecting good starting
points for optimization algorithms (or use alternative techniques).
More advanced stuff
An interesting representation of the situation can be obtained as follows:

> f2 <- function(x,y) (if(min(A%*%c(x,y)-b)>0) 1 else NA)*f(x,y)


> f2v <- Vectorize(f2)
> z2 <- outer(x,y,f2v)
> contour(x,y,z2)
> abline(h=3);abline(v=5)
> abline(5,-3)

56
5

20 0
−20

−40
0

−6
0

0
−5

20

40
60

100 80
120
−10

0 1 2 3 4 5

I’m writing the code to show one way to plot something in some domain only. Here f2
is a function that returns NA (not available in R) if any of the constraints is negative;
otherwise, in the domain, the if sentence produces a 1 and f2(x,y) is the same as
f(x,y).
The standard outer is not working on f2 and you need to vectorize it to compute
the matrix of values of f2 for all values of x and y.
It is also possible to depict with different colors the starting points that erroneously
end up in the wrong solution and the ones who correctly discover (5, −10).

> f3 <- function(x,y){


+ if(is.na(f2(x,y))) NA else {
+ r <- constrOptim(c(x,y),fb,NULL,A,b, control=list(fnscale=-1))
+ r$par[1]
+ }
+ }
> f3v <- Vectorize(f3)
> z3 <- outer(x,y,f3v)
> radiantorchid <- rgb(218/255,112/255,214/255)
> mimosa <- rgb(239/255,192/255,80/255)
> image(x,y,z3,col=c(radiantorchid,mimosa))

57
> abline(h=3);abline(v=5)
> abline(5,-3)
4
2
0
−2
y

−4
−6
−8
−10

0 1 2 3 4 5

Clearly, initiating constrOptim in the upper radiant orchid area yields (1, 3), whereas
a starting point in the lower8 mimosa area provides the correct maximizer (5, −10).

3. A parenthesis is a parenthesis is a parenthesis..., credit to Regina Poniridis and Anna


Scaramuzza. Assume you want to solve a standard assessment problem like

At a cost of 90 you can get one of the random future payoffs 200, 110, 4,
respectively. Assume payoffs are drawn uniformly. What’s the net expected
gain of the investment?

The possible answers were -0.4; 15.7; 9.3; 28.5. What’s wrong with the following
code?

> N <- 10000


> r <- sample(200,110,4)
> mean(r)-90
8
Radiant Orchid is Pantone color of the year 2014, https://www.pantone.com/pages/index.aspx?pg=
21129, Mimosa is Pantone color of the year 2009, https://www.pantone.com/pages/pantone/
pantone.aspx?pg=20634&ca=10. Why should we use the standard boring colors?

58
[1] 6.590909

The answer may superficially look plausible but it’s really a sort of R nonsense (and
unfortunately doen’t generate any error). Indeed, N was never used inside sample
– this is suspicious – and likewise replace was forgotten. Some defaults neverthe-
less allowed R to proceed interpreting sample(x, size, replace = FALSE, prob =
NULL) as follows

(a) x=200 and R samples in this case in 1:200


(b) size=110 and, hence, 110 numbers are sampled in 1:200
(c) if R expects a boolean value, either TRUE or FALSE, by convention a non-zero
number is interpreted as TRUE and zero as FALSE. In this case, replace is TRUE

Therefore, r is a vector of 110 random integers drawn with replacement from the 200
numbers in 1:200. This is entirely different from the intended simulation but the
point is that R showed no error message and one may believe he/she’s doing well...
type r to verify this statement.
The correct answer is of course given by:

> N <- 10000


> r <- sample(c(200,110,4),N,replace=T)
> mean(r)-90
[1] 15.4908
> str(r)
num [1:10000] 4 110 110 110 110 4 4 4 4 200 ...

The answer is correct and r is now a vector with 10000 components (not 110!).
Another striking example is related to the generation of 100 random normal numbers
with mean -4 and standard deviation 1. Is this working?

> v <- c(rnorm(100),-4,1)

The answer is negative: v is a vector of 102 numbers, made of 100 random nor-
mal components and -4 and 1. The correct vector is instead generated by v <-
rnorm(100,-4,1). Indeed, even v <- c(rnorm(100,-4,1)) would work: the c()
operator is not needed, as rnorm already creates a vector, but it’s not harmful either.
The take home message is: be cautious with parentheses and be even more cautious
as no error messages are given in some cases.

59
Lessons from some past exams
Some exercises assigned in the past had particularly low grades (average below 1 when full
score is 2): this holds for exercises with set.seed and matrix/vector, optimization in 2d,
constrained optimization, arbitrage and assessment through simulation.
As students may have extra difficulties in solving such problems, these notes discuss
some of the issues related to the aforementioned exercises.

6.1 set.seed
One of the problem of this exercise may be related to the role of set.seed: this command is
used to reset the random number generator to a specific value and to generate a controlled
sequence of numbers.
If you to generate a random (3 × 3) matrix with normal numbers having mean 2 and
standard deviation 3, you can type

> m <- matrix(rnorm(9,2,3),3,3)


> m # show m
[,1] [,2] [,3]
[1,] 3.8762030 5.369065 -3.8001601
[2,] 0.9821722 4.733989 -0.4554212
[3,] 1.7026076 1.066936 -2.3283255

You will not be able to recover that specific matrix when you quit R, unless you save
your data in one of several ways, and if you type the same thing again the matrix will be
different

> m <- matrix(rnorm(9,2,3),3,3)


> m
[,1] [,2] [,3]
[1,] 2.8075892 2.2169911 0.3995738
[2,] -0.2920926 1.8072854 5.6722845
[3,] 1.6663722 0.1851708 5.5228658

Setting the seed with set.seed allows to recover the very same matrix at any time.

> set.seed(123)
> m <- matrix(rnorm(9,2,3),3,3)
> m
[,1] [,2] [,3]
[1,] 0.3185731 2.211525 3.38274862
[2,] 1.3094675 2.387863 -1.79518370
[3,] 6.6761249 7.145195 -0.06055856

60
Now, you may work with R and destroy or alter the variable m as, for instance, with

> b <- runif(3)


> m %*% b
[,1]
[1,] 5.224463
[2,] 1.111739
[3,] 8.955485
> m <- 1
> m # now m is a number
[1] 1

The original m is now lost forever in R memory, unless you reset the seed to the value
that was used immediately before m creation. Hence, to re-create it you have to type

> set.seed(123) # this resets the seed


> m <- matrix(rnorm(9,2,3),3,3) # this is the same m as before
> m
[,1] [,2] [,3]
[1,] 0.3185731 2.211525 3.38274862
[2,] 1.3094675 2.387863 -1.79518370
[3,] 6.6761249 7.145195 -0.06055856

It’s important to remember that computations must be done after the set.seed com-
mand and before you enter other commands involving random number generations or alter
m in any way.
In case, just re-enter the set.seed to re-initialize everything and type again the needed
commands.

6.2 Optimization
There is a video on optimization and it is a good idea if you revise the material at http:
//youtu.be/b_r7u4IgOhY (part A) and http://youtu.be/ViUy3BTuBwI (part B).
Let me spell out the major steps to solve an optimization problem:

1. define the function f (x, y) (no typos, of course);

2. define the “b-version” as explained in the video;

3. use optim with an appropriate starting point (c(x0,y0) (if you want to maximize,
remember to set control=list(fnscale=-1);

4. interpret and understand the results.

61
One conceptual problem is that the solution depends on the the starting point (x0 , y0 ),
as suggested by the gravitational metaphor: minimizing resembles the fall of a ball on a
surface and, depending on where you initially drop the ball, different resting point or holes
may be reached. So, if you start from the wrong initial point, you may end up in the wrong
solution (more technically, you may end in a local minimizer instead of a global minimizer,
the same holding for maximizers).
How do you select the proper (x0 , y0 )? Easy, you draw a picture, inspect the contour
lines to see a good candidate (global!) minimizer/maximizer and start from there. Often
you need to add some contour lines to better spot where the best minimizer/maximizer is
located. This can be done with something like
contour(x,y,z, add=T, levels=c(oneValue,anotherValue)),
specifying appropriate levels depending on what you see.
As an example, consider the following function to be minimized

x2 + xy + y 2 − y 2 2
f (x, y) = + e−x −x + e−y −2y .
2
The standard contour graph produces
3

6 8
10

2
2
1
0

4
−1

6
−2

8 6
12 10
−3

−3 −2 −1 0 1 2 3

The minimizer appears to be in the orange area, which is too wide to get a single
meaningful candidate. The function was plotted by default using levels ...10, 8, 6, 4, 2 and
does not reach 0 (as there is no zero-contour). So, we add a couple of levels between 0 and
2:

62
> contour(x,y,z)
> contour(x,y,z,add=T,levels=c(1.5,1))

Clearly, the minimizer is inside one of the two small circular contours of level 1, depicted
below in aquamarine:
3

6 8
10

2
2

1.5
1

1
0

4
−1

6
−2

8 6
12 10
−3

−3 −2 −1 0 1 2 3

Using the grid command we see that good candidates are close to (0.5, 0.5) and
(−1.5, 1.5). We minimize starting from both points:

> optim(c(0.5,0.5),fb)
$par
[1] 0.5433489 0.7173225

$value
[1] 0.8158205

$counts
function gradient
49 NA

$convergence
[1] 0

63
$message
NULL
> optim(c(-1.5,1.5),fb)
$par
[1] -1.555756 1.332890

$value
[1] 0.8281947

$counts
function gradient
47 NA

$convergence
[1] 0

$message
NULL

The global minimizer is at (0.54, 0.72) and the global minimum is about 0.82. Observe
that (−1.56, 1.33) is only a local minimizer at which the value of the function is 0.83: this
result would be considered as wrong!
The following 3d graph may help visualizing the two minimizers and having the feeling
of what is going on (in my view, the contour plot does a much better job in describing the
situation but surfaces are suggestive).

> persp(x,y,z,theta=30,phi=-10,ticktype="deta")

64
15

10
z

3
−3 −2 12
−1 0 −10
1 2 −2 y
x 3 −3

Please, read also the following subsection for an example (it’s about constrained opti-
mization, but the problem of selecting of a proper starting point is exactly the same).
Finally, as far as interpretation of the results is concerned, recall that the minimizer/max-
imizer is a point, with an x and a y, that can be retrieved in the $par component of the
output; the objective or the optimal value or the value of the function at the optimum is
a single number, namely the value of f at the minimizer/maximizer ($value).

6.3 Constrained optimization / production function


There is a video on the topic, see http://youtu.be/MCvz-c6UUkw. You must solve the
problem using the previous steps and defining the constraints using linear algebra, as
shown in the video.
It is of utmost importance to provide a good starting point strictly inside the feasible
region. Again, a good graph is needed to start the procedure. Consider the problem to
maximize f (x, y) = −x2 +4xy+5y 2 +3x−5y+3 with the constraints x ≤ 5, y ≤ 3, 3x+y ≥ 3.
“Good graph” means that you see the whole domain:

> f <- function(x,y) -x^2+4*x*y+5*y^2+3*x-5*y+3


> x <- seq(-1,5,len=51)
> y <- seq(-15,3,len=51)
> z <- outer(x,y,f)
> image(x,y,z)

65
> contour(x,y,z,add=T)
> abline(v=5) # x<=5
> abline(h=3) # y<=3
> abline(3,-3) # y>=3-3x

0
0

100
−5

200
y

300

400

500
−10

600

700
800

900

1000
1100
−15

1200

−1 0 1 2 3 4 5

Some experimentation with the ranges of the sequences for x and y may be needed to
see the whole feasible region and you may need to try several graphs before succeeding. In
this case, the feasible region is the shaded triangle below.

66
0
0

100
−5

200

300

400

500
−10

600 ●

700
800
900
1000
1100
−15

1200

−1 0 1 2 3 4 5

First, you can easily select starting points strictly inside the region. Second, you see
from the contours that the maximizer is close to the lower right vertex of the triangle
(indeed, we already know in this case that the maximizer is at (5, −12), no additional
computation would be needed as the graph is fully revealing). We can solve the problem
picking as starting point (4.9, −10), the filled point in the graph.

> constrOptim(c(4.9,-10),fb,NULL,A,b,
+ control=list(fnscale=-1))
$par
[1] 5 -12

$value
[1] 533

$counts
function gradient
668 NA

$convergence
[1] 0

$message

67
NULL

$outer.iterations
[1] 3

$barrier.value
[1] 0.003062075

The maximum, obtained after the definition of proper A and b to describe the con-
straints, is f (5, −12) =533.
A very common error is to use another popular starting point like (4.9, 2.9) that, how-
ever, produces the (wrong!) local minimizer at the (wrong!) vertex (5, 3).

> constrOptim(c(4.9,2.9),fb,NULL,A,b,
+ control=list(fnscale=-1))
$par
[1] 5 3

$value
[1] 83

$counts
function gradient
336 NA

$convergence
[1] 0

$message
NULL

$outer.iterations
[1] 3

$barrier.value
[1] 0.003062073

68

Оценить