0 оценок0% нашли этот документ полезным (0 голосов)

4 просмотров68 страницSep 03, 2020

© © All Rights Reserved

0 оценок0% нашли этот документ полезным (0 голосов)

4 просмотров68 страницВы находитесь на странице: 1из 68

Prologue

This course was conceived in 2013 and gradually developed to improve and refine the

knowledge of the R software and experiment some new teaching techniques, based on flipped

classes and automatic generation of hundreds of exams/exercises.

0 Intro

1. This course aims at providing:

(a) better understanding of theory, techniques, and problems encountered in your

math, stat and economics course.

(b) ability to “translate” a problem into workable R code, which gets a numerical

solution and provides insights on the relevant issues of the problem. Virtually

all the models and exercises of your previous, loosely quantitative, courses are

amenable to this “treatment”.

(c) practical knowledge of R programming environment. Basic programming skills

will be acquired together with ideas on how to “compute” everything that has a

formal structure (and can, say, be described in terms of functions, derivatives,

integrals, estimates, graphs, optimization, utility. . . )

2. Study material:

(a) Download the handouts by Emmanuel Paradis, http://cran.r-project.org/

doc/contrib/Paradis-rdebuts_en.pdf and the classic “Introduction to R” by

W. N. Venables, D. M. Smith and the R Core Team, http://cran.r-project.org/

doc/manuals/r-release/R-intro.pdf

We will not cover the two handouts in full but most of the material can be

studied and understood using parts of the two files.

(b) The “official” support page for the course is on Moodle, the learning platform of

Ca’ Foscari University. Goto moodle.unive.it, register as a Moodle user, search

the course under Studenti lauree e lauree magistrali / Area economica / Corsi di

laurea triennale / Economia e commercio / search for the string “computational”.

The direct link is https://moodle.unive.it/course/view.php?id=2610

(c) This document will expand, as time goes, in a full-fledged set of notes.

I have included a few sections where I present things that smart students have

taught me during the years. The material is very useful and can improve your

understanding and final performance.

1

(d) Some of the lectures will be flipped. This means that you will have to study the

material before the lecture. Yes, you study theory at home and we’ll work on

practical session in class. The first flipped class is scheduled for February 5th

2020 and below there is a list of the urls with the video material.

i. Flipped lecture 3 (fundamental principles): http://www.youtube.com/watch?v=

l73G_bOFDvg.

ii. Flipped lecture 4 (graphics, low vs high level): http://www.youtube.com/

watch?v=U19K2zNVOvA

iii. Flipped lecture 6 (root-finding): http://youtu.be/r9N0izLKQB0 (part A)

and http://youtu.be/kIR4GC20n8M (part B)

iv. Flipped lecture 8 (optim and 2-dim optimization): http://youtu.be/b_r7u4IgOhY

(part A) and http://youtu.be/ViUy3BTuBwI (part B).

v. Flipped lecture 9 (constrOptim and linear constraints): http://youtu.be/

MCvz-c6UUkw

(e) Additional self-study material can be found at http://www.stat.berkeley.edu/

share/rvideos/R_Videos/R_Videos.html

(f) There is no other way to understand programming than do it yourself: practice,

practice, practice, practice. . .

The exam will be held in the PC Lab at Palazzo Moro, approximately one hour

of time to answer 17 questions, which will require to know R functions, enter some

commands and code, interpret the results and so on.

Grades will tentatively be given giving 2 for each correct reply, -0.5 for a wrong reply

and 0 for “no-reply”, i.e., blank. A few questions are very basic and grading is starker:

+1 if right and -1 if wrong or blank.

1 I week

1.1 Install

Introduction, installation and overview: download R at http://cran.r-project.org.

You can also download Rstudio at http://www.rstudio.com/. Rstudio has a nice

graphical interface that let you see in the same windows the console, graphs, files and the

help page. Most students like Rstudio more than the basic R, which is believed to be too

spartan. However, take care: I don’t know whether it will be possible to run the exams

with Rstudio (but it should not be a big problem at all).

2

3/2 Intro and scientific notation

I week 4/2 R commands

5/2 FLIPPED: fundamental principles, used-defined functions

10/2 FLIPPED: graphics, high vs low level commands

II week 11/2 Image, persp, outer, contour

12/2 FLIPPED: root finding

17/2 Optimize 1-dim functions

III week 18/2 FLIPPED: optim for 2-dim functions

19/2 Revision and FLIPPED: linear constraints Ax ≤ b and constrOptim

24/2 State preference model

IV week 25/2 State preference model and examples

26/2 State preference model and examples

2/3 Sample spaces and simulation

V week 3/3 Simulation

4/3 Simulation examples and mock exam

(if everything goes smoothly no need for “recovery week”)

Table 1: Tentative schedule.

Scientific notation is a convenient way to write numbers that are too big or too small to

allow for the standard numeric format. In Italy, the debt of the public sector is about

2380306 millions euro, i. e., 2380306000000 (Wikipedia on 02-02-2020, the data is relative

to 2018). First, it’s hard to understand the meaning of such a number. You can help

yourself rewriting it as

2, 380, 306, 000, 000

but I’m not sure this is really successful. Second, it would be useful to read the number,

as wording can help understanding. This is when scientific notation comes in. A number

can be always written as

a × 10b ,

where a is a real number called mantissa and the exponent b is a (positive or negative)

integer. a can be picked so that it absolute value is between 1 and 10 and I like to think

that b tells how many times the decimal points of the mantissa should be shifted to the

right (left) if b is positive (negative).

The Italian debt is (Italian http://www.nationaldebtclocks.org/debtclock/italy)

where I have shown the so called e-notation. This means that once you take 2.380306

you should move the decimal points to the right 12 times to recover the debt in standard

notation. Try yourself. The scientific notation is helpful because it helps reading the

number. Recall that 1 billion is 1,000,000,000 = 1e9 (nine zeros after the 1) and 1 thousand

3

is 1,000 = 1e3. Hence, to reach the exponent 12, you multiply billions with thousands and

realize that the Italian debt is about 2 thousands 3 hundreds of billions (precisely, 2.380306

thousands of billions or 2.380306 trillions).

Let’s forget about debt, next example is less gloomy... I commute by train to reach

unive and it turns out1 that in EU-27 the rail fatality rate is 7.8125e-11 passenger-km.

This means that for each travelled km there are 7.8125e-11 deaths. Can you express such

a number in words? It’s about 8 hundredths of billionth (8 centesimi di miliardesimo). If

you want to write the number in the standard notation, you have

0.000000000078125,

obtained shifting the decimal point to the left 11 times (as b is -11). Even if I travel roughly

8000 km a year, I can survive the risk, literally!

More details are at http://en.wikipedia.org/wiki/Scientific_notation (also http:

//en.wikipedia.org/wiki/Names_of_large_numbers and http://en.wikipedia.org/wiki/

Names_of_small_numbers may be useful).

The exam will always (take note!) have one question on scientific notation. Please

observe that the e that you see in the scientific notation has nothing to do with the

eexponential function: again, it mean “10 raised to the following number” and tells you

you to shift (move) the decimal point.

1. Sections 1.7, 1.8, 1.9 and chapter 2 of R-intro.

expressions. In general:

Some operators, such as : to create sequences like in 1:4, have even higher precedence

and are executed before powers. A couple of examples will help.

> 1*2^3-4/5

[1] 7.2

1

http://pedestrianobservations.wordpress.com/2011/06/02/comparative-rail-safety/

4

The first to be executed is the power: 1*8; the minus is computed after the division by 5:

-0.8; we are left with 1*8-0.8 and multiplication is done before subtraction, 8-0.8. The

final result is 7.2.

Guess what’s computed typing exp(−x2 /2) when x = 1. The correct answer is 0.6065307,

as can be easily checked using R. However, this is an interesting experience: open Excel

and type in a cell the formula =exp(-1^2/2); press enter; you will get 1.6487213. This is

unfortunate, to say the least. Excel is probably the most commonly used software on earth

and yet it fails to “understand” standard mathematical precedence rules. Indeed, Excel

computes

exp(−12 /2) = exp(1/2) = 1.6487213,

the problem being in the fact that -1 is squared. Instead, you must square the one first

and only then consider the minus. This experience is the main reason why I do not want

to use Excel unless I’m brutally coerced.

Variables, regular sequences (seq), vectors, matrices, extraction. See chapter 5 in R-

intro and 3.4 in RfB.

There is a flipped lecture on fundamental principles: component-wise, recycling and the

fact that no or little feed-back and error signals are given by R. For user-defined functions,

see also 10.1 to 10.5 in R-intro.

The following code show how to define and plot a function and its derivatives.

> curve(f,-3,3)

> grid(col=1)

> df <- function(x,h=0.001) (f(x+h)-f(x-h))/2/h

> curve(df,add=T,lty=2)

> ddf <- function(x,h=0.001) (f(x+2*h)-2*f(x)+f(x-2*h))/4/h**2

> curve(ddf,add=T,lty=4)

5

15

10

5

f(x)

0

−5

−10

−15

−3 −2 −1 0 1 2 3

Example 1 Taken from http: // arxiv .org/ abs/ 1110 .1319 on the number of Facebook

users. The population is

KP0 ert

P (t) = .

K + P0 (ert − 1)

Estimates in the articles can be used to get the following graph with the point (2,0.850)

approximately showing the current time:

+ K*P0*exp(r*t)/(K+P0*(exp(r*t)-1))

> curve(logi(x),-5,10,xlab="Time",ylab="FB Users",ylim=c(0,1.5))

> grid(col=1)

> points(2,0.85)

6

1.5

1.0

FB Users

●

0.5

0.0

−5 0 5 10

Time

g(x, y) = x2 + y 2 + xy + 3y − 1

> x <- seq(-3,3,length=51)

> y <- seq(-3,3,length=51)

> z <- outer(x,y,g)

The seq commands define the domain in the x and y space, respectively. outer computes

a matrix z of values of g, for each couple of x ∈ x and y ∈ y. Different graphs can be

drawn:

> image(x,y,z)

7

3

2

1

0

y

−1

−2

−3

−3 −2 −1 0 1 2 3

> contour(x,y,z);grid(col=1)

3

15 15 20 25 30

10

2

5

1

0

0

−1

−2

10

−3

15

−3 −2 −1 0 1 2 3

> image(x,y,z);contour(x,y,z,add=T)

> grid(col=1)

8

3

15 15 25 30

20

10

2

1

0

0

y

−1

−2

10

−3

15

−3 −2 −1 0 1 2 3

> persp(x,y,z,theta=-30,ticktype="detailed")

30

20

z

10

0

3

2

1

0 3

2

y

−1 1

0

−2 −1 x

−2

−3 −3

9

2 II week

2.1 High vs low level graphs

There is a flipped lecture on high and low level graphical commands, see chapter 12 in r-

intro, skipping sections 12.1.2, 12.1.3, 12.2.2, 12.3, from 12.5.4 onwards. See in particular

section 4.5 in RfB (do not cover the more advanced material of section 4.6).

2.2 Root-finding

It is important to compute solutions (aka roots) of equations. Formally, we want to solve

f (x) = 0,

where f is a univariate function and, as it is clearly seen, the right-hand side of the equation

is null. The following example will clarify issues and concepts. We wish to solve

ex/2 = x2 − x.

> curve(x**2-x,-2,3)

> curve(exp(x/2),-2,3,add=T)

> grid(col=1)

6

5

4

x^2 − x

3

2

1

0

−2 −1 0 1 2 3

10

The R command uniroot can be used to find one root at a time, provided that the

equation is of the form f (x) = 0 and a interval with a unique root is given as an argument.

This interval is called a bracketing interval, because it brackets a unique root. As the

previous equation has not the correct form (there is no 0 on the right-hand side), it must

be rewritten as

ex/2 − x2 + x = 0.

The bracketing intervals can be seen in the graph and, for instance, they are [-1,0] and

[2,3]. Alternatively, a plot of f (x) = ex/2 − x2 + x can be drawn to determine the intervals.

In each of the two intervals there is a unique root and uniroot can be used.

> x1 <- uniroot(f,c(-1,0))

> x2 <- uniroot(f,c(2,3))

> x1$root

[1] -0.5119985

> x2$root

[1] 2.381285

Exercise 1 Find the roots of f (x) = x3 − 3x [plot a graph to bracket the roots, then use

uniroot; otherwise polyroot]. See the solutions in the last pages of this handout.

Once you know how to solve equations, it is possible to find extremal points setting the

derivative to zero, f 0 (x) = 0 (it’s an equation, baby!).

The first problem is to define a derivative as R has no symbolic capabilities and cannot

differentiate in ordinary terms. However, the derivative is a function that can be well

approximated via difference quotients, see http://en.wikipedia.org/wiki/Derivative.

Consider the function f (x) = x3 − 3x.

> df <- function(x,h=0.001) (f(x+h)-f(x-h))/2/h

> curve(df(x),-3,3,main="Derivative of f(x)" )

> grid(col=1)

> e1 <- uniroot(df,c(-2,0))$root # first extremal point

> e1

[1] -0.9999996

> e2 <- uniroot(df,c(0,2))$root # second extremal point

> e2

[1] 0.9999996

11

Derivative of f(x)

25

20

15

df(x)

10

5

0

−3 −2 −1 0 1 2 3

In e1 and e2 the derivative vanishes. Check the second derivative to see whether they

are max o min.

> curve(ddf(x),-3,3,main="Second derivative of f(x)")

> grid(col=1)

> flex <- uniroot(ddf,c(-1,1))$root # change in convexity/concavity

> ddf(e1) # this is negative: e1 is a max

[1] -5.999997

> ddf(e2) # this is positive: e2 is a min

[1] 5.999997

12

Second derivative of f(x)

15

10

5

ddf(x)

0

−5

−10

−15

−3 −2 −1 0 1 2 3

Exercise 2 Verify analytically that our results are correct, writing and using the hand-

computed first and second derivatives of f (x) = x3 − 3x.

Exercise 3 Use polyroot to find the roots of f (x) = x3 − 3x. Can you use the same

method for the function df?

3 III week

3.1 Maximization/minimization of functions

Generalities: objective function, decision variables, constraints. More often than not poor

decision making is resulting from unclear objective, partial knowledge of the variables

(the objects that you have responsibility over) and lack of understanding of constraints

(limitations, technical or conceptual) that one faces.

There is a fundamental difference between 1-dim and multidimensional optimization.

The problem

max f (x)

x∈R

is solved with optimize. In contrast, the problem

max f (x) = max f (x1 , x2 , . . . , xn )

x∈Rn x1 ,x2 ,...,xn

13

Example 3 (1-dim.) Consider again f (x) = x3 − 3x to be minimized. Optimize needs a

function first and an interval as second argument.

> curve(f(x),-3,3);grid(col=1)

> r <- optimize(f,c(0,3))

> r

$minimum

[1] 0.9999889

$objective

[1] -2

15

10

5

f(x)

0

−5

−10

−15

−3 −2 −1 0 1 2 3

standard way to define a function g two variables x and y and it’s the preferred way to

draw pictures (surface with persp, images or contours).

14

> g <- function(x,y) x**2+y**2+x*y+3*y-1

> x <- seq(-3,3,length=51)

> y <- seq(-3,3,length=51)

> zg <- outer(x,y,g)

> contour(x,y,zg)

> grid(col=1)

3

15 15 25 30

20

10

2

1

0

0

−1

−2

−3

15

10

−3 −2 −1 0 1 2 3

function of one thing alone, x, whose two components are used in the computation. The

second way must be used in multi-dim optimization.

> res <- optim(c(1,1),gb)

> res

$par

[1] 1.000032 -1.999912

$value

[1] -4

15

$counts

function gradient

91 NA

$convergence

[1] 0

$message

NULL

Observe that gb is defined, as stated above, in terms of a unique vector x ∈ R2 . Also notice

that optim needs an initial guess first and a (vectorial) function second as arguments.

> gb <- function(x) g(x[1],x[2])

> res <- optim(c(1,1),gb)

> res

$par

[1] 1.000032 -1.999912

$value

[1] -4

$counts

function gradient

91 NA

$convergence

[1] 0

$message

NULL

This alternative definition of gb exploits the fact that often we already have defined g and

we could avoid a good deal of typing.

Please observe that our candidate minimizer c(1,1) in optim is a point in R2 , it is not

an interval as many risk-loving students keep repeating...

You may wish to take a more challenging path to find extrema of multidimensional func-

tions, solving for the critical points that zero the partial derivatives.

(

gx0 (x, y) = 0

gy0 (x, y) = 0

16

Solving equations in several variables is harder (to be formally correct, “solving nonlinear

systems of equations in several variables”) and you must use some ingenuity.

> gx <- function(x,y,h=0.001) (g(x+h,y)-g(x-h,y))/(2*h)

> gy <- function(x,y,h=0.001) (g(x,y+h)-g(x,y-h))/(2*h)

> zgx <- outer(x,y,gx)

> zgy <- outer(x,y,gy)

> contour(x,y,zgx,level=0)

> contour(x,y,zgy,level=0,lty=4,add=T)

> grid(col=1)

3

0

2

1

0

−1

−2

0

−3

−3 −2 −1 0 1 2 3

The graph shows the intersections of the 0-level curves for the x- and y-partial deriva-

tives. Along each curve, one partial derivative vanishes and in the intersection they are

both null. The point is approximately (1,-2).

We can solve the same system also morphing the problem in an optimization. Consider

the minimization

min g 0 (x, y)2 + gy0 (x, y)2

x,y

The objective cannot be smaller than 0 and, when both the partial derivatives vanish, the

sum of squared zeros is zero and therefore the minimum is attained. Cheap advice: this is

really a kind of magic! Solving systems is difficult and it turns out that it’s much easier to

solve this minimization problem. Sometimes in life, you have a difficult problem. . . try to

reframe it in totally different terms. You may more easily find a solution.

17

> gbx <- function(x,...) gx(x[1],x[2])

> gby <- function(x,...) gy(x[1],x[2])

> sumsqu <- function(x) gbx(x)**2+gby(x)**2

> res <- optim(c(1,1),sumsqu)

> res

$par

[1] 1.000233 -2.000196

$value

[1] 9.782539e-08

$counts

function gradient

89 NA

$convergence

[1] 0

$message

NULL

You should be cautious when using such a trick. Indeed you can be confident in a

solution of the system only if the objective sumsqu at the solution is null (as in the previous

case).

Exercise 4 (easy) Can you analytically prove that (1,-2) is a minimizer? Solve by hand

and using R the resulting linear system.

Exercise 5 (funny) Find local extrema for the cubic function g(x, y) = x3 + y 2 + xy +

3y − 1.

trema? Why does the sumsqu problem provides a solution? Has the function global max o

min?

Hint: you can solve, graphically or manually, the first-order conditions, zeroing the deriva-

tives.

3.3 Integrals

Use integrate.

18

Exercise 7 Define and sketch the graph of

x2 − 1

f (x) = .

x2 + 1

Then solve the equation in x Z x

f (t) dt = 2.

1

Hint: define an auxiliary function with integrate and $value, then use uniroot.

4 IV week

4.1 Constrained optimization

There are two approaches: penalty methods and constrOptim.

Assume you want to maximize a function over some set D, like in

max g(x)

x∈D

As you do not want to obtain solutions outside D, you penalize points that do not belong

to D and solve the free problem

max g(x) − λ(“penalty if x ∈

/ D”).

x

The idea is that you get some positive penalty and, hence, worse values for unfeasibility.

Implicitly, we assume there is zero penalty in the domain D. This is very convenient as it

allows to turn a constrained problem, possibly with a complex D, in a unconstrained (free)

optimization problem that can be tackled by optim.

More formally, define D in terms of q constraints by a function h : Rn → Rq :

D = {x : h(x) ≥ 0}

and let (

0 h(x) ≥ 0;

I(x) =

1 otherwise.

In other words, I(x) is an indicator function that takes the value 0 if x ∈ D but flags with

a 1 the points x ∈

/ D (that will be penalized). The problem we solve is finally

max g(x) − λI(x),

x

where λ >> 0 is a parameter to suitably magnify the penalty (signalled by I(x) = 1). We

now solve the problem

max x2 + y 2 + xy + 3y − 1,

x,y

19

> g

function(x,y) x**2+y**2+x*y+3*y-1

<bytecode: 0x1034e0c00>

> gb

function(x) g(x[1],x[2])

<bytecode: 0x112220d78>

> Ib <- function(x) x[1]<0 | x[2]<0 | (x[1]+x[2]>4)

> gbpen <- function(x,lambda=100) gb(x)-lambda*Ib(x)

> res <- optim(c(1,1),gbpen,control=list(fnscale=-1))

> res

$par

[1] 3.54465e-09 4.00000e+00

$value

[1] 27

$counts

function gradient

233 NA

$convergence

[1] 0

$message

NULL

The solution is the border point (0,4). Remember to start the optimization from a

feasible point. It is useful to visualize how penalties work and, in order to do so, we build

a graph in the standard way, defining x and y.

> gpen <- function(x,y,lambda=1000) g(x,y)-lambda*I(x,y)

> x <- seq(-5,5,len=51)

> y <- seq(-5,5,len=51)

> zgpen <- outer(x,y,gpen,lambda=100)

> persp(x,y,zgpen,theta=-30,ticktype="detailed")

20

0

zgpen

−50

−100

4

2

0 4

2

y

−2 0

−2 x

−4

−4

> image(x,y,zgpen)

> contour(x,y,zgpen,add=T)

> grid(col=1)

21

−60 −2

0

−4 −3

0 0

−5

4

0

20

−3

0

−1

0 −8

0

2

10

−2

0

−4

0

−5

0

0

0

y

−2

−4

−7

0

−5

−6

0

−4 −2 0 2 4

Another method to solve problems when constraints are linear uses constrOptim:

> b <- c(0,0,-4)

> res <- constrOptim(c(1,1),gb,NULL,a,b,control=list(fnscale=-1))

> res

$par

[1] 3.213127e-08 4.000000e+00

$value

[1] 27

$counts

function gradient

294 NA

$convergence

[1] 0

$message

NULL

22

$outer.iterations

[1] 3

$barrier.value

[1] 0.0005545176

Exercise 8 The previous computations solve the same problem solved before.

Read help(constrOptim) and explain in detail the meaning of a, b and all the other used

parameters.

max x2 + y 2 + xy + 3y − 1,

x,y

objective function and the domain D, i.e., three lines. Then argue looking at the graph that

the curve with highest level touches the set D. . .

Is the maximum higher than 10? [it can be done in many ways: directly, with a penalty

approach; graphically, even though it may be difficult to say whether it’s larger or smaller

than 10; switch to polar coordinates so that the domain is rectangular and use L-BFGS-B,

which allows bounds; study the restriction on the circle as a function of the angle θ]

u(x, y) = 2x(y − 2)

from the consumption of x units of beef and y units of bread. Assume that one unit of

beef and one unit of bread both cost p = 1 and that consumer’s income is 4.

Exercise 13 On the contour plot, draw the domain of the utility maximization problem

under budget constraints.

Exercise 14 Solve the problem. This can be done graphically and optimizing with penal-

ties. Can the problem be solved with constrOptim?

Exercise 15 Argue that utility is increasing in any north-east direction. Hence, the maxi-

mum will be on the x+y = 4 constraint. Define a function of one variable alone, embedding

the constraint into the function (it’s also called a restriction). Plot this function on one

variable in the right domain. Solve the 1-dim optimization problem. Does this procedure

gives you the same results as the previous ones?

23

4.2 State preference model

What you know about systems of linear equations can be used to understand a beautiful

model of a financial market, in which assets are traded in two periods of time. The model,

known as“State preference model”or“Asset pricing model”, is a powerful tool to understand

important ideas related to portfolios, replication of assets, risk-hedging and arbitrage.

If you are interested in the topic you should read http://economia.unipr.it/DOCENTI/

DEDONNO/docs/files/CastBrevissima.pdf. The text is sparkling, insightful and written

in Italian (a good reason to learn it!)

The model is as follows:

1. There are two periods, today and tomorrow, corresponding to t = 0 and t = 1. You

can buy or (short-)sell assets in t = 0 and will get the payoff in t = 1.

2. There is uncertainty over the future and m possible states can materialize tomorrow.

We often think that nature will pick the future state, as opposed to agents that do

not have the knowledge to predict what lies ahead. Hence, depending on the state of

nature at t = 1, the assets will produce different payoffs.

3. There are n available assets. Each of them is a vector of future payoffs y (at t = 1),

which can be purchased/sold at t = 0 for a price π. As there are n assets, it is

convenient to describe this market using a payoff matrix collecting all the column

vectors y1 , y2 , ..., yn :

. . .

.. .. ..

..

Y = y1 y2 . yn ,

.. .. ..

. . .

with the agreement that π1 is the cost of v1 , π2 is the cost of y2 and so on.

The description of the market is over and an example will help. Assume that there are

m = 3 possible states of nature tomorrow and n = 3 assets:

Bankaccount Stock Riskystock

Good 104 108 112

Y = Fair

104 102 106

Gloomy 104 100 93

Assume that all assets cost 100, i.e., π1 = π2 = π3 = 100. Rows are relative to future

states and you can think, say, that the first row is relative to the payoffs that you will get

if tomorrow the economy will be good. In this case, the payoffs are high and you will cash

104 if you are the holder of the first asset, 108 if you are the the holder of the second asset

and so on. Clearly, if the gloomy state is the prevailing one tomorrow, the owner of the

third asset will get only 93: taking into account that it was paid 100, it’s a bad loss!

Columns should be thought as assets. The first column y1 always pay 104 and, hence,

it’s like a (safe) bank account that pays the same amount regardless of the state. The other

24

two columns have fluctuating payoffs and behave like stocks that depend on the future state

of the economy. The third column is dubbed “risky stock” because payoffs are much more

variable than the second column.

Portfolio: a vector x = (x1 , x2 , ..., xn )0 ∈ Rn of the quantities held of each asset. In the

financial jargon, we refer sometimes to x as “the weights of the portfolio”. The payoff

of a portfolio x is Y x, as this product is exactly a weighted sum of the columns of

A, i.e., a weighted sum of the assets.

The cost of a portfolio is π1 x1 + π2 x2 + ... + πn xn , which can also be computed as the

row times column product π 0 x.

Replication: we say that a vector b can be replicated of there is a portfolio that has the

same payoffs. In other words, b can be replicated if there is x such that Ax = b.

Arbitrage: this is a situation in which there is a portfolio x with null payoff and non-null

price.2

An arbitrage is a free-lunch: you sell for a positive price something that always pays

zero and, hence, should cost zero. It’s a no-loss gamble that can be sold for a profit

but will produce no payoff for the buyer.

Consider the financial market with Y and p described above: the portfolio x0 =

(1/3, 1/3, 1/3) has a future payoff of

1

104 108 112 3

108

Y x = 104 102 106 13 = 104 .

1

104 100 93 3

99

1 1 1

π1 x1 + π2 x2 + π3 x3 = 100 + 100 + 100 = 100.

3 3 3

So, if you pay 100 for the portfolio that is mixing in equal proportions the original three

assets, you can get (108, 104, 99)0 . Let’s notice in passing the power of diversification:

mixing the assets has reduced the maximum losses (and the gains) in every state of the

world.

Can the asset b = (108, 104, 99)0 be replicated? Yes, we have just seen that there is

a portfolio, namely x = (1/3, 1/3, 1/3)0 such that Ax = b and the cost of the replicating

portfolio is 100.

Can the asset b = (112, 106, 104)0 be replicated? We look for the solution x of the

system Y x = b:

2

Arbitrage is a serious topic and, here, I’m just scratching the surface. We only cover one special type

of arbitrage and much more can (and should) be said on this matter, see http://economia.unipr.it/

DOCENTI/FAVERO/docs/files/CastBreve.pdf.

25

> Y <- matrix(c(104,104,104,108,102,100,112,106,92),3,3)

> b <- c(112,106,104)

> x <- solve(Y,b)

> x

[1] 0.03846154 1.00000000 0.00000000

The answer is positive: b can be replicated with x1 ≈ 0.038 of the first asset (putting a

small amount in the bank account) and x2 = 1 of the second asset. The cost of replication

(i.e., the cost of the replicating portfolio) is

> sum(c(100,100,100)*x)

[1] 103.8462

This is an interesting case as b keeps the better payoffs coming from the risky stock

but avoids the heavy losses that would be incurred if the risky stock is bought. Don’t you

think this is nice? Mmhh... where is the trick? Well, on the one hand there is no trick:

this can be done just solving linear systems with R. On the second hand, however, there

are other two equally interesting ways to interpret what happened here.

1. I pay 103.8462 to get b instead of 100 to buy the risky stock. Therefore, the additional

amount 3.8462 is the cost to insure my position in the bad state of the world: I have

paid 3.8462 to reduce my risk in one specific future occurrence.

2. The portfolio x blends a stock with a bond. Often investors reduces the variability of

their cashflows using the (certain) proceeds to compensate for poor results of other

investments in some states. Here, if the future state is “gloomy” then the stock is

making a null profit (it was paid 100, it gives 100) but the 0.03846154 units in the

bank account still pays 4% so that 0.03846154 · 104 = 4 are gained and “compensate”

for the outcome of the stock.

Food for thought: can you replicate any b? Hint: can you solve the system Y x = b

for any b, given this specific Y ? Can you imagine why such a market is called complete?

Next, we’ll discuss an arbitrage arising in a slightly modified market. Let Y be defined

as

104 108 112

Y x = 104 102 106 ,

104 100 104

and let the prices of the asset be p = (100, 100, 104). Observe that the third column of

Y is the asset b = (112, 106, 104)0 that we have discussed a few lines ago and recall that

b can be replicated using the first and second columns (assets) at a cost of about 103.85.

Now, something interesting goes on: the third asset can be sold on the market at a price

of 104 but the same replicated asset can be obtained (mixing the first two columns) at

the cost 103.85. Hence, a clever trader may start a money pump, creating replicates of b

26

for 103.85 each and selling them immediately for 104. For each round, he will make the

difference 104-103.85=0.15, regardless of the future state.

We now check the definition stated before: there should be a portfolio x with null

payoff vector and strictly positive price. Consider x = (−0.03846154, −1, 1), which should

be easily interpreted: 0.038 units of the first asset and 1 unit of the second are bought,

whereas 1 unit of the third asset (b, really) are sold. The payoff Y x and the cost of portfolio

x are given by:

> Y <- matrix(c(104,104,104,108,102,100,112,106,104),3,3)

> x <- c(-0.03846154,-1,1)

> pai <- c(100,100,104)

> Y %*% x

[,1]

[1,] -1.6e-07

[2,] -1.6e-07

[3,] -1.6e-07

> sum(pai*x)

[1] 0.153846

The trader cashes 0.15 e replicating and selling the asset and his profits can be inflated

doing the same operation many times (i.e., trading large volumes). It is likely that once

an arbitrage is spotted, traders will soon wipe it, as large volumes are affecting the prices

and, in particular, the cost of the third asset will tend to go back 103.846, closing the

opportunity for gains at no risk.

This final example illustrates how to deal with singular systems, which is “problematic”

in R... and in life! Let

105 120 110

105 110 102

Y =105 90 98 ,

105 80 90

and assume all the prices are 100, π = (100, 100, 100)0 . Observe that we have m = 4 states

of the world or rows and n = 3 assets or columns. Can the vector b = (119, 119, 95, 95)0

be replicated? Well, we just have to check whether there is a vector x such that Y x = b.

This is what you get trying to solve the problem with solve(Y,c(119,119,95,95)):

Error in solve.default(Y, c(119, 199, 95, 95), 4, 3):

singular matrix a in solve

The problem lies in the fact that solve can be used to solve square systems with an

invertible matrix of coefficients. Here, we have a 4 × 3 matrix, which is singular as finding

an inverse is impossible.

However, there is a way out. First, you may preliminary try to figure out if the system

has a solution using the Rouché-Capelli theorem and checking the ranks of Y and Y|b:

27

> Y <- matrix(c(105, 120, 110,

+ 105, 110, 102,

+ 105, 90, 98,

+ 105, 80, 90),4,3,byrow=T)

> b <- c(119,119,95,95)

> qr(Y)$rank

[1] 3

> qr(cbind(Y,b))$rank

[1] 3

In this case, the ranks of the complete and incomplete matrices are equal and hence

the system has ∞n−r = ∞3−3 = 1 solution. This solution can be worked out using the

generalized inverse of Y , computed using the function ginv of the package MASS.3

> require(MASS)

> giY <- ginv(Y) # compute the generalized inverse

> sol <- giY %*% b # multiply the gen inverse with b

> sol

[,1]

[1,] 1.4

[2,] 1.6

[3,] -2.0

We are ready to answer: yes, indeed (119, 119, 95, 95)0 can be replicated by the portfolio

x = (1.4, 1.6, −2.0)0 : buying 1.4 units of the first asset, 1.6 units of the second asset and

(short-)selling 2 units of the third asset.

So far, so good... but be aware! You can always compute the generalized inverse and

you can always multiply it with the b, getting a vector. This result, however, is a solution

of the original system, as in the previous case, only if the system has solutions; otherwise

the result is not a solution.

So, for safety, it is absolutely necessary to check whether what you have found is indeed

a solution:

[,1]

[1,] 119

[2,] 119

[3,] 95

[4,] 95

3

When you solve a system, say Ax = b and A is invertible, the solution can be obtained by x = A−1 b.

When the inverse A−1 does not exist, as in this case, you can instead use the generalized inverse. In a way,

if you know that there are solutions, you can always find one multiplying A+ b, where A+ is the inverse (if

it exists) or the generalized inverse (if the inverse doe not exist).

28

> b # the two are equal: sol is indeed a solution

[1] 119 119 95 95

To show that things can be different, consider the vector b = (119, 118, 117, 115)0 : can

it be replicated?

> b <- c(119,118,117,115)

> sol <- giY %*% b

> sol

[,1]

[1,] 0.94206349

[2,] 0.01666667

[3,] 0.16666667

> Y %*% sol

[,1]

[1,] 119.25

[2,] 117.75

[3,] 116.75

[4,] 115.25

We have used the giY matrix, which was computed before, and easily obtained sol.

You may think that sol replicates b but it a look at our check shows that this is false,

false, false, false. . .

To sum up:

1. take extra care with singular and rectangular systems.

Exercise 16 With the previous 4×3 matrix Y and prices, can the vector b = (119, 118, 117, 116)0

be replicated? If so, how much does it cost if no arbitrage is assumed?

I’m indebted to Elena “Cooper Supertramp” who first pointed out this issue.

Consider the following two assets (104, 104, 104)0 and (101, 116, 103)0 whose prices are

93.60 and 95.85, respectively. In such a market, how much would cost an asset whose

payoff is (102.71, 109.14, 103.57)0 ?

To see whether the third asset (column) is replicable, we check the ranks:

29

> Y <- matrix(c(104,104,104,101,116,103),3,2)

> Yb <- cbind(Y,c(102.71,109.14,103.57))

> qr(Y)$rank

[1] 2

> qr(Yb)$rank

[1] 3

Hey, wait a minute! I (paolop) know how b was built because I wrote the program

which created the exercise. The vector b was 4/7 times the first asset + 3/7 times the

second. In other words, this is b:

> 4/7*Y[,1]+3/7*Y[,2]

[1] 102.7143 109.1429 103.5714

As you see, the only difference between this vector and the b used before is that we rounded

to the second decimal digit. So far, so good. But let me know call bright the right b and

show that bright is indeed replicable:

> bright <- 4/7*Y[,1]+3/7*Y[,2]

> Ybright <- cbind(Y,bright)

> qr(Ybright)$rank # this is 2, as r(Y)

[1] 2

Hence, bright can be replicated but b cannot. And the reason is only due to rounding!

But there is something more going on. You see, you do not want rounding to stress your

life: rounding should be a good thing, allowing to forget lots of irrelevant figures (cifre)

and you do not want it to interfere with replication and arbitrage, which are very relevant

practical issues. There are two ways to understand this.

1. Ranks are delicate objects and are often computed looking at determinants. If the

rank of Y is 2, then there is a 2 by 2 matrix extracted from Y whose determinat is

non-zero and it is not possible to find a 3 by 3 matrix with non-null determinant; if

the rank of Yb is 3, then there is a 3 by 3 matrix extracted from Yb whose determinat

is non-zero. However, the determinant of Yb (the rounded version) is

> det(Yb)

[1] 4.16

which is relatively close to 0, if you compare 4.16 to the sizes of the elements of the

matrix that exceed 100. R decided that this was not zero (and, consequently, rank

is 3) but I showed you that this is only due to the effects of rounding to the second

decimal digit.

30

2. The second explanation looks more carefully at the way the rank is computed:

> qr(Yb)

$qr

[,1] [,2] [,3]

[1,] -180.1332840 -184.7520861 -1.821078e+02

[2,] 0.5773503 -11.5181017 -4.936577e+00

[3,] 0.5773503 -0.1382626 2.005019e-03

$rank

[1] 3

$qraux

[1] 1.577350269 1.990395604 0.002005019

$pivot

[1] 1 2 3

attr(,"class")

[1] "qr"

Notice that $qraux shows 3 numbers, two of which are clearly different from zero.

The third, well, is different from zero but it is small. The rank is exactly counting

how many of the elements of $qraux are non-zero. If you see 3 non-zero numbers,

that’s ok; but you should be aware that the 0.002005019 is telling you that the matrix

Yb is not “far” from being of rank 2.

Therefore, b is almost replicable, as can be seen using the generalized inverse:

> require(MASS)

> sol <- ginv(Y) %*% c(102.71,109.14,103.57)

> sol

[,1]

[1,] 0.571379

[2,] 0.428593

> Y %*% sol

[,1]

[1,] 102.7113

[2,] 109.1402

[3,] 103.5685

31

Take home message: arbitrages are ways to make money in which you aggressively

exploit a mispricing; if the mispricing is 0, then no arbitrage; but if you see a mispricing

that is close to zero, is that a true mispricing or just due to rounding? Well, you have to

check things very carefully as computers have limited accuracy (all computers, not only

mine or yours) and exploiting a tiny mispricing requires to trade huge quantities as you

cash a tiny amount for each round.

Whether or not you want to engage in massive trading to get 1/1000 of euro per unit is

also related to practical considerations (transaction costs and taxes, say, could wipe your

profits), However, ranks (and $qraux!) are here to tell you that linear algebra is important,

rounding exists and extra care and wisdom is needed when checking if something is zero

on a computer.

5 V week

Broadly speaking, this Section deals with simulation, a useful tool to solve or shed light

on complex problems. Simulation is based on the idea that you can run experiments to

gain insight and understand, to some extent, what’s going on. This may be the first step

to build a model or the only way to approximately solve a problem. In a way, simulation

may be thought as an example of the scientific method in action: given a “problem”, many

experiments may be run to figure out what a solution may look like. Galileo did that using

a ball and a plane, we can do the same performing experiments with the PC and R.

Experiments produce results that may be diverse and are affected by noise. This ran-

domness must be generated on a computer drawing what we call random numbers. Typi-

cally, simulation aims at answering two kind of questions:

question is tackled running many experiments and recording whether the experiment

“is successful”. Hence, after N experiments are run, you’ll have a sequence of N

1/0, standing for acceptable or unacceptable. Often you store the whole history of

experiments in a vector, say res for results, filled with 1 and 0. The reply to the

question, i.e., the probability we look for, is then just the number of 1’s in the vector

over the number of experiments N , or sum(res)/N.

• What’s the average outcome of my experiment? Here, we do not ask about the

(rate of) success of an experiment but require a numeric summary4 of the range of

possible outcomes of the experiments. Again, experiments are run and, this time,

the outcomes are recorded (as opposed to 1 for “success” and 0 for “failure”). Ul-

timately, you’ll have a vector res whose components record the numeric outcomes.

The aggregate measure we look for is now mean(res)

4

Such a summary of many results into one or two numerals is an art as well as a science. Here we have

decided that the summary we are looking for is the average, barely scratching the surface of the fascinating

methods and tools that can be used to analyze randomness, risk, uncertainty, dispersion, cost of making

the wrong decisions and so forth.

32

The following sections briefly describe the notions that are needed to answer to one of

the two kinds of questions we have just mentioned.

R has a vast number of functions that can be used to draw different random numbers. A

working definition of random number for this course is

ideal dice roll.

What I like of the previous “definition” is that it doesn’t really say much and indeed it

fosters further thinking: even you accept the idea of a numerical results, everything depends

on the specific experiment or experiments that are run. If you flip a coin you can have head

or tail (0 or 1) with 50-50% probability; if you roll two dice you can get 2, 3, ..., 12 with

different frequencies; if you count the fraction of students that correctly solved exercise 1

in an exam, you get numbers in the interval [0, 1], like 0.15 or 0.87 (15 or 87% of correct

answers to the first exercise, depending on the preparation of the class).

Hence, it should be clear that a random number, coming from whatever experiment,

needs to be described in terms of what can be observed and how likely is every possible

outcome. Without being fussy, I call distribution such a description.5

As mentioned before, R offers many way to draw random numbers from several distri-

butions (i.e., you can run several typical and often useful experiments and R tells you the

outcome). The functions have a common structure rname(n,parameters), where name

recall the distribution you are drawing from, n is the number of experiments that are run

and parameters provide additional details (on the experiment, if you wish).

means that any number in the interval is as likely as any other (uniform) to be

drawn

[1] 0.3279207 0.9545036 0.8895393

> runif(2,min=-3,max=3) # 2 random numbers in [-3,3]

[1] 1.1568204 0.8430409

> curve(dunif(x,-3,3),-4,4)

5

You’ll attend whole courses devoted to statistics and probability. On an entirely different note, there

is much to say about the possibility to create pseudo-random numbers using a computer, which is a pro-

grammable machine where nothing is left to chance, see http://plato.stanford.edu/entries/chance-

randomness/ or http://www.bbc.co.uk/programmes/b00x9xjb

33

0.15

0.10

dunif(x, −3, 3)

0.05

0.00

−4 −2 0 2 4

The picture shows the density, loosely speaking how likely it is for a number to

be drawn. The graph shows that all numbers in [−3, 3] have the same (constant)

likelihood and, hence, are treated equally in the sampling process. At the same time,

observe that there is no chance to obtain a number outside [-3,3], being null the

density.

tral” outcomes.

[1] 2.5283366 0.5490967 0.2382129 -1.0488931 1.2947633

> rnorm(5,mean=3) # 5 random numbers, more likely to be around 3

[1] 3.825540 2.944314 2.215618 2.266497 2.784135

> curve(dnorm(x,3),0,6)

34

0.4

0.3

dnorm(x, 3)

0.2

0.1

0.0

0 1 2 3 4 5 6

The picture of the normal density shows that the likelihood of sampling around 3

(the mean) is higher than elsewhere.

The normal density is possibly the most important distribution and there are funda-

mental reasons to use it in many models of random phenomena (indeed, many kinds

of experiments produce results where outcomes are concentrated around some mean,

with a likelihood that is nicely spread in a bell-shaped fashion. Students be aware,

there is a universe behind my words... but I’m intrepid enough to attempt to give

you a simple description!)

• There are plenty of other distributions: rbeta, rlnorm, rt, rlogi... draw from a

Beta, lognormal, Student’s t and logistic distributions, respectively.

The uniform distribution and the runif function can be used to simulate an event with a

given probability. Reflect on the sentence: “The event A has 40% probability to happen”.

You may agree that this means that attempting to generate A many times will be successful

40% of times, for a large number of attempts.6

6

If A is “Paolo prepares a good apple pie”, that’s exactly the situation: I can expect 4 out of 10 pies to

be perfect (or 40 out of 100...); the other pies have all some defect or another. Don’t worry too much: my

probability to prepare stunning tomato-and-basil spaghetti is close to 100%!

35

The following R lines produces 100 random uniform numbers in [0, 1] (that’s the default

for runif). How many of them do you expect to be smaller than 0.4?

> rn <- runif(100)

> rn[1:5] #show the first 5 out of 100

[1] 0.3688455 0.1524447 0.1388061 0.2330341 0.4659625

> (rn<0.4)[1:5] #show the first five

[1] TRUE TRUE TRUE TRUE FALSE

> sum(rn<0.4)

[1] 40

Well, runif uniformly samples numbers in [0, 1], without any bias in favor of any

1

specific number. Hence, falling in every one tenth of the unit interval has probability 10

and falling in [0, 0.4] is equivalent to fall in any one of the 4 parts [0, 0.1], [0.1, 0.2], [0.2, 0.3]

4

and [0.3, 0.4]. We conclude that the probability to fall in [0, 0.4] is 10 = 40% = 0.4.

This is not a coincidence and the argument can be easily adjusted to prove that

A uniform number in [0, 1] being smaller than p is an event occurring with

probability p.

In other words, if you want to generate an event whose probability is 0.65, type:

> runif(1)<0.65

[1] TRUE

You’ll get a true with probability 65%=0.65. For a confirmation, try the experiment,

say, 10000 times.

> experiments <- runif(10000)<0.65

> sum(experiments)

[1] 6548

We should get 0.65, shouldn’t we? And indeed 0.6548 is obtained, a result that is quite

close to the target 0.65.

5.3 sample

An insanely powerful R function is named sample. The functions does what it promises, it

samples from a set of values, with given probabilities, with or without replacement. The

syntax is

sample(x,size,replace=F,prob=NULL),

meaning that size elements are drawn from the set x. If the default options are left

unchanged, sampling is done without replacement and probabilities are uniform, i.e., all

elements of x have the same chances to be selected. A few examples will clarify the idea:

36

• sample(c(0,1),10) will give an error as it’s impossible to sample without replace-

ment 10 items from the set x containing only 2 elements.

> sample(c(0,1,5),10,replace=T)

[1] 1 1 5 1 0 0 1 0 5 5

The elements of x are drawn with the same probability, as prob is not specified and

this is the default behavior.

> sample(8)

[1] 7 3 4 1 6 8 2 5

Here (by default, it’s a convention) you sample 8 elements from the sequence of 8

numbers 1:8, with no replacement and uniform probability. It’s the R way to give a

random permutation of the first eight positive integers. In other and simpler words,

this produces a random shuffling of the eight numbers.

• You can also use sample to generate an event with probability p, see the previous

subsection.

> sample(c(1,0),1,prob=c(0.4,0.6))

[1] 1

> # 10 experiments, success prob is 0.4 (40%)

> sample(c(0,1),10,T,prob=c(0.4,0.6))

[1] 0 1 1 1 1 0 0 1 1 1

We can repeat the 10000 experiments run in the previous subsection using sample:

> sum(experiments)

[1] 6562

• sample lets you specify the experiment in that both the outcomes and the probability

can be provided:

37

> sample(c(-0.5,0,0.1,0.2,0.3),10,rep=T,prob=c(1,2,3,4,5))

[1] 0.1 0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.2 0.2

You can think to x as the set of possible yearly returns from an investment: you

can loose 50%, make 0 return or get 10, 20 or 30%, resembling a situation in which

typically some money can be made in most cases but there is the chance to get a

large loss.

You may notice that the vector c(1,2,3,4,5), used to define the probabilities is

strange as these should not exceed 1 (as well as being nonnegative). In this case,

R by default normalizes the vector assuming that this means that if -0.5 has some

probability to be drawn, then 0 has twice as much, 0.1 three times as much and so

on. Hence, the true probabilities used in this case are

> 1:5/sum(1:5)

[1] 0.06666667 0.13333333 0.20000000 0.26666667 0.33333333

1

from which you see, for instance, that a return of 30% will be obtained 3

of the times

and a large loss is experienced with probability 6.7%.

How much is the average return of the investment? There is a simple analytical way

to compute this figure but, in this course, we will approximate the answer using the

power available in R: we generate many random experiments and evaluate the average

outcome as follows

+ prob=c(1,2,3,4,5))

> mean(returns)

[1] 0.13973

Despite the chance of a sizeable loss, the average return is still a healthy 14% per

year.

1. The yearly payoff of a project whose cost is 100 is 1000, 100, 1 with probabilities

1 2 3

, , , respectively. What is the expected net gain?

6 6 6

Observe that this example is quite similar to the previous one about returns. Now

raw payoffs net of costs are given and some additional care must be used. We simulate

(many) payoffs as before and then subtract the (fixed) cost to get an answer.

38

> N <- 1000 # N experiments

> payoffs <- sample(c(1000,100,1),N, rep=T,

+ prob=c(1/6,2/6,3/6))

> mean(payoffs-100)

[1] 109.494

The average gain (i.e., average payoff less the fixed cost) is about 100.

2. You can buy a lottery in which with 50-50 probability you cash either a uniform

random number or a normal random number.

(b) What’s the average revenue?

(c) Which is the probability to get more than 0.5?

(d) Which is the probability to get less than -1?

The fist question may be answered at once: yes, a loss can be experienced when the

normal is selected (by chance) and a negative outcome is following (this is possible

with the normal). To answer the other questions we need to simulate the lottery

many times: begin drawing “uniform” or “normal” with the same probability; then

accordingly record the outcome, repeating the process N times.

> res <- rep(0,N) # vector to store results

> for(i in 1:N)

+ {

+ uorn <- sample(c("uniform","normal"),1) # uniform or normal

+ if( uorn=="uniform")

+ res[i] <- runif(1) else res[i] <- rnorm(1)

+ }

The vector res now contains 1000 simulation of the lottery. The average revenue is

mean(res)=0.25.

To estimate the probabilities to get more than 0.5 or less than 1, use the code:

[1] 0.399

> sum(res< -1)/N # about 8%

[1] 0.086

39

Please take care and remember to enter a space when typing res< -1 to avoid to

assign the variable res the value 1: <- is the assignment operator whereas < - means

“smaller than negative (something)”.

3. What’s the probability that a standard uniform is bigger that a uniform number in

[0.1,1.1]?

We can solve the problem by simulation, along the following lines: create many

standard uniforms and store them in the vector x; create many uniforms in [0.1,1.1]

and store them in y; compare x and y and count how often x>y (in order to do that x

and y must have the same number of elements); divide by the number of experiments.

> x <- runif(N) # N uniforms (in [0,1])

> y <- runif(N,0.1,1.1) # N uniforms in ([0.1,1.1])

> sum(x>y)

[1] 3991

> sum(x>y)/N

[1] 0.3991

picture

1.2

1.0

0.8

0.6

0

0.4

0.2

0.0

Index

40

Notice that whenever a point (x, y) is sampled, with x ∈ [0, 1] and y ∈ [0.1, 1.1],

it must be in the aquamarine region. As sampling is uniform, any point in the

aquamarine rectangle is as likely as any other to be selected.

We are interested in the cases in which x > y so draw the line y = x and look at the

portion of the rectangle that is below the line, in red:

> plot(0,t="n",xlim=c(0,1.2),ylim=c(0,1.2))

> polygon(c(0,0,1,1),c(0.1,1.1,1.1,0.1),col="aquamarine")

> abline(0,1)

> polygon(c(0.1,1,1),c(0.1,1,0.1),col="red")

1.2

1.0

0.8

0.6

0

0.4

0.2

0.0

Index

The probability to draw a red point in the aquamarine rectangle depends on the

2

respective areas and some trivial computations (verify yourself!) show it is 0.91 =

0.405 = 40.5%. Is this in agreement with the result obtained through simulation? If

this is not the case, try increasing N .

Exercise 17 How often is a uniform number in [-1,1] smaller than a uniform number

in [-0.5,1.5]?

4. A random walk can be used to represent several interesting situations in which one,

physically or figuratively, moves to the right or to the left in some random way.

41

5. How probable is that we have common birthdays (same day, but different years are

allowed)? Based on your experience, you may think that this is quite a rare event.

Mmhh, let’s work on this using simulation.

Assume we have a group of S people with their nice birthdays coded as a one day in

1:365, where I neglect leap years for simplicity. We’ll simulate N such groups, for a

large N , and check how often the same day is duplicated.

The simulation will be run as follows

• scan the group for duplicates (duplicated birthdays in that group); record 1 if

there are duplicates or 0 if there is none;

• repeat the previous steps N (i.e., many) times;

• finally, count the recorded number of 1’s (i.e., the number of groups in which at

least one duplicate - a common birthday! - is present) and divide by N .

> S <- 30

> res <- rep(0,N) # vector to store results 1/0

> for(i in 1:N) {

+ group <- sample(1:365,S,rep=T)

+ if(any(duplicated(group))) res[i] <- 1 else res[i] <- 0

+ }

> res[1:10]

[1] 1 0 1 0 1 0 1 1 0 1

> sum(res/N)

[1] 0.728

Let me do three things: explain this result; clarify some finer details on the code we

have used; and generalize the solution to the birthday problem.

First, the previous result shows that it’s indeed quite likely that a common birthday

is in a group with S = 30 persons: a joint party can be held in about 70% of cases

(to be fussy, if you pick a random group of 30 persons, with probability 70% there

will be a common birthday). The interesting thing is that one may intuitively have

guessed this probability to be about 30/365 which is much, much, much lower than

70%.

Second, the most important part of the simulation, as always, is inside the for loop,

where experiments are run. We repeat N times, for i taking values 1, 2, 3, ..., 999, 1000,

the same experiment: we create group sampling with replacement S birthdays in

42

1:365; then we ask R to check for duplicates using duplicated, a function telling

whether each element of group is duplicated. In other words, duplicated(group)

is a vector of TRUE/FALSE (remember, a boolean vector) saying for each person if

there is another guy/lady with the same birthday. Finally, if there is any TRUE in

duplicated(group) we record 1 in the ith component of res; else we record a 0.

Third, the fact that S = 30 yields a high probability of a common birthday prompts

to investigate a bit how this probability depends on S. The following code can be

used to plot this unexpected dependence (I’m leaving the code here for the brave

willing to dig into this example to build some programming experience: may the

force be with you! If you don’t care about the code then ok, I accept your faith...

but still you must study and understand the figure and its meaning).

+ res <- rep(0,N) # vector to store results 1/0

+ for(i in 1:N) {

+ group <- sample(1:365,S,rep=T)

+ if(any(duplicated(group))) res[i] <- 1 else res[i] <- 0

+ }

+ sum(res/N)

+ } # birthday is a function collecting the previous instructions

> s <- seq(5,250,by=5)

> plot(s,sapply(s,birthday),t="b",

+ xlab="S(ize of group)",ylab="Prob of common birthday")

43

1.0

●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●

0.8

●

Prob of common birthday

●

0.6

●

0.4

●

0.2

●

0.0

S(ize of group)

A look at the graph makes clear that the probability of a common birthday is quickly

increasing to 100% for large groups. The intuition that this is a rare event is (very!)

false for medium to large groups and, say, S = 50 is enough to have a probability of

birthday(50)=0.976. Good point to stop: party over, oops out of time... 7

zations and even human beings can (and sometimes must) be ranked along many

dimensions. As a prototype, consider the following situation: an item has three fea-

tures whose quality is measured using three (independent) random numbers, so that

1 means perfect and 0 means entirely defective. For instance, let (0.35,0.71,0.28)

be the three measures: along the first and third dimension, this item is well below

average (0.5) but the second feature scored an adequate value of 0.71, being 1.00 the

highest possible value.

Now assume that the item is deemed satisfactory if the smallest (among the three)

score is larger than 0.4. What’s the probability to have a satisfactory item?

> res <- rep(0,N) # vector to store the results

> for(i in 1:N){

7

Give http://www.youtube.com/watch?v=UjivDeA7Qu0 a chance. If you are more serious try http:

//en.wikipedia.org/wiki/Birthday_problem

44

+ x1 <- runif(1) # first score

+ x2 <- runif(1) # second score

+ x3 <- runif(1) # third score

+ minx <- min(c(x1,x2,x3))

+ if(minx>0.4) res[i] <- 1 else res[i] <- 0

+ }

As done several times, res keeps track of our experiments based on repeated simu-

lations of three uniform scores (x1, x2 and x3). We are ready to answer:

> mean(res)

[1] 0.22

In other words, less than 25% of the items will be cleared. Do you have any thought

on this assessment procedure? Can you think to any other way to provide a sound

judgement on multi-faceted items?

Exercise 18 Divers’ performances get several grades from several judges. Then the

highest and the smallest grade are discarded and only the medium grades contribute

to the final “mean” grade.

Rework the previous exercise discarding the smallest and biggest random numbers

(i.e., keep the medium one). What’s the probability that the medium score exceeds

0.4? Compare with the previous result, I personally find the difference quite striking.

Hint: the code is pretty much the same: if you have three numbers like 1, 4.5, 2.1

the medium one can be always obtained using sort(c(1,4.5,2.1))[2] ...

Suppose you’re on a game show, and you’re given the choice of three doors:

Behind one door is a car; behind the others, goats. You pick a door, say

No. 1, and the host, who knows what’s behind the doors, opens another

door, say No. 3, which has a goat. He then says to you, “Do you want to

pick door No. 2?” Is it to your advantage to switch your choice?

(in italian, http://it.wikipedia.org/wiki/Problema_di_Monty_Hall) where you

can read a lot about this problem. I hope you’ll enjoy looking through the solution

there but... this is a course on R and simulation!

We’ll tackle the problem simulating many times the two strategies of keeping the first

door or switching to the other one that is offered. I’ll call the two strategies “keep”

and “switch”. Without changing the problem, I assume that the car is always placed

45

Figure 1: The Monty Hall problem. A car is behind one of three doors. After the first

pick, another door with a goat is opened. Should you “keep” or “switch” to the other closed

door? Source: http://en.wikipedia.org/wiki/File:Monty_open_door.svg, in the pub-

lic domain.

behind the first door, but this is unknown to the player who honestly pick a random

door in the first place and cannot exploit the knowledge that the only goat-free door

is No. 1. This is only done to keep the R simpler, no trick, no cheating.

First, we simulate what’s the probability of winning the car with “keep”:

> res <- rep(0,N) # to store results

> for(i in 1:N){

+ pick <- sample(c(1,2,3),1) # pick door at random

+ # there is nothing else to do as you always "keep"

+ # no matter of what you are offered

+ if(pick==1) res[i] <- 1 else res[i] <- 0

+ }

So, “keep” gives you a sum(res)/N=0.323 probability to win the car (indeed, it’s 13 ).

Good, but this is not the end of the story: systematic “switch” can be simulated as

> res <- rep(0,N) # to store results

> for(i in 1:N){

+ pick <- sample(c(1,2,3),1) # pick door at random

+ if(pick==1) shown <- sample(c(2,3),1)

+ if(pick==2) shown <- 3

+ if(pick==3) shown <- 2

+ # shown is neither the first nor the door that was picked

46

+ # negative indexes take away from the vector

+ # "switch": the remaining is alway chosen

+ taken <- c(1,2,3)[c(-pick,-shown)]

+ if(taken==1) res[i] <- 1 else res[i] <- 0

+ }

> sum(res)/N

[1] 0.7

bit of coding have solved a famous puzzle that ignited furious discussions about the

optimal strategy to use: you should switch, no doubt, and get the car with probability

2

3

(unless you are very fond of goats).

Exercise 19 Almost every night on RAI 1 you could see “Affari tuoi”. Interestingly,

players win if they keep the right parcel (as it contains money) and they are often

offered the chance to “switch” by the cruel lady at the other end of the phone.

The situation is different but can you draw some conclusions on the optimal way to

behave after the Monty Hall problem discussion?

Exercise 20 The following code carries out “switch” in a different but equivalent

way. Can you explain why in some detail?

> res <- rep(0,N) # to store results

> for(i in 1:N){

+ pick <- sample(c(1,2,3),1) # pick door at random

+ shown <- sample(c(1,2,3),1,prob=c(0,1-(pick==2),1-(pick==3)))

+ # shown is neither the first nor the door that was picked

+ # negative indexes take away from the vector

+ # "switch": the remaining is alway chosen

+ taken <- c(1,2,3)[c(-pick,-shown)]

+ if(taken==1) res[i] <- 1 else res[i] <- 0

+ }

> sum(res)/N

[1] 0.682

47

Selected solutions

> curve(f,-3,3)

> grid(col=1)

> r1 <- uniroot(f,c(-2,-1))$root;r1

[1] -1.732051

> r2 <- uniroot(f,c(-0.5,0.5))$root;r2

[1] 0

> r3 <- uniroot(f,c(1,2))$root;r3

[1] 1.73205

> roots <- c(r1,r2,r3)

> points(roots,f(roots))

15

10

5

f(x)

● ● ●

0

−5

−10

−15

−3 −2 −1 0 1 2 3

be provided to polyroot starting from the constant up the highest degree (think, in

other words, that f (x) = 0 − 3x + 0x2 + x3 ).

48

> polyroot(c(0,-3,0,1))

[1] 0.000000+0i 1.732051-0i -1.732051+0i

The

p roots are shown in complex notation but it should be clear that they are 0,

± (3) ≈ ±1.732051.

No, the method cannot be used for df, which was defined numerically using a small

h and it is not a polynomial (even though it could be done with a little effort).

16. You have to solve the system using ginv and check the result.

> Y <- matrix(c(105, 120, 110,

+ 105, 110, 102,

+ 105, 90, 98,

+ 105, 80, 90),4,3,byrow=T)

> b <- c(119,118,117,116)

> sol <- ginv(Y) %*% b

> Y %*% sol

[,1]

[1,] 119

[2,] 118

[3,] 117

[4,] 116

The result is equal to b and we conclude that the asset can be replicated. In the

absence of arbitrage, the cost of replication should be equal to the cost of sol:

> sum(sol*c(100,100,100))

[1] 112.4603

The price is 112.460317460318 and any other price for b would generate an arbitrage.

This section collects a few examples of bizarre but insightful R code that I have been

learning from students.

1. On the use of optimize, credit to Filippo Toneatti. Consider the problem of min-

imizing f (x) = x4 − x2 + 4x2 + 4x − 1 on the interval [−5, 5]. This is simple stuff,

isn’t it?

49

> f<-function(x) -x**4-x**3+4*x**2+4*x-1

> curve(f,-5,5)

> optimize(f,c(-5,5))

$minimum

[1] -4.999944

$objective

[1] -420.9783

> f(-5)

[1] -421

> f(5)

[1] -631

0

−100

−200

−300

f(x)

−400

−500

−600

−4 −2 0 2 4

Well, clearly R got it wrong! It can immediately be seen from the graph that the

minimizer is x = 5 and the minimum value is −631. Care is needed when you use

algorithms and there are multiple local minima (in x = −5, x = 5 and something

is also happening in [−2, 2]). The algorithm used by R is a rather sophisticated one

but it doesn’t see the minimizer on the right and is misled into x = −5. A slightly

50

modified definition of f can be used to see where R computes the function in the

minimization process:

> optimize(f2,c(-5,5))

-1.18034

1.18034

-2.63932

-3.54102

-4.098301

-4.442719

-4.655581

-4.787138

-4.868444

-4.918694

-4.94975

-4.968944

-4.980806

-4.988138

-4.992669

-4.995469

-4.9972

-4.998269

-4.99893

-4.999339

-4.999591

-4.999747

-4.999844

-4.999904

-4.999944

-4.999944

$minimum

[1] -4.999944

$objective

[1] -420.9783

It is as if R explored up to 1.18 taking the decision to give up on the right. The first

lesson is that caution is always needed when using numerical methods. Use a graph

to check when you can, use a graph, use a graph...

But there is a second lesson for the curious: how can R be so dummy? Have a look

at the messy situation in [−2, 2] and think about the gravitational metaphor.

51

> curve(f,-2,2)

> points(1.18,f(1.18))

6

●

4

f(x)

2

0

−2

−2 −1 0 1 2

Clearly, a lot of things are going on, with lots of local extrema: R probes the function

at 1.18 but the ball then falls back on the left... unlucky!

More advanced stuff

You may need to optimize hundreds of functions and in this case you do not have

the possibility or time to manually look at all the graphs (this happened to me while

doing research: I had to maximize thousands of utility functions, one for every agent

in the model). Then you need a better algorithm:

+ a <- interval[1]

+ b <- interval[2]

+ q1 <- 3/4*a+1/4*b

+ q2 <- (a+b)/2

+ q3 <- 1/4*a+3/4*b

+ res <- NULL

+ res[[1]] <- optimize(f,c(a,q2),maximum=maximum,...)

+ res[[2]] <- optimize(f,c(q1,q3),maximum=maximum,...)

+ res[[3]] <- optimize(f,c(q2,b),maximum=maximum,...)

52

+ x <- c(res[[1]]$objective,res[[2]]$objective,res[[3]]$objective)

+ trueres <- if(maximum) which.max(x) else which.min(x)

+ res[[trueres]]

+ }

The previous code is doing a simple and practical things: it splits the given opti-

mization interval [a, b] in four equal parts [a, q1 , q2 , q3 , b] and optimize in each of the

three partly overlapping intervals [a, q2 ], [q1 , q3 ] and [q2 , b]. Then the best solutions

among the three that were found is provided. While there is no guarantee that this

will work for any f , it is much more difficult for this method to be fooled. See what

happens in the previous case: problem solved!

> myoptimize(f,c(-5,5))

$minimum

[1] 4.999922

$objective

[1] -630.9586

sume you have to maximize f (x, y) = −2x2 + 3xy − 4y 2 − 5x − 4y + 4 subject to the

constraints x ≤ 5, y ≤ 3 and 3x + y ≥ 5. After the definition of the function and of

the constraints, one standard way to tackle the problem is use constrOptim with a

simple initial points satisfying the constraints, with no further worries.

> fb <- function(x) f(x[1],x[2])

> A <- matrix(c(-1,0,3,0,-1,1),3,2)

> b <- c(-5,-3,5)

> constrOptim(c(4.9,2.9),fb,NULL,A,b,

+ control=list(fnscale=-1))

$par

[1] 1.000648 3.000000

$value

[1] 30

$counts

function gradient

278 NA

53

$convergence

[1] 0

$message

NULL

$outer.iterations

[1] 3

$barrier.value

[1] 0.0003544281

It looks as if (1, 3) is the maximizer but we didn’t look any graph and tried (only) a

single convenient starting point. Let’s check.

> y <- seq(-5,5,len=51)

> z <- outer(x,y,f)

> image(x,y,z);contour(x,y,z,add=T)

> abline(h=3);abline(v=5)

> abline(5,-3)

> points(4.9,2.9);points(1,3,pch=19)

80

60

4

40

● ●

20

−20

2

−40

0

−6

0

y

0

20

−2

40

60

−4

80

100

0 1 2 3 4 5

54

constrOptim started from the point in the upper right and ended up at the higher

level in (1, 3), the filled point. However, the graph makes clear that the function rises

again in the lower tip of the triangular domain (not shown in the previous graph).

Looking at the whole domain and starting from (4, −4), which is strictly feasible,

indeed produces the exact result.

> y <- seq(-10,5,len=51)

> z <- outer(x,y,f)

> image(x,y,z);contour(x,y,z,add=T)

> abline(h=3);abline(v=5)

> abline(5,-3)

> points(4,-4);points(5,-10,pch=19)

> constrOptim(c(4,-4),fb,NULL,A,b,

+ control=list(fnscale=-1))

$par

[1] 5.000000 -9.999999

$value

[1] 219

$counts

function gradient

502 NA

$convergence

[1] 0

$message

NULL

$outer.iterations

[1] 3

$barrier.value

[1] 0.002334433

55

4

50

2

−50

0

0

−2

y

50

−4

●

100

150

−6

200

250

−8

300

350

400 200

−10

0 1 2 3 4 5

This example stresses again and again the importance of selecting good starting

points for optimization algorithms (or use alternative techniques).

More advanced stuff

An interesting representation of the situation can be obtained as follows:

> f2v <- Vectorize(f2)

> z2 <- outer(x,y,f2v)

> contour(x,y,z2)

> abline(h=3);abline(v=5)

> abline(5,-3)

56

5

20 0

−20

−40

0

−6

0

0

−5

20

40

60

100 80

120

−10

0 1 2 3 4 5

I’m writing the code to show one way to plot something in some domain only. Here f2

is a function that returns NA (not available in R) if any of the constraints is negative;

otherwise, in the domain, the if sentence produces a 1 and f2(x,y) is the same as

f(x,y).

The standard outer is not working on f2 and you need to vectorize it to compute

the matrix of values of f2 for all values of x and y.

It is also possible to depict with different colors the starting points that erroneously

end up in the wrong solution and the ones who correctly discover (5, −10).

+ if(is.na(f2(x,y))) NA else {

+ r <- constrOptim(c(x,y),fb,NULL,A,b, control=list(fnscale=-1))

+ r$par[1]

+ }

+ }

> f3v <- Vectorize(f3)

> z3 <- outer(x,y,f3v)

> radiantorchid <- rgb(218/255,112/255,214/255)

> mimosa <- rgb(239/255,192/255,80/255)

> image(x,y,z3,col=c(radiantorchid,mimosa))

57

> abline(h=3);abline(v=5)

> abline(5,-3)

4

2

0

−2

y

−4

−6

−8

−10

0 1 2 3 4 5

Clearly, initiating constrOptim in the upper radiant orchid area yields (1, 3), whereas

a starting point in the lower8 mimosa area provides the correct maximizer (5, −10).

Scaramuzza. Assume you want to solve a standard assessment problem like

At a cost of 90 you can get one of the random future payoffs 200, 110, 4,

respectively. Assume payoffs are drawn uniformly. What’s the net expected

gain of the investment?

The possible answers were -0.4; 15.7; 9.3; 28.5. What’s wrong with the following

code?

> r <- sample(200,110,4)

> mean(r)-90

8

Radiant Orchid is Pantone color of the year 2014, https://www.pantone.com/pages/index.aspx?pg=

21129, Mimosa is Pantone color of the year 2009, https://www.pantone.com/pages/pantone/

pantone.aspx?pg=20634&ca=10. Why should we use the standard boring colors?

58

[1] 6.590909

The answer may superficially look plausible but it’s really a sort of R nonsense (and

unfortunately doen’t generate any error). Indeed, N was never used inside sample

– this is suspicious – and likewise replace was forgotten. Some defaults neverthe-

less allowed R to proceed interpreting sample(x, size, replace = FALSE, prob =

NULL) as follows

(b) size=110 and, hence, 110 numbers are sampled in 1:200

(c) if R expects a boolean value, either TRUE or FALSE, by convention a non-zero

number is interpreted as TRUE and zero as FALSE. In this case, replace is TRUE

Therefore, r is a vector of 110 random integers drawn with replacement from the 200

numbers in 1:200. This is entirely different from the intended simulation but the

point is that R showed no error message and one may believe he/she’s doing well...

type r to verify this statement.

The correct answer is of course given by:

> r <- sample(c(200,110,4),N,replace=T)

> mean(r)-90

[1] 15.4908

> str(r)

num [1:10000] 4 110 110 110 110 4 4 4 4 200 ...

The answer is correct and r is now a vector with 10000 components (not 110!).

Another striking example is related to the generation of 100 random normal numbers

with mean -4 and standard deviation 1. Is this working?

The answer is negative: v is a vector of 102 numbers, made of 100 random nor-

mal components and -4 and 1. The correct vector is instead generated by v <-

rnorm(100,-4,1). Indeed, even v <- c(rnorm(100,-4,1)) would work: the c()

operator is not needed, as rnorm already creates a vector, but it’s not harmful either.

The take home message is: be cautious with parentheses and be even more cautious

as no error messages are given in some cases.

59

Lessons from some past exams

Some exercises assigned in the past had particularly low grades (average below 1 when full

score is 2): this holds for exercises with set.seed and matrix/vector, optimization in 2d,

constrained optimization, arbitrage and assessment through simulation.

As students may have extra difficulties in solving such problems, these notes discuss

some of the issues related to the aforementioned exercises.

6.1 set.seed

One of the problem of this exercise may be related to the role of set.seed: this command is

used to reset the random number generator to a specific value and to generate a controlled

sequence of numbers.

If you to generate a random (3 × 3) matrix with normal numbers having mean 2 and

standard deviation 3, you can type

> m # show m

[,1] [,2] [,3]

[1,] 3.8762030 5.369065 -3.8001601

[2,] 0.9821722 4.733989 -0.4554212

[3,] 1.7026076 1.066936 -2.3283255

You will not be able to recover that specific matrix when you quit R, unless you save

your data in one of several ways, and if you type the same thing again the matrix will be

different

> m

[,1] [,2] [,3]

[1,] 2.8075892 2.2169911 0.3995738

[2,] -0.2920926 1.8072854 5.6722845

[3,] 1.6663722 0.1851708 5.5228658

Setting the seed with set.seed allows to recover the very same matrix at any time.

> set.seed(123)

> m <- matrix(rnorm(9,2,3),3,3)

> m

[,1] [,2] [,3]

[1,] 0.3185731 2.211525 3.38274862

[2,] 1.3094675 2.387863 -1.79518370

[3,] 6.6761249 7.145195 -0.06055856

60

Now, you may work with R and destroy or alter the variable m as, for instance, with

> m %*% b

[,1]

[1,] 5.224463

[2,] 1.111739

[3,] 8.955485

> m <- 1

> m # now m is a number

[1] 1

The original m is now lost forever in R memory, unless you reset the seed to the value

that was used immediately before m creation. Hence, to re-create it you have to type

> m <- matrix(rnorm(9,2,3),3,3) # this is the same m as before

> m

[,1] [,2] [,3]

[1,] 0.3185731 2.211525 3.38274862

[2,] 1.3094675 2.387863 -1.79518370

[3,] 6.6761249 7.145195 -0.06055856

It’s important to remember that computations must be done after the set.seed com-

mand and before you enter other commands involving random number generations or alter

m in any way.

In case, just re-enter the set.seed to re-initialize everything and type again the needed

commands.

6.2 Optimization

There is a video on optimization and it is a good idea if you revise the material at http:

//youtu.be/b_r7u4IgOhY (part A) and http://youtu.be/ViUy3BTuBwI (part B).

Let me spell out the major steps to solve an optimization problem:

3. use optim with an appropriate starting point (c(x0,y0) (if you want to maximize,

remember to set control=list(fnscale=-1);

61

One conceptual problem is that the solution depends on the the starting point (x0 , y0 ),

as suggested by the gravitational metaphor: minimizing resembles the fall of a ball on a

surface and, depending on where you initially drop the ball, different resting point or holes

may be reached. So, if you start from the wrong initial point, you may end up in the wrong

solution (more technically, you may end in a local minimizer instead of a global minimizer,

the same holding for maximizers).

How do you select the proper (x0 , y0 )? Easy, you draw a picture, inspect the contour

lines to see a good candidate (global!) minimizer/maximizer and start from there. Often

you need to add some contour lines to better spot where the best minimizer/maximizer is

located. This can be done with something like

contour(x,y,z, add=T, levels=c(oneValue,anotherValue)),

specifying appropriate levels depending on what you see.

As an example, consider the following function to be minimized

x2 + xy + y 2 − y 2 2

f (x, y) = + e−x −x + e−y −2y .

2

The standard contour graph produces

3

6 8

10

2

2

1

0

4

−1

6

−2

8 6

12 10

−3

−3 −2 −1 0 1 2 3

The minimizer appears to be in the orange area, which is too wide to get a single

meaningful candidate. The function was plotted by default using levels ...10, 8, 6, 4, 2 and

does not reach 0 (as there is no zero-contour). So, we add a couple of levels between 0 and

2:

62

> contour(x,y,z)

> contour(x,y,z,add=T,levels=c(1.5,1))

Clearly, the minimizer is inside one of the two small circular contours of level 1, depicted

below in aquamarine:

3

6 8

10

2

2

1.5

1

1

0

4

−1

6

−2

8 6

12 10

−3

−3 −2 −1 0 1 2 3

Using the grid command we see that good candidates are close to (0.5, 0.5) and

(−1.5, 1.5). We minimize starting from both points:

> optim(c(0.5,0.5),fb)

$par

[1] 0.5433489 0.7173225

$value

[1] 0.8158205

$counts

function gradient

49 NA

$convergence

[1] 0

63

$message

NULL

> optim(c(-1.5,1.5),fb)

$par

[1] -1.555756 1.332890

$value

[1] 0.8281947

$counts

function gradient

47 NA

$convergence

[1] 0

$message

NULL

The global minimizer is at (0.54, 0.72) and the global minimum is about 0.82. Observe

that (−1.56, 1.33) is only a local minimizer at which the value of the function is 0.83: this

result would be considered as wrong!

The following 3d graph may help visualizing the two minimizers and having the feeling

of what is going on (in my view, the contour plot does a much better job in describing the

situation but surfaces are suggestive).

> persp(x,y,z,theta=30,phi=-10,ticktype="deta")

64

15

10

z

3

−3 −2 12

−1 0 −10

1 2 −2 y

x 3 −3

Please, read also the following subsection for an example (it’s about constrained opti-

mization, but the problem of selecting of a proper starting point is exactly the same).

Finally, as far as interpretation of the results is concerned, recall that the minimizer/max-

imizer is a point, with an x and a y, that can be retrieved in the $par component of the

output; the objective or the optimal value or the value of the function at the optimum is

a single number, namely the value of f at the minimizer/maximizer ($value).

There is a video on the topic, see http://youtu.be/MCvz-c6UUkw. You must solve the

problem using the previous steps and defining the constraints using linear algebra, as

shown in the video.

It is of utmost importance to provide a good starting point strictly inside the feasible

region. Again, a good graph is needed to start the procedure. Consider the problem to

maximize f (x, y) = −x2 +4xy+5y 2 +3x−5y+3 with the constraints x ≤ 5, y ≤ 3, 3x+y ≥ 3.

“Good graph” means that you see the whole domain:

> x <- seq(-1,5,len=51)

> y <- seq(-15,3,len=51)

> z <- outer(x,y,f)

> image(x,y,z)

65

> contour(x,y,z,add=T)

> abline(v=5) # x<=5

> abline(h=3) # y<=3

> abline(3,-3) # y>=3-3x

0

0

100

−5

200

y

300

400

500

−10

600

700

800

900

1000

1100

−15

1200

−1 0 1 2 3 4 5

Some experimentation with the ranges of the sequences for x and y may be needed to

see the whole feasible region and you may need to try several graphs before succeeding. In

this case, the feasible region is the shaded triangle below.

66

0

0

100

−5

200

300

400

500

−10

600 ●

700

800

900

1000

1100

−15

1200

−1 0 1 2 3 4 5

First, you can easily select starting points strictly inside the region. Second, you see

from the contours that the maximizer is close to the lower right vertex of the triangle

(indeed, we already know in this case that the maximizer is at (5, −12), no additional

computation would be needed as the graph is fully revealing). We can solve the problem

picking as starting point (4.9, −10), the filled point in the graph.

> constrOptim(c(4.9,-10),fb,NULL,A,b,

+ control=list(fnscale=-1))

$par

[1] 5 -12

$value

[1] 533

$counts

function gradient

668 NA

$convergence

[1] 0

$message

67

NULL

$outer.iterations

[1] 3

$barrier.value

[1] 0.003062075

The maximum, obtained after the definition of proper A and b to describe the con-

straints, is f (5, −12) =533.

A very common error is to use another popular starting point like (4.9, 2.9) that, how-

ever, produces the (wrong!) local minimizer at the (wrong!) vertex (5, 3).

> constrOptim(c(4.9,2.9),fb,NULL,A,b,

+ control=list(fnscale=-1))

$par

[1] 5 3

$value

[1] 83

$counts

function gradient

336 NA

$convergence

[1] 0

$message

NULL

$outer.iterations

[1] 3

$barrier.value

[1] 0.003062073

68

## Гораздо больше, чем просто документы.

Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.

Отменить можно в любой момент.