Вы находитесь на странице: 1из 6

1

Useful mathematics, notation and denitions

Fact 1.1. If we multiply a column vector x by a matrix Amxn , this denes a function: Amn : Rn ! Rm Notation 1. A = [a(1)...a(n)], ai 2 Rm refers to columns 1 through n of matrix A. 2 3 a 1 6 . 7 n Notation 2. A = 4 . . 5, ai 2 R refers to the rows of A. a m 1 Proposition 1.1. D = AB and B is nonsingular =) rank (D) = rank (A) Proposition 1.2. Let Amn , then, Rank (A) {m, n} Proposition 1.3. D = AB =) Rank (D) min {Rank (A), Rank (B )} Proposition 1.4. Let Amn =) rank (A) = rank (AA0 ) = rank (A0 A).

Denition 1.1. C (A) = ha(1), ..., a(n)i refers to the space spanned by the columns of A.

1.1

Some Vector Calculus


k 1

For S ( ) = min(Y-X )(Y-x ) where Yn1 , Xnk and Fact 1.2. Derivative w.r.t. a column vector @S( ) B =B @ @ 0

@S( ) @ 1

. . .

1 C C A
S( ) , @@ k

@S( ) @ k

Fact 1.3. Derivative w.r.t. a row vector

Fact 1.4. Let

@ S ( ) @S( ) = @ 1 , @ 0 0 1( ) B 2( ) B ( ) : Rk ! Rm , then ( ) = B . . @ .
m(

... 1

where each element is a row vector as in fact 1.3 and @@ ( 0 ) is a m k matrix1 . 0 1 1( ) B 2( ) C B C Fact 1.5. Let ( ) : Rk ! Rm , ( ) = B C , be a vector valued mapping, then . . @ A . m( ) @ 0( ) @ 1( ) @ m( ) = ,..., @ @ @ where each element is a column vector as in fact 1.2 and
1 This

@ ( ) B =B @ @ 0

)
@

C C C is a vector valued mapping, then A


1( @ 0

. . .

1 C C A

m( 0

( )

is an k m matrix.

symbol

means transpose.

Fact 1.6. Let y = Ax where A does not depend on x, then

@y @ x0

= A where

@y @ x0

is an m n matrix.

@y @y 0 Fact 1.7. Let y = Ax where A does not depend on x, then @ x = A where @ x is an n m matrix. 0 1 1 B . C Proposition 1.5. Suppose y = Ax (), where = @ . . A, then if A does not depend on ,

@ y ( ) @y @x @ x ( ) = =A 0 0 0 @ @ x @ @0

the resulting matrix is m r.


( ) Note 1. A @ x @0 is not well dened, to see this :
0

, but Amn !

B @ x0 B =B 0 @ @

@ x1 @1 @ x1 @2

. . .

@ x2 @1 @ x2 @2

. . .

@ x3 @1 @ x3 @2

. . .

@ x1 @r

@ x2 @r

@ x3 @r

. . .

@ xn @1 @ xn @2

. . .

1 C C C A
r n

@ xn @r

Fact 1.8. y = x0 Bx = (x0 Bx)0 = x0 B 0 x = y 0 , since y is a scalar. Proposition 1.6. If Fact 1.8 holds =) we can always nd a decomposition A of B s.t A is symmetric matrix. Proof. Let A =
1 2

(B + B 0 ) be a symmetric matrix, then: y= 1 0 1 1 (x Bx + x0 B 0 x)) = [x0 (B + B 0 ) x] = (2x0 Ax) = y 0 2 2 2


@y @x

Proposition 1.7. y = x0 Ax =)

= 2x0 A = 2Ax

1.2

Geometry of Least Squares

y = x1 1 + ... + xk k implies that y lives in the linear space generated by hx1 , ..., xk i Rn , however when we add ", we account for all aspects of y that do no live in hx1 , ..., xk i. Denition 1.2. We call P a projection matrix if: 1. P : Rn ! Rn . 2. P =P0 3. P2 = P Fact 1.9. In our particular case P = X(X0 X)
1

Maximum likelihood method

Denition 2.1. Suppose that random variables X1 , ..., Xn have a joint density or frequency function f (x1 , x2 , ..., xn |). Given observed values Xi = xi , where i = 1, ..., n, the likelihood of as a function of x1 , x2 , ..., xn is dened as lik () = f (x1 , ..., xn |) 2 (1)

y-X=e

x2 X

x1

Figure 1: Geometry of Least Squares. Denition 2.2. The Maximum likelihood estimate, is that value of which maximizes 1 In the particular case where Xi are assumed to bi i.i.d. their joint density is the producto of the marginal desnsities and the likelihood is lik () =
n Y

i=1

f ( Xi | )

(2)

Rather than maximizing 1 itself, it is usually easier to maximize the natural logarithm (which is equivalent since the logarithm is a monotonic function). For an i.i.d. sample, the log likelihood is l ( ) =
n X i=1

ln [f (Xi |)]

(3)

2.1

Hypothesis testing

Denition 2.3. Type I error. The procedure may lead to rejection of the null hypothesis when it is true. Denition 2.4. Type II error. The procedure may fail to reject the null hypothesis when it is false or accepting H0 when it is false. State of nature Truth H0 Ha Accept H0 (p = 1 ) Correct Decision Type II error (p = ) Reject Ha Type I error (Size of the test or signicance level, denoted ) Prob. of rejecting H0 when it is false, it is the power of the test. = 1 Decision

Table 1: Type I and II error

Denition 2.5. Power of a test. The power of a test is the probability that it will correctly lead to rejection of a false null hypothesis: P ower = 1 = 1 P (typeII error) Some diculty may arise because Ha : < 0 only species a range, this in turn complicates by hand calculations. Example 2.1. Suppose that underHo = and Ha : E [D] = . Then the power of a given test is : ( ) = p (X > c| ), that is, the probability that we reject H0 when Ha is true. Remark 2.1. Notice in the above example we are testing at a specic value Ha : = . To see a case usch as Ha : < 0, see Casella-Berger 383.

2.2

Distributions used in hypothesis testing under normality


z= c= p n (x s
2

t [n
2

1] 1]

(4) (5)

(n

1) s2

[n

See page 1062 of Econometric Analysis 7th edition for an example.

2.3
2.3.1

Wald Test
One linear restriction

Also know as distance test or signicance test. Wk = k


k k

tn

(6)

The latter is the standard form of the test of a single restricion. In the case of one linear restriction for multiple parameters, denot c as a 1 k vector containing a restricion of the form c = d where is a k 1 vector. W =q c d
1 0 c

2 c (X 0 X )

tn

(7)

Notice the denominator in 7 is the standard deviation of the sum of the i s taking c into consideration. Finally, W is the distance from d into standard deviation units. 2.3.2 Multiple linear restrictions W = R W = R q 0 h i
2 q

For multiple linear restriction, the statistical test takes the form q
2

R (X 0 X )
1

R0
1

R q

0 h

R (X 0 X )
2

R0

(8)

Notice 8 follows the same pattern, except now it is in cuadratic terms, specically in square standard deviation units.

Denition 2.6. Let z1 2 (p) and z2 2 (m) be independent. Then their ratio, devided by their respective degrees of freedom is a random variable with distribution F (p, m) z1 /p F (p, m) z2 /m In order to build a test statistics for multiple linear restrictions, rst notice (n (n k) 2
2

k) 2 = u 0 u , hence (9)

u 0 u
2

2 n k

where 2 in the denominator is used to standarize our distribution. Using the RHS of 9 and 8 and dividing each one by their degrees of freedom, we obtain W / q 0 B R @ q
2

u 0 u = (n K )
1

2 2

(n
1

(q ) /q k ) /n

0 h

R (X 0 X )
2

R0

i
1

0 h

R (X 0 X ) q 2

R0

q C 1 2 n A q 2 n R q F (q, n k)

k 1

F (q, n

k)

K K

(10)

Asymptotic theory
lim P (|Xn X | > ") ! 0

Denition 3.1. Convergence in probability. Let X be a r.v. and {Xn } a sequence of random variables,
n!1

OR

n!1

lim P (|Xn

X | ") ! 1

It is denoted Xn ! X . Denition 3.2. Almost Sure Convergence. Let X be a r.v. and {Xn } a sequence of random variables, P lim |Xn X | > " ! 0 OR
n!1

P OE

n!1

lim |Xn

X| " ! 1 X | ") ! 1

It is denoted Xn ! X . Denition 3.3. Convergence in rth mean. Let X be a r.v. and {Xn } a sequence of random variables, E | Xn X|
r n!1

a.s.

P (sup |Xn

! 0

Denition 3.4. A sequence {Xn } is at most of order n in probability, denoted Op n a non-stochatic O (1) (bounded) sequence {an } s.t. Xn n i.e. limP
n

if there exists

an ! 0 >" !0 if

Xn n

an

Denition 3.5. A sequence {Xn } is of order smaller than n in probability, denoted op n Xn p !0 n i.e. limP
n

Xn >" n

!0

Denition 3.6. A sequence of random variables {Xn } is stochastically bounded, denoted O (1), if for any arbitrary small ", there exists a nite M s.t. P (|Xn > m|) " Note 2. See Wooldridge page 35.

Вам также может понравиться