Вы находитесь на странице: 1из 33

Lecture 1: Introduction and Review of

Prerequisite Concepts

Dr Jay Lee
jay.lee@unsw.edu.au

UNSW

1 / 33
What is Econometrics?

I Ragnar Frisch: “It is the unification of all three [statistics,


economic theory, and mathematics] that is powerful. And it is
this unification that constitutes econometrics.”
I Econometric theory and Applied econometrics

2 / 33
Questions

I What is the causal relationship of interest? e.g. the effect of


class size on children’s test scores
I What would be the appropriate model?
I What econometric techniques could be used?

3 / 33
Econometric Terms and Notation

I Data, dataset, or sample: A set of repeated measurements


on a set of (random) variables.
I Observation: An element of data, dataset, or sample, often
corresponds to a specific economic unit.
I Upper case letters: X, Y, Z are random variables (vectors)
I Lower case letters: x, y, z are realizations or specific values
of X, Y, Z.
I Greek letters: α, β, γ, θ, σ are unknown parameters of an
econometric model.

4 / 33
Standard Data Structures

I Cross-section, Time series, Panel


I We focus on cross-section data and assume the data are
independent and identically distributed (iid).
I Random sample: the observations are iid.
I We view an observation is a realization of a random
variable. A complete knowledge on the probabilistic
nature of the random variables is called the population.

5 / 33
Econometric Software

I STATA: easy to use, popular among applied


econometricians
I MATLAB, R, GAUSS: more flexible, popular among
theoretical econometricians
I Fortran, C: gains in computational speed, but less popular
among econometricians

6 / 33
Review on Statistics

I Critical to understand econometric methods.


I References:
I Appendix of Hansen’s Econometrics lecture notes
I Casella and Berger (2002), Statistical Inference, Duxbury
I Hogg, Craig, and McKean (2004), Introduction to
Mathematical Statistics, Pearson

7 / 33
Probability Theory

I The set, S, of all possible outcomes of a particular


experiment is called the sample space for the experiment.
I Example:
1. Coin toss
2. Time remaining in soccer game
I An event is any collection of possible outcomes of an
experiment, that is, any subset of S (including S itself).

8 / 33
Probability Theory

I A collection of subsets of S is called a σ-field (aka σ-algebra


or Borel field), denoted by B, if it satisfies the following
three properties:
1. ∅ ∈ B,
2. If A ∈ B, then Ac ∈ B,
3. If A1 , A2 , ... ∈ B, then ∪∞
i=1 Ai ∈ B.
I Example: If S = {1, 2, 3}, then what is B?

9 / 33
Probability Theory

I (Kolmogorov Axioms) Given S and B, a probability function


is a function P with domain B that satisfies
1. P(A) > 0 for all A ∈ B,
2. P(S) = 1,
B are pairwise disjoint, then
3. If A1 , A2 , ... ∈P

P(∪∞i=1 Ai ) = i=1 P(Ai ).
I If P is a probability function and A is any set in B, then
1. P(∅) = 0,
2. P(A) 6 1,
3. P(Ac ) = 1 − P(A).

10 / 33
Probability Theory

I If A and B are events in S, and P(B) > 0, then the conditional


probability of A given B, written P(A|B), is

P(A ∩ B)
P(A|B) = .
P(B)
I Example: Monty Hall problem (google it by yourself!)
I A collection of events A1 , ..., An are mutually independent if
for any subcollection Ai1 , ..., Aik , we have
 
[k Yk
P  Aij  = P(Aij ).
j=1 j=1

11 / 33
Probability Theory

I A random variable is a function from a sample space S into


the real numbers.
Experiment Random variable
Toss two dice X = sum of the numbers
Toss a coin 25 times X = number of heads in 25 tosses
Get a PhD X = having a PhD degree

I Example: Three coin tosses (X = number of heads)

12 / 33
Probability Theory

I The cumulative distribution function (cdf) of a random


variable X, denoted by FX (x), is defined by

FX (x) = PX (X 6 x), for all x.


I The function F(x) is a cdf if and only if the following three
conditions hold:
1. limx→−∞ F(x) = 0 and limx→∞ F(x) = 1.
2. F(x) is a nondecreasing function of x.
3. F(x) is right-continuous; that is, for every number x0 ,
limx↓x0 F(x) = F(x0 ).

13 / 33
Probability Theory

I A random variable X is continuous if FX (x) is a continuous


function of x. A random variable X is discrete if FX (x) is a
step function of x.
I The following two statements are equivalent:
1. The random variables X and Y are identically distributed.
2. FX (x) = FY (x) for every x.

14 / 33
Probability Theory

I The probability mass function (pmf) of a discrete random


variable X is given by

fX (x) = P(X = x) for all x.

I The probability density function (pdf), fX (x), of a continuous


random variable X is the function that satisfies
Zx
FX (x) = fX (t)dt for all x.
−∞

I A function fX (x) is a pdf (or pmf) of a random variable X if


and only if
fX (x) > 0 for all x.
1. P R∞
2. x fX (x) = 1 (pmf) or −∞ fX (x)dx = 1 (pdf).

15 / 33
Expectations

I The expected value or mean of a random variable X, denoted


by EX, is
R∞
EX = P −∞ x · fX (x)dxP if X is continuous
x x · f X (x) = x x · P(X = x) if X is discrete,

provided that the integral or sum exists.


I The expectation is a linear operator: that is, for any
constants a and b,

E(aX + b) = a · EX + b.

16 / 33
Expectations

I For each integer n, the nth moment of X is EXn . The nth


central moment of X is E(X − µ)n where µ = EX.
I The variance of a random variable X is its second central
moment, Var(X) = E(X − EX)2 . The positive square root of
Var(X) is the standard deviation of X.
I If X has a finite variance, then for any constants a and b,

Var(aX + b) = a2 · Var(X).

I Alternative variance formula: Var(X) = EX2 − (EX)2 .

17 / 33
Multiple Random Variables
Joint and Marginal Distributions

I Let (X, Y) be a discrete bivariate random vector. Then the


function f (x, y) from R2 into R defined by
f (x, y) = P(X = x, Y = y) is called the joint probability mass
function or joint pmf of (X, Y).
I Let (X, Y) be a discrete bivariate random vector with joint
pmf f (x, y). Then the marginal pmfs of X and Y,
fX (x) = P(X = x) and fY (y) = P(Y = y), are given by
X X
fX (x) = f (x, y) and fY (y) = f (x, y).
y x

18 / 33
Multiple Random Variables
Joint and Marginal Distributions

I For any real-valued function g(x, y),


X
Eg(X, Y) = g(x, y)f (x, y).
(x,y)

I Example: Joint and marginal pmf for dice


I A function f (x, y) from R2 into R is called a joint probability
density function or joint pdf of the continuous random
vector (X, Y) if, for every A ⊂ R2 ,
Z Z
P((X, Y) ∈ A) = f (x, y)dxdy.
A

19 / 33
Multiple Random Variables
Joint and Marginal Distributions

I The marginal pdfs of X and Y are given by


Z∞
fX (x) = f (x, y)dy, −∞ < x < ∞,
−∞
Z∞
fY (y) = f (x, y)dx, −∞ < y < ∞.
−∞

I The definitions of joint and marginal distributions for a


bivariate random vector can be easily generalized to
multivariate random vector.

20 / 33
Multiple Random Variables
Conditional Distributions and Independence

I Let (X, Y) be a discrete bivariate random vector with joint


pmf f (x, y) and marginal pmfs fX (x) and fY (y). For any x
such that P(X = x) = fX (x) > 0, the conditional pmf of Y
given that X = x is the function of y denoted by f (y|x) and
defined by

f (x, y)
f (y|x) = P(Y = y|X = x) = .
fX (x)

The conditional pmf of X given Y = y is defined similarly.

21 / 33
Multiple Random Variables
Conditional Distributions and Independence

I Let (X, Y) be a continuous bivariate random vector with


joint pdf f (x, y) and marginal pdfs fX (x) and fY (y). For any
x such that fX (x) > 0, the conditional pdf of Y given that X = x
is the function of y denoted by f (y|x) and defined by

f (x, y)
f (y|x) = .
fX (x)

The conditional pdf of X given Y = y is defined similarly.

22 / 33
Multiple Random Variables
Conditional Distributions and Independence

I If g(Y) is a function of Y, then the conditional expected value


of g(Y) given that X = x is denoted by E(g(Y)|x) and is given
by
X Z∞
E(g(Y)|x) = g(y)f (y|x) and E(g(Y)|x) = g(y)f (y|x)dy,
y −∞

in the discrete and continuous cases, respectively.


I The conditional variance of Y given X = x is given by
Var(Y|x) = E(Y2 |x) − (E(Y|x))2 .

23 / 33
Multiple Random Variables
Conditional Distributions and Independence

I If X and Y are independent, then


1. f (x, y) = fX (x)fY (y).
2. P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B).
3. E(g(X)h(Y)) = Eg(X) · Eh(Y) where g(x) be a function of x
and h(y) be a function of y.

24 / 33
Multiple Random Variables
Conditional Distributions and Independence

I Law of Iterated Expectations: If X and Y are any two


random variables, then

EX = E(E(X|Y)).

I Conditioning Theorem: For any function g(x),

E(g(X) · Y|X) = g(X) · E(Y|X).

25 / 33
Covariance and Correlation

I The covariance of X and Y is the number defined by

Cov(X, Y) = E((X − EX)(Y − EY)).

I The correlation of X and Y is the number defined by

Cov(X, Y)
ρXY = , |ρXY | 6 1.
σX σY
p p
where σX = Var(X) and σY = Var(Y).

26 / 33
Covariance and Correlation

I For any random variable X and Y,

Cov(X, Y) = EXY − EXEY.

I If X and Y are independent random variables, then


Cov(X, Y) = 0 and ρXY = 0.
I For any constants a and b,

Var(aX + bY) = a2 Var(X) + b2 Var(Y) + 2ab · Cov(X, Y).

27 / 33
Matrix Algebra: Notation
I A scalar a is a single number.
I A vector a is a k × 1 list of numbers:
 
a1
 a2 
a= . 
 
 .. 
ak

I A matrix A is a k × r rectangular array of numbers:


 
a11 a12 · · · a1r
 a21 a22 · · · a2r   
A= . ..  = a1 a2 · · · ar ,
 
. ..
 . . . 
ak1 ak2 · · · akr

where ai , i = 1, ..., r is k × 1 column vector.


28 / 33
Matrix Algebra: Notation (cont.)

I Transpose of a matrix: A 0 is obtained by flipping A on its


diagonal:  
a11 a21 · · · ak1
 a12 a22 · · · ak2 
0
A = .
 
.. .. 
 .. . . 
a1r a2r · · · akr
I A matrix is square if k = r.
I A square matrix is symmetric if A = A 0 .
I A square matrix is diagonal if the off-diagonal elements are
all zero (similar for upper/lower diagonal).

29 / 33
Matrix Algebra: Notation (cont.)

I The identity matrix is a diagonal (thus square) matrix


whose diagonal terms are all ones.
I The k × k identity matrix is denoted as
 
1 0 ··· 0
 0 1 ··· 0 
Ik =  . .
 
..
 .. ..

. 
0 0 ··· 1

30 / 33
Matrix Addition

I For matrices A and B with the same number of columns


and rows,
A + B = (aij + bij )
where aij , bij are elements of A and B, respectively.
I The communicative and associative laws hold:

A+B = B+A
A + (B + C) = (A + B) + C.

31 / 33
Matrix Multiplication

I For a scalar c, cA = Ac = (aij c).


I For k × 1 vectors a and b,

X
k
a 0 b = a1 b1 + a2 b2 + · · · + ak bk = aj bj = b 0 a.
j=1

I Two vectors a and b are orthogonal if a 0 b = 0.


I To multiply matrices A and B, say A × B, the number of
columns of A should be equal to the number of rows of B.
I AB =

32 / 33
Matrix Multiplication (cont.)

I Not commutative: AB , BA in general.


I Associative and distributive:

A(BC) = (AB)C
A(B + C) = AB + AC

I For the identity matrix and a k × r matrix A,

AIr = A, Ik A = A.

33 / 33

Вам также может понравиться