Lecture 1: Introduction and Review of Prerequisite Concepts: DR Jay Lee Jay - Lee@unsw - Edu.au

Lecture 1: Introduction and Review of
Prerequisite Concepts
Dr Jay Lee
jay.lee@unsw.edu.au
UNSW
1 / 33
What is Econometrics?
I Ragnar Frisch: “It is the unification of all three [statistics,

economic theory, and mathematics] that is powerful. And it is
this unification that constitutes econometrics.”
I Econometric theory and Applied econometrics
2 / 33
Questions
I What is the causal relationship of interest? e.g. the effect of

class size on children’s test scores
I What would be the appropriate model?
I What econometric techniques could be used?
3 / 33
Econometric Terms and Notation
I Data, dataset, or sample: A set of repeated measurements

on a set of (random) variables.
I Observation: An element of data, dataset, or sample, often
corresponds to a specific economic unit.
I Upper case letters: X, Y, Z are random variables (vectors)
I Lower case letters: x, y, z are realizations or specific values
of X, Y, Z.
I Greek letters: α, β, γ, θ, σ are unknown parameters of an
econometric model.
4 / 33
Standard Data Structures
I Cross-section, Time series, Panel

I We focus on cross-section data and assume the data are
independent and identically distributed (iid).
I Random sample: the observations are iid.
I We view an observation is a realization of a random
variable. A complete knowledge on the probabilistic
nature of the random variables is called the population.
5 / 33
Econometric Software
I STATA: easy to use, popular among applied

econometricians
I MATLAB, R, GAUSS: more flexible, popular among
theoretical econometricians
I Fortran, C: gains in computational speed, but less popular
among econometricians
6 / 33
Review on Statistics
I Critical to understand econometric methods.

I References:
I Appendix of Hansen’s Econometrics lecture notes
I Casella and Berger (2002), Statistical Inference, Duxbury
I Hogg, Craig, and McKean (2004), Introduction to
Mathematical Statistics, Pearson
7 / 33
Probability Theory
I The set, S, of all possible outcomes of a particular

experiment is called the sample space for the experiment.
I Example:
1. Coin toss
2. Time remaining in soccer game
I An event is any collection of possible outcomes of an
experiment, that is, any subset of S (including S itself).
8 / 33
Probability Theory
I A collection of subsets of S is called a σ-field (aka σ-algebra

or Borel field), denoted by B, if it satisfies the following
three properties:
1. ∅ ∈ B,
2. If A ∈ B, then Ac ∈ B,
3. If A1 , A2 , ... ∈ B, then ∪∞
i=1 Ai ∈ B.
I Example: If S = {1, 2, 3}, then what is B?
9 / 33
Probability Theory
I (Kolmogorov Axioms) Given S and B, a probability function

is a function P with domain B that satisfies
1. P(A) > 0 for all A ∈ B,
2. P(S) = 1,
B are pairwise disjoint, then
3. If A1 , A2 , ... ∈P
∞
P(∪∞i=1 Ai ) = i=1 P(Ai ).
I If P is a probability function and A is any set in B, then
1. P(∅) = 0,
2. P(A) 6 1,
3. P(Ac ) = 1 − P(A).
10 / 33
Probability Theory
I If A and B are events in S, and P(B) > 0, then the conditional

probability of A given B, written P(A|B), is
P(A ∩ B)
P(A|B) = .
P(B)
I Example: Monty Hall problem (google it by yourself!)
I A collection of events A1 , ..., An are mutually independent if
for any subcollection Ai1 , ..., Aik , we have
 
[k Yk
P  Aij  = P(Aij ).
j=1 j=1
11 / 33
Probability Theory
I A random variable is a function from a sample space S into

the real numbers.
Experiment Random variable
Toss two dice X = sum of the numbers
Toss a coin 25 times X = number of heads in 25 tosses
Get a PhD X = having a PhD degree
I Example: Three coin tosses (X = number of heads)
12 / 33
Probability Theory
I The cumulative distribution function (cdf) of a random

variable X, denoted by FX (x), is defined by
FX (x) = PX (X 6 x), for all x.

I The function F(x) is a cdf if and only if the following three
conditions hold:
1. limx→−∞ F(x) = 0 and limx→∞ F(x) = 1.
2. F(x) is a nondecreasing function of x.
3. F(x) is right-continuous; that is, for every number x0 ,
limx↓x0 F(x) = F(x0 ).
13 / 33
Probability Theory
I A random variable X is continuous if FX (x) is a continuous

function of x. A random variable X is discrete if FX (x) is a
step function of x.
I The following two statements are equivalent:
1. The random variables X and Y are identically distributed.
2. FX (x) = FY (x) for every x.
14 / 33
Probability Theory
I The probability mass function (pmf) of a discrete random

variable X is given by
fX (x) = P(X = x) for all x.
I The probability density function (pdf), fX (x), of a continuous

random variable X is the function that satisfies
Zx
FX (x) = fX (t)dt for all x.
−∞
I A function fX (x) is a pdf (or pmf) of a random variable X if

and only if
fX (x) > 0 for all x.
1. P R∞
2. x fX (x) = 1 (pmf) or −∞ fX (x)dx = 1 (pdf).
15 / 33
Expectations
I The expected value or mean of a random variable X, denoted

by EX, is
R∞
EX = P −∞ x · fX (x)dxP if X is continuous
x x · f X (x) = x x · P(X = x) if X is discrete,
provided that the integral or sum exists.

I The expectation is a linear operator: that is, for any
constants a and b,
E(aX + b) = a · EX + b.
16 / 33
Expectations
I For each integer n, the nth moment of X is EXn . The nth

central moment of X is E(X − µ)n where µ = EX.
I The variance of a random variable X is its second central
moment, Var(X) = E(X − EX)2 . The positive square root of
Var(X) is the standard deviation of X.
I If X has a finite variance, then for any constants a and b,
Var(aX + b) = a2 · Var(X).
I Alternative variance formula: Var(X) = EX2 − (EX)2 .
17 / 33
Multiple Random Variables
Joint and Marginal Distributions
I Let (X, Y) be a discrete bivariate random vector. Then the

function f (x, y) from R2 into R defined by
f (x, y) = P(X = x, Y = y) is called the joint probability mass
function or joint pmf of (X, Y).
I Let (X, Y) be a discrete bivariate random vector with joint
pmf f (x, y). Then the marginal pmfs of X and Y,
fX (x) = P(X = x) and fY (y) = P(Y = y), are given by
X X
fX (x) = f (x, y) and fY (y) = f (x, y).
y x
18 / 33
I For any real-valued function g(x, y),

X
Eg(X, Y) = g(x, y)f (x, y).
(x,y)
I Example: Joint and marginal pmf for dice

I A function f (x, y) from R2 into R is called a joint probability
density function or joint pdf of the continuous random
vector (X, Y) if, for every A ⊂ R2 ,
Z Z
P((X, Y) ∈ A) = f (x, y)dxdy.
A
19 / 33
I The marginal pdfs of X and Y are given by

Z∞
fX (x) = f (x, y)dy, −∞ < x < ∞,
−∞
Z∞
fY (y) = f (x, y)dx, −∞ < y < ∞.
−∞
I The definitions of joint and marginal distributions for a

bivariate random vector can be easily generalized to
multivariate random vector.
20 / 33
Conditional Distributions and Independence
I Let (X, Y) be a discrete bivariate random vector with joint

pmf f (x, y) and marginal pmfs fX (x) and fY (y). For any x
such that P(X = x) = fX (x) > 0, the conditional pmf of Y
given that X = x is the function of y denoted by f (y|x) and
defined by
f (x, y)
f (y|x) = P(Y = y|X = x) = .
fX (x)
The conditional pmf of X given Y = y is defined similarly.
21 / 33
I Let (X, Y) be a continuous bivariate random vector with

joint pdf f (x, y) and marginal pdfs fX (x) and fY (y). For any
x such that fX (x) > 0, the conditional pdf of Y given that X = x
is the function of y denoted by f (y|x) and defined by
f (x, y)
f (y|x) = .
fX (x)
The conditional pdf of X given Y = y is defined similarly.
22 / 33
I If g(Y) is a function of Y, then the conditional expected value

of g(Y) given that X = x is denoted by E(g(Y)|x) and is given
by
X Z∞
E(g(Y)|x) = g(y)f (y|x) and E(g(Y)|x) = g(y)f (y|x)dy,
y −∞
in the discrete and continuous cases, respectively.

I The conditional variance of Y given X = x is given by
Var(Y|x) = E(Y2 |x) − (E(Y|x))2 .
23 / 33
I If X and Y are independent, then

1. f (x, y) = fX (x)fY (y).
2. P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B).
3. E(g(X)h(Y)) = Eg(X) · Eh(Y) where g(x) be a function of x
and h(y) be a function of y.
24 / 33
I Law of Iterated Expectations: If X and Y are any two

random variables, then
EX = E(E(X|Y)).
I Conditioning Theorem: For any function g(x),
E(g(X) · Y|X) = g(X) · E(Y|X).
25 / 33
Covariance and Correlation
I The covariance of X and Y is the number defined by
Cov(X, Y) = E((X − EX)(Y − EY)).
I The correlation of X and Y is the number defined by
Cov(X, Y)
ρXY = , |ρXY | 6 1.
σX σY
p p
where σX = Var(X) and σY = Var(Y).
26 / 33
Covariance and Correlation
I For any random variable X and Y,
Cov(X, Y) = EXY − EXEY.
I If X and Y are independent random variables, then

Cov(X, Y) = 0 and ρXY = 0.
I For any constants a and b,
Var(aX + bY) = a2 Var(X) + b2 Var(Y) + 2ab · Cov(X, Y).
27 / 33
Matrix Algebra: Notation
I A scalar a is a single number.
I A vector a is a k × 1 list of numbers:
 
a1
 a2 
a= . 
 
 .. 
ak
I A matrix A is a k × r rectangular array of numbers:

 
a11 a12 · · · a1r
 a21 a22 · · · a2r 
A= . ..  = a1 a2 · · · ar ,
 
. ..
 . . . 
ak1 ak2 · · · akr
where ai , i = 1, ..., r is k × 1 column vector.

28 / 33
Matrix Algebra: Notation (cont.)
I Transpose of a matrix: A 0 is obtained by flipping A on its

diagonal:  
a11 a21 · · · ak1
 a12 a22 · · · ak2 
0
A = .
 
.. .. 
 .. . . 
a1r a2r · · · akr
I A matrix is square if k = r.
I A square matrix is symmetric if A = A 0 .
I A square matrix is diagonal if the off-diagonal elements are
all zero (similar for upper/lower diagonal).
29 / 33
Matrix Algebra: Notation (cont.)
I The identity matrix is a diagonal (thus square) matrix

whose diagonal terms are all ones.
I The k × k identity matrix is denoted as
 
1 0 ··· 0
 0 1 ··· 0 
Ik =  . .
 
..
 .. ..

. 
0 0 ··· 1
30 / 33
Matrix Addition
I For matrices A and B with the same number of columns

and rows,
A + B = (aij + bij )
where aij , bij are elements of A and B, respectively.
I The communicative and associative laws hold:
A+B = B+A
A + (B + C) = (A + B) + C.
31 / 33
Matrix Multiplication
I For a scalar c, cA = Ac = (aij c).

I For k × 1 vectors a and b,
X
k
a 0 b = a1 b1 + a2 b2 + · · · + ak bk = aj bj = b 0 a.
j=1
I Two vectors a and b are orthogonal if a 0 b = 0.

I To multiply matrices A and B, say A × B, the number of
columns of A should be equal to the number of rows of B.
I AB =
32 / 33
Matrix Multiplication (cont.)
I Not commutative: AB , BA in general.

I Associative and distributive:
A(BC) = (AB)C
A(B + C) = AB + AC
I For the identity matrix and a k × r matrix A,
AIr = A, Ik A = A.
33 / 33

Lecture 1: Introduction and Review of Prerequisite Concepts: DR Jay Lee Jay - Lee@unsw - Edu.au

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Lecture 1: Introduction and Review of Prerequisite Concepts: DR Jay Lee Jay - Lee@unsw - Edu.au

Загружено:

Авторское право:

Доступные форматы

Lecture 1: Introduction and Review of

I Ragnar Frisch: “It is the unification of all three [statistics,

I What is the causal relationship of interest? e.g. the effect of

I Data, dataset, or sample: A set of repeated measurements

I Cross-section, Time series, Panel

I STATA: easy to use, popular among applied

I Critical to understand econometric methods.

I The set, S, of all possible outcomes of a particular

I A collection of subsets of S is called a σ-field (aka σ-algebra

I (Kolmogorov Axioms) Given S and B, a probability function

I If A and B are events in S, and P(B) > 0, then the conditional

I A random variable is a function from a sample space S into

I Example: Three coin tosses (X = number of heads)

I The cumulative distribution function (cdf) of a random

FX (x) = PX (X 6 x), for all x.

I A random variable X is continuous if FX (x) is a continuous

I The probability mass function (pmf) of a discrete random

fX (x) = P(X = x) for all x.

I The probability density function (pdf), fX (x), of a continuous

I A function fX (x) is a pdf (or pmf) of a random variable X if

I The expected value or mean of a random variable X, denoted

provided that the integral or sum exists.

I For each integer n, the nth moment of X is EXn . The nth

I Alternative variance formula: Var(X) = EX2 − (EX)2 .

I Let (X, Y) be a discrete bivariate random vector. Then the

I For any real-valued function g(x, y),

I Example: Joint and marginal pmf for dice

I The marginal pdfs of X and Y are given by

I The definitions of joint and marginal distributions for a

I Let (X, Y) be a discrete bivariate random vector with joint

The conditional pmf of X given Y = y is defined similarly.

I Let (X, Y) be a continuous bivariate random vector with

The conditional pdf of X given Y = y is defined similarly.

I If g(Y) is a function of Y, then the conditional expected value

in the discrete and continuous cases, respectively.

I If X and Y are independent, then

I Law of Iterated Expectations: If X and Y are any two

I Conditioning Theorem: For any function g(x),

E(g(X) · Y|X) = g(X) · E(Y|X).

I The covariance of X and Y is the number defined by

Cov(X, Y) = E((X − EX)(Y − EY)).

I The correlation of X and Y is the number defined by

I For any random variable X and Y,

Cov(X, Y) = EXY − EXEY.

I If X and Y are independent random variables, then

Var(aX + bY) = a2 Var(X) + b2 Var(Y) + 2ab · Cov(X, Y).

I A matrix A is a k × r rectangular array of numbers:

where ai , i = 1, ..., r is k × 1 column vector.

I Transpose of a matrix: A 0 is obtained by flipping A on its

I The identity matrix is a diagonal (thus square) matrix

I For matrices A and B with the same number of columns

I For a scalar c, cA = Ac = (aij c).

I Two vectors a and b are orthogonal if a 0 b = 0.

I Not commutative: AB , BA in general.

I For the identity matrix and a k × r matrix A,

Вам также может понравиться