Вы находитесь на странице: 1из 38

Economics 420

Introduction to Econometrics
Professor Woodbury
Fall Semester 2015
Experiments and Basic Statistics
1. The probability framework for statistical
inference
2. Estimation
3. Hypothesis testing
4. Confidence intervals

1. The probability framework for statistical


inference
a. Population, random variable, distribution
b. Moments of a distribution mean, variance, standard deviation
c.Two random variables joint distributions, covariance, correlation
d. Conditional distributions and conditional means
e. Distribution of a sample of data drawn randomly from a population
Y1, ,Yn

a. Population, random variable, distribution


Population
The group or collection of all possible entities of interest (for
example, school districts)
We will think of populations as infinitely large ( is an
approximation to very big)

Random variable Y
Numerical summary of a random outcome outcome of a
coin toss, weight of an individual chosen at random from a
group of people, a school districts average test score or its
STR
The outcome is subject to chance each possible outcome
of a random variable is associated with some probability of
occurrence
A random variable may be discrete (finite number of
outcomes, like the roll of a die) or continuous (takes any real
value with probability 0)

Population distribution (of probability density function)

Every RV is associated with a probability function a function


that gives probabilities of different values of Y that occur in the
population
For example, for a discrete RV like a coin toss, the possible
outcomes for the number of heads are y = 0, 1, and the
associated probabilities are 1/2 and 1/2
So the probability density function can be written
f(y) = 1/2, y = 0, 1

In general, for a discrete random variable Y, the probability


distribution can be written:

pj = Pr(Y = yj), for j = 1, 2, 3, ..., k

A probability density function summarizes what we know about


the outcomes of Y:

f(yj) = pj , for j = 1, 2, 3, ..., k

For example, for the roll of a die, this would be:


f(1) = 1/6

f(2) = 1/6

f(3) = 1/6
f(4) = 1/6

f(5) = 1/6

f(6) = 1/6
6

Or if you want to to be better organized:


Outcome

Y=1

Y=2

Y=3

Y=4

Y=5

Y=6

Probability
f(yi)

0.167

0.167

0.167

0.167

0.167

0.167

Note: We will be using mainly discrete random variables think


of a histogram

Cumulative distribution function


The cumulative probability distribution is the probability that the
random variable is less than or equal to a particular value:

F(y) = Pr(Y y)
Outcome

Y1

Y2

Y3

Y4

Y5

Y6

Cumulative
probability
F(yi)

0.167

0.333

0.50

0.667

0.833

1.0

The cumulative probability is more useful with continuous


distributions than with discrete distributions, but the idea is the
same (more later)
8

b. Moments of a population distribution mean,


variance, standard deviation
We want to get a better understanding of the distribution of a
random variable
Specifically, what would we like to know?
Where is the distribution centered?
How dispersed it is?
Is it symmetric or skewed?
Does it have a lot of outliers?

mean = expected value (expectation) of Y


= E(Y)
= Y

You can think of this as the long-run average value of Y over


repeated realizations of Y
This is our measure of central tendency

10

Key
Concept
2.1
For a discrete
random
variable (one
that takes on a finite or
discrete set of values, like the outcome of rolling a die 1, 2, 3, 4,
5, 6):

Copyright 2003 by Pearson Education, Inc.

2-3

11

Moments (continued)
variance = E(Y Y)2
2

= Y

= measure of the squared spread of the distribution


standard deviation =
= Y
= measure of the spread of a distribution
These are both measures of dispersion why might they be
interesting?

12

Again for a discrete random variable:

Copyright 2003 by Pearson Education, Inc.

2-4

13

Exercise
Let Y be the number of heads that occur when two coins are
tossed.
a. What is the probability density function of Y?
b. What is the cumulative distribution of Y?
c. What are the mean, variance, and standard deviation of Y?

14

a. What is the probability density function of Y?


Outcome (number
of heads)

Y=0

Y=1

Y=2

Probability
f(yi)

0.25

0.50

0.25

b. What is the cumulative distribution of Y?


Outcome (number
of heads)

Y0

Y1

Y2

Cumulative
probability F(yi)

0.25

0.75

1.0

15

c. What are the mean, variance, and standard deviation of Y?


Y = E(Y) = (00.25) + (10.50) + (20.25)=1.00
var(Y) = [(01)2 0.25] + [(11)2 0.50] + [(21)2 0.25]

= (1 0.25) + (0 0.50) + (1 0.25) = 0.50

Y = var(Y) = 0.50 = 0.707

16

APPENDICES

Notes on continuous distributions


F I G U R E B . 2 The probability that X lies between the points a and b.
f(x)

When computing probabilities for continuous random variables, it is easiest to work


with the cumulative distribution function (cdf ). If X is any random variable, then its cdf
is defined for any real number x by
F(x)  P(X  x).

[B.6]

Cengage Learning, 2013

726

17

Moments (continued)
skewness =





= measure of asymmetry of a distribution

skewness = 0 implies the distribution is symmetric


skewness > 0 implies the distribution has a long right tail (is
skewed right)
skewness < 0 implies the distribution has a long left tail (is
skewed left)
18

Moments (continued)
kurtosis =





= measure of mass in tails
= measure of the probability of large values

kurtosis = 3 implies a normal distribution


kurtosis > 3 implies heavy tails (a leptokurtic distribution)
Why might we care about the amount of mass in the tails of a
distribution?

19

Some examples

Both of these distributions are symmetric (skewness = 0)


But one is normal, whereas the other has thick tails

20

Both of these distributions are skewed


Which is skewed right? Which is skewed left?

1/2/3-22

21

c. Two random variables joint distributions,


covariance, correlation

Random variables X and Y have a joint distribution


The covariance between X and Y is

cov(X,Y) = E[(X X)(Y Y)] = XY

The covariance is a measure of the linear association between


X and Y
Its units are (units of X) (units of Y), which has no natural
interpretation (thats why we have the correlation coefficient)
cov(X,Y) > 0 means a positive relation between X and Y
If X and Y are independently distributed, then cov(X,Y) = 0
(but not vice versa!)
22

The estimator of the covariance is ...



XY

= (Xi X)(Yi Y ) / (n 1)
i1

This is called an estimator because it is a formula that we


use to obtain an estimate of a parameter
A parameter is a population statistic, which in general we
do not know that is why we take a random sample of the
population and try to learn (or infer something) about that
population
In this case, the parameter we are trying to estimate is
XY = cov(X,Y) = E[(X X)(Y Y)]
and XY is the estimator (or formula)
23

Why (and how) does this estimator work?


Think about what you are doing when you calculate a sample
variance using the estimator XY

You are calculating the product of deviations from the mean



(Xi X)(Yi Y )

for each observation in the sample, then taking the average of


those products
In the figure, the products for observations in quadrants I and
III are always positive, whereas the products for observations
in quadrants II and IV are always negative see the examples
in quadrants I and II

24

So for a variable with a distribution like the one in the graph,


the negative products outweigh the positive products, and

XY will be negative (which makes sense)

25

Illustration of how XY is calculated


Mean student-teacher !
ratio = 20

Test score
720
710

(Xi X)(Yi Y) =!
(21 20)(670 660) =!
(1)(10) = 10 > 0

II

700
680

Mean test score!


= 660

660
640
620
600
590
10

III

(Xi X)(Yi Y) =!
(25 20)(620 660) =!
(5)(40) = 200 < 0

IV

15
20
25
Student-teacher ratio

30

26

For the CA school districts, the covariance between Test Score


and STR is slightly negative

27

The correlation coefficient is defined using the covariance:



cov(X,Y )
XY
corr(X,Y ) =
=
= rXY
var(X)var(Y ) X Y
1 corr(X,Y) 1
corr(X,Y) = 1 mean perfect positive linear association
corr(X,Y) = 1 means perfect negative linear association
corr(X,Y) = 0 means no linear association

Nice because (unlike the covariance), corr(X,Y) is unit free


For CA school districts, the correlation between Test Score
and STR is a about 0.2
28

The correlation
coefficient
measures
linearassociation
association
Note the the correlation
coefficient
measures linear

29

1/2

30

Footnote
The covariance of a random variable with itself is its variance:
cov(X,X)

= E[(X X)(X X)]

= E[(X X

)2]

= X

31

d. Conditional distributions and conditional means


Conditional distributions

The distribution of Y, given value(s) of some other random


variable, X
For example, the distribution of test scores, given that (or
conditional on) STR < 20

32

Conditional expectations (or means)

conditional mean = the mean of a conditional distribution


= E(Y|X = x),
where x is a specific value of X
This is an especially important concept, and the notation is
also important!
Example: E(Test scores|STR < 20) = the mean of test scores
among districts with small class sizes
This is read the expected value (or expectation) of test
scores conditional on the STR being less than 20

33

The conditional mean is simply a (perhaps new) term for the


familiar idea of the group mean
Conditional moments

conditional variance = the variance of a conditional


distribution
etc.

34

A difference in means is typically the difference


between the means of two conditional distributions
= E(Test scores|STR < 20) E(Test scores|STR 20)
Other examples of conditional means remember, E(Y|X = x)
Wages of all female workers = (Y = wages, X = gender)
Mortality rate of those given an experimental treatment (Y =
die, X = treatment)
If E(X|Z) is constant, then X does not vary when Z changes, so
corr(X,Z) = 0 (but not necessarily the other way around)

35

e. Distribution of a sample of data drawn randomly


from a population Y1, , Yn
We will assume simple random sampling
Choose an individual (district, entity) at random from the
population
Randomness and data
Before sample selection, the value of Y is random because the
individual selected is random
Once the individual is selected and the value of Y is observed,
then Y is just a number no longer random
The data set is (Y1,Y2,,Yn), where Yi = value of Y for the ith
individual (district, entity, unit) sampled
36

Distribution of Y1, , Yn under simple random sampling


Because individuals #1 and #2 are selected at random, the
value of Y1 has no information content for Y2
If follows that:
Y1 and Y2 are independently distributed
Y1 and Y2 come from the same distribution that is,Y1 and Y2
are identically distributed
So under simple random sampling,Y1 and Y2 are independently
and identically distributed i.i.d. for short
More generally, under simple random sampling, {Yi}, i = 1, , n,
are i.i.d.

37

This framework allows rigorous statistical inferences about


moments of population distributions using a sample of data from
that population

38

Вам также может понравиться