2,3 Prob PDF

Economics 420
Introduction to Econometrics
Professor Woodbury
Fall Semester 2015
Experiments and Basic Statistics
1. The probability framework for statistical
inference
2. Estimation
3. Hypothesis testing
4. Confidence intervals
1. The probability framework for statistical

inference
a. Population, random variable, distribution
b. Moments of a distribution mean, variance, standard deviation
c.Two random variables joint distributions, covariance, correlation
d. Conditional distributions and conditional means
e. Distribution of a sample of data drawn randomly from a population
Y1, ,Yn
a. Population, random variable, distribution

Population
The group or collection of all possible entities of interest (for
example, school districts)
We will think of populations as infinitely large ( is an
approximation to very big)
Random variable Y
Numerical summary of a random outcome outcome of a
coin toss, weight of an individual chosen at random from a
group of people, a school districts average test score or its
STR
The outcome is subject to chance each possible outcome
of a random variable is associated with some probability of
occurrence
A random variable may be discrete (finite number of
outcomes, like the roll of a die) or continuous (takes any real
value with probability 0)
Population distribution (of probability density function)
Every RV is associated with a probability function a function

that gives probabilities of different values of Y that occur in the
population
For example, for a discrete RV like a coin toss, the possible
outcomes for the number of heads are y = 0, 1, and the
associated probabilities are 1/2 and 1/2
So the probability density function can be written
f(y) = 1/2, y = 0, 1
In general, for a discrete random variable Y, the probability

distribution can be written:

pj = Pr(Y = yj), for j = 1, 2, 3, ..., k
A probability density function summarizes what we know about

the outcomes of Y:

f(yj) = pj , for j = 1, 2, 3, ..., k
For example, for the roll of a die, this would be:

f(1) = 1/6

f(2) = 1/6

f(3) = 1/6
f(4) = 1/6

f(5) = 1/6

f(6) = 1/6
6
Or if you want to to be better organized:

Outcome
Y=1
Y=2
Y=3
Y=4
Y=5
Y=6
Probability
f(yi)
0.167
0.167
0.167
0.167
0.167
0.167
Note: We will be using mainly discrete random variables think

of a histogram
Cumulative distribution function

The cumulative probability distribution is the probability that the
random variable is less than or equal to a particular value:

F(y) = Pr(Y y)
Outcome
Y1
Y2
Y3
Y4
Y5
Y6
Cumulative
probability
F(yi)
0.167
0.333
0.50
0.667
0.833
1.0
The cumulative probability is more useful with continuous

distributions than with discrete distributions, but the idea is the
same (more later)
8
b. Moments of a population distribution mean,

variance, standard deviation
We want to get a better understanding of the distribution of a
random variable
Specifically, what would we like to know?
Where is the distribution centered?
How dispersed it is?
Is it symmetric or skewed?
Does it have a lot of outliers?
mean = expected value (expectation) of Y

= E(Y)
= Y
You can think of this as the long-run average value of Y over

repeated realizations of Y
This is our measure of central tendency
10
Key
Concept
2.1
For a discrete
random
variable (one
that takes on a finite or
discrete set of values, like the outcome of rolling a die 1, 2, 3, 4,
5, 6):
Copyright 2003 by Pearson Education, Inc.
2-3
11
Moments (continued)
variance = E(Y Y)2
2
= Y
= measure of the squared spread of the distribution

standard deviation =
= Y
= measure of the spread of a distribution
These are both measures of dispersion why might they be
interesting?
12
Again for a discrete random variable:
Copyright 2003 by Pearson Education, Inc.
2-4
13
Exercise
Let Y be the number of heads that occur when two coins are
tossed.
a. What is the probability density function of Y?
b. What is the cumulative distribution of Y?
c. What are the mean, variance, and standard deviation of Y?
14
a. What is the probability density function of Y?

Outcome (number
of heads)
Y=0
Y=1
Y=2
Probability
f(yi)
0.25
0.50
0.25
b. What is the cumulative distribution of Y?

Outcome (number
of heads)
Y0
Y1
Y2
Cumulative
probability F(yi)
0.25
0.75
1.0
15
c. What are the mean, variance, and standard deviation of Y?

Y = E(Y) = (00.25) + (10.50) + (20.25)=1.00
var(Y) = [(01)2 0.25] + [(11)2 0.50] + [(21)2 0.25]

= (1 0.25) + (0 0.50) + (1 0.25) = 0.50
Y = var(Y) = 0.50 = 0.707
16
APPENDICES
Notes on continuous distributions

F I G U R E B . 2 The probability that X lies between the points a and b.
f(x)
When computing probabilities for continuous random variables, it is easiest to work

with the cumulative distribution function (cdf ). If X is any random variable, then its cdf
is defined for any real number x by
F(x) P(X x).
[B.6]
Cengage Learning, 2013
726
17
Moments (continued)
skewness =

= measure of asymmetry of a distribution
skewness = 0 implies the distribution is symmetric

skewness > 0 implies the distribution has a long right tail (is
skewed right)
skewness < 0 implies the distribution has a long left tail (is
skewed left)
18
Moments (continued)
kurtosis =

= measure of mass in tails
= measure of the probability of large values
kurtosis = 3 implies a normal distribution

kurtosis > 3 implies heavy tails (a leptokurtic distribution)
Why might we care about the amount of mass in the tails of a
distribution?
19
Some examples
Both of these distributions are symmetric (skewness = 0)

But one is normal, whereas the other has thick tails
20
Both of these distributions are skewed

Which is skewed right? Which is skewed left?
1/2/3-22
21
c. Two random variables joint distributions,

covariance, correlation
Random variables X and Y have a joint distribution

The covariance between X and Y is

cov(X,Y) = E[(X X)(Y Y)] = XY
The covariance is a measure of the linear association between

X and Y
Its units are (units of X) (units of Y), which has no natural
interpretation (thats why we have the correlation coefficient)
cov(X,Y) > 0 means a positive relation between X and Y
If X and Y are independently distributed, then cov(X,Y) = 0
(but not vice versa!)
22
The estimator of the covariance is ...

XY
= (Xi X)(Yi Y ) / (n 1)
i1
This is called an estimator because it is a formula that we

use to obtain an estimate of a parameter
A parameter is a population statistic, which in general we
do not know that is why we take a random sample of the
population and try to learn (or infer something) about that
population
In this case, the parameter we are trying to estimate is
XY = cov(X,Y) = E[(X X)(Y Y)]
and XY is the estimator (or formula)
23
Why (and how) does this estimator work?

Think about what you are doing when you calculate a sample
variance using the estimator XY
You are calculating the product of deviations from the mean

(Xi X)(Yi Y )
for each observation in the sample, then taking the average of

those products
In the figure, the products for observations in quadrants I and
III are always positive, whereas the products for observations
in quadrants II and IV are always negative see the examples
in quadrants I and II
24
So for a variable with a distribution like the one in the graph,

the negative products outweigh the positive products, and
XY will be negative (which makes sense)
25
Illustration of how XY is calculated

Mean student-teacher !
ratio = 20
Test score
720
710
(Xi X)(Yi Y) =!
(21 20)(670 660) =!
(1)(10) = 10 > 0
II
700
680
Mean test score!

= 660
660
640
620
600
590
10
III
(Xi X)(Yi Y) =!
(25 20)(620 660) =!
(5)(40) = 200 < 0
IV
15
20
25
Student-teacher ratio
30
26
For the CA school districts, the covariance between Test Score

and STR is slightly negative
27
The correlation coefficient is defined using the covariance:

cov(X,Y )
XY
corr(X,Y ) =
=
= rXY
var(X)var(Y ) X Y
1 corr(X,Y) 1
corr(X,Y) = 1 mean perfect positive linear association
corr(X,Y) = 1 means perfect negative linear association
corr(X,Y) = 0 means no linear association
Nice because (unlike the covariance), corr(X,Y) is unit free

For CA school districts, the correlation between Test Score
and STR is a about 0.2
28
The correlation
coefficient
measures
linearassociation
association
Note the the correlation
coefficient
measures linear
29
1/2
30
Footnote
The covariance of a random variable with itself is its variance:
cov(X,X)

= E[(X X)(X X)]
= E[(X X
)2]
= X
31
d. Conditional distributions and conditional means

Conditional distributions
The distribution of Y, given value(s) of some other random

variable, X
For example, the distribution of test scores, given that (or
conditional on) STR < 20
32
Conditional expectations (or means)
conditional mean = the mean of a conditional distribution

= E(Y|X = x),
where x is a specific value of X
This is an especially important concept, and the notation is
also important!
Example: E(Test scores|STR < 20) = the mean of test scores
among districts with small class sizes
This is read the expected value (or expectation) of test
scores conditional on the STR being less than 20
33
The conditional mean is simply a (perhaps new) term for the

familiar idea of the group mean
Conditional moments
conditional variance = the variance of a conditional

distribution
etc.
34
A difference in means is typically the difference

between the means of two conditional distributions
= E(Test scores|STR < 20) E(Test scores|STR 20)
Other examples of conditional means remember, E(Y|X = x)
Wages of all female workers = (Y = wages, X = gender)
Mortality rate of those given an experimental treatment (Y =
die, X = treatment)
If E(X|Z) is constant, then X does not vary when Z changes, so
corr(X,Z) = 0 (but not necessarily the other way around)
35
e. Distribution of a sample of data drawn randomly

from a population Y1, , Yn
We will assume simple random sampling
Choose an individual (district, entity) at random from the
population
Randomness and data
Before sample selection, the value of Y is random because the
individual selected is random
Once the individual is selected and the value of Y is observed,
then Y is just a number no longer random
The data set is (Y1,Y2,,Yn), where Yi = value of Y for the ith
individual (district, entity, unit) sampled
36
Distribution of Y1, , Yn under simple random sampling

Because individuals #1 and #2 are selected at random, the
value of Y1 has no information content for Y2
If follows that:
Y1 and Y2 are independently distributed
Y1 and Y2 come from the same distribution that is,Y1 and Y2
are identically distributed
So under simple random sampling,Y1 and Y2 are independently
and identically distributed i.i.d. for short
More generally, under simple random sampling, {Yi}, i = 1, , n,
are i.i.d.
37
This framework allows rigorous statistical inferences about

moments of population distributions using a sample of data from
that population
38

2,3 Prob PDF

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

2,3 Prob PDF

Загружено:

Авторское право:

Доступные форматы

Economics 420

1. The probability framework for statistical

a. Population, random variable, distribution

Population distribution (of probability density function)

Every RV is associated with a probability function a function

In general, for a discrete random variable Y, the probability

pj = Pr(Y = yj), for j = 1, 2, 3, ..., k

A probability density function summarizes what we know about

f(yj) = pj , for j = 1, 2, 3, ..., k

For example, for the roll of a die, this would be:

Or if you want to to be better organized:

Note: We will be using mainly discrete random variables think

Cumulative distribution function

The cumulative probability is more useful with continuous

b. Moments of a population distribution mean,

mean = expected value (expectation) of Y

You can think of this as the long-run average value of Y over

Copyright 2003 by Pearson Education, Inc.

= measure of the squared spread of the distribution

Again for a discrete random variable:

Copyright 2003 by Pearson Education, Inc.

a. What is the probability density function of Y?

b. What is the cumulative distribution of Y?

c. What are the mean, variance, and standard deviation of Y?

= (1 0.25) + (0 0.50) + (1 0.25) = 0.50

Y = var(Y) = 0.50 = 0.707

Notes on continuous distributions

When computing probabilities for continuous random variables, it is easiest to work

Cengage Learning, 2013

skewness = 0 implies the distribution is symmetric

kurtosis = 3 implies a normal distribution

Both of these distributions are symmetric (skewness = 0)

Both of these distributions are skewed

c. Two random variables joint distributions,

Random variables X and Y have a joint distribution

cov(X,Y) = E[(X X)(Y Y)] = XY

The covariance is a measure of the linear association between

The estimator of the covariance is ...

This is called an estimator because it is a formula that we

Why (and how) does this estimator work?

You are calculating the product of deviations from the mean

for each observation in the sample, then taking the average of

So for a variable with a distribution like the one in the graph,

XY will be negative (which makes sense)

Illustration of how XY is calculated

Mean test score!

For the CA school districts, the covariance between Test Score

The correlation coefficient is defined using the covariance:

Nice because (unlike the covariance), corr(X,Y) is unit free

= E[(X X)(X X)]

d. Conditional distributions and conditional means

The distribution of Y, given value(s) of some other random

Conditional expectations (or means)

conditional mean = the mean of a conditional distribution

The conditional mean is simply a (perhaps new) term for the

conditional variance = the variance of a conditional

A difference in means is typically the difference

e. Distribution of a sample of data drawn randomly

Distribution of Y1, , Yn under simple random sampling

This framework allows rigorous statistical inferences about

Вам также может понравиться