Probability Theory and Statistics

Review notes, Econometrics 2010-11.
Notes on probability theory and statistics: For use in Econometrics

Econometrics heavily relies on the probability theory and statistics you learned in Advanced Statistics or Further Statistics, and it is important that you feel comfortable with that material. These notes should be seen as a rough guideline to help you review the material from Advanced Statistics or Further Statistics. They are written as lecture notes and are built on Wooldridge, J.M. (2009) Introductory Econometrics: A Modern Approach, Appendix B and C. There will also be a set of exercises to help you with your revision, named Review Exercises, on the course website. Sketch answers to these exercises will be released on the course website later in the term. The idea is that since you have no tutorials for the couple of weeks of the course, you use the time you would have spent on preparing for the tutorials to do this revision.
I Random Variables and Distributions of Random Variables. Random Variable: Definition. A random variable, X, is a mapping that assigns numerical values to events in a sample space. Example 1: Flip a coin 2 times and count number of heads. Sample space ={(H,H), (H,T), (T,H), (T,T)}. X = number of heads. Possible outcomes of X: {0,1,2}. X(H,H)=2, X(H,T)=1, X(T,H)=1, X(T,T)=0. Before flipping the coin we do not know the outcome. Another trial of the experiment can produce a different outcome. This is the case for all random variables: Before the event that the random variable describes, takes place, we do not know the outcome. And another trial can produce a different outcome. Example 2: Y = indicate whether a person is unemployed. Sample space={yes, no}. Possible outcomes: Y(yes)=1, Y(no)=0. Suppose you have a sample of 10 people. Each person is then to be considered as a trial of the experiment of assigning values to Y. Before you ask, you dont know whether they are unemployed or not. You ask the first person, and the answer is yes, hence Y=1 for that person. Then you repeat the trial and ask the next person and the answer is no, hence Y=0 for that second person, etc. Example 3: Z = education level in years. Sample space={9,10,12,13,17,18} Possible outcomes: {9,10,12,13,17,18}. Example 4: U = number of pints drunk by a student in a week. Sample space={0,1,2,3,}. Possible outcomes: U=0, U=1, U=2, etc. Example 5: V = earned income in a given month in . Sample space = [0, [. Possible outcomes: V=600, V=250, etc. In the examples above, X, Y, Z, W and U are discrete random variables; Y is also binary. V is a continuous random variable. 1
Distribution of a Random Variable: The discrete case Let X be a discrete random variable with possible realisations {x1, x2,., xk} How likely is a particular realisation of X? The answer is given by the probability distribution function (the pdf): Pr(X = xj ) = pj, j = 1,, k The pdf describes how probabilities are distributed across all possible outcomes of X Example: X = number of eyes on a dice when rolled once. Possible realisations: 1,2,3,4,5,6 Pr(X = j) = 1/6, j = 1,2,,6 Distribution of a Random Variable: The continuous case Let X be a continuous random variable. Remember that Pr(X=x)=0 for all x when X is continuous, hence there is no function that gives Pr(X=x) for some number x. Instead, we work with the probability that X lies in a certain range, say a < X < b, where a and b are some constants. The cumulative distribution function (cdf) is defined by
FX ( x) F ( x) = Pr ( X x ) for all x.
The Fundamental Theorem of Calculus implies that if
f X ( x) f ( x) = dF ( x) dx
then f is the probability density function of X, with
F ( x) =
f ( x)dx,
F (b) F (a) = f ( x)dx

a
Examples of continuous distributions: The normal distribution: The normal distribution and those derived from it are the most widely used distributions in statistics and econometrics, because they are easy to work with and many real life random variables have distributions that are well approximated by the normal distribution; one example is ln(income), see p. 738 in Wooldridge for more examples. The pdf for the normal distribution: f ( x) = 1 exp(( x ) 2 2 2 ), - < x < 2
The standard normal distribution: 1 f ( z) = exp( z 2 2), - < z < . 2 The Chi-Square distribution: Let Zi, i = 1,.,n, be standard normal random variables. Define the random variable X as X = i =1 Z i2 , i.e. the sum of squared of the Zis. Then X has a
n
chi-square distribution with n degrees of freedom.
The t distribution: Let Z have a standard normal distribution and let X have a chi-square distribution with n degrees of freedom. Assume that Z and X are independent. Then T defined below has a t distribution with n degrees of freedom: Z . X n The F-distribution: Let X and Y be a chi-square distributes random variable with k and m degrees of freedom, respectively. Assume that X and Y are independent. Then the random variable F = (X/k) / (Y/m) has an F-distribution with (k,m) degrees of freedom. T= What we have seen until now are distributions of a single random variable. Next, we turn to distributions of two random variables. Joint and Conditional Distributions We will usually be interested in how one random variable is related to one or more other random variables. Example: Overbooking of flights We want to decide the optimal number of seat reservations for an airline with 100 seats on a given route. For a given reservation we would for example want to consider the probability that the person shows up and the probability that the person travels on business. This means that we are interested in two random variables: X = person shows up Y = person travels on business Joint Distribution: person shows up AND travels on business Conditional Distribution: conditional on that the person travels on business, what is the probability of the person showing up? Formally: Let X and Y be discrete random variables. The joint pdf of X and Y is defined by: f X ,Y ( x, y ) = Pr( X = x, Y = y )
The conditional pdf of Y given X is defined by (compare with Bayes Theorem!):
fY X ( y x ) =
f X ,Y ( x, y ) f X ( x)
The most we can know about how X affects Y is contained in the conditional distribution of Y given X, since this gives us the probability distribution of Y for each possible value of X. Independence: Let X and Y be random variables with joint and conditional distributions given by f X ,Y ( x, y ) and fY X ( y x) , respectively. X and Y are INDEPENDENT if and only if f X ,Y ( x, y ) = f X ( x) fY ( y ) .
If X and Y are independent, then the conditional and marginal distributions of Y are equal, i.e. fY X ( y x) = fY ( y ) . This means that knowledge about the values that X takes on tells us nothing about the probability of Y taking on a certain value. II Mean, Variance, Covariance and Correlation This paragraph repeats calculation rules for means (expectations), variances, covariances and correlations. It is essential that you are comfortable with these! Means of Random Variables The mean (expected value) is a measure of central tendency in the distribution. The mean is the weighted average of all possible values of X, the weights being the probabilities of the outcomes of X: E ( X ) = x1 f ( x1 ) + ... + xn f ( xn ) Properties of expected values: E (c) = c for all c E (aX + b) = aE ( X ) + b, all a and b E (a1 X 1 + ... + an X n ) = a1 E ( X 1 ) + ... + an E ( X n ) E ( X 1 + ... + X n ) = E ( X 1 ) + ... + E ( X n ) E ( g ( X )) g ( E ( X )) for g nonlinear Variances of Random Variables
The variance is a measure of how much X varies around its mean. Or in other words, how far is X from its mean, on average. Let E ( X ) = X . The variance is the expected distance from X to its mean:
Var ( X ) = E[( X X ) 2 ] = X Note (show this yourself):

2 Var ( X ) = E ( X 2 ) X
Properties of variance: Var(X) = 0 if and only if P(X=c) = 1 for some constant c Var(aX+b) = a2Var(X) (Show this yourself) Var(aX+bY) = a2Var(X) + b2Var(Y) + 2abCov(X,Y) (Show this yourself) How is the standard deviation of X defined? You should know the answer to this immediately. Now we define the covariance between X and Y.
Covariances and Correlations between Random Variables: The covariance is a measure of how, on average, two random variables vary with one another. The covariance between X and Y is defined as
Cov( X , Y ) = E[( X X )(Y Y )] = XY
Note that (show this yourself): Cov( X , Y ) = E ( XY ) X Y If Cov(X,Y) > 0, then, on average when X is above its mean, Y is also above its mean. If Cov(X,Y) < 0, then, on average when X is above its mean, Y is below its mean. Properties of Covariance: (1) If E(X) = 0 or E(Y) = 0, then Cov(X,Y) = E(XY) (2) If X and Y are independent, then Cov(X,Y) = 0 (3) For all a1, a2, b1, b2, we have Cov(a1X + b1, a2X + b2) = a1a2Cov(X,Y) Proof of (2): Assume X and Y are independent. We then have that E(XY)=E(X)E(Y). As Cov(X,Y)=E(XY)-E(X)E(Y), independence implies that Cov(X,Y)=E(X)E(Y)E(X)E(Y)=0. Note: Covariance measures the amount of linear dependence between two variables. Thus, the converse of 2 is NOT true in general! That is, zero covariance does NOT imply independence. From (3) it follows that covariance depends on the units of measurement of the random variables. We therefore define another measure of how two variables vary with one another which does not depend on units of measurement: Correlation. The correlation coefficient between X and Y:
( X ,Y ) =
=
Cov( X , Y ) Cov( X , X )Cov(Y , Y )
XY Cov( X , Y ) = = XY 2 2 Var ( X )Var (Y ) XY XY
We have: 1 ( X , Y ) 1 . III Population and Sample

Statistical inference is concerned with learning about a population from a sample from that population. In other words, statistical inference is concerned with learning about unknown characteristics (or parameters) characterising a population from a sample from that population. Population can be individuals, firms, states, countries, etc. Think of the population as all people we want to know about. Or think of the population as being described by some underlying distribution. Sampling from a population is equivalent to sampling from the probability distribution of a random variable. A sample is a subset from that population. Say for example we want to learn about the return to education in the UK. We pick out a sample of people from the UK and collect data on their education levels and wages. Consider the simplest case: Let X be a random variable representing a population. X has probability distribution f(x;) and is unknown. We want to learn about .
Random Sampling: If X1,,Xn are independent random variables with a common probability distribution function f(x;), then {X1,, Xn} is said to be a random sample from f(x; ) (or a random sample from the population represented by f(x; )). A (random) sample consists of a set of random variables. Think of the ith random variable as what can be obtained in the ith draw from the population. IV Estimation Continuing the example above, we want to learn about the parameter characterising the population (). We want to obtain an estimate of the parameter. The parameter is unknown to us, and we will never recover its true value. Loosely speaking, an estimate of an unknown parameter is a qualified guess at the value of the unknown parameter. There are many different ways of estimating out there! An estimator is a rule of calculating an estimate. An estimator is a random variable. Its distribution is called the sampling distribution. The sampling distribution describes the likelihood of various outcomes of the estimator across different samples. A general idea underlying many estimation methods (i.e. a general idea underlying many of the rules of calculating an estimate) is to minimise our ignorance about the values of the unknown parameter. Estimation method in Econometrics, Semester 1: Least Squares. Estimator and Estimate An estimator is a sample statistic, i.e. it is a random variable which is a function of the sample random variables. An estimator can also be viewed as a known function (or a known rule) that to each of the possible outcomes of the sample assigns a value for the unknown parameter. An estimator: W = h(X1,,Xn). An estimate is the number calculated from the sample values. In other words, an estimate is a value of the random variable that is the estimator. An estimator is a random variable and hence has a probability distribution. Unbiasedness An estimator is a random variable. So, for different sample values (i.e. for different samples drawn) we get different values of our estimator. The first property of our estimator we would like to be satisfied is that our estimator is unbiased. An estimator W is an unbiased estimator of if E(W) = . Unbiasedness says that on average we get it right. Or, in other words, if we could draw infinitely many random samples on X from the population and estimate W for each sample, the average value of the estimates would be equal to .
An important result: The sample mean is an unbiased estimator of the population mean, regardless of the underlying population distribution: Population X: distribution with mean X Random sample: {X1,, Xn}, Each Xi has distribution with mean X Sample mean: X = X i .
i =1 n
How strong a property is unbiasedness? For example, you can estimate the mean of a population by the first observation of the random sample: That is an unbiased estimator of the population mean; i.e. unbiasedness is NOT a strong property. Use also the sampling variance = the variance of the estimator, to further assess the estimator. Example: The sampling variance of the sample mean (derive this yourself! You should have seen this in your statistics course):
Var ( X ) = ... = 2 n .
Definition: Efficiency: Let W and V be unbiased estimators of , then W is efficient relative to V when Var(W) Var(V) for all and with < for at least one . In other words, one estimator is efficient relative to another when it has the smallest variance of the two. An estimator is said to be efficient, when it has the smallest variance of all possible estimators of that parameter. V Hypothesis Testing (Inference) This section briefly outlines some concepts from hypothesis testing that you must be familiar with. It also outlines the steps in a hypothesis test about one parameter. Concepts in hypothesis testing: ALWAYS(!!!) when doing hypothesis testing, you must write down the null hypothesis and the alternative hypothesis. We will be interested in null hypotheses of the form: Null hypothesis:
H0: = 0 where 0 is some known number. There are three possible alternative hypotheses to this null hypothesis.
Alternative hypotheses: (1) H1: > 0 (one-sided test) (2) H1: < 0 (one-sided test) (3) H1: 0 (two-sided test) Remember that we are testing a hypothesis about an unknown parameter and we will never know the true value of that parameter. There are two possible errors we can make when testing: We can reject the null hypothesis when it is in fact true. This is 7
called a Type I error. Or we can fail to reject the null hypothesis when the null is in fact false, this is called a Type II error: Type I error: Reject the null when the null is true. Type II error: Fail to reject the null when the null is false. We will never know with certainty whether we committed an error or not. But we can calculate the probability of making either Type I or Type II error. The significance level is defined as the probability of making a Type I error, i.e. the significance level is the probability of rejecting a true null hypothesis: Significance level = Prob(rejecting the null | the null is true) = . The significance level is something we choose from the outset of the test. It is a measure of our tolerance for making a Type I error. For example, if we choose a significance level of 5% it means that we are willing to reject a true null hypothesis 5% of the times. Assume that the null is true. If you do your test at a significance level of 5% and you pick 100 samples, then on the grounds of probability you would expect to reject the null 5 times. Test statistic T: T is some known function f of the random sample, so T is in itself a random variable: T = f(your sample). Since T is a random variable, T has a distribution under the null: We always use the distribution of T under the null, i.e. the distribution of T given that the null is true. How to decide when to reject the null? Suppose you are testing against the alternative (3). Intuitively, then the further away from 0 the value of the test statistic is, the less likely you would think it is that the null is true. How to assess what is sufficiently far away from 0? We assess that by probabilities, i.e. we assess that by using the distribution of the test statistic. The critical value depends on the significance level and the alternative hypothesis. Suppose that we choose a significance level of 5% and that the alternative hypothesis is (3). Then the critical value is that value of the test statistic, c, where the probability of getting a value of the test statistic smaller than c or larger than c is exactly 5%: From the distribution of T under the null (i.e. when the null is true) we determine c such that P(|T| > c |H0) = . The p-value is the largest significance level at which we could carry out the test and still fail to reject the null: p-value = P(|T| > t |H0) = 1- P(|T| t |H0) = 1 - F(T|H0) Loosely speaking, the p-value is the probability of getting a value of the test statistic at least as extreme as the value we got from our sample. You can also think of the pvalue as a measure of support for the null hypothesis.
Cookbook for doing hypothesis testing: Step 1: Write down the null hypothesis and the alternative hypothesis.
Step 2: Estimate . Let denote the estimate of obtained from the sample you have.
Step 3: Calculate the test statistic for your sample. How to do this will depend on the particular test you are performing. In other words, the test statistic is always calculated by a known rule and what this known rule is depends on what sort of test you are 8
performing (in Semester 1 we will see t-tests and F-tests). A test statistic does usually always involve both and the estimate of the variance of , though. Denote the value of the test statistic from your sample by t. Step 4: Decide on a significance level for your test. Common values are 5% and 1%. If you are performing the test manually you follow Step 4a. If you have an econometrics software package available you follow Step 4b. Step 4a: In the correct table of critical values, find the critical value corresponding to your chosen significance level (the correct table is the table for the distribution of your test statistic). Reject according to the following rules for the three different alternative hypotheses: (1) Reject if t > c (see Fig. C.5 in App. C in Wooldridge) (2) Reject if t < -c (3) Reject if |t| > c (see Fig. C.6 in App. C in Wooldridge) Step 4b: Calculate the p-value associated with your calculated test statistic. An econometrics software package will calculate the p-value automatically every time you do a test; often the p-value you get out of an econometrics software package will already have taken into account whether the alternative hypothesis is one-sided or two-sided (usually the latter) in which case you can simply reject the null hypothesis if p-value < significance level (see Fig. C.7 and C.8 in App. C in Wooldridge) Confidence interval: Suppose the sampling distribution of your estimator is the normal distribution with mean and variance V(), and suppose that you have estimated from your sample the parameter to be and that you have estimated the variance of your estimator to be S2. Then the 95% confidence interval for your sample is given by [ 1.96 S 2 , +1.96 S 2 ,]. The interpretation of a 95% confidence interval is that if we draw repeated samples and estimate for each of them, then 95% of the times, will lie in the confidence interval. An Example: The government increases the tax on petrol in order to lower car mileage per person. We are advisors to the government, and our job is to find out whether the tax increase has had the desired effect or not. We are given data (a random sample of individuals) on the difference in car mileage in the year before the tax increase and the year after the tax increase for n = 1256 persons. For person i, Yi denotes the change in car mileage such that Yi = car mileage before car mileage after. We treat the data as a random sample from a N(,2) distribution. Since we are interested in finding out whether the car mileage has decreased, sensible null and alternative hypotheses are: H0: = 0 H1: > 0 In this way, if we reject the null in favor of the alternative, we reject that there is no change in car mileage in favor of that car mileage before the tax increase is larger than car 2 mileage after the tax increase. From the data we get the following estimates of and :
= Y = 12.8 2 = S 2 = 553.7
This means that the estimated average change in car mileage is a decline of 12.8 miles per person. Is this a significant decline? We employ a t test to find out. We choose a 5% significance level, and because of the large sample size, we can use the standard normal distribution. Because Var (Y ) = 2 n (you proved that earlier), then:
t=
0 n
2
Y S n
2
12.8 = 19.278. 553.7 1256
From a table of the standard normal distribution (for example Table G.1 in Wooldridge), we find that the critical value for a one-sided test at significance level 5% is approximately 1.64. Since 19.278 is far larger than 1.64, we strongly reject the null in favor of the alternative (if we were to do the test at a 1% significance level, we would still reject, since the critical value at a 1% significance level is approximately 2.33). We thus conclude that car mileage has significantly decreased after the tax increase. Note that this analysis implicitly assumes that nothing else has affected car mileage in the year before and after the tax increase. But if for example the price of train tickets had decreased in the same period, it might also be that car mileage had decreased because train tickets had become cheaper.
10

Probability Theory and Statistics

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Probability Theory and Statistics

Загружено:

Авторское право:

Доступные форматы

Review notes, Econometrics 2010-11.

Notes on probability theory and statistics: For use in Econometrics

Review notes, Econometrics 2010-11.

then f is the probability density function of X, with

F (b) F (a) = f ( x)dx

chi-square distribution with n degrees of freedom.

Review notes, Econometrics 2010-11.

Review notes, Econometrics 2010-11.

Var ( X ) = E[( X X ) 2 ] = X Note (show this yourself):

Review notes, Econometrics 2010-11.

Cov( X , Y ) Cov( X , X )Cov(Y , Y )

XY Cov( X , Y ) = = XY 2 2 Var ( X )Var (Y ) XY XY

We have: 1 ( X , Y ) 1 . III Population and Sample

Review notes, Econometrics 2010-11.

Review notes, Econometrics 2010-11.

Review notes, Econometrics 2010-11.

Review notes, Econometrics 2010-11.

Review notes, Econometrics 2010-11.

12.8 = 19.278. 553.7 1256

Вам также может понравиться