Вы находитесь на странице: 1из 21

Lecture 3 Basic probability theory

Lectures 1 and 2 (and Labs 1 and 2) basic work with economic data Very valuable!! However, economists often want to use more sophisticated statistical techniques to examine relationships between economic variables in more detail Remainder of class work towards basic single variable regression analysis First step basic probability theory Gujarati Chapters 2 and 3 read all here, key points

EC203 Introduction to Empirical Economics. KT.

Random variables A random (or statistical) experiment is a process leading to at least two possible outcomes There will be uncertainty as to which outcome will occur Example: rolling a fair dice observe the number shown uppermost possible outcomes: 1, 2, 3, 4, 5 or 6 For a random experiment, we know in advance all the possible outcomes What we do NOT know in advance is which outcome will occur in any particular experiment Sample space (population) The set of all possible outcomes of the experiment: here, {1,2,3,4,5,6}
EC203 Introduction to Empirical Economics. KT. 2

A random variable is a variable whose (numerical) value is determined by the outcome of a random experiment. Example: toss two fair coins Let H denote a head, T a tail There are four possible outcomes: {HH, HT, TH, TT} Now consider a variable X, defined as the number of heads that are observed in the throw of two fair coins, or number of heads The situation is as follows Possible outcomes TT TH HT HH Number of heads 0 1 1 2

EC203 Introduction to Empirical Economics. KT.

The variable, X, number of heads , is a random or stochastic variable and has 3 possible outcomes: X 0 1 2 Random variables (r.v) may be discrete or continuous: A discrete r.v. takes on only a finite number of particular values A continuous r.v. can take on any value in some interval of values Both the roll of the dice and toss of the coin we have looked at are discrete r.v.s An example of a continuous r.v. is the rainfall falling in Glasgow per year. Focussing initially on discrete r.v.s makes concepts easier to grasp.
EC203 Introduction to Empirical Economics. KT. 4

Probability Logical reasoning and/or empirical evidence may give us some feeling of how likely different outcomes are E.g. throw of the dice: outcomes {1,2,3,4,5,6} are all equally likely Basic coin toss H or T both outcomes equally likely

For the two coin toss example, we can expect the value 1 to occur with twoic the likelihood of value 0 the values 0 and 2 are equally likely Lets use the notation Pr of a particular outcome, we can now deduce some probabilities for this example Possible outcome O heads 1 head 2 heads
EC203 Introduction to Empirical Economics. KT.

Pr of outcome 1/4 2/4 1/4


Or, we can write Pr(2 heads) = 2/4 (or 1/2) etc i.e. Note that the probabilities sum to one, as we are distributing a total of 1 the value of 1 corresponds to a certainty: we know that one of the outcomes will occur each of the outcomes are mutually exclusive (i.e. they cannot occur at the same time) Note that this classical definition of probability is what we call an a priori definition the probabilities are derived from purely deductive reasoning However, what if the outcomes of an experiment are not finite and cannot be stated with certainty?

EC203 Introduction to Empirical Economics. KT.

E.g. what is the probability that GDP will rise by a certain amount? Relative frequency or empirical definition of probability Distinguish between absolute and relative frequency the absolute frequency is the number of occurrences of a given event e.g. 10 students in this class get an exam mark of 70% if there are 50 students in the class the relative frequency of the event of achievement of first class marks is 1/5 The frequency distribution of marks achieved by all 50 students in the class would show the different marking bands and how students are distributed across it in both relative and absolute terms.

EC203 Introduction to Empirical Economics. KT.

Can we treat relative frequencies as probabilities? Yes, provided the number of observations that the relative frequencies are based on is reasonably large The empirical, or relative frequency, definition of probability See Gujarati on properties of probabilities, but ignore Bayes Theorem Probability of random variables First discrete r.v.s takes only a finite number of values If X is an r.v. with distinct values x1, x2..xn The function f is defined by f(X=xi) = P(X=xi) i=1,2,N = 0 if xxi

EC203 Introduction to Empirical Economics. KT.

is called the probability mass function (PMF) or probability function (PF) Note that 0 f(xi) 1 i.e. the probability of X taking the value of xi lies between 0 and 1, and f(xi)=1 From slide 5 Number of heads X O heads 1 head 2 heads Sum Geometrically?
Insert the PMF of the number of heads in a two coin toss (see Gujarati Fig 2.2)

PF f(X) 1/4 1/2 1/4 1

Expected value of a random variable, X

EC203 Introduction to Empirical Economics. KT.

E( X ) = ( xi * f ( xi ) )

i =n

EC203 Introduction to Empirical Economics. KT.


Probability distribution of a continuous r.v Instead of a probability mass function we have a probability density function (PDF) Because a continuous r.v. can take an infinite number of values the probability of it taking any one is always measured over an interval Formally, means use of integral rather than summation operator (used for discrete r.v.s)
P( x1 < X < x2 = x f ( x)dx


for all x1<x2 (dx is a small interval of x values) However, key is geometric representation

EC203 Introduction to Empirical Economics. KT.


(Insert diagram and/or see Gujarati Fig 2.3)

Note that f(xi) =0 Properties of a PDF 1. Total area under the curve f(x) is 1 2. P(x1<X<x2) is the area under the curve between x1 and x2, where x2> x1 3. Because the probability that a continuous r.v. takes a particular value is 0, we can say that P(x1Xx2)= P(x1<Xx2)= P(x1X<x2)= P(x1<X<x2)

EC203 Introduction to Empirical Economics. KT.


Gujarati Miss material on cumulative distribution functions for now Also miss section on multivariate probability density functions (later class) Statistical independence will also be key in later courses For now, move onto Chapter 3 Characteristics of probability distributions * also referred to as moments of PDFs Next slide

EC203 Introduction to Empirical Economics. KT.


Moments of PDFs The first moment of a PDF is the expected value of the random variable it represents the weighted average of all possible values of all possible values where the probabilities of these values serve as weights also the average or mean value the population mean value E.g. throwing a dice outcomes are {1,2,3,4,5,6} each with a probability of 1/6 So, EV(X)=1/6+2/6+3/6+4/6+5/6+6/6 = 21/6 = 3.5 Odd, since this is a discrete r.v., with 3.5 not an option? Think of if someone gave you 1 for each number on the dice (i.e. 6 for the 6, 1 for the 1), after a number of rolls of the die, you would anticipate receiving 3.50 per roll
EC203 Introduction to Empirical Economics. KT. 14

Insert or take Gujarati Fig 3.1

Gujarati read section on properties of the expected value Key here is that the expected value is a measure of central tendency of the PDF Ignore for now section on EV of multivariate PDFs Our next focus is the second moment of the PDF the variance, a measure of dispersion
EC203 Introduction to Empirical Economics. KT. 15

Variance of a PDF In Lecture 2, we looked at the standard deviation

2 iN ( Y Y ) i s = =1 N 1

i.e. we square the total of summing across the deviation of each observation of Y from the sample mean and divide by the number of observations minus 1
In empirical economics we normally replace s with x as notation for the standard deviation of variable X The variance is defined as
2 var( X ) = x

that is, the square of the standard deviation

EC203 Introduction to Empirical Economics. KT.


We wont go into details on computing the variance here see Gujarati, but not very intuitive, so well move on.. Several r.v.s may have the same expected value but different variances. Geometrically..
(Insert or Gujaratic Fig 3.2)

See Gujarati on the properties of variance. The key one for this particular class is the first the variance of a constant is zero (by definition, a constant has no variability The final 3 concepts today link to Lec 4..
EC203 Introduction to Empirical Economics. KT. 17

In Gujarati, skip until we get to the coefficient of variation this is a measure of relative variation between the mean and standard deviation of an r.v. because the mean and standard deviation will be in common units of measurement, the coefficient of variation this is independent of units therefore it is useful for comparison across different r.v.s Then there is the covariance this is a special kind of EV that measures how two variables vary or move together it can be positive, negative or zero Again, the computation of the covariance is not too intuitive and we wont be focussing on it here other than its role in calculating the correlation coefficient, but you will in future applied classes

EC203 Introduction to Empirical Economics. KT.


The last but one thing we are interested in in Gujarati Ch3 is the (population) correlation coefficient this is found by taking the covariance of 2 r.v.s and dividing by the product of their standard deviations Thus, it is a measure of linear association between two variables..this is the focus of our next lecture Finally, if you skip forward to the section titled From the population to the sample, and read over about the sample mean, variance, covariance and correlation coefficient, this will be our focus in Lecture 6

EC203 Introduction to Empirical Economics. KT.


EC203 Introduction to Empirical Economics. KT.


EC203 Introduction to Empirical Economics. KT.