Вы находитесь на странице: 1из 52

ECON 1005

INTRODUCTION TO STATISTICS
ESTIMATION
Introduction
• In the last two lectures, we discussed the characteristics and
properties of the probability distributions of random variables

• These characteristics were the parameters:


– n and p for the Binomial Distribution
– λ for the Poisson Distribution
– μ and σ for the Normal Distribution

• In the real world we frequently do not know the values of


these parameters and will have to estimate them.

• Three words become critical for this chapter: estimation (the


process); estimate (the result) and estimator (the facilitator).
Three approaches to estimating unknown
population parameters.

First approach -
• Perform a complete enumeration of the population
(also known as a census) and calculate the mean and
variance from the dataset so derived. Unfortunately:
– It can be expensive
– It can be time consuming
– It consumes large quantities of resources
– It can be destructive to the elements of the population
– It may yield a level of accuracy that is not cost effective
when compared with the results of an appropriately sized
sample.
Second Approach -

• Guess the average value or proportion from


our knowledge of the population

Unfortunately this approach:


– is very unscientific
– may produce results that vary widely
– leaves the researcher with no method of judging
how close the ‘guesstimate’ is to the real value of
the population parameter.
Third Approach -

• The preferred method :- draw a random


sample of appropriate size from the
population and use the sample data and your
choice of a formula (called a sample statistic)
to estimate the unknown population
parameter.
Definition of Estimation

Estimation then is the process by which we


estimate the value of an unknown population
parameter by making use of the data from a
random sample that was drawn from that
Population.
THE ESTIMATION PROCESS

• Identify the Unknown Population Parameter


|
• Decide on the Size of the Random Sample - n

• Select the Random Sample of Size n

• Choose an Appropriate Sample Statistic [Estimator]

• Substitute the Sample Data into the Sample Statistic

• Calculate the estimate and interpret


Two Types of Estimates
Suppose we seek to estimate the mean age of
Level I students on the Campus. We may draw
a random sample of 100 Level I students from
the Campus, record their ages, substitute the
100 values into the formula for the mean of a
sample (also called the sample statistic or
estimator), and read off the estimate.

The resulting estimate can be a single value


e.g. 20 or an interval of values ( 18 , 22).
Two Types of Estimates

• The single valued estimate of an unknown


population parameter derived through
estimation is called a point estimate.

• On the other hand, the estimate comprising of


an interval of values derived through
estimation is called an interval estimate for
the unknown population parameter.
Estimators
• How do we use the data from our random
sample to arrive at an estimate?
• We substitute the sample data into a formula
better known as a sample statistic.
• These sample statistics are called estimators.
• A point estimator for an unknown population
parameter is a sample statistic into which the
data from the random sample is substituted,
so as to yield a point estimate of that
parameter.
Commonly Used Point Estimators
Population Sample
Parameter Statistic
Meanμ Sample Mean
Sample Median
Sample Mode

Std Deviation σ Sample Std Deviation s

Λ
Proportion p Sample Proportion p
Example
The mean and standard deviation of the teaching experience of
faculty members in a department at a University are unknown. A
random sample of 5 faculty members were selected; their teaching
experience in years were as follows: 7 8 14 7 20

• Identify suitable point estimators for the mean teaching experience


of the entire faculty

• Identify suitable point estimators for the standard deviation of


teaching experience of the entire faculty

• Find a point estimate of the mean teaching experience of the entire


faculty

• Find a point estimate of the std deviation of the teaching


experience of the entire faculty.
Solution
1. We can use any of three point estimators to estimate the population mean viz.
sample mean, sample mode or sample median.

2. We can use the sample standard deviation (s) as the point estimator for the
population standard deviation.

On the basis of the three estimators declared in 1. above, we can compute three
point estimates.

– Sample Mean = 1/5 ( 7 + 8 + 14 + 7 + 20 ) = 11.2

– Sample Mode = 7

– Sample Median = 8

– The point estimate of the population standard deviation is the value of s .


s =  1/4 (4.2 2 + 3.22 + 4.22 +2.82 +8.82) = 5.718
Solution
Some Realities
• Since we must estimate population parameters from
samples, it is inevitable that we make errors.

• Different sample sizes can give rise to different point


estimates when the same estimator is used

• Different estimators can give rise to different point


estimates when the same sample is used

• Different estimators and different sample sizes can give rise


to different point estimates

• Some estimates will agree with the true value of the


population parameter; others would not.
Error in Estimation
• The difference between the point estimate and the true
value of the population parameter is known as the total
error in the estimate.

• This total error between the point estimate and the true
value of the population parameter can be the result of both
sampling error and non-sampling error.

• The sampling errors occur because of chance.

• Other errors may also arise as a result of human errors, and


not chance; these tend to impair the results obtained. Such
errors are called non-sampling errors.

• TOTAL ERROR IN THE ESTIMATE = SAMPLING ERROR + NON-


SAMPLING ERROR
Sources of Non-Sampling Error

There exists many sources of non-sampling


error. Some of these sources are :
• Inability to obtain all the required information from
all elements of the sample
• Difficulties in defining terms
• Differences in interpretation of questions
• Errors in the data collection such as in recording or
coding
• Errors made in the data tabulation activity.
Example
Consider a history class of five students. Their exam scores were 70, 78, 80,
80 & 95.

• Find the population mean. (μ = 80.6)

Suppose that a random sample of three students was drawn i.e. 70, 80 & 95.

• Use the sample data and the sample mean to estimate the population
mean. ( = 81.67)

• What is the difference due to chance? (1.07)

Now suppose that we mistakenly recorded 82 instead of 80.

• What would be the new estimate of the population mean? ( = 82.33)

• What is the new difference between the population mean and the point
estimate? (1.73)
Example (cont’d)
• It is this difference of 1.73 that we call the total error in
the estimate. It is subdivided into two components:
– The sampling error of 1.07
– The non-sampling error of 0.66

• As this error grows, the sample statistic will become


less useful as an estimator of the population
parameter.

• We must therefore be able to determine the impact of


the error on the inferences that we will be making by
subjecting the estimators to specific tests. These are
discussed in the next chapter.
Unbiased Point Estimators
SAMPLING DISTRIBUTION OF THE MEAN
Return to our population of history scores for the class comprising five
students A, B, C, D and E.
A = 70, B = 78, C = 80, D = 80, E = 95
Population Mean = 80.6 Population Std Deviation = 8.09

We will now perform the following activities.

1. Consider all possible samples of three scores from this population; there
are 10 such samples.

2. Compute the sample mean for each of the 10 samples.

3. Construct the Frequency Distribution of Sample Means.

4. Construct the Relative Frequency Distribution of Sample Means.

5. Rename Relative Frequency as Probability to create the Probability


Distribution of the Sample Means
1 & 2. Generating the 10 Random Samples of Size 3

Sample Scores in the Sample


Sample Mean
• ABC 70, 78, 80 76.00
• ABD 70, 78, 80 76.00
• ABE 70, 78, 95 81.00
• ACD 70, 80, 80 76.67
• ACE 70, 80, 95 81.67
• ADE 70, 80, 95 81.67
• BCD 78, 80, 80 79.33
• BCE 78, 80, 95 84.33
• BDE 78, 80, 95 84.33
• CDE 80, 80, 95 85.00
3. The Frequency Distribution of Sample Means
X f

76.00 2

76.67 1

79.33 1

81.00 1

81.67 2

84.33 2

85.00 1

∑f = 10
4. The Relative Frequency Distribution of
Sample Means
X Relative Frequency

76.00 0.2

76.67 0.1

79.33 0.1

81.00 0.1

81.67 0.2

84.33 0.2

85.00 0.1

∑Rel. Freq. = 1
5. The Probability Distribution of Sample Means
(or The Sampling Distribution of the Mean)
X Probability

76.00 0.2

76.67 0.1

79.33 0.1

81.00 0.1

81.67 0.2

84.33 0.2

85.00 0.1

∑Probability = 1
Sampling Distributions in this Course
• In general, the probability distribution of a
Sample Statistic is called its sampling distribution.
• We will focus on two sampling distributions:
– Sampling Distribution of the Mean
– Sampling Distribution of the Proportion
• In the Sampling Distribution of the Mean, the
random variable is the sample mean .
• In the Sampling Distribution of the Proportion,
the random variable is the sample proportion pΛ.
The Mean of the Sampling Distribution of the Mean

• The mean of the sampling distribution of the


mean is equal to the population mean μ.

• Class Activity
Compute the mean of the Sampling Distribution of
the Mean History Score based on the ten random
samples of size 3.
Show that it is indeed equal to the population mean.
The Standard Deviation of the Sampling Distribution
of the Mean
• The Standard Deviation of the Sampling Distribution of
Mean is given by σx where
σx = σ /√n.

• σx is also called the standard error or the standard


error mean.

• The spread of the Sampling Distribution of the Mean is


smaller than the spread of the corresponding
population distribution.

• The standard deviation of the Sampling Distribution of


Mean decreases as the sample size increases.
What kind of distribution will the Sampling
Distribution of the Mean have?

• If the population from which the samples are


drawn is normally distributed with mean μ
and standard deviation σ , then the Sampling
Distribution of the Mean will also be normally
distributed with mean μ and standard
deviation σx (irrespective of the sample size).

• Does the above result hold true if the


population were not normally distributed?
What kind of Probability Distribution does the Sampling
Distribution of the Mean possess when the population is not
Normal ?

The Central Limit Theorem assures us that


• If the sample size is large, the Sampling
Distribution of the Mean will be approximately
normally distributed with mean μ and standard
deviation σx irrespective of the distribution of
the population.

• Large is taken to mean n ≥ 30.

• What happens when the sample size is small i.e.


n < 30?
Example
• The population mean and standard deviation
of the final exam grades of ECON1005
students is known to be 65 and 20
respectively. If a sample of 100 students are
drawn at random, what is the probability that
the sample mean will be greater than 70?
What kind of Probability Distribution does the Sampling
Distribution of the Mean possess when the population is not
Normal and sample size is small i.e. n < 30?
• We must look to the Student t Distribution

• The Student t Distribution is a specific type of bell-shaped distribution with a lower


height and a wider spread than the Standard Normal Distribution.

• The Student t Distribution has only one parameter i.e. the number of degrees of
freedom abbreviated df

• The number of degrees of freedom is the number of observations that can be


freely chosen.

• The mean of the Student t Distribution is 0

• The standard deviation of the Student t Distribution is df/(df – 2)

• As the degrees of freedom increases the Student t Distribution approaches the


Standard Normal Distribution.
What kind of Probability Distribution does the Sampling Distribution
of the Mean possess when the population is not Normal and sample
size is small i.e. n < 30?

• If the population from which the samples are


drawn is either of unknown distribution or not
normally distributed with mean μ and standard
deviation σ, then the Sampling Distribution of the
Mean is specified by the Student t Distribution
with n - 1 degrees of freedom.

• The random variable of the Student t Distribution


is given by T where
T = (x - μ)/σx .
Activity
• The mean age of all diabetes patients in
Tobago is 50 with a Std.Dev. Of 15. If a sample
of 25 patients is drawn determine the
probability that the mean age of these
patients will be less than 60 years old?
The Sampling Distribution of Proportion
• The probability distribution of the sample
proportion is called the Sampling Distribution
of the Proportion.
• The random variable of the Sampling
Distribution of the Proportion is pΛ
• The mean of the Sampling Distribution of the
Proportion is the population proportion p.
• The standard deviation of the Sampling
Distribution of the Proportion is given by
√(pq/n).
What is the shape of the Sampling Distribution
of the Proportion?

The Central Limit Theorem assures us that

• If the sample size is sufficiently large, the


Sampling Distribution of the Proportion will be
approximately normally distributed with mean
p and standard deviation √(pq/n).

• Sufficiently Large means np > 5 and nq > 5.


Interval Estimates: Confidence Intervals
• We were speaking all along about Unbiased Point
Estimators.
• Instead of assigning a single value to an unknown
population parameter, we can construct an interval of
values around the point estimate and make a probabilistic
statement that the interval contains the value of the
corresponding population parameter.

• Such activity is called interval estimation and interval


estimators are called Confidence Intervals.

• These estimators, when applied to the data from a random


sample, defines an interval that is likely to contain the true
value of the population parameter being estimated.
Interval Estimates
• The likelihood is seen to be a level of confidence; this is discerned
from the probabilistic statement and written as a percentage

• An interval that is constructed based on the confidence level is


called a confidence interval.
• A 90% Confidence Interval means a 10% significance level i.e. α = 10%
• A 95% Confidence Interval means a 5% significance level i.e. α = 5%

• Confidence Interval Estimates in this course are as follows:

– For the population mean based on large samples


– For the population mean based on small samples
– For the population mean based on large samples with σ unknown
– For the population mean based on small samples with σ unknown
– For the population proportion
A 100 (1 - α)% Confidence Interval Estimate for the
Population Mean μ
A 100 (1 - α)% Confidence Interval Estimate for the
Population Mean μ when σ is unknown.
• Let X ~ N(μ , σ) where σ is unknown. A single sample of size n was drawn
and the sample mean X was computed. On the basis of this single sample
mean, find a 100(1 - α)% Confidence Interval Estimate for μ.

• Here we substitute s for the unknown σ.

• However, it matters whether

– n is large i.e. (n ≥ 30)

– n is small i.e. (n < 30)

• If n ≥ 30 the CLT allows us to use the Normal Distribution N(μ , s/√n ) as the
Sampling Distribution

• If n < 30 the CLT allows us to use the Student-t Distribution with n – 1 df as


the Sampling Distribution.
A 100 (1 - α)% Confidence Interval Estimate for the
Population Mean μ when σ is unknown and n ≥ 30.

A 100( 1 – α)% interval estimate for the population


mean μ when n ≥ 30 and σ is unknown is given by

X - Zα/2 s/√n ≤ μ ≤ X + Zα/2 s/√n


or
(X - Zα/2 s/√n , X + Zα/2 s/√n)

where Zα/2 comes from the Std Normal Distribution and


s is the sample standard deviation.

Diagram & Interpretation???


A 100 (1 - α)% Confidence Interval Estimate for the
Population Mean μ when σ is unknown and n < 30.
A 100( 1 – α)% interval estimate for the population mean
μ when n < 30 and σ is unknown is given by

X - tα/2 s/√n ≤ μ ≤ X + tα/2 s/√n


or
(X - tα/2 s/√n , X + tα/2 s/√n)

where tα/2 comes from the Student-t Distribution with


(n – 1) degrees of freedom and s is the sample standard
deviation

• Diagram & Interpretation???


Class Exercise 1

The standard deviation for a population is 14.8.


A sample of 100 observations selected from this
population gave a mean of 143.72.

• Construct a 99% confidence interval for μ.


• Construct a 95% confidence interval for μ.
• Construct a 90% confidence interval for μ.

• Does the width of the confidence intervals constructed


in parts a. to c. decrease as the confidence level
decreases? Explain.
Interpretation
• If the experiment of drawing a random sample
of 100 observations is repeated a very large
number of times. 99% of the time that we
obtain a sample mean of 143.72 the true
value of the population mean (insert what you
are studying) will be between 139.909 and
147.531
Answer to Class Exercise 1
• 99% CI is (139.92 and 147.52)
• 95% CI is (140.82 and 146.62)
• 90% CI is (141.28 and 146.16)
• Notice that the width of the Confidence
Interval decreases as the Confidence level
decreases.
• It makes sense right? Why?
Another Class Exercise
A sample of 10 observations taken from a
normally distributed population produced the
following data:

44 52 31 48 46 39 47 36 41 57

a. What is the point estimate of μ?


Sample Mean = 44.1 and S = 7.67

b. Construct a 95% confidence interval for μ.


Activity
• A sample of 49 households in Grenada yielded
an average monthly income of $3000 with a
Std.Dev. of $450. Construct an 80% confidence
interval of the monthly average income of all
households in Grenada, if the underlying
population is normally distributed?
A 100 (1 - α)% Confidence Interval Estimate for the
Population Proportion p.

A 100( 1 – α)% interval estimate for the population


proportion p is given by

pΛ - Zα/2 √(pq/n) ≤ p ≤ pΛ + Zα/2 √(pq/n)


or
(pΛ - Zα/2 √(pq/n) , pΛ + Zα/2 √(pq/n))

where Zα/2 comes from the Std Normal Distribution.


Exercise
• In a sample of 100 patients from a clinic, 30%
have been found to have a history of heart
disease in their family. Compute a 90%
confidence interval for the proportion of all
patients having a history of heart disease in
their family?
Working out Confidence Intervals:
Minitab Version

Exhibit I

Variable N Mean Median TrMean StDev SE Mean


Group A 135 51.44 51.00 51.40 09.37 *
Group B 108 51.80 53.00 52.05 ** 1.168

Variable Minimum Maximum Q1 Q3


Group A 18.00 71.00 44.00 57.00
Group B 13.00 81.00 43.00 61.00
Question: Assuming normality calculate a 91% confidence
interval for the mean Group B score and give an
interpretation to the result obtained.

91% Confidence Interval for the mean Group B score is given by:

(X - Zα/2 σx , X + Zα/2 σx) where Z0.09/2 = Z0.045 = 1.70

Substituting our known values:


= 51.8 – 1.7(1.168) < μ < 51.8 + 1.7(1.168)
= 49.814 < μ < 53.786

Therefore, if random samples of size 108 are drawn a large number of times
and the sample mean calculated is 51.8, then 91% of the times, the
corresponding population mean would lie between 49.8 and 53.8.
End of Lecture 8
• We have reviewed the Confidence Intervals
that form an integral part of the 5 stages of a
statistical analysis.
• Next we move on to another level of
investigation with respect to sample data.
• This involves Hypothesis testing.

Вам также может понравиться