Академический Документы
Профессиональный Документы
Культура Документы
Applied Marketing (Market Research Methods) Topic 6: Inference, condence intervals and sample size determination
Dr James Abdey
Dr James Abdey
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Overview
Here we consider sample size determination in simple random sampling Properties of the sampling distribution are discussed We describe the required adjustments to statistically determined sample sizes to account for incidence and completion rates Non-response issues in sampling are also covered, with ways of improving response rates
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Estimation
A population has particular characteristics of interest such as the mean, variance etc. Collectively we refer to these characteristics as parameters If we do not have population data, the parameter values will be unknown Statistical inference is the process of estimating the (unknown) parameter values using the (known) sample data We use a statistic (estimator) calculated from sample observations to provide a point estimate
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Estimation Example
Returning to our example, recall there are 15 different samples of size 2 from a population of size 6 Suppose the variable of interest is income A B C D E Individual Income in 000s 3 6 4 9 7
Sampling distribution of X
F 7
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
If we seek the population mean, , we will use the , as our estimator sample mean, X =1 X n
n
Xi
i =1
For example, if the observed sample was AB, the sample mean is (3000 + 6000)/2 = 4,500
Estimation Example
Clearly, different observed samples will lead to different sample means for all possible samples (in 000s): Consider X Sample Values X Sample Values X AB 36 4.5 BF 67 6.5 AC 34 3.5 CD 49 6.5 AD 39 6 CE 47 5.5 AE 37 5 CF 47 5.5 AF 37 5 DE 97 8 BC 64 5 DF 97 8 BD 69 7.5 EF 77 7 BE 67 6.5
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Sampling distribution of X
The previous slide showed all possible values of the estimator X Since we have the population data here, we can actually compute the population mean (in 000s) 1 = N
N i =1
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem
3+6+4+9+7+7 Xi = =6 6
values far from So even with SRS, we obtain some X = Here only one sample (AD) results in X
Sampling distribution of X
| Lets now consider the maximum | X
Overview
Number of samples 1 6 10 12 14 15
Estimation
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
So, for example, there is an 80% chance of being within 1.5 units of
Sampling distribution of X
We now represent this as a frequency distribution That is, we record the frequency of each possible value of X Frequency Relative frequency X 3.5 1 1/15 = 0.067 4.5 1 1/15 = 0.067 3 3/15 = 0.200 5.0 5.5 2 2/15 = 0.133 6.0 1 1/15 = 0.067 3 3/15 = 0.200 6.5 7.0 1 1/15 = 0.067 7.5 1 1/15 = 0.067 8.0 2 2/15 = 0.133 This is known as the sampling distribution of X
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Sampling distribution of X
The sampling distribution is a central and vital concept in statistics It can be used to evaluate how good an estimator is Specically, we care about how close the estimator is to the population parameter of interest As we have seen, different samples yield different X values, as a consequence of the random sampling procedure is an example) are Hence estimators (of which X random variables is our estimator of So, X is a point estimate The observed value of X
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Like any distribution, we care about a sampling distributions mean and variance Together, we can assess how good an estimator is First, consider the mean we seek an estimator which does not mislead us systematically So the average (mean) value of an estimator, over all possible samples, should be equal to the population parameter
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Product 3.5 4.5 15.0 11.0 6.0 19.5 7.0 7.5 16.0 90.0
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
is always an Fortunately the sample mean X unbiased estimator in SRS, regardless of:
the sample size, n the distribution of the (parent) population
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
This is a good illustration of a population parameter, , being estimated by its sample counterpart, X
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Frequency 1 1 3 2 1 3 1 1 2 15
Product 6.25 2.25 3.00 0.50 0.00 1.75 1.00 2.25 8.00 24.00
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
The population itself has a variance the population variance, 2 X 3 6 4 9 7 X 3 0 2 3 1 (X 9 0 4 9 1 ) 2 Frequency 1 1 1 1 2 Product 9 0 4 9 2
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
So for our example, ) = 6 2 4 = 1.6 Var(X 61 2 We use the term standard error to refer to the standard deviation of the sampling distribution, ) = S.E.(X ) = Var(X N n 2 = X N 1 n
Implications:
Overview Choosing a sample size
as the sample size, n, increases, the sampling variance decreases, i.e. the precision increases1 provided the sampling fraction, n/N , is small, the term N n 1 N 1 so can be ignored the precision depends effectively on n only
Estimation
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Although greater precision is desirable, data collection costs will rise with n (remember why we sample in the rst place!)
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
decreases steadily as n N
When n = 1 the factor equals 1, and when n = N it equals 0 Sampling without replacement, increasing n must increase precision since less of the population is left out In much practical sampling N is very large (e.g. several million), while n is comparably small (e.g. at most 1,000, say) Therefore in such cases the factor negligible, hence ) = Var(X
N n N 1
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
becomes
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
= =
2 X
2 2 , despite N << N So X 1 2 X
1
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction
The approximation is reasonable for n at least 30, as a rule-of-thumb Though because this is an asymptotic approximation (i.e. as n ), the bigger n is, the better the normal approximation Special case: if the population distribution is itself will have an exact Normal distribution for Normal, X any sample size n
Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
0.3
0.0
0.1
0 Sample mean
Although the shape of the population distribution does not affect the generality of the CLT result, it does affect the speed of convergence of the to the Normal distribution sampling distribution of X Obviously a symmetric population distribution would converge faster in n In practice, n = 30 is usually adequate to make the Normal approximation reasonable
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Remember the CLT is based on SRS Without probability sampling methods, there is absolutely no basis for the use of the CLT This is principally why we insist on probability (random) sampling Otherwise the whole structure of statistical inference collapses!
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
The CLT also makes the use of the variance more reasonable The Normal distribution is completely characterised by its mean and variance Hence it is sensible to focus attention on these two characteristics of the sampling distribution
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
A point estimate is our best guess of an unknown population parameter based on sample data But as its based on a sample, there is some uncertainty/imprecision Condence intervals (CIs) communicate the level of imprecision
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
The general format (for our purposes) for a condence interval is statistic (multiplier coefcient) standard error
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Xi
Overview Choosing a sample size
n
Estimation
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Unfortunately, to use the approach just discussed requires knowledge of the population variance, 2 This is because it is used in the standard error: z X n In practice, we are unlikely to know 2 After all, its a population characteristic, and so if we do not know , why would we know 2 ?
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
s2 N n s2 = N n n
n
where 1 s2 = n1
n i =1
Estimation
Sampling distribution of X
1 )2 = (xi x n1
2 xi2 nx
i =1
Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
n 1 in the social sciences since typically NN Once we have estimated this, we proceed as before to construct a CI using the estimate of the standard error in place of the actual standard error
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Note the trade-off between accuracy and data cost Solution: x desired precision and nd smallest n which achieves this If we want the sample mean to be within a tolerance d of with a specied probability, then d =z n = z 22 n= d2
Overview Choosing a sample size Estimation
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
n is the minimum sample size required to achieve the desired precision n must be an integer, so always round up!
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Note that the required sample size in this type of calculation needs to be rounded up from a decimal fraction, since rounding down would result in a value not quite large enough!
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Subjective estimates when it is no longer feasible to increase the response rate by sub-sampling, replacement, or substitution, it may be possible to arrive at subjective estimates of the nature and effect of non-response bias This involves evaluating the likely effects of non-response based on experience and available information
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known
Imputation
Imputation involves imputing, or assigning, the characteristic of interest to the non-respondents based on the similarity of the variables available for both non-respondents and respondents For example, a respondent who does not report brand usage may be imputed the usage of a respondent with similar demographic characteristics
Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known