Вы находитесь на странице: 1из 85

Sampling Design

SI-5098 Metoda Penelitian


Research Staging
OBSERVATION
Broad area of
research interest
identified

THEORETICAL
PROBLEM
FRAMEWORK GENERATION OF SCIENTIFIC
DEFINITION
Variables clearly HYPOTHESES RESEARCH
Research Problem DESIGN
identified and labeled
Deliniated

PRELIMINARY DATA DATA COLLECTION,


GATHERING ANALYSIS AND
NO INTEPRETATION
Interviewing
Literature survey DEDUCTION
Hypotheses substantiated
? ?
Research Question
answered ?

Report Writing
Issues Involved in the Research Design
MEASUREMENT
DETAILS OF STUDY
establishing • Operational DATA
• Causal relations • Studying event definition
• Manipulation • Items
ANALYSIS
• Exploration • Corelations
• Description • Group Differences, • Control • Scaling
• Hypotheses rank, etc • Simulation • Contrived • Categorizing
Testing • Non-contrived • Coding
Extent of
Type of Researcher
PROBLEM STATEMENTS

Purpose of the Study Setting Measurements • Feel for data


Study Investigation Interference and Measure
• Goodness of
data

• Hypotheses
Unit of Sampling Time Horizon Data Collection testing
Analysis Design Method
• Individuals • Sampling Method • Cross-sectional
• Dyads • Sampling Size • Longitudinal • Observation
• Groups • Interview
• Organizations • Questionaire
• etc • Physical
Measurements
• Unobtrusive
Data Collection Process
Preliminary Sample
Planning Design

Questioner Pilot Survey


Design Survey Administration

Coding Editing

Presentation
Results
Definition of sampling

Procedure by which some members


of a given population are selected as
representatives of the entire population
Why do we use sampling?
Get information from large populations with:
– Reduced costs
– Reduced field time
– Increased accuracy
– Enhanced methods
Basic Concepts
• UNIVERSE:Totality of items or units in any field of enquiry
• POPULATION: Total of items about which information is desired
aggregate of elementary units (finite or infinite, N) possess at least
one common characteristics -real or hypothetical
• ELEMENTARY UNITS: Units possessing the relevant characteristics
i.e., attributes that are the object of study (operational definition)
Basic Concepts
• SAMPLING FRAME:
– A list of all the units of population
– Perfect frames seldom exists; elementary units or cluster of such units or the
group form the basis for frame for finite population
– Frame is either constructed by the researcher or some existing list of population
is used
– Sampling frame should be a good representative of the populationand as far as
possible free from
– Incompleteness
– Inaccurateness
– Inadequateness
– Out-of-date (complete, no missing element, no ineligibles, no element should appear
more than once & up to date)
• SAMPLING DESIGN: (Target population →Survey population)
– A definite plan for obtaining a sample from the sampling frame
– Refers to technique or procedure adopted by the researcher
Basic Concepts
• PARAMETER(S): A characteristic of a population
• STATISTIC(S):A characteristic of a sample (estimation of a parameter
from a statistic is the prime objective of sampling analysis)
• SAMPLING ERROR (or ERROR OF VARIANCE): Errors which arise on
account of sampling. Certain amount of inaccuracy in the
information collected could be due to sampling
– Total Error= Sampling error + Measurement Error + Non sampling
Error
– Sampling Error= Frame Error + Chance Error + Response Error
– Sampling errors occur randomly on either direction.
– Magnitude of errors depends on homogeneity of the universe
and size of the sample
Basic Concepts
• PRECISION:Range within which the population parameter will lie in
accordance with the reliability specified in the confidence level (as a
% of estimate +/-or as a numerical quantity) I .O. W., range within
which the answer may vary and still be acceptable (probable
accuracy of sample is affected by variability, size & design of
sample)
• RELIABILITY OR CONFIDENCE LEVEL: Expected % of times that the
actual value will fall within the stated precision limits i.e.. the
likelihood that the answer will fall within that range (acceptance
region of the normal curve)
Basic Concepts
• SIGNIFICANCE LEVEL: The likelihood that the answer will fall
outside the range (accepted region of the curve)
• SAMPLING DISTRIBUTION: Values of a particular statistic
computed from a certain number of samples together with their
relative frequencies is the sampling distribution of that statistic.
For large samples, tend to be close to normal distribution and
hence mean of a sampling distribution can be taken as mean of
the universe
Basic Concepts
• STANDARD ERROR:
– the standard deviation of a sampling distribution of a
statisticis its standard error; it is a key to sampling theory
– Helps in testing whether difference between observed and
expected frequency could arise due to chance
– Gives an idea about the reliability and precision of a sample
– Enables to specify the limits within which the parameters of
thepopulation are expected to lie with a specified degree of
confidence
Sampling
(select)
Population Sample

(estimate)
Parameter Statistic
true proportion sample proportion
true mean sample mean
Sampling and representativeness
Target Population  Sampling Population  Sample

Sampling
Population
Sample

Target Population
Types of Total survey
error
Survey Error

Random Systematic error


sampling error

Sample design
Measurement error
error
Surrogate
Processing information
error error Selection
Frame error
error
Interviewer
Response error
error Population
specification
Nonresponse Instrument bias error
bias
Sampling error
• Random difference between sample and population from which
sample drawn
• Size of error can be measured in probability samples
• Expressed as “standard error”
– of mean, proportion…
• Standard error (or precision) depends upon:
– Size of the sample
– Distribution of character of interest in population
Sampling errors
• This is not an "error" in the sense of making a mistake. Rather, it
is a measure of the possible range of approximation in the
results because a sample was used
• Interviews with a representative sample of 1,000 adults can
accurately reflect the opinions of nearly ~2 million adults
• This range of possible results is called the error due to sampling,
often called the Margin Of Error (MOE)
Sampling errors
Population distribution, e.g. income

m ( population mean)
The sample mean falls here only because
Sampling error
certain randomly selected observations
were included in the sample
Sample

x ( sample mean)
Non-sampling errors…
– Process errors:
• Examples include measurement error, interviewer
error, and processing error.
• It can be minimised by proper interviewer training,
good questionnaire design, pre-testing, and careful
management of the data recording process.
– Theproblem is most serious when a bias is
created.
Non-sampling errors…
• Errors in data acquisition:
– Selection bias
– Randomly select people – don’t let them/you
select these people!!
– Non-response errors
• Anonymity, questionnaire design, relevance
• Call backs, substitution, re-weighting data
Non-sampling error
Population

Sample Sampling error + Non–sampling error

…then the sample mean is affected


Sample Accuracy
• Sample accuracy: refers to how close a random sample’s statistic
(e.g. mean, variance, proportion) is to the population’s value it
represents (mean, variance, proportion)

• Important points:
• Sample size is NOT related to representativeness … you
could sample 20,000 persons walking by a street corner and
the results would still not represent the city; however, an n of
100 could be “right on.”
Sample Accuracy
• Important points:
• Sample size, however, IS related to accuracy. How
close the sample statistic is to the actual population
parameter (e.g. sample mean vs. population mean) is
a function of sample size.
Sample Size AXIOMS

To properly understand how to determine


sample size, it helps to understand the
following AXIOMS…
Sample Size Axioms
• The only perfectly accurate sample is a census.
• A probability sample will always have some inaccuracy
(sample error).
• The larger a probability sample is, the more accurate it is
(less sample error).
• Probability sample accuracy (error) can be calculated
with a simple formula, and expressed as a + % value.
Sample Size Axioms
• You can take any finding in the survey, replicate the
survey with the same probability sample plan & size, and
you will be “very likely” to find the same result within the
+ range of the original findings.
• In almost all cases, the accuracy (sample error) of a
probability sample is independent of the size of the
population.
Sample Size Axioms
• A probability sample can be a very tiny percentage of the
population size and still be very accurate (have little
sample error).
• The size of the probability sample depends on the client’s
desired accuracy (acceptable sample error) balanced
against the cost of data collection for that sample size.
CENTRAL LIMIT THEOREM
Central Limit Theorem
The Central Limit Theorem allows us to use the logic of the
Normal Curve Distribution
• Since 95% of samples drawn from a population
will fall within + 1.96 x Sample error (this logic is
based upon our understanding of the normal
curve) we can make the following statement: ….
Central Limit Theorem

If we conducted our study over and over, e.g.1,000 times, we would expect our result to fall within a
known range (+ 1.96 s.d.’s of the mean). Based upon this, there are 95 chances in 100 that the true
value of the universe statistic (proportion, share, mean) falls within this range!
Normal Distribution

1.96 X s.d. defines the endpoints for 95% of the distribution


Normal Distribution

n = 500

n = 1000

We also know that, given the amount of variability in the population, the
sample size affects the size of the confidence interval; as n goes down the
interval widens (more “sloppy”)
Central Limit Theorem
So, what have we learned thus far?
There is a relationship among:
• the level of confidence we desire that our results be repeated
within some known range if we were to conduct the study
again, and…
• the variability (in responses) in the population and…
• the amount of acceptable sample error (desired accuracy) we
wish to have and…
• the size of the sample.
Sample Size Formula
• The formula requires that we
a. specify the amount of confidence we wish to have,
b. estimate the variance in the population, and
c. specify the level of desired accuracy we want.
• When we specify the above, the formula tells us
what sample size we need to use….n
Sample Size Formula for Estimating
a Mean

Communication Research 35
Sample Size Formula
Estimating a Mean
This requires a different formula

Z is determined the same way (1.96 or 2.58)


e is expressed in terms of the units we are estimating, i.e. if we are measuring attitudes
on a 1-7 scale, we may want our error to be no more than + .5 scale units. If we are
estimating dollars being paid for a product, we may want our error to be no more than
+ $3.00.
s is a little more difficult to estimate, but must be in same units as e.
Sample Size Formula
Estimating “s” in the Formula to Determine the Sample Size
Required to Estimate a Mean
Since we are estimating a mean, we can assume that our data are either
interval or ratio. When we have interval or ratio data, the standard deviation
of the sample, s, may be used as a measure of variance.

How to estimate s?
• Use standard deviation of the sample from a previous study on the target
population
• Conduct a pilot study of a few members of the target population and
calculate s
Sample Size Calculation
Example: Estimating the Mean of a Population
What is the required sample size, n?

• Management wants to know customers’ level of satisfaction with their service. They
propose conducting a survey and asking for satisfaction on a scale from 1 to 10
(since there are 10 possible answers, the range = 10).
• Management wants to be 99% confident in the results (99 chances in 100 that true
value is captured) and they do not want the allowed error to be more than + .5 scale
points.
• What is n?
Sample Size Calculation

s = 1.7 (from a pilot study),


z = 2.58 (99% confidence), and
e = 0.5 scale points
• What is n? It is 77. Assume the survey average score was 7.3, what does this “tell
us?” A 10 is very satisfied and a 1 is not satisfied at all.
• Answer: “Our most likely estimate of the level of consumer satisfaction is 7.3 on a
10-point scale. In addition, we are 99% confident that the true level of satisfaction
in our consumer population falls between 6.8 and 7.8 on the scale.”
Sample Size Formula
Interval Estimation of a Population Mean:
Large-Sample Case

• Sampling Error
• Probability Statements about the Sampling Error
• Interval Estimation:  Assumed Known
• Interval Estimation:  Estimated by s
Sample Size Formula
Sampling Error
• The absolute value of the difference between an unbiased
point estimate and the population parameter it estimates is
called the sampling error.
• For the case of a sample mean estimating a population mean,
the sampling error is
Sampling Error = | x  m|
Sample Size Formula
Probability Statements about the Sampling error

• Knowledge of the sampling distribution of x enables us to make


probability statements about the sampling error even though
the population mean m is not known.
• A probability statement about the sampling error is a precision
statement.
Probability Statements about the Sampling Error
Precision Statement
There is a 1 -  probability that the value of a sample mean will provide a
sampling error of z  / 2  x or less.

Sampling
distribution
of
x
1 -  of all
/2 /2
xvalues

x
m
Interval Estimate of a Population Mean:
Large-Sample Case (n > 30)
•  Assumed Known

x  z /2
n
where: isxthe sample mean
1 - is the confidence coefficient
z/2 is the z value providing an area of
/2 in the upper tail of the standard
normal probability distribution
 is the population standard deviation
n is the sample size
Interval Estimate of a Population Mean:
Large-Sample Case (n > 30)
 estimated by s
In most applications the value of the population standard
deviation is unknown. We simply use the value of the
sample standard deviation, s, as the point estimate of the
population standard deviation.

s
x  z / 2
n
Example: Airport Check-in Counter

In an effort to estimate the mean amount time


spent per customer at airport check-in counter,
data was collected for a sample of 49 customers.
Assume a population standard deviation of 5
minutes.

x
Example: Airport Check-in Counter
At 95% confidence, what is the margin of error?
Margin of error is defined as: 
z 
/2
n
Given:
 = 5 minutes ,n = 49 ,  = 1 - .95 = .05
Margin of error :
 5 5
 z / 2   z .05 / 2   z .025
n 49 7
5
  1.96   1.4
7
Example: Airport Check-in Counter

If the sample mean is 24.80 minutes, what is a 95%


confidence interval estimate of the population mean?

A 95% confidence interval estimate of m is


24.80 minutes  margin of error
= 24.80 mnt  1.40 mnt

The 95% confidence interval estimate ranges from 23.40 mnt to


26.20 mnt
Interval Estimation of a Population Mean:
Small-Sample Case (n < 30)

Population is Not Normally Distributed


The only option is to increase the sample size to n > 30 and
use the large-sample interval-estimation procedures.
Interval Estimation of a Population Mean:
Small-Sample Case (n < 30)

Population is Normally Distributed :  Assumed Known


The large-sample interval-estimation procedure can
be used.


x  z / 2
n
Interval Estimation of a Population Mean:
Small-Sample Case (n < 30)

Population is Normally Distributed:


 estimated by s
The appropriate interval estimate is based on a
probability distribution known as the t distribution.
t Distribution
• The t distribution is a family of similar probability distributions.
• A specific t distribution depends on a parameter known as the
degrees of freedom.
• As the number of degrees of freedom increases, the difference
between the t distribution and the standard normal probability
distribution becomes smaller and smaller.
• A t distribution with more degrees of freedom has less
dispersion.
• The mean of the t distribution is zero.
t Distribution
Standard normal
distribution
t distribution
(20 degrees of freedom)

t distribution
(10 degrees of freedom)

z, t

0
t Distribution
/2 Area or Probability in the Upper Tail

/2

t
0 t/2
Interval Estimation of a Population Mean:
Small-Sample Case (n < 30) and  Estimated by s

Interval Estimate s
x  t / 2
n
where 1 - = the confidence coefficient
t/2 = the t value providing an area of /2 in the
upper tail of a t distribution
with n - 1 degrees of freedom
s = the sample standard deviation
Example
Interval Estimation of a Population Mean:
Small-Sample Case (n < 30) with  Estimated by s
In the testing of a new method, 18 employees were selected
randomly and asked to try the new method. The sample mean
production rate for the 18 employees was 80 parts per hour and
the sample standard deviation was 10 parts per hour. Provide a
95% confidence interval estimate for the population mean
production rate for the new method. Assume the population has
a normal probability distribution.
Example
Given : x  80, s  10, n  18
  1 - .95  .05,   .025
2
s 10
  2.36
n 18
t .025,17  2.11
s
x  t .025, 17  80  2.112.36 
n
 80  4.98
Summary of Interval Estimation Procedures
for a Population Mean
Yes No
n > 30 ?
No
 known ? Popul.
Yes
approx.
Yes normal
Use s to
No ?
estimate   known ?
No
Yes Use s to
estimate 

 s  s Increase n
x  z / 2 x  z / 2 x  t / 2 x  t / 2
n n n n to > 30
Sample Size for an Interval Estimate
of a Population Mean

• Let e = the maximum sampling error mentioned in the precision


statement.
• e is the amount added to and subtracted from the point
estimate to obtain an interval estimate.
• e is often referred to as the margin of error.
Sample Size for an Interval Estimate
of a Population Mean

Margin of Error 
e  z / 2
n
Necessary Sample Size
( z / 2 ) 
2 2
n 2
e
Example: Starting Salaries of College Graduates
Sample Size for an Interval Estimate of a Population Mean

Annual starting salaries for college graduates with business


administration degrees are believed to have a standard deviation of
approximately $2,000. Assume a 95% confidence interval estimate of
the mean annual starting salary is desired. How large a sample should
be taken if the desired margin of error is $200.
Example: Starting Salaries of College Graduates

Given :  $2,000, e  $200, 1    .95,


  .05,  2  .025, Z 2  1.96

n
( z / 2 ) 
2 2

1.96  2,000 
2 2

e 2
200 2

3.8416  4,000,000
  385
40,000
The Confidence Interval Method
• Confidence interval approach: applies the concepts of accuracy,
variability, and confidence interval to create a “correct” sample
size
• Two types of error:
• Non-sampling error: pertains to all sources of error other than
sample selection method and sample size
• Sampling error: involves sample selection and sample size…this
is the error that we are controlling through formulas
• Sample error formula:
Interpreting the meaning of a confidence
interval estimate
• and , the point estimates m and p respectively,
x
provides the best guess of these population
parameters.
• The estimated standard errors provide information
about the sampling variability.
• Confidence interval estimates not only gives us an idea
about the value of the estimated parameter, but also
informs us about the sampling variability via the
estimated standard error and the level of confidence
(1-).
The Confidence Interval Method
The relationship between sample size and sample error:
How to Calculate Sample Error (Accuracy)

pq
error  z Where z = 1.96 (95%)
or 2.58 (99%)
n
Sample Size and Accuracy

sp 16%
14%
12%

Accuracy
10%
8%
6%
4%
2%
0%

1100

1250

1400

1550

1700

1850

2000
50
200

350

500

650

800

950
66
Sample Size
Accuracy Levels for Different Sample Sizes
The “p” you found in your sample

• At 95% ( z = 1.96)
• n p=50% p=70% p=90%
• 10 ±31.0% ±28.4% ±18.6%
• 100 ±9.8% ±9.0% ±5.9%
• 250 ±6.2% ±5.7% ±3.7% 1.96 sp
• 500 ±4.4% ±4.0% ±2.6%
• 1,000 ±3.1% ±2.8% ±1.9%

95% Confidence interval: p ± 1.96 times sp


67
Proportions, Variability
• Variability: refers to how similar or dissimilar responses
are to a given question
• p (%): share that “have” or “are” or “will do” etc.
• q (%): 100%-P%, share of “have nots” or “are nots” or
“won’t dos” etc.

The more variability in the population being studied, the


larger the sample size needed to achieve stated accuracy
level.
With Nominal data (i.e. Yes, No), we can conceptualize answer
variability with bar charts…
the highest variability is 50/50
Interval Estimation of a Population
Proportion
Normal Approximation of Sampling Distribution of when np > 5
and n(1 – p) > 5

Sampling distribution p (1  p )
of
p p 
n

/2 /2

p
p

z / 2  p z / 2  p
Interval Estimation of a Population
Proportion
Interval Estimate
p (1  p )
p  z / 2
n
where: 1 - is the confidence coefficient
z/2 is the z value providing an area of
/2 in the upper tail of the standard normal probability
distribution
p is the sample proportion
Sample Size Formula for
Estimating a Proportion
The sample size formula for estimating a proportion (also
called a percentage or share):
Sample Size Formula for
Estimating a Proportion
How to estimate variability (p and q shares) in
the population ?

• Expect the worst case (p=50%; q=50%)

• Estimate variability: results of previous studies or


conduct a pilot study
Sample Size Formula for
Estimating a Proportion
How to determine the amount of desired sample
error ?
• Researchers should work with team to make this
decision. How much error is the client willing to
tolerate (less error = more accuracy)?
• Convention is + 5%
• The more important the decision, the less should be
the acceptable level of the sample error
Sample Size Formula for
Estimating a Proportion
How to decide on the level of confidence desired ?
• Researchers should work with team to make this decision.
The higher the desired confidence level, the larger the
sample size needed
• Convention is 95% confidence level (z=1.96 which is + 1.96
s.d.’s )
• The more important the decision, the more likely the client
will want more confidence. For example, a 99% confidence
level has a z=2.58.
Sample Size Formula for
Estimating a Proportion
Example:
Estimating a Percentage (proportion or share) in the Population What is the
Required Sample Size?
• Five years ago a survey showed that 42% of consumers were aware of
the company’s brand (Consumers were either “aware” or “not aware”)
• After an intense ad campaign, management will conduct another survey.
They want to be 95% confident (95 chances in 100) that the survey
estimate will be within + 5% of the true share of “aware” consumers in
the population.
• What is n?
Sample Size Calculation for
Estimating a Proportion

Z =1.96 (95% confidence)


p =42% (p, q and e must be in the same units)
q =100% - p%=58%
e = + 5%
What is n?
Sample Size Calculation for
Estimating a Proportion
n =374 What does this mean?
It means that if we use a sample size of 374, after the survey, we can
say the following of the results: (Assume results show that 55% are
aware)
“Our most likely estimate of the percentage of consumers that are
“aware” of our brand name is 55%. In addition, we are 95%
confident that the true share of “aware” customers in the population
falls between 52.25% and 57.75%.”
Note that: ( + .05 x 55% = + 2.75%) !!!!
Sample Size Calculation for
Estimating a Proportion
A survey by the Society for Human Resource Management asked 346
job seekers why employees change jobs so frequently (Wall Street
Journal, March 28,2000). The answer selected most (152 times) was
“higher compensation elsewhere.”
What is the 95% confidence interval estimate of the population
proportion of job seekers who would select “higher compensation
elsewhere” as the reason for changing jobs?
Sample Size Calculation for
Estimating a Proportion
152
The point estimate of the population proportion is: p  .44
346
p(1  p) .44(.56)
The standard error of the proportion is: p  
n 346
.2464
  .00071  .027
346
The margin of error is: E  Z / 2  p  Z.05 / 2 p  Z.025 p
1.96(.027 )  .052
The 95% confidence interval estimate of the population
proportion is: p E
0.44  0.052
or from 0.388 to 0.492.
Sample Size Calculation for
Estimating a Proportion
How large should the sample have been if the Client desired a margin
of error of 0.03?

E  Z / 2 p  Z.05 / 2 p  Z.025 p
p(1  p)
 Z.025
n Notice that the value of in
2 p (1  p )
this example is 0.5 and not
E  Z.025 
2

n 0.44. Why?
2 p (1  p )
n  Z.025 
E2
2 .5(1  .5)  .25 
 1.96  3.8416    1068
 .0009 
2
.03
Other Methods of Sample Size Determination

Arbitrary “percentage rule of thumb” sample size:


• Arbitrary sample size approaches rely on erroneous rules of
thumb (e.g. “n must be at least 5% of the population”).
• Arbitrary sample sizes are simple and easy to apply, but they
are neither efficient nor economical. (e.g. Using the “5 percent
rule,” if the universe is 12 million, n = 600,000 – a very large
and costly result)
Other Methods of Sample Size Determination

Conventional sample size specification


• Conventional approach follows some “convention” or
number believed somehow to be the right sample size (e.g.
1,000 – 1,200 used for national opinion polls w/+ 3% error)
• Using conventional sample size can result in a sample that
may be too large or too small.
• Conventional sample sizes ignore the special circumstances
of the survey at hand.
Other Methods of Sample Size Determination

Statistical analysis requirements of sample size


specification
• Sometimes the researcher’s desire to use particular statistical
technique influences sample size. As cross comparisons go up cell
sizes go up and n goes up.
Cost basis of sample size specification
• Using the “all you can afford” method, instead of the value of the
information to be gained from the survey being the primary
consideration in sample size determination, the sample size is
based on budget factors.
Other Methods of Sample Size Determination

Special Sample Size Determination Situations


Sample Size Using Nonprobability Sampling

When using nonprobability sampling, sample size is


unrelated to accuracy, so cost-benefit
considerations must be used

Вам также может понравиться