Вы находитесь на странице: 1из 39

Unit –II

Sampling and Sampling Distributions

Basic Concepts.

Need for Sampling.

Types of sampling.

Sampling Process

 Sampling Error.

 Sampling Distribution

 Central Limit Theorem


Basic Concepts
 A population is the collection of all the elements of interest.

 Sample Frame is that part of the population from which the


sample is actually taken. It is the list of all the accessible
elements of the population

 A sample is a subset of the population.

 Sampling is the process of obtaining a sample from a population

 Parameter is a numerical descriptive measure of a population.


Because it is based on all the observations in the population, its
value is almost always unknown
 Statistic is a numerical descriptive measure of a sample. It is
calculated from the observations in the sample
•An estimator of a population parameter is a sample statistic used to
estimate or predict the population parameter.

•An estimate of a parameter is a particular numerical value of a


sample statistic obtained through sampling.

•A point estimate is a single value used as an estimate of a population


parameter.

•A statistic is also referred to as point estimator when estimating a


population parameter

• An Interval estimate is a range of values used to estimate a


population parameter.

• Standard Error: It is the standard deviation of the sample statistic.


Parameters and Statistics

Measure Parameter Statistic

Mean μ x
Standard deviation σ s

Proportion p p
No. of elements N n
Need for Sampling
• Reduces cost of research

• Generalize about a larger population.

• In some cases analysis may be destructive, so sampling is needed.

• Time of researcher and those being surveyed.

• Confidentiality, anonymity, and other ethical issues.

• Non-interference with population. Large sample could alter the nature of


population, eg. opinion surveys.

• Cooperation of respondents – individuals, firms, administrative agencies.

• Partial data is all that is available, eg. fossils and historical records, climate
change.
Sampling Error
The sampling error is the difference between the value of the statistic and the
value of the parameter.

This is the error caused by sampling only a subset of elements of a population,


rather than all elements in a population.

A researcher hopes to minimize the sampling error, but all samples have some
such error associated with them.

Different samples will yield different sampling errors.

The sampling error may be positive or negative


( x may be greater than or less than μ)
The expected sampling error decreases as the sample size increases
Sampling Errors

Measure Parameter Statistic or point Sampling


estimator error
Mean μ x x
Standard deviation σ s s 
Proportion p p p p
No. of elements N n
Types of Sampling Methods

Sampling
Methods

Probability Non-
Samples probability

Simple
Systematic Random Convenience Snowball

Stratified
Cluster Judgment Quota
Random Sampling (OR)Probability Sampling
• A Probability Sampling is a method of sampling that ensures that every unit
in the population has a known non-zero chance of being selected. This
makes us to assess objectively the estimates of the population
characteristics that results from our sample.
• There are various ways of random selection ex., lottery method, random
tables, Computer generated random numbers, etc.
• There are different sampling methods:

Simple Random Sampling (SRS)


• The simplest form of random sampling is called simple random sampling.
• In simple random sampling every unit in the population has an equal chance
of being included in the sample and every sample has an equal chance of
being selected.
• Simple random sampling can be chosen both with or without replacement
SRS with Replacement
• Let the population consists of “N” units and we want select a sample of size
“n”
• Choose a unit from the population such that all the N (finite) units have an
equal chance, that is 1/N to be included in the sample.

• After the first unit is drawn record it and replace it in the population.

• Draw the second unit in the same way as the first unit . Continue this
procedure until all the required ( n ) units of the sample are selected.

• This is how we ensure that all elements have an equal probability being
included in the sample.

SRS without Replacement


• Choose a unit from the population such that all the N units have an equal
chance, that is 1/N to be included in the sample.

• After the first unit is drawn record it and do not replace it in the population.
Systematic Sampling
• In Systematic Random Sampling, the units are drawn from the
population at regular intervals clearly defined.
• Systematic Sampling differs from SRS in that all units have equal chance
of being selected but sample does not have equal chance of being
selected.
• The steps involved in Systematic Sampling are:

 number the units in the population from 1 to N


 decide on the n (sample size) that you want or need
 compute sampling interval = k = N/n
 select a random number between 1 and k.
 start with this number and select every k th number until all the n
units are selected.
Ex: If we want to interview every twentieth student on a college campus,
we would choose a random sample starting point in the first 20names in
the student directory and then pick every twentieth name there after.
Illustration of Systematic Sampling
Stratified Random Sampling
• When considerable heterogeneity is present in the population under consideration Simple
Random Sampling may not provide a representative sample as certain segments are
unrepresented.

• In such situations Stratified Random Sampling technique is used. This involves dividing
the population into homogeneous subgroups (strata) and then taking a simple random
sample in each subgroup, thus ensuring representation from all relevant sub groups.

• Two types of stratified Random Sampling methods are there:

* Proportionate Stratified Random Sampling


* Disproportionate Stratified Random Sampling

• Proportionate Stratified random Sampling Method:

- Divide the population into non-overlapping groups (i.e., strata) N1, N2, N3, ... Ni,
such that N1 + N2 + N3 + ... + Ni = N.
- Divide the sample into i non overlapping groups: n1, n2, ……,ni.

- Do a simple random sample independently from each strata such that:


n/N = n1/N1 = n2/N2 = -------- = ni /Ni
and n= n1+n2+-------+ ni

Where n is the stratified random sample size from population of size N.

Condition: σ 1 = σ2 = - - - - = σ i ( SD among the strata is equal)


Disproportionate Stratified Random Sampling

• If the variances of the strata are not equal then disproportionate method
of stratified random sampling techniques shall be used.
• In this method the stratum which has more variance will have
proportionately more sampling units compared to stratum which have less
variance.
• The formula to compute the no of sampling units of the i th strata is given
by
ni = qi σi n
∑ qi σi
Where n = is th number of sampling units in ith stratum.
σi = is the variance of the ith stratumn.
qi = Ni/N
n = sample size
Stratified Sampling

 Advantages
 Assures representation of all groups in
sample population needed.
 Characteristics of each stratum can be
estimated and comparisons made.
 Reduces variability.

 Disadvantages
 Requires accurate information on
proportions of each stratum.
 Stratified lists costly to prepare.
Cluster Sampling (or) Area Sampling

• Cluster sampling may be used when it is either impossible or


impractical to compile an exhaustive list of the elements that make
up the population
• The population is divided into subgroups (clusters) like families.
• A simple random sample is taken of the subgroups and then all
members of the cluster selected are surveyed.
Cluster Sampling (or) Area Sampling
• When the population under consideration was disbursed across a wide
geographic region it is very difficult to take simple random sample.

• It is for precisely this problem that cluster or area random sampling was
invented.

In cluster sampling, we follow these steps:

 divide population into clusters (usually along geographic boundaries)


 randomly sample clusters.
 measure all units within sampled clusters.

Clustering (vs) Stratification


• With both stratified and cluster sampling ,the population is divided into well
defined groups.

• We use stratified sampling when each strata has small variation within itself but
there is wide variation between stratas

• We use cluster sampling in the opposite case: when there is considerable


variation within each group but the groups are essentially similar to each other.
Cluster Sampling
Types of Cluster Samples

 Area sample:
 Primary sampling unit is a
geographical area.

 Multistage area sample:


 Involves a combination of two or more
types of probability sampling
techniques. Typically, progressively
smaller geographical areas are
randomly selected in a series of steps
 Advantages
 Low cost/high frequency of use
 Requires list of all clusters, but only of
individuals within chosen clusters
 Can estimate characteristics of both cluster and
population
 For multistage, has strengths of used methods
 Disadvantages
 Larger error for comparable size than other
probability methods
 Multistage very expensive and validity depends
on other methods used
Non-Probability Sampling Mehtods
In non-probabilistic sampling the units are selected
“without using the principle of probability”

The following are some of these sampling methods.


1.Convenience Sampling:
• Convenience sampling is a non-probability sampling technique where units are
selected because of their convenient accessibility and proximity to the
researcher.
• Many researchers prefer this sampling technique because it is fast, inexpensive,
easy and the units are readily available.
• This method of sampling is useful in explorative research for generating
hypothesis.
• The problem with this sampling is biased estimates.

2.Expert Opinion Sampling or Judgment Sampling:

• This method involves gathering a set of people who have the knowledge and
expertise in certain key areas that are crucial for decision making.

• Often, we convene such a sample under the auspices of a "panel of experts”.


• There are actually two reasons you might do expert sampling.
First, because it would be the best way to elicit the views of persons who have
specific expertise and
second, we might use expert sampling to provide evidence for the validity of
another sampling approach you've chosen

3.Quota Sampling:
• In quota sampling, we select units non randomly according to some fixed quota.

• Quota sampling is similar to stratified random sampling , without applying the


principle of probability while selecting the units of sample.

• There are two types of quota sampling: proportional and non proportional.

• In proportional quota sampling we want to represent the major characteristics of


the population by sampling a proportional amount of each.

• In non proportional quota sampling we specify the minimum number of sampled


units we want in each category. Here, we are not concerned with having numbers
that match the proportions in the population
4. Snowball Sampling (or) Chain Sampling
• It is a non probability sampling method.
• It is particularly useful for sampling of rare populations.
• Snowball sampling is quite suitable to use when
members of a population are hidden and difficult to
locate.
• The term "snowball sampling" reflects an analogy to a
snowball increasing in size as it rolls downhill
• Snowball sampling uses a small pool of initial
informants to nominate, through their social networks,
other participants who meet the eligibility criteria and
could potentially contribute to a specific study.
• Snowball sampling can be used to identify experts in a
certain field such as medicine, manufacturing
processes, or customer relation methods
The sampling process involves 6 stages:
1. Defining the population of interest

2. Identifying a sampling frame or list of individuals or households


to measure (as much of the population of interest as possible)

3. Specifying a sampling method for selecting individuals or


households from the frame

4. Determining the sample size

5. Implementing the sampling plan to select the sample

6. Collecting data from each sample member (i.e., conducting the


survey)
Sampling Distributions
• The concept of sampling distribution is the link between
sample and population.

•The sampling distribution of a statistic(not parameter) is


the probability distribution of values taken by the statistic (not
parameter) in all possible random samples of the same size
from the same population.

• The probability distribution of all possible means of the


samples is a distribution of the sample means. Statistician
call this as Sampling Distribution of Mean.

• We can also have a sampling distribution of median,


proportion.
The Distribution of Sample Means
• The distribution of sample means is defined as the set of means from all the
possible random samples of a specific size (n) selected from a specific
population.

• The mean of each sample is a random variable …with each mean varying
according to the laws of probability.

• This distribution has well-defined (and predictable) properties that are


specified in the Central Limit Theorem
Property Symbol
The mean of sampling distribution of mean is equal to population mean E( X ) = µ

•The standard deviation of sampling mean(SE) SD( X ) = σx = n

( N  n) / N  1
If: the population is infinite = n 26
Sampling Distribution of mean
Standard Error
• The standard deviation of the distribution of the sample statistic is called
standard error of the statistic.

• Thus we have the standard deviation of the distribution of the sample


mean is called standard error of mean and standard deviation of the
distribution of the sample median is called standard error of median, etc

When we wish to refer to We use the conventional term


• Examples
Standard deviation of the sample
of standard errors means Standard error of the mean(  )
n

Standard deviation of the sample median. Standard error of median

Standard error of proportion


Standard deviation of the sample proportion
The Distribution of Sample Means
• The distribution of sample means is defined as the set of means from all the
possible random samples of a specific size (n) selected from a specific
population.

• The mean of each sample is a random variable …with each mean varying
according to the laws of probability.

• This distribution has well-defined (and predictable) properties that are


specified in the Central Limit Theorem
Property Symbol
The mean of sampling distribution of mean is equal to population mean E( X ) = µ

The standard deviation of sampling mean(SE) SD( X ) = s = n

( N  n) / N  1
If: the population is infinite = n 29
The Distribution of Sample Means

We have a population.

We take a sample of size n and compute the mean.

Then do this again and again until we have taken every


possible sample.

The distribution of these means from samples is called


the sampling distribution of sample means.

We will end up with a distribution that begins to look


normally distributed
Sampling Distribution of Mean
Sampling Distribution of Mean(x)
• Population of 100
young people, μ =25 years.
Sampling Distribution of proportion
Sample proportion = x/N = p , Population proportion = P
SD of Sample proportion = √p(1-p )/n

Then p follows a normal distribution with mean P and


variance = p (1-p )/n
Central Limit Theorem
• The Central Limit Theorem is perhaps the most important theorem in
statistical inference.

• The central limit theorem describes the relation ship between the shape of
population distribution and the shape of sampling distribution of mean.

• It says that the sampling distribution of mean will be normally distributed IF


EITHER:

1.The parent population from which we are sampling is normally distributed


(or )
2. If the sample size is greater than 30 (n >30)

• The significance of central limit theorem is that it permits us to use sample


statistic to make inferences about population parameters without knowing
anything about the shape of the population distribution other than what we
get from the sample.
Central Limit Theorem

X or x-bar
Distribution

Regardless of the shape of the population, the sampling distribution of


x-bar becomes approximately normal as the sample size n increases.

Caution: only applies to shape and not to the mean or standard deviation

x x x x x x x x x x x x x x x x

Random Samples Drawn from Population

Population Distribution
Student’s t Distribution
If the population standard deviation, , is unknown, replace
with the sample standard deviation, s. If the population is
normal, the resulting statistic: t  X 
s/ n
has a t distribution with (n - 1) degrees of freedom.

• The t is a family of bell-shaped and


symmetric distributions, one for each
number of degree of freedom. Standard normal
• The mean of t is 0. t, df=20
• The variance of t is greater than 1, but t, df=10
approaches 1 as the number of degrees
of freedom increases. The t is flatter and
has fatter tails than does the standard
normal. 

• The t distribution approaches a standard
normal as the number of degrees of
freedom increases.
MIT Open Course

• population is a collection of objects, items, humans/animals (“units”) about


which information is sought
• A sample is a part of the population that is observed. A unemployment in the
US.
• parameter is a numerical characteristic of a population, e.g. percent A
parameter, e.g., percent unemployment in a sample.
• statistic is a numerical function of the sampled data, used to estimate an
unknown
Don’t get confused between the population and the sample! There is
separate notation for the population and the sample – make sure you know
what you can calculate and what you can’t calculate! You can’t calculate
anything from the population if you only.

• Sampling frame is a list of all units in a finite population. Often, we do not


have a sampling frame and we need to choose which units to sample.

• A representative sample does not differ in systematic and important ways


from population.
• convenience sampling involves using a sample that is easily available.
• Judgment sampling involves a trained sample collector (who may use A
randomly from the population. Bias is possible with these sampling
methods. To avoid bias, sample randomly from the population.

• Simple random sample (SRS) of size n from a population of size N is drawn


so that each possible sample of size n has the same chance of being chosen

• An SRS is not always practical to obtain, for instance for a highly diverse
population, or really large population.
• Stratified random sampling involves dividing population into homogeneous
subpopulations and drawing SRS from each one. This is useful when you
want to do statistics both on sub population and whole population.
• Multistage cluster sampling. “Tree structured” sampling, units are
different at each stage. Useful for sampling of large of populations.
e.g. 1) Draw states SRS of states
2) Draw SRS of districts from states
3)Draw SRS of mandals from districts
4) Draw of people from each mandal

• 1-‐in-‐k systematic sampling consists of selecting every kth unit.


Useful for sampling items coming off assembly lines.

Вам также может понравиться