Академический Документы
Профессиональный Документы
Культура Документы
Population
Sample
Subset
Basic Definitions
• Population: Total group under discussion or the group to which the
results will be generalized is called population
• Sample: It is a part of the population. The data is collected from the
selected units in the sample (respondents) in order to get estimates about
the population from which the sampling units are selected
• A parameter is a value that represent a certain population characteristic.
Population mean(μ), variance (σ2) proportion (P).
• Within a population, a parameter is a fixed value
• Parameter is usually unknown
Statistic: A statistic is a quantity that is calculated from a sample It is used to
give information about unknown values in the corresponding population
parameter i,.e guess value about population parameter
Sample mean (x), variance(S2), proportion (p^)
• Statistic is variable as it varies from sample to sample
Parameter Vs Statistic
Parameter
Statistic
Why Sample Survey?
• Lesser Cost and Time
• company may have only a limited budget for market testing
• market-research study must be completed within a certain amount
of time. Time limitations thus make it impossible to interview an
entire population.
• Greater Scope: one can get more information by taking a detailed
questionnaire
• Greater Accuracy Due to smaller work more trained persons can be
engaged for collection of data and better supervision and checks increase
the accuracy of estimates.
• To avoid physically damage of the sampling units. In some
instances sampling units get destroyed in the process of sampling.
The manufacturer is not going to ignite every match to demonstrate quality of his
produce because nothing would be left if he does so
Statistical Methods
Statistical
Methods
Descriptive Inferential
Statistics Statistics
5
Descriptive Statistics consists of the tools and
techniques designed to describe data, such as
charts, graphs, and numerical measures like mean,
variance etc.
7
Inference Process
Estimates Population
& tests
Sample
statistic
Sample
8
Function of Hypothesis Testing
Hypothesis testing begins with an assumption, called a
hpothesis, that we make about a population parameter.
Say that we assume a certain value for a population
mean.
To test the validity of our assumption we
• Collect sample data
• Produce sample statistics
• Use this information to decide how likely it is that our
hypothesized population parameter is correct.
Now determine the difference between the hypothesized
value and the actual value of the sample mean.
9
Function of Hypothesis Testing
Then we judge whether the difference is significant or
non-significant.
Unfortunately, the difference between the hypothesized
population parameter and the actual statistic is more often
neither so large that we automatically reject our
hypothesis nor so small that we just as quickly say don’t
reject it.
So in hypothesis testing, as in most significant real life
decisions, clear-cut solutions are the exception, not the
rule.
10
Function of Hypothesis Testing
When to Reject the Hypothesis or Don’t Reject ?
Suppose I say that the average marks in FSC of the
students of UAF is at least 70 percent. How can you test
the validity of my hypothesis?
Using the sampling methods we could calculate the marks
of a sample of students. If we did this and the sample
statistic came out to be 85 percent, we would readily say
that “don’t reject the statement”.
However, if the sample statistic were 46 percent, we
would reject the statement.
We can interpret both these outcomes, 95 percent and 46
percent, using our common sense.
11
The Basic Problem ?
13
STEPS FOR TEST OF HYPOTHESIS
1):-Construction of hypotheses
2):- Level of significance
3):- Test statistic
4):-Decision rule
5):-Conclusion
14
1/5 Construction of hypotheses
[Null and Alternative Hypotheses]
A Statistical Hypothesis is an assumption made about the
population parameter which may or may not be true.
17
1/5 Construction of hypotheses
[Construction of Hypotheses]
• Null and Alternative Hypotheses
Hypotheses Conclusion and Action
H0: m The emergency service is meeting
the response goal; no follow-up
action is necessary.
H1: m The emergency service is not
meeting the response goal;
appropriate follow-up action is
necessary.
Where: m = mean response time for the population
of medical emergency requests.
2/5 Level of significance
[Type I and Type II errors]
Whenever sample evidence is used to draw a conclusion
about population, there are risks of making wrong decision
because of sampling. Such errors in making the incorrect
conclusion are called Inferential Errors, because they
entail drawing an incorrect inference from the sample
about the value of the population parameter.
One the basis of sample information, we may reject a true
statement about population or don’t reject a false
statement
Type I error = Reject H0 / H0 is true
Type II error = Don’t Reject H0 / H0 is fasle
19
2/5 Level of significance
[Type I and Type II errors]
State of Nature
MiniTab output
Test of mu = 3 vs < 3
95% Upper
Variable N Mean StDev SE Mean Bound T P
weight 36 2.9200 0.0956 0.0159 2.9469 -5.02 0.000
p-value does not provide much support for the null hypothesis, but
is it small enough to cause us to reject H0. we find sufficient
statistical evidence to reject the null hypothesis at the .01 level of
significance.
Example:- (Data File:Rating.mtw)
MiniTab output
One-Sample T: rating
Test of mu = 7 vs > 7
95% Lower
Variable N Mean StDev SE Mean Bound T P
rating 60 7.283 1.195 0.154 7.026 1.84 0.036
p-value does not provide much support for the null hypothesis, but
is it small enough to cause us to reject H0. we find sufficient
statistical evidence to reject the null hypothesis at the .05 level of
significance and conclude that Heathrow should be classified as a
superior service airport
Example(Data file:- GolfDistance.mtw)
The U.S. Golf Association (USGA) establishes rules that a
manufacturers of golf equipment must meet if their products are to
be acceptable for use in USGA events. MaxFlight uses a high-
technology manufacturing process to produce golf balls with a
mean driving distance of 275 yards. Sometimes, however, the
process gets out of adjustment and produces golf balls with a mean
driving distance different from 275 yards. When the mean distance
falls below 275 yards, the company worries about losing sales
because the golf balls do not provide as much distance as
advertised. When the mean distance passes 275 yards, MaxFlight’s
golf balls may be rejected by the USGA for exceeding the overall
distance standard concerning carry and roll. MaxFlight’s quality
control program involves taking periodic samples of 50 golf balls to
monitor the manufacturing process. For each sample, a hypothesis
test is conducted to determine whether the process has fallen out
of adjustment.
Solution:
Construction of hypotheses
Ho : m = 275 (process is functioning correctly i.e no need for adjustment)
H1: m ≠ 275 (process requires adjustment)
MiniTab output
One-Sample T: distance
Test of mu = 275 vs not = 275
Variable N Mean StDev SE Mean 95% CI T P
distance 50 277.08 11.96 1.69 (273.68, 280.48) 1.23 0.225
p-value does provide much support for the null hypothesis and is
large enough to cause us not to reject H0. we do not find sufficient
statistical evidence to reject the null hypothesis at the .05 level of
significance and conclude that no action will be taken to adjust the
MaxFlight manufacturing process.
EXAMPLE:- It has been found from experience that the mean
breaking strength of a particular brand of thread is 9.63N with
a standard deviation of 1.40N. Recently a sample of 36 pieces
of thread showed a mean breaking strength of 8.93N. Can we
conclude that the thread has become inferior? Use 5% level
Construction of hypotheses
POPULATION
Ho : m 9.63(Thread has not become inferior)
σ=1.40 H1 : m < 9.63(Thread has become inferior)
m < 9.63 MiniTab output
m 9.63 One-Sample Z
Test of mu = 9.63 vs < 9.63 The assumed standard deviation = 1.4
95% Upper
N Mean SE Mean Bound Z P
36 8.930 0.233 9.314 -3.00 0.001
SAMPLE
n=36 p-value does not provide much support for the null hypothesis and is
¯X=8.93 small enough to cause us to reject H0. we find sufficient statistical
evidence to reject the null hypothesis at the .05 level of significance
31
and conclude that thread has become inferior.
EXAMPLE:-The mean lifetime of bulbs produced by a company has in
past been 1120 hours. A sample of 9 electric light bulbs recently
chosen from a supply of newly produced battery showed a mean
lifetime of 1170 hours with a standard deviation of 120 hours. Test
that mean lifetime of the bulbs has not changed. Use 5%
level of significance
Construction of hypotheses
POPULATION
Ho : m =1120(Mean lifetime of bulbs has not changed)
H1 : m ≠1120(Mean lifetime of bulbs has changed)
m =1120 MiniTab output
m ≠ 1120 One-Sample T
Test of mu = 1120 vs not = 1120
N Mean StDev SE Mean 95% CI T P
9 1170.0 120.0 40.0 (1077.8, 1262.2) 1.25 0.247
SAMPLE
n=9 p-value does provide much support for the null hypothesis and is
¯X=1170 large enough to cause us not to reject H0. we do not find sufficient
S=120 statistical evidence to reject the null hypothesis at the .05 level of
significance and conclude that mean life time of the bulbs has not 32
changed.
Test of Hypothesis
for testing two
Population Means
Z
X X 2 m1 m 2
Population variances Yes
1
1 2
2 2
are Known n1 n 2
NO
tp
X 1 X 2 m1 m 2
Population variances are equal Yes 1 1
S p n1 n2
2
(By using F-test)
No
t
X 1 X 2 m1 m 2
2 2
S1 S 2
n1 n2
Example(Data file:- SoftwareTest.mtw)
A new computer software package developed to help systems
analysts reduce the time required to design, develop, and implement
an information system. To evaluate the benefits of the new software
package, a random sample of 24 systems analysts is selected. Each
analyst is given specifications for a hypothetical information
system. Then 12 of the analysts are instructed to produce the
information system by using current technology. The other 12
analysts are trained in the use of the new software package and then
instructed to use it to produce the information system. The time
required to complete the information system project by using both
systems is recorded. The researcher in-charge of the new software
evaluation project hopes to show that the new software package will
provide a shorter mean project completion time.
Construction of hypotheses
Ho : m1 ≤ m2 (New system does not require shorter mean time than current system)
H1: m1 > m2 (New system requires shorter mean time than current system)
MiniTab output
Test for equality of two population variances
Null hypothesis Variance(current) / Variance(new) = 1 Ratio of variances = 0.826
Method DF1 DF2 Statistic P-Value
F Test (normal) 11 11 0.83 0.757 (p-value > 0.05, Don’t reject hypothesis of equal
population variances i.e population variances are equal)
Two-Sample T-Test for equality of two population means( equal population variances)
SE
N Mean StDev Mean
current 12 325.0 40.0 12
new 12 288.0 44.0 13
Difference = mu (current) - mu (new)
T-Test of difference = 0 (vs >): T-Value = 2.16 P-Value = 0.021 DF = 22
m1 > m2
m1 m2
Construction of hypotheses
Ho : m1 m2
H1: m1 > m2
SAMPLE
n1=50 n2=70
¯X1=7000 ¯X2=6800
S1=500 S2=300
41
Construction of hypotheses
Ho : m1 ≤ m2 (Masood textile is not paying more than Shahzad textile))
H1: m1 > m2 (Masood textile is paying more than Shahzad textile)
MiniTab output
Test for equality of two population variances
Null hypothesis Variance(current) / Variance(new) = 1 Ratio of variances = 3.37
Method DF1 DF2 Statistic P-Value
F Test (normal) 49 69 2.78 0.000 (p-value < 0.05, reject hypothesis of equal
population variances i.e population variances are not equal)
Two-Sample T-Test for equality of two population means( unequal population variances)
SE
Sample N Mean StDev Mean
1 50 7000 500 71
2 70 6800 300 36
Difference = mu (1) - mu (2)
T-Test of difference = 0 (vs >): T-Value = 2.52 P-Value = 0.007 DF = 73
SAMPLE
n1=100 n2=100
44
EXAMPLE: Data File (HopPlants)
An experiment was performed with seven hop plants. One half of each plant was
pollinated and the other half was not pollinated. The yield of seed of each hop plant is
recorded.
Test whether the pollinated half of the plant gives a higher average yield of seed than
the non-pollinated half.
SAMPLE
n1=7 n2=7
45