Вы находитесь на странице: 1из 37

Quantitative Analysis Report – SPSS &MINITAB

By

SARA RAFIQ MUSANI– 21701

Submitted to

Muhmammad Shahbaz Khan

Course: Statistical Inference – Term Report


Acknowledgement

First and foremost we are grateful to almighty Allah.

Secondly, we would like to express my special gratitude to mentor and

Instructor Mr. Muhammad Shahbaz khan for his valuable guidance in

statistical inference course owing to his immense expertise and related

experience in the practical field. We are all grateful to him for assigning

This project report, which has further helped us in evaluating many

interrelated dimensions for doing analysis in many statistical fields. The

semester learning has been instrumental in coming up with this report.


INTRODUCTION
SPSS
The reason for this report is to see how to utilize SPSS by applying the measurements. SPSS
measurable bundle is a standout amongst the most famous factual bundles which can
perform exceptionally complex information control and investigation with basic directions.
It is every now and again utilized in the sociology. SPSS has four windows - Data
proofreader; Output watcher; Syntax editorial manager; Script window. Numerous tasks
can be performed with the menus and discourse boxes yet some amazing highlights are
accessible just with direction sentence structure. Charts order is utilized only in SPSS to
make diagrams. SPSS more often than not makes ordinarily utilized designs in the fields of
sociology, for example, histograms, dissipate plots, and relapse line, and so forth. The
Graphs direction permits changing parts of tomahawks, including content, changing
shading and text style, replicating, gluing, and sending out, and so forth. You could likewise
physically altering designs for distribution

MINITAB
Minitab, originally intended as a tool for teaching statistics, is a general-purpose statistical
software package designed for easy interactive use. Minitab is well suited for instructional
applications, but is also powerful enough to be used as a primary tool for analyzing research data.
Statistics instructors have been choosing Minitab for more than 40 years because of its user-friendly
interface, affordable price, and free online teaching resources. Minitab is the leading software used
for statistics education at more than 4,000 colleges and universities worldwide. More students learn
statistics with Minitab than with any other software. Minitab is the leading software used for
statistics education at more than 4,000 colleges and universities worldwide. More students learn
statistics with Minitab than with any other software. Minitab 18 for PCs includes a comprehensive
collection of statistics for beginning through advanced courses. Following are the features of mini –
tab useful for the statistician for data analysis.

1. Familiar worksheets make it easy to type, load, or copy and paste your data.
2. Menus are organized to complement leading textbooks.
3. Clean, straightforward dialogs are easy to complete.
4. Rich, informative graphs help students visualize and explore data.
5. All analyses are archived for easy review.
Chap# 6 *SAMPLING DISTRIBUTION
SAMPLING DISTRIBUTION:
A sampling distribution of sample means is a distribution using the means
computed from all possible random samples of a specific size taken from a population.

SAMPLING ERROR:

Sampling error is the difference between the sample measure and the corresponding
Population measure due to the fact that the sample is not a perfect representation of the
Population.

CENTRAL LIMIT THEOREM:

As the sample size n increases without limit, the shape of the distribution of the sample means
taken with replacement from a population with mean m and standard deviation s will approach a
normal distribution. As previously shown, this distribution will have a mean m and a standard
deviation

With replacement:

Q.1 5 students from a statistics class each collected a random sample of times on how long it took
students to take the quiz and the results were:

Studen Time
t (mins)
1 22
2 31
3 18
4 27
5 20

Compute;

a) Sample and Population Mean of the data

Ẋ= 590 ΣX= 118


N= 25 N= 5
μẊ = 23.6 μ= 23.6
Notations Sample Sampl Ẋi Ẋi- (Ẋi-
e μ μ)ᶺ2
A,A 22 22 22 - 2.56
1.6
0
b) A,B 22 31 27 2.9 8.41 Obtain the sample
0 distribution of the mean
A,C 22 18 20 - 12.96 with sample size 2 with
3.6 replacement
0
A,D 22 27 25 0.9 0.81
0
A,E 22 20 21 - 6.76
2.6
0
B,A 31 22 27 2.9 8.41
0
B,B 31 31 31 7.4 54.76
0
B,C 31 18 25 0.9 0.81
0
B,D 31 27 29 5.4 29.16
0
B,E 31 20 26 1.9 3.61
0
C,A 18 22 20 - 12.96
3.6
0
C,B 18 31 25 0.9 0.81
0
C,C 18 18 18 - 31.36
5.6
0
C,D 18 27 23 - 1.21
1.1
0
C,E 18 20 19 - 21.16
4.6
0
D,A 27 22 25 0.9 0.81
0
D,B 27 31 29 5.4 29.16
0
D,C 27 18 23 - 1.21
1.1
0
D,D 27 27 27 3.4 11.56
0
D,E 27 20 24 - 0.01
0.1
0
E,A 20 22 21 - 6.76
2.6
0
E,B 20 31 26 1.9 3.61
c) Compute sample and population variances and standard deviation

Σ(X- 100.
(σ)ᶺ2 = μ)ᶺ2 2 20.048
  N 5
Σ(Ẋi-
(σẊ)ᶺ2 = μ)ᶺ2 283.00 11.32
  N 25

σ= 4.75815
d) Verify:
σẊ = 3.36452
(i) E(X¯) =μ

ΣX 118 Ẋ= 590
N 5 N= 25
μ= 23.6 μẊ = 23.6

(ii) Var(X¯) =σ2n(N−nN−1)

4.7581
σ= 5
3.3645
σẊ = 2

√2 = 1.4142

Verificatio 3.36455239
n σẊ = σ/√n 7

Hence Verified
Without replacement

Q.2 The following data shows the cost of cell phones in thousands:

4
5
5
0
8
0
6
Compute; 0
3
a) 5 Sample and Population Mean of the data

Ẋ= 540 ΣX 270
N= 10 N 5
μẊ = 54.0 μ= 54.0

b) Obtain the sample distribution of the mean with sample size 2 without replacement

(Ẋi-
Notations Sample Ẋi Ẋi-μ μ)ᶺ2
A,B 45.00 50.00 47.50 -6.500 42.25
A,C 45.00 80.00 62.50 8.500 72.25
A,D 45.00 60.00 52.50 -1.500 2.25
A,E 45.00 35.00 40.00 -14.000 196
B,C 50.00 80.00 65.00 11.000 121
B,D 50.00 60.00 55.00 1.000 1
B,E 50.00 35.00 42.50 -11.500 132.25
C,D 80.00 60.00 70.00 16.000 256
C,E 80.00 35.00 57.50 3.500 12.25
D,E 60.00 35.00 47.50 -6.500 42.25
Total 540 877.5
c) Compute sample and population variances and standard deviation

(σ)ᶺ2 = Σ(X-μ)ᶺ2 1170.00


234
  N 5

(σẊ)ᶺ2 = Σ(Ẋi-μ)ᶺ2 877.5


87.75
  N 10

d) Verify:
(i) E(X¯) =μ

Ẋ= 540 ΣX 270
N= 10 N 5
μẊ = 54.0 μ= 54.0

(ii) Var(X¯) =σ2n(N−nN−1)

σ= 15.297

σẊ = 9.367

√2 = 1.4142

√N-n = 1.73205

√N-1 = 2

Verification σẊ = σ *√N-n =9.37


√n*√N-1
Hence Verified
Chap# 7 * CONFIDENCE INTERVALS AND SAMPLE SIZE
CONFIDENCE INTERVAL:

A confidence interval is a specific interval estimate of a parameter determined by using


data obtained from a sample and by using the specific confidence level of the estimate.

 CONFIDENCE INTERVAL FOR SAMPLE SIZE MEAN:

Assumptions for the confidence intervals of the mean when S


is known:

1. The sample is a random sample.


2. Either N is less than 30 or the population is normally distributed if N is greater than or equal to
30.

3. There were no significant outliers.

Assumptions for the confidence intervals of the mean when S


is Unknown:
1. The sample is a random sample.
2. Either N is less than 30 or the population is normally distributed if N is greater than or equal to
30.

3. There were no significant outliers.


5000 3400 3400
15000 3200 3800
4000 6700 3100
Q.3 3000 8800 5100
Shopping Surve : 6000 6500 6700
A random sample of 35 shoppers showed that they spend Rs:8000 per 7000 4300 4100
visit at Hyper Mall. The amount 35 shoppers spend on their per visit 11000 4900 4700
are as follow. Find 90% confidence interval of the true mean of the 20000 4900 3200
sample. 4500 3400 2600
3000 6800 5200
SOLUTION: 7000 5900 6300
RESULT:
At 90% confidence interval the amount spend by visitors at Hyper Mall lies between RS.4575 to 7150 whereas
the mean is 5862 which is quiet lower than the population mean which is 8000.

 CONFIDENCE INTERVAL FOR SAMPLE SIZE PROPORTION:

Assumptions for the confidence intervals of Proportion:


1. The sample is a random sample.
2. The data are continuous (not discrete).
3. The conditions for a binomial experiment are satisfied

Q.4

According to the Kaiser Family Foundation, 84% of Pakistani children ages 8 to 18 had Internet
access at home as of August 2009. Researchers wonder if this percentage has changed since then.
They survey 500 randomly selected children (ages 8 to 18) and find that 430 of them have Internet
access at home. Use a level of significance of α = 0.05 for this hypothesis test .

SOLUTION:

RESULT:
The confidence interval for proportion is 0.829 to 0.890 which equates to 82% and 89%.

Hence, at 95% confidence interval we can say that the true proportion of the children which have internet access at
their home are 410 to 445 out of 500 children (age 8 to 18).
 CONFIDENCE INTERVAL FOR SAMPLE SIZE VARIANCE AND STANDARD DEVIATION:

Assumptions for the confidence intervals of Standard


deviation & Variance:
1. The sample is a random sample. Each individual in the population has an equal probability of
being selected in the sample.
2. The population must be normally distributed.
3. The data is continuous (not discrete)

Q.5
32.4 35.5
A random sample of stock prices of OGDC per share (in dollars) is shown
45.53 21.2
below. Find the 90% confidence interval for the variance and standard
11.9 58.6
deviation for the prices. Assume the variable is normally distributed .
78.4 33.3
33.4 41.5
53.4 56.2
21.2 61.4
SOLUTION:

RESULT:

At 90% confidence interval the variances and standard deviation for prices lies between 13.2 to 29.7
Chap# 8 *HYPOTHESIS TESTING
STATISTICAL HYPOTHESIS:
A statistical hypothesis is a conjecture about a population parameter. This conjecture may or
may not be true.
LEVEL OF SIGNIFICANCE:
The level of significance is the maximum probability of committing a type I error. This probability
is symbolized by a (Greek letter alpha).

CRITICAL REGION:
The critical or rejection region is the range of values of the test value that indicates that there is a
significant difference and that the null hypothesis should be rejected. The noncritical or
nonrejection region is the range of values of the test value that

 t TEST FOR A MEAN

Assumptions for the t Test for a Mean When S Is Unknown


1. The sample is a random sample.
2. Either n <30 or the population is normally distributed if n > 30.
1 5
2 4
2 7
2 3
5 4
Q.6 4 8
3 6
A teacher wishes to study the amount of time students in his
2 3
statistics course spend each week in study for the course.  He
5 7.5
believes that the average should be the nominal 6 hours (two hours
4 9
outside class for every hour in class).   So he has the students keep
3 10
track of and report the time spent in study during a typical week.  A
total of 12 students respond.  With this information he wishes to 1 8
perform a hypothesis test at alpha 0.01. 2

SOLUTION:
Identification Of Hypothesis:

H0=6 hours
H1≠6 hours

Testing The Hypothesis:


RESULT:
Since the p-value 0.00 is lesser than alpha i-e 0.01 so we will reject null hypothesis and accept
alternate hypothesis i-e the average time spend by students in statistics class is not equal to 6
hours.

Z TEST FOR A PROPORTION:

Assumptions for Testing a Proportion


 Assumptions for Testing a Proportion
1. The sample is a random sample.
2. The conditions for a binomial experiment are satisfied.
3. np < 5 and nq > 5.

Q.7

According to the Kaiser Family Foundation, 84% of Pakistani children ages 8 to 18 had Internet
access at home as of August 2009. Researchers wonder if this percentage has changed since then.
They survey 500 randomly selected children (ages 8 to 18) and find that 430 of them have Internet
access at home. Use a level of significance of α = 0.05 for this hypothesis test.

SOLUTION:
RESULT:
Since the P-value of 0.0223 is greater than the value of alpha i-e 0.05, the null hypothesis is accepted. With 95%
confidence, the evidence is strong enough to say the population proportion is equal to 84%.

 X2 TEST FOR A VARIANCE OR STANDARD DEVIATION

Assumptions for the Chi-Square Test for a Single Variance


1. The sample must be randomly selected from the population.
2. The population must be normally distributed for the variable under study.
3. The observations must be independent of one another.

Q.8

A travel agent claims that the average of the number of rooms in 310 200 330
hotels in a large city is 500. At alpha 0.05 is the claim realistic? The 550 220 280
data for a sample of six hotels are shown. 500 210 400

SOLUTION:
Identification Of Hypothesis:

H0: σ2= 500


H1: σ2≠ 500

Testing The Hypothesis:


RESULT:

Since p (significant value) which is 0.003 is less than alpha i-e 0.05 so we will reject null hypothesis
and we will accept alternate hypothesis .It means we can conclude that the average number of
rooms in a hotel is no equal to 500.

Chap# 9
*TESTING THE DIFFERENCE BETWEEN TWO MEANS, TWO
PROPORTIONS, AND TWO VARIANCES

 CONFIDENCE INTERVAL FOR THE MEAN DIFFERENCES:


Chocolat Caramel
e Syrup Syrup
Q.9: 29 20
26 30
The number of grams of sugar contained Hershey’s in 1-ounce serving of 15 36
chocolate syrup and Caramel syrup is listed here. Use a 95% confidence 32 19
29 18
interval to conclude the difference between two means.
25 17
16 39
20 10
38 29
34 55
24 29
27 21
29 30
SOLUTION:

RESULT:
There is significant difference in amount of sugar in both syrups. At 95% confidence interval we can
say that the amount of sugar in chocolate syrup lies between 21 to 32 grams whereas in caramel it
lies between 18 to 37 grams

TESTING THE DIFFERENCE BETWEEN TWO MEANS USING THE T TEST

Assumptions for the t Test for Two Means When the Samples

Are Dependent
1. The sample or samples are random.
2. The sample data are dependent.
3. When the sample size or sample sizes are less than 30, the population or populations must
be normally or approximately normally distributed.

Assumptions for the t Test for Two Means When the Samples

Are Dependent
1. The sample or samples are random.
2. The sample data are dependent.
3. When the sample size or sample sizes are less than 30, the population or populations must
be normally or approximately normally distributed.
Q.10
Novem Decem
A random sample of 10 days cold temperature in November and
ber ber
December of Pakistan is listed below. At 0.05 can it be concluded
31 13
that there is a difference in average of cold temperature between the
31 19
two months?
38 21
24 23
24 25
42 10
22 9
43 10
SOLUTION: 35 16
42 17

Identification Of Hypothesis:

Ho: μ1 = μ2
Ha: μ1 ≠ μ2

Testing The Hypothesis:


RESULT:

Since significant value is lesser than alpha so we will reject null hypothesis

 CONFIDENCE INTERVAL FOR THE PROPORTION DIFFERENCES:

 There are no assumptions for confidence interval

 TESTING THE DIFFERENCE BETWEEN PROPORTIONS USING Z TEST:

Assumptions for the z Test for Two Proportions


1. The samples must be random samples.
2. The sample data are independent of one another.
3. For both samples np > 5 and nq <5.
Q.11

In a sample of 60 Pakistani, 44 wished that they were rich. Whereas, in a sample of 30 Africans, 12
wished that they were rich.

A-Find the 99% confidence interval for the difference of the two proportions.

B-is there a difference in the proportions? Prove the claim by using alpha 0.01.

SOLUTION:

RESULT:
A-At 99% confidence interval the proportion of the difference lies between -0.654 to 0.597.

B-Since the p-value 0.906 is greater than alpha 0.01 so we will accept null hypothesis i-e the
proportion of Pakistani who wished to be rich are equal to proportion of Africans.

 CONFIDENCE INTERVAL FOR THE TWO VARIANCES DIFFERENCES:

 There are no assumptions for confidence interval

 TESTING THE DIFFERENCE BETWEEN VARIANCES USING F TEST:A

Assumptions for Testing the Difference between Two


Variances:
1. The samples must be random samples.
2. The populations from which the samples were obtained must be normally distributed.
(Note: The test should not be used when the distributions depart from normality.)
3. The samples must be independent of one another.
Speed Speed
of of
Male Female
studen studen
ts ts
210 150
Q.12 200 160
230 140
The psychologist conducted a survey of a random 12 male college students and 180 170
a random of 12 female college students. 190 210
199 220
180 160
220 140
A- Construct 95% confidence for finding difference between two variances 230 200
B- Is there sufficient evidence at the α = 0.05 level to conclude that the 240 210
standard deviation of the fastest speed driven by male college students 220 230
differs from the variance of the fastest speed driven by female college 170 200
students?

SOLUTION:
Identification Of Hypothesis:

Ho: Ha = σ21 = σ2 2


Ha: = σ21 ≠ σ2  2

Testing The Hypothesis:

RESULT:

A-At 95% confidence interval the speed of female in college lies between 26 to 47.whereas male speed
lies between 17 to 35.

B-Since p value i-e 0.087 is greater than alpha which is 0.05 so we will accept null hypothesis.
Chap# 10 CORRELATIONS AND REGRESSION
CORRELATION COEFFICIENT:

The correlation coefficient computed from the sample data measures the strength
and direction of a linear relationship between two quantitative variables. The symbol for the sample
correlation coefficient is r. The symbol for the population correlation coefficient is r (Greek letter
rho).
STANDARD ERROR:

The standard error of the estimate, denoted by sest, is the standard deviation of the observed y
values about the predicted values.

REGRESSION:

A technique for determining the statistical relationship between two or more variables where a
change in a dependent variable is associated with, and depends on, a change in one or more
independent variables.

Assumptions for Valid Predictions in Regression

1. The sample is a random sample.


2. For any specific value of the independent variable x, the value of the dependent variable y must be
normally distributed about the regression line
3. The standard deviation of each of the dependent variables must be the same for each value of the
independent variable.

Assumptions for the Correlation Coefficient


1. The sample is a random sample.
2. The data pairs fall approximately on a straight line and are measured at the interval or ratio level.
3. The variables have a joint normal distribution. (This means that given any specific value of x, the
y values are normally distributed; and given any specific value of y, the x values are normally
distributed.)
Q.13

The numbers of fat calories and grams of saturated fat is written in the menu of Bellavita
restaurant, non-breakfast entrees. The data of the non-breakfast item are shown below. Is
there sufficient evidence to conclude a significant relationship between the two variables?
Calculate regression, correlation & standard error of the data mentioned below. Test the
hypothesis to prove the claim at alpha 0.05.

Non - Breakfast Cheese Pancake Bread porridge Chocolate Butter


items omlette s pudding Cornflakes sauted
beans
Fat calories 180 200 270 350 460 540

Saturated 9 8 13 17 23 27
Fat

SOLUTION:
Identification Of Hypothesis:

Ho: p = 0
Ha: p ≠ 0

Testing The Hypothesis:


RESULT:
Hence significant value is lesser than alpha so we will reject null hypothesis.

Chap# 11 CHI-SQUARE TESTS


 GOODNESS FOR FIT -TEST
Assumptions for the Chi-Square Goodness-of-Fit Test
1. The data are obtained from a random sample.
2. The expected frequency for each category must be 5 or more.

Q.14

A university surveyed the reasons why its students dropped out. It found out that after admission,
38% were unable to manage the tough routine, 32% were eager to start their own business, 23%
have financial issues and 7% went abroad. To see if these percentages are consistent with those of
other universities’ findings, a local researcher was hired to survey 300 dropouts of other universities
and it found that 122 were unable to manage the tough routine, 85 were eager to start their own
business, 76 had financial issues, and 17 went abroad. At alpha 0.10, test the claim that the
percentages are the same for both the universities.

SOLUTION:
Identification Of Hypothesis:

Ho: Student have given no reasons for dropping university


Ha: Student have given equal reasons for dropping university

Testing The Hypothesis:

RESULT:

Since p value is greater than alpha so we will accept null hypothesis.


 GOODNESS FOR INDEPENDENCE TEST :

Assumptions for the Chi-Square Independence and


Homogeneity Tests
1. The data are obtained from a random sample.
2. The expected value in each cell must be 5 or more.

Q.15

To test the right market for its diet plans, a nutritionist firm researched both males and females for
their choice of its diet plans. The results of the study are shown here. At the 0.05 level of
significance, can the researcher conclude that there is a is there a relationship between gender and
choice of diet plan?

Female
  Males
s
Keto 50 10
Paleo 25 10
Atkins 30 10
SOLUTION:

Identification Of Hypothesis:

Ho: The choice of diet plan is independent on gender


Ha: the choice of diet is dependent on gender
Testing The Hypothesis:

RESULT:
Since p value is greater than alpha so we will accept null hypothesis.

Chap#12 ANALYSIS OF VARIANCE


ANOVA TEST

An ANOVA test is a way to find out if survey or experiment results are significant. In other words,
they help you to figure out if you need to reject the null hypothesis or accept the alternate
hypothesis. Basically, you’re testing groups to see if there’s a difference between them

ONE-WAY” OR “TWO-WAY:

One-way or two-way refers to the number of independent variables (IVs) in your Analysis of
Variance test. One-way has one independent variable (with 2 levels) and two-way has two
independent variables (can have multiple levels). For example, a one-way Analysis of Variance could
have one IV (brand of cereal) and a two-way Analysis of Variance has two IVs (brand of cereal,
calories).

 ONE-WAY ANALYSIS OF VARIANCE(ANOVA):

ASSUMPTIONS:
1. The populations from which the samples were obtained must be normally or approximately
normally distributed.
2. The samples must be independent of one another.
3. The variances of the populations must be equ
Q.16

A state employee wishes to see if there is a significant difference in the number of employees sitting
late in functional departments. The data are shown .At α = 0.05, can it be concluded that there is a
significant difference in the of employees at each department?

HR Finance Custom
Depart Depart er
ment ment Service
Depart
ment
7 10 1

14 1 12

32 1 1

19 0 9

10 11 1

11 1 11

Mean = 15.5 Mean = 4.0 Mean = 5.8

Variance = 81.9 Variance = 25.6 Variance = 29

Solution:

Identification Of Hypothesis:

Ho: μ1 = μ2= μ3
Ha: at least one of the means is different from the others.
Testing The Hypothesis:

RESULT:
Since p value which is 0.02 is less than alpha so we will reject null hypothesis

 TWO-WAY ANALYSIS OF VARIANCE(ANOVA):

ASSUMPTIONS:
1. The populations from which the samples were obtained must be normally or approximately
normally distributed.
2. The samples must be independent.
3. The variances of the populations from which the samples were selected must be equal.
4. The groups must be equal in sample size.

Q.17

Sun-silk has three variants of shampoo: think and long, intense repair and black shine. The manager
decides to see whether the geographical factors in Karachi and the type of item affect monthly sales.
At a 0.05, analyze the data shown, using a two-way ANOVA. Sales are given in hundred thousand
for a randomly selected month, and five salespeople were selected for each group.

Geographical Area
North and
Central West and South
56 16
23 14
Thick and 52 18
Long 28 27
35 31
43 58
25 62
Business Strategies Intense 16 68
On Variants
Repair 27 72
32 83
47 15
43 14
52 22
Black Shine
61 16
74 27
SOLUTION:

Identification of hypothesis:

H0=There is no interaction between geographical region in Karachi and type of item effect sales.
H1=There is an interaction between geographical region of Karachi and type of item effect sales .

H0=There is no difference between two geographical regions of Karachi and variants of shampoo
H1=There is a difference between two geographical regions of Karachi and variants of shampoo

H0=There is no difference in the mean of two geographical region of Karachi- i-e North and Central
& West and South.
H1=There is a difference in the mean of two geographical region of Karachi- i-e North and Central &
West and South.

Testing The Hypothesis:


RESULT:

Keeping in view constant significant values which are lesser than alpha 0.05 so we will reject null
hypothesis.

Вам также может понравиться