Академический Документы
Профессиональный Документы
Культура Документы
Inference
Adnan Butt
4:28 AM
Course Outline
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
4:28 AM
Mode of Teaching
Lecture
SPSS Workshop
Discussion Session
4:28 AM
Marks Distribution
Mid term
25 Marks
Final
40 Marks
Quizzes
15 Marks
SPSS
15 Marks
Class Participation
Total
4:28 AM
5 Marks
100 Marks
5
Variable
A characteristic or
property that varies
from individual to
individual.
4:28 AM
Constant
A characteristic or
property that does not
change from individual
to individual.
4:28 AM
Types of Variables
Types of
Variables
Qualitative
Quantitative
Discrete
4:28 AM
Continuous
8
Nominal Scale
Variable categories are mutually
exclusive and exhaustive.
Variable categories have no
logical order.
Eye Color, Hair Color, Gender.
4:28 AM
Ordinal Scale
Data categories are mutually
exclusive and exhaustive.
Data classifications are ranked or
ordered
according
to
the
particular trait they possess.
Level of Knowledge about SPSS
4:28 AM
10
Interval Scale
Data categories are mutually exclusive
and exhaustive.
Data classifications are ranked or ordered
according to the particular trait they
possess.
Equal differences in the characteristic are
not represented by equal differences in
the measurements.
Temperature, Shoe Size and IQ scores
4:28 AM
11
Ratio Scale
Data categories are mutually exclusive and
exhaustive.
Data classifications are ranked or ordered
according to the particular trait they possess.
Equal differences in the characteristic are
represented by equal differences in the
measurements.
The zero point is the essence of the
characteristic.
Height, Weight, Distance.
4:28 AM
12
Measurement Scales
Scale
Nominal
Ordinal
Interval
Ratio
Data are
ranked
Meaningful Zero
point and Ratio
Between values
Eye color,
Hair Color
Gender.
4:28 AM
Level of
Knowledge
about
SPSS
Temperature,
Shoe Size,
IQ Scores
Height, Weight,
Distance.
13
Data
The information collected
for any kind of investigation.
Usually Numerical but can
be Qualitative.
4:28 AM
14
Primary Data
The initial material collected
during the research process.
The information collected
directly from the respondent.
Personal Invetigation, Through Investigator, Through Questionnaire,
Through Local Sources, Through Telephone,
4:28 AM
15
Secondary Data
The information
collected and processed
by the people other than
the researcher
Government Organizations, Semi-Government
Organizations,
4:28 AM
16
Data Collection
Any of the following methods may be
adopted:
(a) Personal interview
(b) Direct observation
(c) Mail interview (internet interview)
(d) Telephone interview
What are the cons and pros of each?
4:28 AM
17
Data management
Office Editing,
Post Coding,
Data entry and Verification.
4:28 AM
18
4:28 AM
19
Mode
4:28 AM
20
Arithmetic Mean
A value obtained by dividing the sum of all the observations by
their number.
X1 X 2 X n
X
n
4:28 AM
X
i 1
n
21
Arithmetic Mean
The marks obtained by 8 students are:
67 72 68 70 65 68 75 63
67 72 63 548
X
68.5 Marks
8
8
4:28 AM
22
Quantiles
For
individual
observations/discrete
frequency
i(n 1)
th observation in the distribution, i 1, 2, 3
4
j(n 1)
th observation in the distribution, j 1, 2,,9
10
k(n 1)
Pk
th observation in the distribution, k 1, 2,,99
100
Dj
4:28 AM
23
Quartiles
The weekly TV Watching times (Hours):
25 41 27 32 43 66 35 31 15 5
34 26 32 38 16 30 38 30 20 21
5 15 16 20 21 25 26 27 30 30
31 32 32 34 35 37 38 41 43 66
4:28 AM
24
Quartiles
1(20 1)
Q1
th observation in the distribution
4
5.25th observation in the distribution
5th obs. 0.25{6th obs.- 5th obs.}
21 0.25{25- 21} 22.0 Hours
4:28 AM
25
Quartiles
2(20 1)
Q2
th observation in the distribution
4
10.50th observation in the distribution
10th obs. 0.50{11th obs.- 10th obs.}
30 0.50{31- 30} 30.5 Hours
4:28 AM
26
Quantiles
4:28 AM
27
Mode
The mode is a value which occurs
most frequently in a set of data. Or
mode
is
value
that
occurs
4:28 AM
28
Mode
The total automobile sales (in millions) in
the United States for the last 14 years.
9.0
8.2 8.0 9.1 10.3 11.0 11.5
10.3 10.5 9.8 9.3
8.2
8.2
8.5
29
4:28 AM
30
Absolute Measures of
Dispersion
Range
Quartile Deviation
Mean (Average) Deviation
Variance and Standard Deviation
4:28 AM
31
Relative Measures of
Dispersion
Coefficient of Range
Coefficient of Quartile Deviation
Coefficient of Mean Deviation
Coefficient of Variation (CV)
4:28 AM
32
Range
Difference between the largest
and the smallest observations
Range X Largest X Smallest
4:28 AM
33
10
11
Range = 12 - 7 = 5
12
10
11
12
Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
4:28 AM
34
4:28 AM
35
Inter-quartile Range
X
minimum
Q1
25%
12
Median
(Q2)
25%
30
25%
45
Q3
maximum
25%
57
70
4:28 AM
36
4:28 AM
(X X )
X X
3
0
-3
0
3
0
3
(x x )
n
6
2
3
6
37
Variance
Variance is the average
of the squared
deviations taken from
the mean value.
(i) S 2
(x x )
(ii ) S
2
4:28 AM
X2
n
102
17cm 2
6
702 102 2
2
17 cm
6 6
X cm (X-Mean)^2
X2
36
16
16
36
81
12
144
13
169
16
36
256
60
102
702
38
Standard Deviation
Standard deviation is the positive square root of
the mean-square deviations of the observations
from their arithmetic mean.
Population
Sample
SD variance
x
i
N 1
f i xi x 2
Where
fx
N
i i
i
Simplified formula
2
fx
x
f
fx
N
10
Size (xi)
Here, x
n
Family No.
50
5
10
10
Total
xi
50
xi x
-2
-2
-1
-1
20
16
16
25
25
36
36
49
49
270
x i x
xi
s2
x
i
20
2
10
s 2 1.41
12
13
14
15
16
17
18
19
20 21
Mean = 15.5
S = 3.338
20 21
Mean = 15.5
S = 0.926
20 21
Mean = 15.5
S = 4.567
Data B
11
12
13
14
15
16
17
18
19
Data C
11
12
13
14
15
16
17
18
19
X Largest X Smallest
X Largest X Smallest
Q3 Q1
Coefficient of Quartile Deviation
Q3 Q1
MD
Coefficient of Mean Deviation
Mean
4:28 AM
44
X
Can be used to compare two or more
sets of data measured in different
units or same units but different
average size.
4:28 AM
45
S
$5
CVA 100%
100% 10%
$50
X
Stock B:
Average price last year = $100
Standard deviation = $5
S
$5
CVB 100%
100% 5%
$100
X
4:28 AM
Both stocks
have the
same
standard
deviation
but stock B is
less variable
relative to its
price
46
47
4:28 AM
48
41 27
26 32
32
38
43 66
16 30
35
38
31 15
30 20
5
21
5
15 16 20 21 25 26 27 30 30
31 32 32 34 35 37 38 41 43 66
4:28 AM
49
4
VALUE of Q1 ; 5th obs. 0.25{6thobs. - 5th obs.} 21 0.25{25 - 21} 22.0 Hrs
LOCAT ION of Q 2 ;
VALUE of Q2
2(20 1)
3(20 1)
LOCATIONof Q 3 ;
VALUE of Q 3 ; 15th obs 0.75 {16th obs - 15th obs} 35 0.75{37 - 35} 36.5 Hrs
Minimum value=5.0
4:28 AM
Maximum value=66.0
50
4:28 AM
51
Max
Value
Construction of Box-Plot
Start the box from Q1 and end at
Q3
2. Within the box draw a line to
represent Q2
3. Draw lower whisker to Min.
Value up to Q1
4. Draw upper Whisker from Q3 up
to Max. Value
1.
4:28 AM
Q3
Q2
Q1
Min
Value
52
70
Construction of Box-Plot
60
50
1.
2.
3.
4.
Q1=22.0 Q3=36.5
Q2=30.5
Minimum Value=5.0
Maximum Value=66.0
40
30
20
10
0
4:28 AM
53
70
Interpretation of Box-Plot
60
50
40
IQR=Q3-Q1,
Lengthy box indicates more variability in the data
30
20
10
54
Outliers
An outlier is the values that falls well outside the overall
pattern of the data. It might be
4:28 AM
55
Q1=22.0
Q2=30.5
Q3=36.5
4:28 AM
56
80
70
*
60
Only
66 is a
mild
outlier
50
40
30
20
10
57
4:28 AM
Male
Female
58
Standardized Variable
A variable that has mean 0 and Variance 1 is
called standardized variable
Values of standardized variable are called
standard scores
Values of standard variable i.e standard scores are
unit-less
Construction
4:28 AM
59
Standardized Variable
X
( X X )2
25
(Z Z ) 2
-1.3624 1.8561
-0.5450 0.2970
11
0.81741 0.6682
12
16
1.0899
32
54
1.1879
4.009
S x2
32
8
4
n
54
13.5
4
X X X 8
Sx
3.67
Z
S z2
n
4.009
1
4
61
XB= $2,500
XP =$4,800
SB= $500
SP = $600
XB= $4,000
XP= $6,000
ZB
ZB
XB XB
SB
4,000 2,500
500
ZP
3
ZP
XP XP
SP
6,000 4,800
600
62
X
X 1S
95%
X 2S
99.7%
4:28 AM
X 3S
X 3S containsabout99.7%of values
63
Measures of Skewness
A distribution in which the values equidistant from
the centre have equal frequencies is defined to be
symmetrical and any departure from symmetry is
called skewness.
Tail
2. Mean = Median = Mode
3. Sk=0
a) Sk=(Mean-Mode)/SD
b) Sk=(Q3-2Q2+Q1)/(Q3-Q1)
4:28 AM
64
Measures of Skewness
A distribution is positively skewed, if the observations
tend to concentrate more at the lower end of the possible
values of the variable than the upper end. A positively
skewed frequency curve has a longer tail on the right
hand side
65
Measures of Skewness
A distribution is negatively skewed, if the
observations tend to concentrate more at the upper
end of the possible values of the variable than the
66
Measures of Kurtosis
4:28 AM
Measures of Kurtosis
4:28 AM
68
Measures of Kurtosis
Coefficient of Kurtosis=
n X-X
X-X
2 2
4:28 AM
69