Академический Документы
Профессиональный Документы
Культура Документы
RE-TEST
INDEX
QUESTION 1:...............................................................................................1
Analysis:.................................................................................................1
Step 1:.................................................................................................1
Step 2:.................................................................................................1
Step 3: KMO and Bartletts Test...................................................................2
Step 4: Communalities.............................................................................3
Step 5: To determine number of components..................................................3
Step 6: To determine variables in each component...........................................5
Step 7: To check the reliability of each component...........................................6
Step 8: Remove component 3 and component 4 as they are not significant..............8
Step 9:.................................................................................................9
Step 10: Check Reliability of each component...............................................10
QUESTION 2:..............................................................................................12
ANALYSIS:...............................................................................................12
Block 1: Method = Backward Stepwise (Likelihood Ratio).....................................12
The Regression Equation.............................................................................17
QUESTION 1:
A music firm has collected data on music preferences of 1500 respondents. The data are in
the file music.sav. The music preferences cover a broad range of categories (look at the
variables in the file). The music firm wants a broad segmentation of the music preferences.
Do a factor analysis to ascertain the factor structure of the data. What are the segments
you can identify? Write down the table of factor loadings.
Analysis:
Step 1:
Open the music.sav file on the SPSS platform. The file contains music preferences of
respondants for Bigband Music, Bluegrass Music, Country Western Music Blues or R & B
Music, Broadways Musicals, Classical Music, Folk Music, Opera, Rap Music and Heavy
Metal Music. A likert 9 point scale is used for the survey.
Step 2:
Step 4: Communalities
The values in this Communalities column indicate the proportion of each variable's variance
that can be explained by the retained factors. Variables with high values are well
represented in the common factor space, while variables with low values are not well
represented.
Minimum Criteria: The extraction value for each variable 0.5
Here, all factors have significant loading since the extraction value for each variable 0.5
The initial number of components is the same as the number of variables used in the factor
analysis. However, not all 11 components will be retained.
Here, each variable has a variance of 1 and the first component will account for variance of
3.276 followed by component 2, which will account for variance of 1.661 and so on.
Criteria for the components:
Eigen value > 1
Cumulative % > 60%
Scree Plot Find out the point where the graph is almost flat i.e. parallel to x axis
Based on criteria,
Number of components = 4
this
the
of
for is
four extracted factors.
Here, we will
consider
rotated
component
matrix,
because in
case Varimax
Rotation
tries to
maximize the
variance of
each of the
factors, so
total amount
variance
accounted
redistributed over the
Component 1: Classical Music, Opera, Broadway Musicals, Folk Music, Big Band Music
Component 2: Blues or R & B Music, Jazz Music
Component 3: Country Western Music, Bluegrass Music
Component 4: Heavy Metal Music, Rap Music
In Communalities,
Minimum Criteria: The extraction value for each variable 0.5
Here, for Folk Music Extraction value = 0.462 => Not significant
Bigband music Extraction Value = 0.491 => Not Significant
So remove the variables Folk Music and Bigband music and run the factor analysis again
Step 9:
QUESTION 2:
Banks are concerned about people who default on loans that they have taken. In order to
identify potential defaulters, a bank has collected data on 850 people who have taken loans
in the past. The file bankloan.sav contains information about the loans and also information
about which of these people defaulted on their loans (variable: default. default=0 if not
previously defaulted). To help the bank identify potential defaulters, build a model to
predict defaulters using the other variables in the file as possible predictors. Write a report
of your analysis.
Here is the profile of a potential loan taker:
Age: 30, level of education: 4, years employed: 6, years at current address: 3, income: 100,
debt-to-income ratio: 10, credit card debt: 10, other debt: 10. What is the probability that
this person will default on his loan?
ANALYSIS:
Step 1: Open the file bankloan.sav. The file contains data of 850 people. The independent
variable is categorical which denotes if a person has defaulted a bank loan or not. The
remaining variables are dependent variables.
Step 2:
Analyze Regression Binary Logistic
Method: Backward LR
Variables
age
df
Sig.
13.265
.000
9.205
.002
years_employ
56.054
.000
years_stay
18.931
.000
income
3.526
.060
debtinc
106.238
.000
creddebt
41.928
.000
othdebt
14.863
.000
201.873
.000
ed
Overall Statistics
Classification Tablea,b
Predicted
Previously defaulted
Observed
Step 0
No
Previously defaulted
Yes
517
100.0
Yes
183
.0
73.9
Step 2a
Step 3
Step 4
df
Sig.
Step
252.695
.000
Block
252.695
.000
Model
252.695
.000
-.539
.463
Block
252.156
.000
Model
252.156
.000
-.810
.368
Block
251.347
.000
Model
251.347
.000
-.159
.690
Block
251.188
.000
Model
251.188
.000
Step
Step
Step
a. A negative Chi-squares value indicates that the Chisquares value has decreased from the previous step.
Model Summary
Step
-2 Log
Nagelkerke R
likelihood
Square
Square
551.669a
.303
.444
552.208
.302
.443
553.017
.302
.442
553.176
.302
.441
3
4
Correct
No
Overall Percentage
Step 1
Percentage
Classification Tablea
Predicted
Previously defaulted
Observed
Step 1
Previously defaulted
No
Percentage
Yes
Correct
No
472
45
91.3
Yes
90
93
50.8
Overall Percentage
Step 2
Previously defaulted
80.7
No
476
41
92.1
Yes
89
94
51.4
Overall Percentage
Step 3
Previously defaulted
81.4
No
475
42
91.9
Yes
89
94
51.4
Overall Percentage
Step 4
Previously defaulted
81.3
No
476
41
92.1
Yes
89
94
51.4
Overall Percentage
81.4
We can analyze when we started, in the first step overall percentage in the classification table was only 73.9%.
That means theres is 0.739 probability that the predicted value is correct
After 4 steps of Backward LT method analysis, few of the variables are removed and the percentage is changed
to 81.4%. So now there is 0.814 probability that the predicted value is correct.
Variables in the Equation
B
a
Step 1
Step 2
Wald
df
Sig.
Exp(B)
age
.034
.017
3.924
.048
1.035
ed
.091
.123
.542
.462
1.095
years_employ
-.258
.033
60.645
.000
.772
years_stay
-.105
.023
20.442
.000
.900
income
-.009
.008
1.159
.282
.991
debtinc
.067
.031
4.863
.027
1.070
creddebt
.626
.113
30.742
.000
1.869
othdebt
.063
.077
.655
.418
1.065
-1.554
.619
6.294
.012
.211
.034
.017
3.776
.052
1.034
years_employ
-.265
.032
68.612
.000
.767
years_stay
-.104
.023
20.094
.000
.901
income
-.008
.008
.864
.352
.992
debtinc
.065
.031
4.541
.033
1.067
Constant
a
S.E.
age
creddebt
.628
.114
30.512
.000
1.874
othdebt
.070
.078
.818
.366
1.073
-1.378
.572
5.810
.016
.252
.034
.017
3.740
.053
1.034
years_employ
-.258
.031
70.200
.000
.773
years_stay
-.103
.023
19.857
.000
.902
income
-.003
.006
.160
.689
.997
debtinc
.086
.020
18.433
.000
1.090
creddebt
.595
.105
32.207
.000
1.814
Constant
-1.591
.522
9.281
.002
.204
.033
.017
3.594
.058
1.033
years_employ
-.261
.030
75.023
.000
.770
years_stay
-.104
.023
20.157
.000
.902
debtinc
.089
.019
23.162
.000
1.093
creddebt
.573
.087
43.101
.000
1.773
Constant
-1.631
.513
10.124
.001
.196
Constant
a
Step 3
age
Step 4
age
a. Variable(s) entered on step 1: age, ed, years_employ, years_stay, income, debtinc, creddebt,
othdebt.
Model Log
Log
Sig. of the
Likelihood
Likelihood
df
Change
-277.777
3.885
.049
-276.104
.539
.463
-318.127
84.585
.000
years_stay
-286.880
22.092
.000
income
-276.367
1.065
.302
debtinc
-278.274
4.878
.027
creddebt
-298.555
45.441
.000
othdebt
-276.161
.652
.419
-277.973
3.738
.053
-326.097
99.987
.000
years_stay
-286.961
21.714
.000
income
-276.510
.813
.367
debtinc
-278.398
4.589
.032
creddebt
-298.687
45.167
.000
othdebt
-276.509
.810
.368
-278.360
3.702
.054
ed
years_emplo
y
Step 2 age
years_emplo
y
Step 3 age
years_emplo
-326.509
100.001
.000
years_stay
-287.231
21.445
.000
income
-276.588
.159
.690
debtinc
-285.939
18.861
.000
creddebt
-298.966
44.915
.000
-278.366
3.556
.059
-330.797
108.418
.000
years_stay
-287.499
21.822
.000
debtinc
-288.488
23.799
.000
creddebt
-309.280
65.385
.000
Step 4 age
years_emplo
y
Step 2
Step 3
.542
.461
Overall Statistics
.542
.461
Variables ed
.702
.402
.820
.365
1.389
.499
.402
.526
income
.159
.690
othdebt
.157
.692
1.588
.662
Overall Statistics
Step 4
Sig.
Variables ed
othdebt
c
df
Variables ed
Overall Statistics
Thus we see that level of education, Household income in thousands and other debt in thousands is removed
from the equation because the sig value is greater than 0.05.