Business Analytics Assignment-Sajitha

BUSINESS ANALYTICS ASSIGNMENT
RE-TEST
Submitted by: Sajitha Nair

Roll no: 2014A53
INDEX
QUESTION 1:...............................................................................................1
Analysis:.................................................................................................1
Step 1:.................................................................................................1
Step 2:.................................................................................................1
Step 3: KMO and Bartletts Test...................................................................2
Step 4: Communalities.............................................................................3
Step 5: To determine number of components..................................................3
Step 6: To determine variables in each component...........................................5
Step 7: To check the reliability of each component...........................................6
Step 8: Remove component 3 and component 4 as they are not significant..............8
Step 9:.................................................................................................9
Step 10: Check Reliability of each component...............................................10
QUESTION 2:..............................................................................................12
ANALYSIS:...............................................................................................12
Block 1: Method = Backward Stepwise (Likelihood Ratio).....................................12
The Regression Equation.............................................................................17
QUESTION 1:
A music firm has collected data on music preferences of 1500 respondents. The data are in
the file music.sav. The music preferences cover a broad range of categories (look at the
variables in the file). The music firm wants a broad segmentation of the music preferences.
Do a factor analysis to ascertain the factor structure of the data. What are the segments
you can identify? Write down the table of factor loadings.
Analysis:
Step 1:
Open the music.sav file on the SPSS platform. The file contains music preferences of
respondants for Bigband Music, Bluegrass Music, Country Western Music Blues or R & B
Music, Broadways Musicals, Classical Music, Folk Music, Opera, Rap Music and Heavy
Metal Music. A likert 9 point scale is used for the survey.
Step 2:
Analyze Dimension Reduction Factor

Select all the music bands into variable list.
Under Descriptives Option select
Coefficients
Anti-Image
KMO and Bartletts test of sphericity
Under Extraction Option select
Scree Plot
Under Rotation select
Varimax
Under Scores select
Save as Variables
Under Options select
Sorted by size
Run the analysis
Step 3: KMO and Bartletts Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy - This measure varies between 0 and 1,

and values closer to 1 are better. A value of .5 is a suggested minimum.
Here, KMO = 0.748 > 0.5, therefore significant
Bartlett's Test of Sphericity
Null Hypothesis states that the correlation matrix is an identity matrix.
Here, Sig. = .000 <0.05

We reject the null hypothesis.
Taken together, KMO and Bartletts Test says that the data is significant for factor analysis.
Step 4: Communalities
The values in this Communalities column indicate the proportion of each variable's variance
that can be explained by the retained factors. Variables with high values are well
represented in the common factor space, while variables with low values are not well
represented.
Minimum Criteria: The extraction value for each variable 0.5
Here, all factors have significant loading since the extraction value for each variable 0.5
Step 5: To determine number of components
The initial number of components is the same as the number of variables used in the factor
analysis. However, not all 11 components will be retained.
Here, each variable has a variance of 1 and the first component will account for variance of
3.276 followed by component 2, which will account for variance of 1.661 and so on.
Criteria for the components:
Eigen value > 1
Cumulative % > 60%
Scree Plot Find out the point where the graph is almost flat i.e. parallel to x axis
Based on criteria,
Number of components = 4
Step 6: To determine variables in each component
this
the
of
for is
four extracted factors.
Here, we will
consider
rotated
component
matrix,
because in
case Varimax
Rotation
tries to
maximize the
variance of
each of the
factors, so
total amount
variance
accounted
redistributed over the
Component 1: Classical Music, Opera, Broadway Musicals, Folk Music, Big Band Music
Component 2: Blues or R & B Music, Jazz Music
Component 3: Country Western Music, Bluegrass Music
Component 4: Heavy Metal Music, Rap Music
Step 7: To check the reliability of each component

Analyze Scale Reliability Analysis
In Statistics select
Scale if Item deleted
Criteria:
Cronbach alpha is the measure of internal consistency that is how closely related set of
items are in group
Minimum Criteria: Cronbachs alpha 0.7
Also check the column Cronbachs alpha if item is deleted in table Item-Total Statistics.
If the value in the table for any variable is greater than the cronbachs alpha for that
component, then delete that variable and run the whole analysis again
Step 7a: Select Component 1 variables
Here, Cronbachs alpha = 0.795 > 0.7

Cronbachs alpha if Item deleted value is less than 0.795 for all variables.
So Component 1 is significant.
Step 7b: Select Component 2 variables

Step 7c: Select Component 3 variables
Here, Cronbachs alpha = 0.580 < 0.7

So Component 3 is not significant.
Step 7d: Select Component 4 variables
Here, Cronbachs alpha = 0.529 < 0.7

So Component 4 is not significant.
Step 8: Remove component 3 and component 4 as they are not significant

Remove variables from component 3 and 4 as it is not significant and run the factor analysis
again
Now,
KMO = 0.781 0.5
In Communalities,
Minimum Criteria: The extraction value for each variable 0.5
Here, for Folk Music Extraction value = 0.462 => Not significant
Bigband music Extraction Value = 0.491 => Not Significant
So remove the variables Folk Music and Bigband music and run the factor analysis again
Step 9:
After removing Folk Music and Big Band Music

KMO = 0.695 0.5
Here, extraction values for all variables > 0.5
Total Components to be considered = 2
Component 1: Classical Music, Opera, Broadway Musicals,

Step 10: Check Reliability of each component

Step 10 a: Reliability Analysis of Component 1

Step 10 b: Reliability Analysis of Component 2

Therefore 2 significant clusters are formed:
Component 1: Classical Music, Opera, Broadway Musicals,

QUESTION 2:
Banks are concerned about people who default on loans that they have taken. In order to
identify potential defaulters, a bank has collected data on 850 people who have taken loans
in the past. The file bankloan.sav contains information about the loans and also information
about which of these people defaulted on their loans (variable: default. default=0 if not
previously defaulted). To help the bank identify potential defaulters, build a model to
predict defaulters using the other variables in the file as possible predictors. Write a report
of your analysis.
Here is the profile of a potential loan taker:
Age: 30, level of education: 4, years employed: 6, years at current address: 3, income: 100,
debt-to-income ratio: 10, credit card debt: 10, other debt: 10. What is the probability that
this person will default on his loan?
ANALYSIS:
Step 1: Open the file bankloan.sav. The file contains data of 850 people. The independent
variable is categorical which denotes if a person has defaulted a bank loan or not. The
remaining variables are dependent variables.
Step 2:
Analyze Regression Binary Logistic
Method: Backward LR
Block 1: Method = Backward Stepwise (Likelihood Ratio)
Variables not in the Equation

Score
Step 0
Variables
age
df
Sig.
13.265
.000
9.205
.002
years_employ
56.054
.000
years_stay
18.931
.000
income
3.526
.060
debtinc
106.238
.000
creddebt
41.928
.000
othdebt
14.863
.000
201.873
.000
ed
Overall Statistics
Classification Tablea,b
Predicted
Previously defaulted
Observed
Step 0
No
Yes
517
100.0
Yes
183
.0
73.9
a. Constant is included in the model.

b. The cut value is .500
Omnibus Tests of Model Coefficients

Chi-square
Step 2a
Step 3
Step 4
df
Sig.
Step
252.695
.000
Block
252.695
.000
Model
252.695
.000
-.539
.463
Block
252.156
.000
Model
252.156
.000
-.810
.368
Block
251.347
.000
Model
251.347
.000
-.159
.690
Block
251.188
.000
Model
251.188
.000
Step
Step
Step
a. A negative Chi-squares value indicates that the Chisquares value has decreased from the previous step.
Model Summary
Step
-2 Log
Cox & Snell R
Nagelkerke R
likelihood
Square
Square
551.669a
.303
.444
552.208
.302
.443
553.017
.302
.442
553.176
.302
.441
3
4
Correct
No
Overall Percentage
Step 1
Percentage
a. Estimation terminated at iteration number 6 because

parameter estimates changed by less than .001.
Classification Tablea
Predicted
Observed
Step 1
No
Percentage
Yes
Correct
No
472
45
91.3
Yes
90
93
50.8
Overall Percentage
Step 2
80.7
No
476
41
92.1
Yes
89
94
51.4
Overall Percentage
Step 3
81.4
No
475
42
91.9
Yes
89
94
51.4
Overall Percentage
Step 4
81.3
No
476
41
92.1
Yes
89
94
51.4
Overall Percentage
81.4
a. The cut value is .500
We can analyze when we started, in the first step overall percentage in the classification table was only 73.9%.
That means theres is 0.739 probability that the predicted value is correct
After 4 steps of Backward LT method analysis, few of the variables are removed and the percentage is changed
to 81.4%. So now there is 0.814 probability that the predicted value is correct.
Variables in the Equation
B
a
Step 1
Step 2
Wald
df
Sig.
Exp(B)
age
.034
.017
3.924
.048
1.035
ed
.091
.123
.542
.462
1.095
years_employ
-.258
.033
60.645
.000
.772
years_stay
-.105
.023
20.442
.000
.900
income
-.009
.008
1.159
.282
.991
debtinc
.067
.031
4.863
.027
1.070
creddebt
.626
.113
30.742
.000
1.869
othdebt
.063
.077
.655
.418
1.065
-1.554
.619
6.294
.012
.211
.034
.017
3.776
.052
1.034
years_employ
-.265
.032
68.612
.000
.767
years_stay
-.104
.023
20.094
.000
.901
income
-.008
.008
.864
.352
.992
debtinc
.065
.031
4.541
.033
1.067
Constant
a
S.E.
age
creddebt
.628
.114
30.512
.000
1.874
othdebt
.070
.078
.818
.366
1.073
-1.378
.572
5.810
.016
.252
.034
.017
3.740
.053
1.034
years_employ
-.258
.031
70.200
.000
.773
years_stay
-.103
.023
19.857
.000
.902
income
-.003
.006
.160
.689
.997
debtinc
.086
.020
18.433
.000
1.090
creddebt
.595
.105
32.207
.000
1.814
Constant
-1.591
.522
9.281
.002
.204
.033
.017
3.594
.058
1.033
years_employ
-.261
.030
75.023
.000
.770
years_stay
-.104
.023
20.157
.000
.902
debtinc
.089
.019
23.162
.000
1.093
creddebt
.573
.087
43.101
.000
1.773
Constant
-1.631
.513
10.124
.001
.196
Constant
a
Step 3
age
Step 4
age
a. Variable(s) entered on step 1: age, ed, years_employ, years_stay, income, debtinc, creddebt,
othdebt.
Model if Term Removed

Change in -2
Variable
Step 1 age
Model Log
Log
Sig. of the
Likelihood
Likelihood
df
Change
-277.777
3.885
.049
-276.104
.539
.463
-318.127
84.585
.000
years_stay
-286.880
22.092
.000
income
-276.367
1.065
.302
debtinc
-278.274
4.878
.027
creddebt
-298.555
45.441
.000
othdebt
-276.161
.652
.419
-277.973
3.738
.053
-326.097
99.987
.000
years_stay
-286.961
21.714
.000
income
-276.510
.813
.367
debtinc
-278.398
4.589
.032
creddebt
-298.687
45.167
.000
othdebt
-276.509
.810
.368
-278.360
3.702
.054
ed
years_emplo
y
Step 2 age
years_emplo
y
Step 3 age
years_emplo
-326.509
100.001
.000
years_stay
-287.231
21.445
.000
income
-276.588
.159
.690
debtinc
-285.939
18.861
.000
creddebt
-298.966
44.915
.000
-278.366
3.556
.059
-330.797
108.418
.000
years_stay
-287.499
21.822
.000
debtinc
-288.488
23.799
.000
creddebt
-309.280
65.385
.000
Step 4 age
years_emplo
y
The sig value < 0.05 for the variable to be significant.

When we analyze the model if term removed, the sig values is high or equal to original values. So we do not
remove any more variables from the equation.
Variables not in the Equation
Score
a
Step 2
Step 3
.542
.461
Overall Statistics
.542
.461
Variables ed
.702
.402
.820
.365
1.389
.499
.402
.526
income
.159
.690
othdebt
.157
.692
1.588
.662
Overall Statistics
Step 4
Sig.
Variables ed
othdebt
c
df
Variables ed
Overall Statistics
a. Variable(s) removed on step 2: ed.

b. Variable(s) removed on step 3: othdebt.
c. Variable(s) removed on step 4: income.
Thus we see that level of education, Household income in thousands and other debt in thousands is removed
from the equation because the sig value is greater than 0.05.
The Regression Equation

Probability of defaulting: -1.631 + 0.573(credit card debt in thousands) + 0.089(debt to income ratio (X100))
0.104(years at current address) 0.261(years with current employer) + 0.033(age in years)
If
0 < Probability of defaulting < 0.5
The Person will default the bank loan

0.5 < Probability of defaulting < 1
The Person will not default the bank loan
In Problem given:
Age = 30 years
Credit Card debt (in thousands) = 10
Debt to income ratio (x100) = 10
Years at current address = 3
Years with current employer = 6
Probability of defaulting=
= -1.631 + 0.573 * 10 + 0.089 * 10 0.104 *3 0.261 *6 + 0.033 *30
= -1.631 + 5.73 + 0.89 0.312 1.566 + 0.99
= 4.101
log(p/1-p) = 4.101
p/1-p = e^4.101
p/1-p = 60.4006
p = 60.4006 60.4006p
61.4006p = 60.4006
p = 0.983
As the p value > 0.5, the person will not default.

Business Analytics Assignment-Sajitha

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Business Analytics Assignment-Sajitha

Загружено:

Авторское право:

Доступные форматы

BUSINESS ANALYTICS ASSIGNMENT

Submitted by: Sajitha Nair

Analyze Dimension Reduction Factor

Step 3: KMO and Bartletts Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy - This measure varies between 0 and 1,

Here, Sig. = .000 <0.05

Step 5: To determine number of components

Step 6: To determine variables in each component

Step 7: To check the reliability of each component

Here, Cronbachs alpha = 0.795 > 0.7

Step 7b: Select Component 2 variables

Here, Cronbachs alpha = 0.714 > 0.7

Step 7c: Select Component 3 variables

Here, Cronbachs alpha = 0.580 < 0.7

Step 7d: Select Component 4 variables

Here, Cronbachs alpha = 0.529 < 0.7

Step 8: Remove component 3 and component 4 as they are not significant

After removing Folk Music and Big Band Music

Here, extraction values for all variables > 0.5

Total Components to be considered = 2

Component 1: Classical Music, Opera, Broadway Musicals,

Step 10: Check Reliability of each component

Here, Cronbachs alpha = 0.761 > 0.7

Here, Cronbachs alpha = 0.714 > 0.7

Component 1: Classical Music, Opera, Broadway Musicals,

Block 1: Method = Backward Stepwise (Likelihood Ratio)

Variables not in the Equation

a. Constant is included in the model.

Omnibus Tests of Model Coefficients

Cox & Snell R

a. Estimation terminated at iteration number 6 because

a. The cut value is .500

Model if Term Removed

The sig value < 0.05 for the variable to be significant.

a. Variable(s) removed on step 2: ed.

The Regression Equation

The Person will default the bank loan

Вам также может понравиться