Вы находитесь на странице: 1из 17

BUSINESS ANALYTICS ASSIGNMENT

RE-TEST

Submitted by: Sajitha Nair


Roll no: 2014A53

INDEX
QUESTION 1:...............................................................................................1
Analysis:.................................................................................................1
Step 1:.................................................................................................1
Step 2:.................................................................................................1
Step 3: KMO and Bartletts Test...................................................................2

Step 4: Communalities.............................................................................3
Step 5: To determine number of components..................................................3
Step 6: To determine variables in each component...........................................5
Step 7: To check the reliability of each component...........................................6
Step 8: Remove component 3 and component 4 as they are not significant..............8
Step 9:.................................................................................................9
Step 10: Check Reliability of each component...............................................10
QUESTION 2:..............................................................................................12
ANALYSIS:...............................................................................................12
Block 1: Method = Backward Stepwise (Likelihood Ratio).....................................12
The Regression Equation.............................................................................17

QUESTION 1:
A music firm has collected data on music preferences of 1500 respondents. The data are in
the file music.sav. The music preferences cover a broad range of categories (look at the
variables in the file). The music firm wants a broad segmentation of the music preferences.
Do a factor analysis to ascertain the factor structure of the data. What are the segments
you can identify? Write down the table of factor loadings.

Analysis:
Step 1:

Open the music.sav file on the SPSS platform. The file contains music preferences of
respondants for Bigband Music, Bluegrass Music, Country Western Music Blues or R & B
Music, Broadways Musicals, Classical Music, Folk Music, Opera, Rap Music and Heavy
Metal Music. A likert 9 point scale is used for the survey.

Step 2:

Analyze Dimension Reduction Factor


Select all the music bands into variable list.
Under Descriptives Option select
Coefficients
Anti-Image
KMO and Bartletts test of sphericity
Under Extraction Option select
Scree Plot
Under Rotation select
Varimax
Under Scores select
Save as Variables
Under Options select
Sorted by size
Run the analysis

Step 3: KMO and Bartletts Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy - This measure varies between 0 and 1,


and values closer to 1 are better. A value of .5 is a suggested minimum.
Here, KMO = 0.748 > 0.5, therefore significant
Bartlett's Test of Sphericity
Null Hypothesis states that the correlation matrix is an identity matrix.

Here, Sig. = .000 <0.05


We reject the null hypothesis.
Taken together, KMO and Bartletts Test says that the data is significant for factor analysis.

Step 4: Communalities

The values in this Communalities column indicate the proportion of each variable's variance
that can be explained by the retained factors. Variables with high values are well
represented in the common factor space, while variables with low values are not well
represented.
Minimum Criteria: The extraction value for each variable 0.5
Here, all factors have significant loading since the extraction value for each variable 0.5

Step 5: To determine number of components

The initial number of components is the same as the number of variables used in the factor
analysis. However, not all 11 components will be retained.
Here, each variable has a variance of 1 and the first component will account for variance of
3.276 followed by component 2, which will account for variance of 1.661 and so on.
Criteria for the components:
Eigen value > 1
Cumulative % > 60%
Scree Plot Find out the point where the graph is almost flat i.e. parallel to x axis
Based on criteria,
Number of components = 4

Step 6: To determine variables in each component

this

the
of
for is
four extracted factors.

Here, we will
consider
rotated
component
matrix,
because in
case Varimax
Rotation
tries to
maximize the
variance of
each of the
factors, so
total amount
variance
accounted
redistributed over the

Component 1: Classical Music, Opera, Broadway Musicals, Folk Music, Big Band Music
Component 2: Blues or R & B Music, Jazz Music
Component 3: Country Western Music, Bluegrass Music
Component 4: Heavy Metal Music, Rap Music

Step 7: To check the reliability of each component


Analyze Scale Reliability Analysis
In Statistics select
Scale if Item deleted
Criteria:
Cronbach alpha is the measure of internal consistency that is how closely related set of
items are in group
Minimum Criteria: Cronbachs alpha 0.7
Also check the column Cronbachs alpha if item is deleted in table Item-Total Statistics.
If the value in the table for any variable is greater than the cronbachs alpha for that
component, then delete that variable and run the whole analysis again
Step 7a: Select Component 1 variables

Here, Cronbachs alpha = 0.795 > 0.7


Cronbachs alpha if Item deleted value is less than 0.795 for all variables.
So Component 1 is significant.

Step 7b: Select Component 2 variables

Here, Cronbachs alpha = 0.714 > 0.7


Cronbachs alpha if Item deleted value is less than 0.714 for all variables.
So Component 2 is significant.

Step 7c: Select Component 3 variables

Here, Cronbachs alpha = 0.580 < 0.7


So Component 3 is not significant.

Step 7d: Select Component 4 variables

Here, Cronbachs alpha = 0.529 < 0.7


So Component 4 is not significant.

Step 8: Remove component 3 and component 4 as they are not significant


Remove variables from component 3 and 4 as it is not significant and run the factor analysis
again
Now,
KMO = 0.781 0.5

In Communalities,
Minimum Criteria: The extraction value for each variable 0.5
Here, for Folk Music Extraction value = 0.462 => Not significant
Bigband music Extraction Value = 0.491 => Not Significant
So remove the variables Folk Music and Bigband music and run the factor analysis again

Step 9:

After removing Folk Music and Big Band Music


KMO = 0.695 0.5

Here, extraction values for all variables > 0.5

Total Components to be considered = 2

Component 1: Classical Music, Opera, Broadway Musicals,


Component 2: Blues or R & B Music, Jazz Music

Step 10: Check Reliability of each component


Step 10 a: Reliability Analysis of Component 1

Here, Cronbachs alpha = 0.761 > 0.7


Cronbachs alpha if Item deleted value is less than 0.761 for all variables.
So Component 1 is significant.
Step 10 b: Reliability Analysis of Component 2

Here, Cronbachs alpha = 0.714 > 0.7


Cronbachs alpha if Item deleted value is less than 0.714 for all variables.
So Component 2 is significant.
Therefore 2 significant clusters are formed:

Component 1: Classical Music, Opera, Broadway Musicals,


Component 2: Blues or R & B Music, Jazz Music

QUESTION 2:
Banks are concerned about people who default on loans that they have taken. In order to
identify potential defaulters, a bank has collected data on 850 people who have taken loans
in the past. The file bankloan.sav contains information about the loans and also information
about which of these people defaulted on their loans (variable: default. default=0 if not
previously defaulted). To help the bank identify potential defaulters, build a model to
predict defaulters using the other variables in the file as possible predictors. Write a report
of your analysis.
Here is the profile of a potential loan taker:
Age: 30, level of education: 4, years employed: 6, years at current address: 3, income: 100,
debt-to-income ratio: 10, credit card debt: 10, other debt: 10. What is the probability that
this person will default on his loan?

ANALYSIS:

Step 1: Open the file bankloan.sav. The file contains data of 850 people. The independent
variable is categorical which denotes if a person has defaulted a bank loan or not. The
remaining variables are dependent variables.
Step 2:
Analyze Regression Binary Logistic
Method: Backward LR

Block 1: Method = Backward Stepwise (Likelihood Ratio)

Variables not in the Equation


Score
Step 0

Variables

age

df

Sig.

13.265

.000

9.205

.002

years_employ

56.054

.000

years_stay

18.931

.000

income

3.526

.060

debtinc

106.238

.000

creddebt

41.928

.000

othdebt

14.863

.000

201.873

.000

ed

Overall Statistics

Classification Tablea,b
Predicted
Previously defaulted
Observed
Step 0

No

Previously defaulted

Yes

517

100.0

Yes

183

.0
73.9

a. Constant is included in the model.


b. The cut value is .500

Omnibus Tests of Model Coefficients


Chi-square

Step 2a

Step 3

Step 4

df

Sig.

Step

252.695

.000

Block

252.695

.000

Model

252.695

.000

-.539

.463

Block

252.156

.000

Model

252.156

.000

-.810

.368

Block

251.347

.000

Model

251.347

.000

-.159

.690

Block

251.188

.000

Model

251.188

.000

Step

Step

Step

a. A negative Chi-squares value indicates that the Chisquares value has decreased from the previous step.

Model Summary
Step

-2 Log

Cox & Snell R

Nagelkerke R

likelihood

Square

Square

551.669a

.303

.444

552.208

.302

.443

553.017

.302

.442

553.176

.302

.441

3
4

Correct

No

Overall Percentage

Step 1

Percentage

a. Estimation terminated at iteration number 6 because


parameter estimates changed by less than .001.

Classification Tablea
Predicted
Previously defaulted
Observed
Step 1

Previously defaulted

No

Percentage

Yes

Correct

No

472

45

91.3

Yes

90

93

50.8

Overall Percentage
Step 2

Previously defaulted

80.7
No

476

41

92.1

Yes

89

94

51.4

Overall Percentage
Step 3

Previously defaulted

81.4
No

475

42

91.9

Yes

89

94

51.4

Overall Percentage
Step 4

Previously defaulted

81.3
No

476

41

92.1

Yes

89

94

51.4

Overall Percentage

81.4

a. The cut value is .500

We can analyze when we started, in the first step overall percentage in the classification table was only 73.9%.
That means theres is 0.739 probability that the predicted value is correct

After 4 steps of Backward LT method analysis, few of the variables are removed and the percentage is changed
to 81.4%. So now there is 0.814 probability that the predicted value is correct.
Variables in the Equation
B
a

Step 1

Step 2

Wald

df

Sig.

Exp(B)

age

.034

.017

3.924

.048

1.035

ed

.091

.123

.542

.462

1.095

years_employ

-.258

.033

60.645

.000

.772

years_stay

-.105

.023

20.442

.000

.900

income

-.009

.008

1.159

.282

.991

debtinc

.067

.031

4.863

.027

1.070

creddebt

.626

.113

30.742

.000

1.869

othdebt

.063

.077

.655

.418

1.065

-1.554

.619

6.294

.012

.211

.034

.017

3.776

.052

1.034

years_employ

-.265

.032

68.612

.000

.767

years_stay

-.104

.023

20.094

.000

.901

income

-.008

.008

.864

.352

.992

debtinc

.065

.031

4.541

.033

1.067

Constant
a

S.E.

age

creddebt

.628

.114

30.512

.000

1.874

othdebt

.070

.078

.818

.366

1.073

-1.378

.572

5.810

.016

.252

.034

.017

3.740

.053

1.034

years_employ

-.258

.031

70.200

.000

.773

years_stay

-.103

.023

19.857

.000

.902

income

-.003

.006

.160

.689

.997

debtinc

.086

.020

18.433

.000

1.090

creddebt

.595

.105

32.207

.000

1.814

Constant

-1.591

.522

9.281

.002

.204

.033

.017

3.594

.058

1.033

years_employ

-.261

.030

75.023

.000

.770

years_stay

-.104

.023

20.157

.000

.902

debtinc

.089

.019

23.162

.000

1.093

creddebt

.573

.087

43.101

.000

1.773

Constant

-1.631

.513

10.124

.001

.196

Constant
a

Step 3

age

Step 4

age

a. Variable(s) entered on step 1: age, ed, years_employ, years_stay, income, debtinc, creddebt,
othdebt.

Model if Term Removed


Change in -2
Variable
Step 1 age

Model Log

Log

Sig. of the

Likelihood

Likelihood

df

Change

-277.777

3.885

.049

-276.104

.539

.463

-318.127

84.585

.000

years_stay

-286.880

22.092

.000

income

-276.367

1.065

.302

debtinc

-278.274

4.878

.027

creddebt

-298.555

45.441

.000

othdebt

-276.161

.652

.419

-277.973

3.738

.053

-326.097

99.987

.000

years_stay

-286.961

21.714

.000

income

-276.510

.813

.367

debtinc

-278.398

4.589

.032

creddebt

-298.687

45.167

.000

othdebt

-276.509

.810

.368

-278.360

3.702

.054

ed
years_emplo
y

Step 2 age
years_emplo
y

Step 3 age

years_emplo

-326.509

100.001

.000

years_stay

-287.231

21.445

.000

income

-276.588

.159

.690

debtinc

-285.939

18.861

.000

creddebt

-298.966

44.915

.000

-278.366

3.556

.059

-330.797

108.418

.000

years_stay

-287.499

21.822

.000

debtinc

-288.488

23.799

.000

creddebt

-309.280

65.385

.000

Step 4 age
years_emplo
y

The sig value < 0.05 for the variable to be significant.


When we analyze the model if term removed, the sig values is high or equal to original values. So we do not
remove any more variables from the equation.
Variables not in the Equation
Score
a

Step 2

Step 3

.542

.461

Overall Statistics

.542

.461

Variables ed

.702

.402

.820

.365

1.389

.499

.402

.526

income

.159

.690

othdebt

.157

.692

1.588

.662

Overall Statistics
Step 4

Sig.

Variables ed

othdebt
c

df

Variables ed

Overall Statistics

a. Variable(s) removed on step 2: ed.


b. Variable(s) removed on step 3: othdebt.
c. Variable(s) removed on step 4: income.

Thus we see that level of education, Household income in thousands and other debt in thousands is removed
from the equation because the sig value is greater than 0.05.

The Regression Equation


Probability of defaulting: -1.631 + 0.573(credit card debt in thousands) + 0.089(debt to income ratio (X100))
0.104(years at current address) 0.261(years with current employer) + 0.033(age in years)
If
0 < Probability of defaulting < 0.5

The Person will default the bank loan


0.5 < Probability of defaulting < 1
The Person will not default the bank loan
In Problem given:
Age = 30 years
Credit Card debt (in thousands) = 10
Debt to income ratio (x100) = 10
Years at current address = 3
Years with current employer = 6
Probability of defaulting=
= -1.631 + 0.573 * 10 + 0.089 * 10 0.104 *3 0.261 *6 + 0.033 *30
= -1.631 + 5.73 + 0.89 0.312 1.566 + 0.99
= 4.101
log(p/1-p) = 4.101
p/1-p = e^4.101
p/1-p = 60.4006
p = 60.4006 60.4006p
61.4006p = 60.4006
p = 0.983
As the p value > 0.5, the person will not default.

Вам также может понравиться