Вы находитесь на странице: 1из 42

BEO2255

APPLIED STATISTICS FOR BUSINESS

ASSIGNMENT 3

NAME VU ID
BEH YANG CHENG 4520339

JAMIE OOI ZHI YUAN 4538241

TEE HUEY YEE 4520382

TUTOR’S NAME: MS. MOY TOW YOON

DUE DATE: 19th MAY 2016

DATE SUBMITTED: 19th MAY 2016

1
Part A

Question 1

The regression Model is shown as below:

WEEKLYRENT= β0 + β1 BEDROOMS + β2 BATHROOMS + β3 GARAGE + β4 LOCATION +


β5 TYPE + ε

LOCATION and TYPE are dummy variables. LOCATION 1 (City) and TYPE 1 (House) are the
base level.

Table 1 below shows the estimated regression model for the regression model above.

Table 1: Coefficientsa
Model Unstandardized Standardized t Sig.
Coefficients Coefficients
B Std. Error Beta
(Constant) 271.904 28.392 9.577 .000
Bedrooms 21.538 9.512 .231 2.264 .026
Bathrooms 62.796 13.831 .409 4.540 .000
1
Garage 15.533 9.844 .124 1.578 .118
Location -35.011 17.208 -.165 -2.035 .045
Type 36.390 25.024 .139 1.454 .149
a. Dependent Variable: WeeklyRent

The estimated Equation for the regression model is:


WEEKLYRENT= 271.904 + 21.538 BEDROOMS + 62.796 BATHROOMS + 15.533
GARAGE – 35.011 LOCATION + 36.390 TYPE

2
Question 2

WEEKLYRENT= β0 + β1 BEDROOMS + β2 BATHROOMS + β3 GARAGE + β4 LOCATION +


β5 TYPE + ε

Based on experience or theories, the a priori signs for each coefficient of the regression model
are:

1. β1 should be positive because when BEDROOMS (the number of BEDROOMS)


increases, the landlord can accommodate more tenants in the property as it is bigger (per
square meters) and more spacious. Hence, the WEEKLYRENT of the property will
increase. Therefore, the a priori sign of BEDROOMS is positive. From the SPSS output,
the estimated sign for BEDROOMS is positive, same as the a priori sign.

2. β2 should be positive because when BATHROOMS (the number of BATHROOMS)


increases, the tenants will find it more convenient living in the property. Hence, the
WEEKLYRENT of the property will increase. Therefore, the a priori sign of
BATHROOMS is positive. From the SPSS output, the estimated sign for BATHROOMS
is positive, same as the a priori sign.

3. β3 should be positive because when GARAGE (the number of car spaces) increases, the
more cars can be accommodated. Hence, the WEEKLYRENT of the property will
increase. Therefore, the a priori sign of GARAGE is positive. From the SPSS output, the
estimated sign for GARAGE is positive, same as the a priori sign.

4. β4 should be positive because when the property is located in a city, the weekly rental will
rise because it is more convenient to be located in a city as the facilities provided are
more efficient compared to suburb locations. Therefore, the a priori sign of LOCATION
is positive. However, from the SPSS output, the estimated sign for LOCATION is
negative, different as the a priori sign.

3
5. β5 should be positive because when TYPE (the type of property) is a HOUSE, the size of
the house is bigger, more spacious and can accommodate more people compared to the
UNIT type of property. Hence, the WEEKLYRENT of the property will increase.
Therefore, the a priori sign of TYPE is positive. From the SPSS output, the estimated
sign for TYPE is positive, same as the a priori sign.

4
Question 3

Based on Table 1 with reference to each of the coefficient value of the estimated Model, it can be
observed that:

BEDROOMS

Hypothesis:

H0: The number of bedrooms is not significant in explaining the weekly rent of properties.

H1: The number of bedrooms is significant in explaining the weekly rent of properties.

Decision:

1. The number of bedrooms in the property


● When the number of bedrooms in the property increase by 1 unit of bedroom,
then the weekly rent will increase by $A 21.538, on the average; while other
things being constant.
● The p-value for BEDROOMS is 0.026, which is smaller than 0.05, the level of
significance for the test.
● The significance value of 0.026 indicates that there is a strong evidence to reject
the null hypothesis at 5% level of significance.
● Therefore, we reject the null hypothesis that the number of bedrooms is not
significant in explaining the weekly rent of properties.
Conclusion:

At 5% significance level, there is sufficient evidence to conclude that the number of bedrooms is
significant in explaining the weekly rent of properties.

5
BATHROOMS

H0: The number of bathrooms is not significant in explaining the weekly rent of properties.

H1: The number of bathrooms is significant in explaining the weekly rent of properties.

Decision:

1. The number of bathrooms in the property


● When the number of bathrooms in the property increases by 1 unit, the weekly
rent will increase by $A 62.796, on the average; while other things being constant.
● The p-value for BATHROOMS is 0.000, which is smaller than 0.05, the level of
significance for the test.
● The significance value of 0.000 indicates that there is a strong evidence to reject
the null hypothesis at 5% level of significance.
● Therefore, we reject null hypothesis that the number of bathrooms is not
significant in explaining the weekly rent of properties.
Conclusion:

At 5% significance level, there is sufficient evidence to conclude that the number of bedrooms is
significant in explaining the weekly rent of properties.

6
GARAGE

H0: The number of garage is not significant in explaining the weekly rent of properties.

H1: The number of garage is significant in explaining the weekly rent of properties.

Decision:

1. The number of car spaces (GARAGE) in the property


● When the number of car spaces in the property increases by 1 unit, the weekly
rent will increase by $A 15.533, on the average; while other things being constant.
● The p-value for GARAGE is 0.118, which is larger than 0.05, the level of
significance for the test.
● The significance value of 0.118 indicates that there is a strong evidence to accept
the null hypothesis at 5% level of significance.
● Therefore, we accept the null hypothesis that the number of garage is not
significant in explaining the weekly rent of properties.
Conclusion:

At 5% significance level, there is insufficient evidence to conclude that the number of garage is
significant in explaining the weekly rent of properties.

7
LOCATION:

H0: The type of location is not significant in explaining the weekly rent of properties.

H1: The type of location is significant in explaining the weekly rent of properties.

Decision:

1. The type of location (City, Suburb)


● When the type of location is a city, then the WEEKLYRENT is less by $A 35.011
compared to suburb location, on the average; while other things being constant.
● The p-value for LOCATION is 0.045, which is smaller than 0.05, the level of
significance for the test.
● The significance value of 0.045 indicates that there is a strong evidence to reject
the null hypothesis at 5% level of significance.
● Therefore, we reject null hypothesis that the type of location is not significant in
explaining the weekly rent of properties.
Conclusion:

At 5% significance level, there is sufficient evidence to conclude that the type of location is
significant in explaining the weekly rent of properties.

8
TYPE OF PROPERTY

H0: The type of property is not significant in explaining the weekly rent of properties.

H1: The type of property is significant in explaining the weekly rent of properties.

Decision:

1. The type of property (House, Unit).


● When the type of property is a house, the WEEKLYRENT is more by $A 36.390
compared to unit type of property, on the average; while other things being
constant.
● The p-value for the type of property is 0.149, which is larger than 0.05, the level
of significance for the test.
● The significance value of 0.149 indicates that there is a strong evidence to accept
the null hypothesis at 5% level of significance.
● Therefore, we accept the null hypothesis that the type of property is not
significant in explaining the weekly rent of properties.
Conclusion:

At 5% significance level, there is insufficient evidence to conclude that the type of property is
significant in explaining the weekly rent of properties.

9
Question 4

Table 2 & Table 3 show the summary of the regression model and the descriptive statistic for the
regression.

Table 2: Model Summaryb


Mode R R Square Adjusted R Std. Error of
l Square the Estimate
1 .695a .483 .456 74.521
a. Predictors: (Constant), Type, Garage, Bathrooms,
Location, Bedrooms
b. Dependent Variable: WeeklyRent

Table 3: Descriptive Statistics


Mean Std. Deviation N
WeeklyRent 452.33 100.995 100
Bedrooms 3.09 1.083 100
Bathrooms 1.35 .657 100
Garage 1.44 .808 100
Location .66 .476 100
Type .82 .386 100

𝑆𝐸
𝐶𝑉 = × 100%
𝑀𝑒𝑎𝑛
74.521
= × 100%
452.33

= 16.47 %

Adj R2 = 0.456 = 45.6%

10
The goodness of fit of the model:
● The adjusted R2 is 0.456. The model is moderately satisfactory for explaining 45.6% of
the variation in the weekly rent of properties.
● This means that 45.6% of the variation in WEEKLYRENT can be explained by the
variation in the independent variable (BEDROOMS, BATHROOMS, GARAGE,
LOCATION and TYPE).
● The goodness of fit of the model is considered a good model because it has a low
coefficient of variance value of 16.47% and a moderately high adjusted R2 of 0.456.

11
Question 5
H0: The model is not significant.

H1: The model is significant.

Table 4 below shows the ANOVA test for the model.

Table 4: ANOVAa
Model Sum of df Mean F Sig.
Squares Square
Regression 487783.106 5 97556.621 17.567 .000b
1 Residual 522011.004 94 5553.309
Total 1009794.110 99
a. Dependent Variable: WeeklyRent
b. Predictors: (Constant), Type, Garage, Bathrooms, Location, Bedrooms

Decision:

● The significant value of the ANOVA test is 0.000, which is smaller than 0.05, the level of
significance for the test.
● The significance value of 0.000 indicates that there is strong evidence to reject the null
hypothesis.
● Therefore, we reject the null hypothesis that the model is not significant.

Conclusion:

At 5% significance level, we have sufficient evidence to conclude that the model is significant.

12
Question 6

a) Multi-collinearity

Table 5 shows the Pearson Correlation Test to check the multi-collinearity problems on the
regression model.
Table 5: Correlations
WeeklyRent Bedrooms Bathrooms Garage Location Type
WeeklyRent 1.000 .536 .579 .304 -.177 .401
Bedrooms .536 1.000 .509 .197 .001 .522
Bathrooms .579 .509 1.000 .316 .061 .171
Pearson Correlation
Garage .304 .197 .316 1.000 .078 .127
Location -.177 .001 .061 .078 1.000 -.336
Type .401 .522 .171 .127 -.336 1.000
WeeklyRent . .000 .000 .001 .039 .000
Bedrooms .000 . .000 .025 .495 .000
Bathrooms .000 .000 . .001 .272 .044
Sig. (1-tailed)
Garage .001 .025 .001 . .221 .104
Location .039 .495 .272 .221 . .000
Type .000 .000 .044 .104 .000 .
WeeklyRent 100 100 100 100 100 100
Bedrooms 100 100 100 100 100 100
Bathrooms 100 100 100 100 100 100
N
Garage 100 100 100 100 100 100
Location 100 100 100 100 100 100
Type 100 100 100 100 100 100

13
Based on Table 5, it can be observed that:

● The largest Pearson Correlation coefficient is found between BEDROOMS and TYPE,
which is 0.522.
● The Squared Correlation figure of 0.2725 (0.522)2 is smaller compared to the adjusted R2
of the regression model, which is 0.456. Therefore, the correlation between BEDROOMS
and TYPE show no signs of collinearity problem.
● Based on the table above, since the biggest available coefficient between pairs of
independent variables of Table 5 is not a problem, it can be concluded that the rest of the
coefficient may not have a problem. Thus, the regression does not have multi-collinearity
problem.

b) Heteroskedasticity

Figure 1: Scatterplot of Dependent Variable (WeekyRent)

14
Based on Figure 1, it can be observed that:

● By ignoring the possible outliers as shown in the circle in Figure 1, the residuals can be
seen fairly evenly distributed against the dependent variable of WEEKLYRENT.
● The distribution pattern of the residual in the regression model suggests that the model
does not have a heteroskedasticity problem.

c) Non-normality
Table 6: Descriptive Statistics
N Minimu Maximu Skewness Kurtosis
m m (peakness)
Statistic Statistic Statistic Statistic Std. Statistic Std.
Error Error
Standardized 100 -2.34236 3.41423 .613 .241 1.282 .478
Residual
Valid N (listwise) 100

Figure 2: Histogram of Dependent Variable (WeekyRent)

15
If the data is normally distributed, the histogram should have a normal profile, the skewness
statistic should be close to zero and the kurtosis statistic should be close to zero.

Based on Table 6, there are 100 observations. It is fairly small sample size. The skewness
statistic is 0.613, which is moderately larger than 0. This indicates that this distribution is skewed
to the right. The kurtosis of the distribution of residuals is 1.282, which is larger than 0. This
shows that the distribution is high peaked.

Based on Figure 2, the histogram shows a normal profile plot and has a bell-shape curve. Besides
that, the distribution of the histogram is relatively symmetry. Therefore, it is relatively normally
distributed.

Conclusion
Based on Table 5 & 6 and Figure 1 & 2, we can conclude that the regression model does not
show non-normality problem and heteroskedasticity problem. Besides, the regression model
shows no multi-collinearity problem.

16
Question 7
The estimated equation for the regression model is:
WEEKLYRENT= 271.904 + 21.538 BEDROOMS + 62.796 BATHROOMS + 15.533
GARAGE – 35.011 LOCATION + 36.390 TYPE

Based on the information given in Question 7, the following are:


TYPE = 1 (House)
LOCATION = 0 (Suburb)
BEDROOMS = 5 (number of bedrooms)
BATHROOMS = 3 (number of bathrooms)
GARAGE = 3 (cars spaces)

After replacing it with the relevant figures given above, the result is shown as below:

WEEKLYRENT= 271.904 + 21.538 (5) + 62.796 (3) + 15.533 (3) – 35.011 (0) + 36.390 (1)
= $ 650.971

17
Question 8
The goodness of fit model considered good as the adjusted R2 demonstrates that the five
variables – BEDROOMS, BATHROOMS, GARAGE, LOCATION and TYPE of properties are
able to explain 45.6% of the variation in the weekly rent of properties. The remaining 54.4%
remains unexplained while there may be other hidden factors that affect the weekly rent of the
property.
Out of the five variables, three of the variables, which are BEDROOMS, BATHROOMS
and LOCATION strongly affect the weekly rent of the property in western suburbs of
Melbourne. We considered these variables as good predictors to predict the weekly rent of the
properties. A greater number of bedrooms and bathrooms in the property will cause the weekly
rent to be higher as it is more spacious and convenient; accommodating more people. Therefore,
the real estate agent is able to set the weekly rental at a high price. Besides that, when the
property is located in a suburb area compared to a city area, the weekly rent of the property is
greater than the city area. This is because the areas in western suburbs in Melbourne are close to
the city of Melbourne and are mainly residential area. So, demand is high as many people would
want to live there and it is also close to their workplace. Therefore, the real estate agent can set
the price higher in western suburb areas in Melbourne to compete with other real estate agents as
they are the price setter for the property.
In conclusion, these types of data and information can be extremely useful and valuable
for the real estate agent in determining the weekly rent of the property.

18
PART B

Question 9 & Question 10

Discriminant Model (1)

Table 7 shows the results for group statistics.

Table 7: Group Statistics

Subscriber Mean Std. Deviation Valid N (listwise)

Unweighted Weighted

Age 48.76 11.296 66 66.000

Income 47.02 7.449 66 66.000

Otherwise Marital .50 .504 66 66.000

Profession .21 .412 66 66.000

Gender .62 .489 66 66.000


Age 29.30 6.680 84 84.000
Income 52.70 7.133 84 84.000
Subscriber Marital .64 .482 84 84.000
Profession .20 .404 84 84.000
Gender .76 .428 84 84.000
Age 37.86 13.208 150 150.000

Income 50.20 7.783 150 150.000

Total Marital .58 .495 150 150.000

Profession .21 .406 150 150.000

Gender .70 .460 150 150.000

Based on Table 7, it can be observe that:


 The means of AGE and INCOME look different between Subscriber and Otherwise,
these variables may be predictors.
 The means of MARITAL, PROFESSION and GENDER look similar between Subscriber
and Otherwise. Therefore, these variables may not be predictors.

19
Table 8 shows the tests of equality of group means.

Table 8: Tests of Equality of Group Means

Wilks' Lambda F df1 df2 Sig.

Age .462 172.655 1 148 .000


Income .868 22.595 1 148 .000
Marital .979 3.120 1 148 .079
Profession 1.000 .021 1 148 .885
Gender .977 3.519 1 148 .063

The null hypotheses are as follow:


(a) The mean of AGE for subscriber and otherwise are the same.
(b) The mean of INCOME for subscriber and otherwise are the same.
(c) The mean of MARITAL for subscriber and otherwise are the same.
(d) The mean of PROFESSION for subscriber and otherwise are the same.
(e) The mean of GENDER for subscriber and otherwise are the same.

The corresponding alternative hypotheses are as follows:


(f) The mean of AGE for subscriber and otherwise are not the same.
(g) The mean of INCOME for subscriber and otherwise are not the same.
(h) The mean of MARITAL for subscriber and otherwise are not the same.
(i) The mean of PROFESSION for subscriber and otherwise are not the same.
(j) The mean of GENDER for subscriber and otherwise are not the same.

Based on Table 8, it can be observed that:


(a) Age
Decision:
 The significance value of AGE is 0.000, which is smaller than 0.05, the level of
significance for the test.
 Therefore, we reject the null hypothesis that the mean of AGE for subscriber and
otherwise are the same.
 The significance value of 0.000 indicates that there is strong evidence against the null
hypothesis.
Conclusion:
 At 5% significance level, we conclude that AGE is a useful predictor.

20
(b) Income
Decision:
 The significance value of INCOME is 0.000, which is smaller than 0.05, the level of
significance for the test.
 Therefore, we reject the null hypothesis that the mean of INCOME for subscriber and
otherwise are the same.
 The significance value of 0.000 indicates that there is strong evidence against the null
hypothesis.
Conclusion:
 At 5% significance level, we conclude that INCOME is a useful predictor.

(c) Marital
Decision:
 The significance value of MARITAL is 0.079, which is larger than 0.05, the level of
significance for the test.
 Therefore, we do not reject the null hypothesis that the mean of MARITAL for subscriber
and otherwise are the same.
 The significance value of 0.079 indicates that there is insufficient evidence to reject the
null hypothesis.
Conclusion:
 At 5% significance level, we conclude that MARITAL is not a useful predictor.

(d) Profession
Decision:
 The significance value of PROFESSION is 0.885, which is larger than 0.05, the level of
significance for the test.
 Therefore, we do not reject the null hypothesis that the mean of PROFESSION for
subscriber and otherwise are the same.
 The significance value of 0.885 indicates that there is insufficient evidence to reject the
null hypothesis.
Conclusion:
 At 5% significance level, we conclude that PROFESSION is not a useful predictor.

21
(e) Gender
Decision:
 The significance value of GENDER is 0.063, which is larger than 0.05, the level of
significance for the test.
 Therefore, we do not reject the null hypothesis that the mean of GENDER for subscriber
and otherwise are the same.
 The significance value of 0.063 indicates that there is insufficient evidence to reject the
null hypothesis.
Conclusion:
 At 5% significance level, we conclude that GENDER is not a useful predictor.

Table 9 shows the correlation coefficients (pooled within-groups matrices).


Table 9: Pooled Within-Groups Matrices

Age Income Marital Profession Gender

Age 1.000 .004 .108 -.012 -.023

Income .004 1.000 -.095 -.101 .241

Correlation Marital .108 -.095 1.000 -.301 -.019

Profession -.012 -.101 -.301 1.000 -.024

Gender -.023 .241 -.019 -.024 1.000

Based on the Table 9, it can be observe that:


 The largest relevant absolute correlation is discovered between PROFESSION and
MARITAL, which is 0.301. The largest relevant correlation seems relatively small, thus
there is no problem of multi-collinearity.

22
Table 10 shows canonical correlation.
Table 10: Eigenvalues

Function Eigenvalue % of Variance Cumulative % Canonical


Correlation

1 1.429a 100.0 100.0 .767

a. First 1 canonical discriminant functions were used in the analysis.

Based on Table 10, it can be observe that:


 The discriminant function explains 58.83% (0.7672) of the variation in the model.
 The large value suggests the relative importance of the predictor variables.
 The eigenvalue of 1.429 seems to be relatively big, which supports the usefulness of the
model.
 We can conclude that the model may be useful because it can explain a relatively large
percentage of the variation in the whole model.

Table 11 shows the test of the discriminant function.


Table 11: Wilks' Lambda

Test of Function(s) Wilks' Lambda Chi-square df Sig.

1 .412 129.143 5 .000

H0: The discriminant function is not significant.


H1: The discriminant function is significant.

Decision:
 Based on Table 11, the significance value of the test is 0.000, which is smaller than 0.05,
the level of significance for the test.
 Therefore, we reject the null hypothesis that the discriminant function is not significant.
 The significance value of 0.000 indicates that there is strong evidence against the null
hypothesis.
Conclusion:
 At 5% significance level, we have sufficient evidence to conclude that the discriminant
function is significant.

23
Table 12 shows the standardized canonical discriminant function coefficients.

Table 12: Standardized Canonical Discriminant Function Coefficients

Function

Age .934
Income -.362
Marital -.288
Profession -.102
Gender -.028

Based on Table 12, it can be interpret that:


 The coefficient of AGE is 0.934, which considered as the highest coefficient of the five
variables. Therefore, AGE is the most important variable.
 The absolute coefficient of INCOME is 0.362, which considered as the second highest
coefficient of the five variables. Therefore, INCOME is the second important variable.
 The absolute coefficient of MARITAL is 0.288, which considered as the third highest
coefficient of the five variables. Therefore, MARITAL is the third important variable.
 The absolute coefficient of PROFESSION is 0.102, which considered as the fourth
highest coefficient of the five variables. Therefore, PROFESSION is the fourth important
variable.
 The absolute coefficient of GENDER is 0.028, which considered as the smallest
coefficient of the five variables. Therefore, PROFESSION is the least important variable.

24
Table 13 shows the unstandardized canonical discriminant function coefficients.

Table 13: Canonical Discriminant Function Coefficients

Function

Age .104
Income -.050
Marital -.587
Profession -.251
Gender -.061
(Constant) -.996

Unstandardized coefficients

The discriminant function is given by:


Subscriber = β0 + β1 AGE + β2 INCOME + β3 MARITAL + β4 PROFESSION + β5 GENDER

The estimated equation from Table 13 is:


Subscriber = - 0.996 + 0.104 AGE - 0.050 INCOME - 0.587 MARITAL - 0.251 PROFESSION -
0.061 GENDER

Table 14 shows the unstandardized canonical discriminant function evaluated at group means.

Table 14: Functions at Group Centroids

Subscriber Function

Otherwise 1.340
Subscriber -1.053

Unstandardized canonical discriminant functions evaluated at group means

Based on Table 14, the average centroid (mid-point) can be calculated as follows:

Average centroid = (1.340-1.053) / 2


= 0.287 / 2
= 0.1435

25
Table 15 shows the hit ratio classification results.

Table 15: Classification Resultsa,b

Subscriber Predicted Group Membership Total

Otherwise Subscriber

Otherwise 55 11 66
Count
Subscriber 6 78 84
Cases Selected Original
Otherwise 83.3 16.7 100.0
%
Subscriber 7.1 92.9 100.0
Otherwise 10 10 20
Count
Subscriber 3 27 30
Cases Not Selected Original
Otherwise 50.0 50.0 100.0
%
Subscriber 10.0 90.0 100.0

a. 88.7% of selected original grouped cases correctly classified.


b. 74.0% of unselected original grouped cases correctly classified.

Based on Table 15, it can be interpret that:

 The hit ratio for the analysis sample is 88.7%, while the hit ratio for the held out sample
is 74.0%.
 The hit ratio for the analysis sample is relatively high as it is above 80%.
 On the other hand, the held out sample is not relatively high as it is lower than 80%.
 This indicates that the discriminant function does not appear to be very satisfactory.

26
Question 12

The estimated equation for the regression model is:


Subscriber = - 0.996 + 0.104 AGE - 0.050 INCOME - 0.587 MARITAL - 0.251 PROFESSION -
0.061 GENDER

Based on the information given in Question 12, the following are:


Age: 40 years old

Income: $ 65,000 / $65K

Marital: 0 (Single)

Profession: 1 (Full-time)

Gender: 1 (Male)

After replacing it with the relevant figures given above, the result is shown as below:

Subscriber = - 0.996 + 0.104 (40) - 0.050 (65) - 0.587(0) - 0.251 (1) - 0.061 (1)
= - 0.398
= - $398

From the estimation above, D = - 0.398 which is smaller than the average centroid 0.1435.
Therefore, 40 years old single male in full-time employment and earns an annual income of
$65,000 is unlikely to be a subscriber to the online newspaper.

27
Part C (Factor Analysis)
Question 13 & Question 14

Table 16 shows mean and standard deviations of each variable for 200 observations from a
random sample of persons over the age of 18 who are in full-time or part-time employment.

Table 16: Descriptive Statistics

Mean Std. Deviation Analysis N

Informative 2.48 1.378 200


Current 2.43 1.180 200
Economic 2.91 1.366 200
Mode 3.69 1.233 200
Payment 3.05 .923 200
Useful 3.79 1.235 200

Based on Table 16, it can be observed that:


 Informative and Current variables appear to be similar with quite close mean values.
Therefore, these two variables can be classified as one factor.
 Economic and Payment variables appear to be similar with quite close mean values.
Therefore, these two variables can be classified as one factor.
 Mode and Useful variables appear to be similar with quite close mean values. Therefore,
these two variables can be classified as one factor.

Table 17 shows the correlation matrix between variables.

Table 17: Correlation Matrix

Informative Current Economic Mode Payment Useful

Informative 1.000 .695 .038 -.166 -.043 -.128

Current .695 1.000 .166 .051 .063 .002

Economic .038 .166 1.000 -.044 .466 -.131


Correlation
Mode -.166 .051 -.044 1.000 .067 .767

Payment -.043 .063 .466 .067 1.000 -.052

Useful -.128 .002 -.131 .767 -.052 1.000

Based on Table 17, it can be observed that:


 The correlation between Mode and Useful is 0.767, which is very high.
 Therefore, factor analysis is recommended due to high correlation between variables.

28
Table 18 shows the total variance explained by the factors.
Table 18: Total Variance Explained

Component Initial Eigenvalues Rotation Sums of Squared Loadings

Total % of Variance Cumulative % Total % of Variance Cumulative %

1 1.915 31.925 31.925 1.795 29.911 29.911


2 1.628 27.137 59.062 1.701 28.345 58.256
3 1.440 23.997 83.059 1.488 24.803 83.059
4 .523 8.723 91.781
5 .287 4.789 96.570
6 .206 3.430 100.000

Extraction Method: Principal Component Analysis.

Based on Table 18, it can be observed that:


 Three components have eigenvalues greater than 1, which suggests the presence of three
factors.
 The first factor explains 31.925% of the variation in the responses.
 The second factor explains 27.137% of the variation in the responses.
 The third factor explains 23.997% of the variation in the responses.
 In total, the first three factors explain 83.059% of the variation in the responses.

29
Figure 3: Scree Plot of the 6 Factors

Based on Figure 3, the scree plot profile shows that three factors are doing most of the work,
since the plot levels out after the first three factors.

30
Table 19 shows the rotated component matrix of the factors.

Table 19: Rotated Component Matrixa

Component

1 2 3

Informative -.143 .916 -.057


Current .079 .921 .122
Economic -.088 .098 .847
Mode .943 -.031 .050
Payment .043 -.038 .861
Useful .932 -.031 -.098

Extraction Method: Principal Component Analysis.


Rotation Method: Varimax with Kaiser
Normalization.
a. Rotation converged in 4 iterations.

Based on Table 19, it can be observed that:


 The variables Mode and Useful load heavily onto Factor 1.
 The variables Informative, Current, Economic and Payment do not load heavily onto
Factor 1.
 This suggests that Factor 1 is something like a ‘Cost’ factor.

 The variables Informative and Current load heavily onto Factor 2.


 The variables Economic, Mode, Payment and Useful do not load heavily onto Factor 2.
 This suggests that Factor 2 is something like a ‘Educational’ factor.

 The variables Economic and Payment load heavily onto Factor 3.


 The variables Informative, Current, Mode and Useful do not load heavily onto Factor 3.
 This suggests that Factor 3 is something like a ‘Functional’ factor.

31
Question15
Discriminant Model (2)
Table 20 shows the group statistics.
Table 20: Group Statistics

Subscriber Mean Std. Deviation Valid N (listwise)

Unweighted Weighted

Age 48.7575758 11.29609954 66 66.000

Income 47.0151515 7.44930028 66 66.000

Marital .5000000 .50383147 66 66.000

Profession .2121212 .41194292 66 66.000


Otherwise
Gender .6212121 .48880235 66 66.000

Cost .2883733 .80274019 66 66.000

Educational -.3671864 .84978613 66 66.000

Functional .0411726 .90975376 66 66.000


Age 29.2976190 6.67991011 84 84.000
Income 52.7023810 7.13346054 84 84.000
Marital .6428571 .48203527 84 84.000
Profession .2023810 .40418777 84 84.000
Subscriber
Gender .7619048 .42847580 84 84.000
Cost -.3209875 1.05707360 84 84.000
Educational .2655815 .98969432 84 84.000
Functional -.0176387 .98875186 84 84.000
Age 37.8600000 13.20820876 150 150.000

Income 50.2000000 7.78313441 150 150.000

Marital .5800000 .49521197 150 150.000

Profession .2066667 .40627076 150 150.000


Total
Gender .7000000 .45979278 150 150.000

Cost -.0528687 .99783006 150 150.000

Educational -.0128364 .97978065 150 150.000


Functional .0082383 .95210300 150 150.000

Based on Table 20, it can be observed that:


 The means of AGE, INCOME, COST and EDUCATIONAL look different between
Subscriber and Otherwise, these variables may be predictors.
 The means of MARITAL, PROFESSION, GENDER and FUNCTIONAL looks similar
between Subscriber and Otherwise. Therefore, these variables may not be the predictors.

32
Table 21 shows the test of equality of group means.
Table 21: Tests of Equality of Group Means

Wilks' Lambda F df1 df2 Sig.

Age .462 172.655 1 148 .000


Income .868 22.595 1 148 .000
Marital .979 3.120 1 148 .079
Profession 1.000 .021 1 148 .885
Gender .977 3.519 1 148 .063
Cost .907 15.087 1 148 .000
Educational .897 17.079 1 148 .000
Functional .999 .140 1 148 .709

The null hypotheses are as follow:


(a) The mean of AGE for subscriber and otherwise are the same.
(b) The mean of INCOME for subscriber and otherwise are the same.
(c) The mean of MARITAL for subscriber and otherwise are the same.
(d) The mean of PROFESSION for subscriber and otherwise are the same.
(e) The mean of GENDER for subscriber and otherwise are the same.
(f) The mean of COST for subscriber and otherwise are the same.
(g) The mean of EDUCATIONAL for subscriber and otherwise are the same.
(h) The mean of FUNCTIONAL for subscriber and otherwise are the same.

The corresponding alternative hypotheses are as follows:


(a) The mean of AGE for subscriber and otherwise are not the same.
(b) The mean of INCOME for subscriber and otherwise are not the same.
(c) The mean of MARITAL for subscriber and otherwise are not the same.
(d) The mean of PROFESSION for subscriber and otherwise are not the same.
(e) The mean of GENDER for subscriber and otherwise are not the same.
(f) The mean of COST for subscriber and otherwise are not the same.
(g) The mean of EDUCATIONAL for subscriber and otherwise are not the same.
(h) The mean of FUNCTIONAL for subscriber and otherwise are not the same.

33
Based on Table 21, it can be observe that:
(a) Age
Decision:
 The significance value of AGE is 0.000, which is smaller than 0.05, the level of
significance for the test.
 Therefore, we reject the null hypothesis that the mean of AGE for subscriber and
otherwise are the same.
 The significance value of 0.000 indicates that there is strong evidence against the null
hypothesis.
Conclusion:
 At 5% significance level, we conclude that AGE is a useful predictor.

(b) Income
Decision:
 The significance value of INCOME is 0.000, which is smaller than 0.05, the level of
significance for the test.
 Therefore, we reject the null hypothesis that the mean of INCOME for subscriber and
otherwise are the same.
 The significance value of 0.000 indicates that there is strong evidence against the null
hypothesis.
Conclusion:
 At 5% significance level, we conclude that INCOME is a useful predictor.

(c) Marital
Decision:
 The significance value of MARITAL is 0.079, which is larger than 0.05, the level of
significance for the test.
 Therefore, we do not reject the null hypothesis that the mean of MARITAL for subscriber
and otherwise are the same.
 The significance value of 0.079 indicates that there is insufficient evidence to reject the
null hypothesis.
Conclusion:
 At 5% significance level, we conclude that MARITAL is not a useful predictor.

34
(d) Profession
Decision:
 The significance value of PROFESSION is 0.885, which is larger than 0.05, the level of
significance for the test.
 Therefore, we do not reject the null hypothesis that the mean of PROFESSION for
subscriber and otherwise are the same.
 The significance value of 0.885 indicates that there is insufficient evidence to reject the
null hypothesis.
Conclusion:
 At 5% significance level, we conclude that PROFESSION is not a useful predictor.

(e) Gender
Decision:
 The significance value of GENDER is 0.063, which is larger than 0.05, the level of
significance for the test.
 Therefore, we do not reject the null hypothesis that the mean of GENDER for subscriber
and otherwise are the same.
 The significance value of 0.063 indicates that there is insufficient evidence to reject the
null hypothesis.
Conclusion:
 At 5% significance level, we conclude that GENDER is not a useful predictor.

(f) Cost
Decision:
 The significance value of COST is 0.000, which is smaller than 0.05, the level of
significance for the test.
 Therefore, we reject the null hypothesis that the mean of AGE for subscriber and
otherwise are the same.
 The significance value of 0.000 indicates that there is strong evidence against the null
hypothesis.
Conclusion:
 At 5% significance level, we conclude that COST is a useful predictor.

35
(g) Educational
Decision:
 The significance value of EDUCATIONAL is 0.000, which is smaller than 0.05, the level
of significance for the test.
 Therefore, we reject the null hypothesis that the mean of EDUCATIONAL for subscriber
and otherwise are the same.
 The significance value of 0.000 indicates that there is strong evidence against the null
hypothesis.
Conclusion:
 At 5% significance level, we conclude that EDUCATIONAL is a useful predictor.

(h) Functional
Decision:
 The significance value of FUNCTIONAL is 0.709, which is larger than 0.05, the level of
significance for the test.
 Therefore, we do not reject the null hypothesis that the mean of FUNCTIONAL for
subscriber and otherwise are the same.
 The significance value of 0.709 indicates that there is insufficient evidence to reject the
null hypothesis.
Conclusion:
 At 5% significance level, we conclude that FUNCTIONAL is not a useful predictor.

36
Table 22 shows the correlation coefficients (pooled within-groups matrices).
Table 22: Pooled Within-Groups Matrices

Age Income Marital Profession Gender Cost Educational Functional

Age 1.000 .004 .108 -.012 -.023 .014 -.042 .030

Income .004 1.000 -.095 -.101 .241 .163 .075 -.039

Marital .108 -.095 1.000 -.301 -.019 -.079 -.266 -.099

Profession -.012 -.101 -.301 1.000 -.024 .028 .031 .131


Correlation
Gender -.023 .241 -.019 -.024 1.000 .022 .159 .015

Cost .014 .163 -.079 .028 .022 1.000 .084 -.046

Educational -.042 .075 -.266 .031 .159 .084 1.000 -.013

Functional .030 -.039 -.099 .131 .015 -.046 -.013 1.000

Based on the Table 22, it can be observe that:


 The largest relevant absolute correlation is discovered between PROFESSION and
MARITAL, which is 0.301. The largest relevant correlation seems relatively small, thus there
is no problem of multi-collinearity.

Table 23 shows canonical correlation.


Table 23: Eigenvalues

Function Eigenvalue % of Variance Cumulative % Canonical


Correlation

1 1.705a 100.0 100.0 .794

a. First 1 canonical discriminant functions were used in the analysis.

Based on Table 23, it can be observe that:


 The discriminant function explains 63.04 % (0.7942) of the variation in the model.
 The large value suggests the relative importance of the predictor variables.
 The eigenvalue of 1.705 seems to be relatively big, which supports the usefulness of the
model.
 We can conclude that the model may be useful because it can explain a relatively large
percentage of the variation in the whole model.

37
Table 24 shows the test of the discriminant function.
Table 24: Wilks' Lambda

Test of Function(s) Wilks' Lambda Chi-square df Sig.

1 .370 143.273 8 .000

H0: The discriminant function is not significant.


H1: The discriminant function is significant.

Decision:
 Based on Table 24, the significance value of the test is 0.000, which is smaller than 0.05,
the level of significance for the test.
 Therefore, we reject the null hypothesis that the discriminant function is not significant.
 The significance value of 0.000 indicates that there is strong evidence against the null
hypothesis.
Conclusion:
 At 5% significance level, we have sufficient evidence to conclude that the discriminant
function is significant.

38
Table 25 shows the standardized canonical discriminant function coefficients.

Table 25: Standardized Canonical Discriminant Function


Coefficients

Function

Age .848
Income -.379
Marital -.335
Profession -.114
Gender .027
Cost .295
Educational -.311
Functional -.026

Based on Table 25, it can be observed that:


 The coefficient of AGE is 0.848, which considered as the highest coefficient of the eight
variables. Therefore, AGE is the most important variable.
 The absolute coefficient of INCOME is 0.379, which considered as the second highest
coefficient of the eight variables. Therefore, INCOME is the second important variable.
 The absolute coefficient of MARITAL is 0.335, which considered as the third highest
coefficient of the eight variables. Therefore, MARITAL is the third important variable.
 The absolute coefficient of EDUCATIONAL is 0.311, which considered as the fourth
highest coefficient of the eight variables. Therefore, EDUCATIONAL is the fourth
important variable.
 The coefficient of COST is 0.295, which considered as the fifth highest coefficient of the
eight variables. Therefore, COST is the fifth important variable.
 The absolute coefficient of PROFESSION is 0.114, which considered as the sixth highest
coefficient of the eight variables. Therefore, PROFESSION is the sixth important
variable.
 The coefficient of GENDER is 0.027, which considered as the seventh highest coefficient
of the eight variables. Therefore, GENDER is the seventh important variable.
 The absolute coefficient of FUNCTIONAL is 0.026, which considered as the smallest
coefficient of the eight variables. Therefore, FUNCTIONAL is the least important
variable.

39
Table 25 shows the unstandardized canonical discriminant function coefficients.

Table 26: Canonical Discriminant Function Coefficients

Function

Age .094
Income -.052
Marital -.681
Profession -.279
Gender .060
Cost .309
Educational -.334
Functional -.027
(Constant) -.530

Unstandardized coefficients

The discriminant function is given by:


Subscriber = β0 + β1 AGE + β2 INCOME + β3 MARITAL + β4 PROFESSION + β5 GENDER +
β6 COST + β7 EDUCATIONAL + β8 FUNCTIONAL

The estimated equation from Table 26 is:


Subscriber = – 0.530 + 0.094 AGE – 0.052 INCOME – 0.681 MARITAL – 0.279 PROFESSION
+ 0.060 GENDER + 0.309 COST – 0.334 EDUCATIONAL – 0.027 FUNCTIONAL

Table 27 shows the unstandardized canonical discriminant function evaluated at group means.
Table 27: Functions at Group Centroids

Subscriber Function

Otherwise 1.463
Subscriber -1.150

Unstandardized canonical discriminant functions evaluated at group means

Based on Table 27, the average centroid (mid-point) can be calculate as follows:
Average centroid = (1.463-1.150) / 2
= 0.313 / 2
= 0.1565

40
Table 28 shows the hit ratio classification results.

Table 28: Classification Resultsa,b

Subscriber Predicted Group Membership Total

Otherwise Subscriber

Otherwise 54 12 66
Count
Subscriber 4 80 84
Cases Selected Original
Otherwise 81.8 18.2 100.0
%
Subscriber 4.8 95.2 100.0
Otherwise 12 8 20
Count
Subscriber 0 30 30
Cases Not Selected Original
Otherwise 60.0 40.0 100.0
%
Subscriber .0 100.0 100.0

a. 89.3% of selected original grouped cases correctly classified.


b. 84.0% of unselected original grouped cases correctly classified.

Based on Table 28, it can be interpret that:

 The hit ratio for the analysis sample is 89.3%, while the hit ratio for the held out sample
is 84.0%.
 The hit ratio for the analysis sample is high as it is above 80%, almost reaching 90%.
 On the other hand, the held out sample is relatively high as it is above 80%.
 This indicates that the discriminant function appears to be very satisfactory.

41
Question 15

By comparing the two discriminants model, it can be observe that:


 Discriminant function Model 1 shows 58.83% (0.7672) of the variation in the model.
Conversely, discriminant function Model 2 shows 63.04 % (0.7942) of the variation in the
model.
 The eigenvalue in the discriminant Model 1 is 1.429. It seems to be relatively big, which
supports the usefulness of the model. Conversely, the eigenvalue in discriminant Model 2 of
1.705, which also supports the usefulness of the model seems to be larger compared to
discriminant Model 1.
 We conclude that discriminant Model 1 may be useful because it can explain a relatively
large percentage of the variation in the whole model.
 However, we conclude that discriminant Model 2 may be more useful because it can explain
a larger percentage compared to discriminant Model 1 of the variation in the whole model.
 Discriminant Model 1 shows that the hit ratio for the analysis sample is 88.7%, while the hit
ratio for the held out sample is 74.0%. The hit ratio for the analysis sample is high as it is
above 80%. On the other hand, the held out sample is not relatively high as it is lower than
80%. This indicates that discriminant function Model 1 does not appear to be very
satisfactory.
 Discriminant Model 2 shows that the hit ratio for the analysis sample is 89.3%, while the hit
ratio for the held out sample is 84.0%. The hit ratio for the analysis sample is high as it is
above 80%, almost reaching 90%. On the other hand, the held out sample is high as it is
above 80%. This indicates that the discriminant function appears to be very satisfactory.

Conclusion:

In conclusion, we conclude that discriminant Model 2 is a better model. Therefore, the use of
factor analysis has improved the discriminant analysis.

42

Вам также может понравиться