BEO2255 Applied Statistics For Business Assignment 3: Name Vu Id

BEO2255
APPLIED STATISTICS FOR BUSINESS
ASSIGNMENT 3
NAME VU ID
BEH YANG CHENG 4520339
JAMIE OOI ZHI YUAN 4538241
TEE HUEY YEE 4520382
TUTOR’S NAME: MS. MOY TOW YOON
DUE DATE: 19th MAY 2016
DATE SUBMITTED: 19th MAY 2016
1
Part A
Question 1
The regression Model is shown as below:
WEEKLYRENT= β0 + β1 BEDROOMS + β2 BATHROOMS + β3 GARAGE + β4 LOCATION +

β5 TYPE + ε
LOCATION and TYPE are dummy variables. LOCATION 1 (City) and TYPE 1 (House) are the
base level.
Table 1 below shows the estimated regression model for the regression model above.
Table 1: Coefficientsa
Model Unstandardized Standardized t Sig.
Coefficients Coefficients
B Std. Error Beta
(Constant) 271.904 28.392 9.577 .000
Bedrooms 21.538 9.512 .231 2.264 .026
Bathrooms 62.796 13.831 .409 4.540 .000
1
Garage 15.533 9.844 .124 1.578 .118
Location -35.011 17.208 -.165 -2.035 .045
Type 36.390 25.024 .139 1.454 .149
a. Dependent Variable: WeeklyRent
The estimated Equation for the regression model is:

WEEKLYRENT= 271.904 + 21.538 BEDROOMS + 62.796 BATHROOMS + 15.533
GARAGE – 35.011 LOCATION + 36.390 TYPE
2
Question 2
WEEKLYRENT= β0 + β1 BEDROOMS + β2 BATHROOMS + β3 GARAGE + β4 LOCATION +

β5 TYPE + ε
Based on experience or theories, the a priori signs for each coefficient of the regression model
are:
1. β1 should be positive because when BEDROOMS (the number of BEDROOMS)

increases, the landlord can accommodate more tenants in the property as it is bigger (per
square meters) and more spacious. Hence, the WEEKLYRENT of the property will
increase. Therefore, the a priori sign of BEDROOMS is positive. From the SPSS output,
the estimated sign for BEDROOMS is positive, same as the a priori sign.
2. β2 should be positive because when BATHROOMS (the number of BATHROOMS)

increases, the tenants will find it more convenient living in the property. Hence, the
WEEKLYRENT of the property will increase. Therefore, the a priori sign of
BATHROOMS is positive. From the SPSS output, the estimated sign for BATHROOMS
is positive, same as the a priori sign.
3. β3 should be positive because when GARAGE (the number of car spaces) increases, the
more cars can be accommodated. Hence, the WEEKLYRENT of the property will
increase. Therefore, the a priori sign of GARAGE is positive. From the SPSS output, the
estimated sign for GARAGE is positive, same as the a priori sign.
4. β4 should be positive because when the property is located in a city, the weekly rental will
rise because it is more convenient to be located in a city as the facilities provided are
more efficient compared to suburb locations. Therefore, the a priori sign of LOCATION
is positive. However, from the SPSS output, the estimated sign for LOCATION is
negative, different as the a priori sign.
3
5. β5 should be positive because when TYPE (the type of property) is a HOUSE, the size of
the house is bigger, more spacious and can accommodate more people compared to the
UNIT type of property. Hence, the WEEKLYRENT of the property will increase.
Therefore, the a priori sign of TYPE is positive. From the SPSS output, the estimated
sign for TYPE is positive, same as the a priori sign.
4
Question 3
Based on Table 1 with reference to each of the coefficient value of the estimated Model, it can be
observed that:
BEDROOMS
Hypothesis:
H0: The number of bedrooms is not significant in explaining the weekly rent of properties.
H1: The number of bedrooms is significant in explaining the weekly rent of properties.
Decision:
1. The number of bedrooms in the property

● When the number of bedrooms in the property increase by 1 unit of bedroom,
then the weekly rent will increase by $A 21.538, on the average; while other
things being constant.
● The p-value for BEDROOMS is 0.026, which is smaller than 0.05, the level of
significance for the test.
● The significance value of 0.026 indicates that there is a strong evidence to reject
the null hypothesis at 5% level of significance.
● Therefore, we reject the null hypothesis that the number of bedrooms is not
significant in explaining the weekly rent of properties.
Conclusion:
At 5% significance level, there is sufficient evidence to conclude that the number of bedrooms is
5
BATHROOMS
H0: The number of bathrooms is not significant in explaining the weekly rent of properties.
H1: The number of bathrooms is significant in explaining the weekly rent of properties.
Decision:
1. The number of bathrooms in the property

● When the number of bathrooms in the property increases by 1 unit, the weekly
rent will increase by $A 62.796, on the average; while other things being constant.
● The p-value for BATHROOMS is 0.000, which is smaller than 0.05, the level of
● Therefore, we reject null hypothesis that the number of bathrooms is not
Conclusion:
At 5% significance level, there is sufficient evidence to conclude that the number of bedrooms is
6
GARAGE
H0: The number of garage is not significant in explaining the weekly rent of properties.
H1: The number of garage is significant in explaining the weekly rent of properties.
Decision:
1. The number of car spaces (GARAGE) in the property

● When the number of car spaces in the property increases by 1 unit, the weekly
rent will increase by $A 15.533, on the average; while other things being constant.
● The p-value for GARAGE is 0.118, which is larger than 0.05, the level of
● The significance value of 0.118 indicates that there is a strong evidence to accept
● Therefore, we accept the null hypothesis that the number of garage is not
Conclusion:
At 5% significance level, there is insufficient evidence to conclude that the number of garage is
7
LOCATION:
H0: The type of location is not significant in explaining the weekly rent of properties.
H1: The type of location is significant in explaining the weekly rent of properties.
Decision:
1. The type of location (City, Suburb)

● When the type of location is a city, then the WEEKLYRENT is less by $A 35.011
compared to suburb location, on the average; while other things being constant.
● The p-value for LOCATION is 0.045, which is smaller than 0.05, the level of
● Therefore, we reject null hypothesis that the type of location is not significant in
explaining the weekly rent of properties.
Conclusion:
At 5% significance level, there is sufficient evidence to conclude that the type of location is
8
TYPE OF PROPERTY
H0: The type of property is not significant in explaining the weekly rent of properties.
H1: The type of property is significant in explaining the weekly rent of properties.
Decision:
1. The type of property (House, Unit).

● When the type of property is a house, the WEEKLYRENT is more by $A 36.390
compared to unit type of property, on the average; while other things being
constant.
● The p-value for the type of property is 0.149, which is larger than 0.05, the level
of significance for the test.
● The significance value of 0.149 indicates that there is a strong evidence to accept
● Therefore, we accept the null hypothesis that the type of property is not
Conclusion:
At 5% significance level, there is insufficient evidence to conclude that the type of property is
9
Question 4
Table 2 & Table 3 show the summary of the regression model and the descriptive statistic for the
regression.
Table 2: Model Summaryb

Mode R R Square Adjusted R Std. Error of
l Square the Estimate
1 .695a .483 .456 74.521
a. Predictors: (Constant), Type, Garage, Bathrooms,
Location, Bedrooms
b. Dependent Variable: WeeklyRent
Table 3: Descriptive Statistics

Mean Std. Deviation N
WeeklyRent 452.33 100.995 100
Bedrooms 3.09 1.083 100
Bathrooms 1.35 .657 100
Garage 1.44 .808 100
Location .66 .476 100
Type .82 .386 100
𝑆𝐸
𝐶𝑉 = × 100%
𝑀𝑒𝑎𝑛
74.521
= × 100%
452.33
= 16.47 %
Adj R2 = 0.456 = 45.6%
10
The goodness of fit of the model:
● The adjusted R2 is 0.456. The model is moderately satisfactory for explaining 45.6% of
the variation in the weekly rent of properties.
● This means that 45.6% of the variation in WEEKLYRENT can be explained by the
variation in the independent variable (BEDROOMS, BATHROOMS, GARAGE,
LOCATION and TYPE).
● The goodness of fit of the model is considered a good model because it has a low
coefficient of variance value of 16.47% and a moderately high adjusted R2 of 0.456.
11
Question 5
H0: The model is not significant.
H1: The model is significant.
Table 4 below shows the ANOVA test for the model.
Table 4: ANOVAa
Model Sum of df Mean F Sig.
Squares Square
Regression 487783.106 5 97556.621 17.567 .000b
1 Residual 522011.004 94 5553.309
Total 1009794.110 99
a. Dependent Variable: WeeklyRent
b. Predictors: (Constant), Type, Garage, Bathrooms, Location, Bedrooms
Decision:
● The significant value of the ANOVA test is 0.000, which is smaller than 0.05, the level of
● The significance value of 0.000 indicates that there is strong evidence to reject the null
hypothesis.
● Therefore, we reject the null hypothesis that the model is not significant.
Conclusion:
At 5% significance level, we have sufficient evidence to conclude that the model is significant.
12
Question 6
a) Multi-collinearity
Table 5 shows the Pearson Correlation Test to check the multi-collinearity problems on the
regression model.
Table 5: Correlations
WeeklyRent Bedrooms Bathrooms Garage Location Type
WeeklyRent 1.000 .536 .579 .304 -.177 .401
Bedrooms .536 1.000 .509 .197 .001 .522
Bathrooms .579 .509 1.000 .316 .061 .171
Pearson Correlation
Garage .304 .197 .316 1.000 .078 .127
Location -.177 .001 .061 .078 1.000 -.336
Type .401 .522 .171 .127 -.336 1.000
WeeklyRent . .000 .000 .001 .039 .000
Bedrooms .000 . .000 .025 .495 .000
Bathrooms .000 .000 . .001 .272 .044
Sig. (1-tailed)
Garage .001 .025 .001 . .221 .104
Location .039 .495 .272 .221 . .000
Type .000 .000 .044 .104 .000 .
WeeklyRent 100 100 100 100 100 100
Bedrooms 100 100 100 100 100 100
Bathrooms 100 100 100 100 100 100
N
Garage 100 100 100 100 100 100
Location 100 100 100 100 100 100
Type 100 100 100 100 100 100
13
Based on Table 5, it can be observed that:
● The largest Pearson Correlation coefficient is found between BEDROOMS and TYPE,
which is 0.522.
● The Squared Correlation figure of 0.2725 (0.522)2 is smaller compared to the adjusted R2
of the regression model, which is 0.456. Therefore, the correlation between BEDROOMS
and TYPE show no signs of collinearity problem.
● Based on the table above, since the biggest available coefficient between pairs of
independent variables of Table 5 is not a problem, it can be concluded that the rest of the
coefficient may not have a problem. Thus, the regression does not have multi-collinearity
problem.
b) Heteroskedasticity
Figure 1: Scatterplot of Dependent Variable (WeekyRent)
14
Based on Figure 1, it can be observed that:
● By ignoring the possible outliers as shown in the circle in Figure 1, the residuals can be
seen fairly evenly distributed against the dependent variable of WEEKLYRENT.
● The distribution pattern of the residual in the regression model suggests that the model
does not have a heteroskedasticity problem.
c) Non-normality
N Minimu Maximu Skewness Kurtosis
m m (peakness)
Statistic Statistic Statistic Statistic Std. Statistic Std.
Error Error
Standardized 100 -2.34236 3.41423 .613 .241 1.282 .478
Residual
Valid N (listwise) 100
Figure 2: Histogram of Dependent Variable (WeekyRent)
15
If the data is normally distributed, the histogram should have a normal profile, the skewness
statistic should be close to zero and the kurtosis statistic should be close to zero.
Based on Table 6, there are 100 observations. It is fairly small sample size. The skewness
statistic is 0.613, which is moderately larger than 0. This indicates that this distribution is skewed
to the right. The kurtosis of the distribution of residuals is 1.282, which is larger than 0. This
shows that the distribution is high peaked.
Based on Figure 2, the histogram shows a normal profile plot and has a bell-shape curve. Besides
that, the distribution of the histogram is relatively symmetry. Therefore, it is relatively normally
distributed.
Conclusion
Based on Table 5 & 6 and Figure 1 & 2, we can conclude that the regression model does not
show non-normality problem and heteroskedasticity problem. Besides, the regression model
shows no multi-collinearity problem.
16
Question 7
The estimated equation for the regression model is:
WEEKLYRENT= 271.904 + 21.538 BEDROOMS + 62.796 BATHROOMS + 15.533
GARAGE – 35.011 LOCATION + 36.390 TYPE
Based on the information given in Question 7, the following are:

TYPE = 1 (House)
LOCATION = 0 (Suburb)
BEDROOMS = 5 (number of bedrooms)
BATHROOMS = 3 (number of bathrooms)
GARAGE = 3 (cars spaces)
After replacing it with the relevant figures given above, the result is shown as below:
WEEKLYRENT= 271.904 + 21.538 (5) + 62.796 (3) + 15.533 (3) – 35.011 (0) + 36.390 (1)
= $ 650.971
17
Question 8
The goodness of fit model considered good as the adjusted R2 demonstrates that the five
variables – BEDROOMS, BATHROOMS, GARAGE, LOCATION and TYPE of properties are
able to explain 45.6% of the variation in the weekly rent of properties. The remaining 54.4%
remains unexplained while there may be other hidden factors that affect the weekly rent of the
property.
Out of the five variables, three of the variables, which are BEDROOMS, BATHROOMS
and LOCATION strongly affect the weekly rent of the property in western suburbs of
Melbourne. We considered these variables as good predictors to predict the weekly rent of the
properties. A greater number of bedrooms and bathrooms in the property will cause the weekly
rent to be higher as it is more spacious and convenient; accommodating more people. Therefore,
the real estate agent is able to set the weekly rental at a high price. Besides that, when the
property is located in a suburb area compared to a city area, the weekly rent of the property is
greater than the city area. This is because the areas in western suburbs in Melbourne are close to
the city of Melbourne and are mainly residential area. So, demand is high as many people would
want to live there and it is also close to their workplace. Therefore, the real estate agent can set
the price higher in western suburb areas in Melbourne to compete with other real estate agents as
they are the price setter for the property.
In conclusion, these types of data and information can be extremely useful and valuable
for the real estate agent in determining the weekly rent of the property.
18
PART B
Question 9 & Question 10
Discriminant Model (1)
Table 7 shows the results for group statistics.
Table 7: Group Statistics
Subscriber Mean Std. Deviation Valid N (listwise)
Unweighted Weighted
Age 48.76 11.296 66 66.000
Income 47.02 7.449 66 66.000
Otherwise Marital .50 .504 66 66.000
Profession .21 .412 66 66.000
Gender .62 .489 66 66.000

Age 29.30 6.680 84 84.000
Income 52.70 7.133 84 84.000
Subscriber Marital .64 .482 84 84.000
Profession .20 .404 84 84.000
Gender .76 .428 84 84.000
Age 37.86 13.208 150 150.000
Income 50.20 7.783 150 150.000
Total Marital .58 .495 150 150.000
Profession .21 .406 150 150.000
Gender .70 .460 150 150.000
Based on Table 7, it can be observe that:

 The means of AGE and INCOME look different between Subscriber and Otherwise,
these variables may be predictors.
 The means of MARITAL, PROFESSION and GENDER look similar between Subscriber
and Otherwise. Therefore, these variables may not be predictors.
19
Table 8 shows the tests of equality of group means.
Table 8: Tests of Equality of Group Means
Wilks' Lambda F df1 df2 Sig.
Age .462 172.655 1 148 .000

Income .868 22.595 1 148 .000
Marital .979 3.120 1 148 .079
Profession 1.000 .021 1 148 .885
Gender .977 3.519 1 148 .063
The null hypotheses are as follow:

(a) The mean of AGE for subscriber and otherwise are the same.
(b) The mean of INCOME for subscriber and otherwise are the same.
(c) The mean of MARITAL for subscriber and otherwise are the same.
(d) The mean of PROFESSION for subscriber and otherwise are the same.
(e) The mean of GENDER for subscriber and otherwise are the same.
The corresponding alternative hypotheses are as follows:

(f) The mean of AGE for subscriber and otherwise are not the same.
(g) The mean of INCOME for subscriber and otherwise are not the same.
(h) The mean of MARITAL for subscriber and otherwise are not the same.
(i) The mean of PROFESSION for subscriber and otherwise are not the same.
(j) The mean of GENDER for subscriber and otherwise are not the same.

(a) Age
Decision:
 The significance value of AGE is 0.000, which is smaller than 0.05, the level of
 Therefore, we reject the null hypothesis that the mean of AGE for subscriber and
otherwise are the same.
 The significance value of 0.000 indicates that there is strong evidence against the null
hypothesis.
Conclusion:
 At 5% significance level, we conclude that AGE is a useful predictor.
20
(b) Income
Decision:
 The significance value of INCOME is 0.000, which is smaller than 0.05, the level of
 Therefore, we reject the null hypothesis that the mean of INCOME for subscriber and
hypothesis.
Conclusion:
 At 5% significance level, we conclude that INCOME is a useful predictor.
(c) Marital
Decision:
 The significance value of MARITAL is 0.079, which is larger than 0.05, the level of
 Therefore, we do not reject the null hypothesis that the mean of MARITAL for subscriber
and otherwise are the same.
 The significance value of 0.079 indicates that there is insufficient evidence to reject the
null hypothesis.
Conclusion:
 At 5% significance level, we conclude that MARITAL is not a useful predictor.
(d) Profession
Decision:
 The significance value of PROFESSION is 0.885, which is larger than 0.05, the level of
 Therefore, we do not reject the null hypothesis that the mean of PROFESSION for
subscriber and otherwise are the same.
null hypothesis.
Conclusion:
 At 5% significance level, we conclude that PROFESSION is not a useful predictor.
21
(e) Gender
Decision:
 The significance value of GENDER is 0.063, which is larger than 0.05, the level of
 Therefore, we do not reject the null hypothesis that the mean of GENDER for subscriber
null hypothesis.
Conclusion:
 At 5% significance level, we conclude that GENDER is not a useful predictor.
Table 9 shows the correlation coefficients (pooled within-groups matrices).

Table 9: Pooled Within-Groups Matrices
Age Income Marital Profession Gender
Age 1.000 .004 .108 -.012 -.023
Income .004 1.000 -.095 -.101 .241
Correlation Marital .108 -.095 1.000 -.301 -.019
Profession -.012 -.101 -.301 1.000 -.024
Gender -.023 .241 -.019 -.024 1.000
Based on the Table 9, it can be observe that:

 The largest relevant absolute correlation is discovered between PROFESSION and
MARITAL, which is 0.301. The largest relevant correlation seems relatively small, thus
there is no problem of multi-collinearity.
22
Table 10 shows canonical correlation.
Table 10: Eigenvalues
Function Eigenvalue % of Variance Cumulative % Canonical

Correlation
1 1.429a 100.0 100.0 .767
a. First 1 canonical discriminant functions were used in the analysis.

 The discriminant function explains 58.83% (0.7672) of the variation in the model.
 The large value suggests the relative importance of the predictor variables.
 The eigenvalue of 1.429 seems to be relatively big, which supports the usefulness of the
model.
 We can conclude that the model may be useful because it can explain a relatively large
percentage of the variation in the whole model.
Table 11 shows the test of the discriminant function.

Table 11: Wilks' Lambda
Test of Function(s) Wilks' Lambda Chi-square df Sig.
1 .412 129.143 5 .000
H0: The discriminant function is not significant.

H1: The discriminant function is significant.
Decision:
 Based on Table 11, the significance value of the test is 0.000, which is smaller than 0.05,
the level of significance for the test.
 Therefore, we reject the null hypothesis that the discriminant function is not significant.
hypothesis.
Conclusion:
 At 5% significance level, we have sufficient evidence to conclude that the discriminant
function is significant.
23
Table 12 shows the standardized canonical discriminant function coefficients.
Table 12: Standardized Canonical Discriminant Function Coefficients
Function
Age .934
Income -.362
Marital -.288
Profession -.102
Gender -.028
Based on Table 12, it can be interpret that:

 The coefficient of AGE is 0.934, which considered as the highest coefficient of the five
variables. Therefore, AGE is the most important variable.
 The absolute coefficient of INCOME is 0.362, which considered as the second highest
coefficient of the five variables. Therefore, INCOME is the second important variable.
 The absolute coefficient of MARITAL is 0.288, which considered as the third highest
coefficient of the five variables. Therefore, MARITAL is the third important variable.
 The absolute coefficient of PROFESSION is 0.102, which considered as the fourth
highest coefficient of the five variables. Therefore, PROFESSION is the fourth important
variable.
 The absolute coefficient of GENDER is 0.028, which considered as the smallest
coefficient of the five variables. Therefore, PROFESSION is the least important variable.
24
Table 13 shows the unstandardized canonical discriminant function coefficients.
Table 13: Canonical Discriminant Function Coefficients
Function
Age .104
Income -.050
Marital -.587
Profession -.251
Gender -.061
(Constant) -.996
Unstandardized coefficients
The discriminant function is given by:

Subscriber = β0 + β1 AGE + β2 INCOME + β3 MARITAL + β4 PROFESSION + β5 GENDER
The estimated equation from Table 13 is:

Subscriber = - 0.996 + 0.104 AGE - 0.050 INCOME - 0.587 MARITAL - 0.251 PROFESSION -
0.061 GENDER
Table 14 shows the unstandardized canonical discriminant function evaluated at group means.
Table 14: Functions at Group Centroids
Subscriber Function
Otherwise 1.340
Subscriber -1.053
Unstandardized canonical discriminant functions evaluated at group means
Based on Table 14, the average centroid (mid-point) can be calculated as follows:
Average centroid = (1.340-1.053) / 2

= 0.287 / 2
= 0.1435
25
Table 15 shows the hit ratio classification results.
Table 15: Classification Resultsa,b
Subscriber Predicted Group Membership Total
Otherwise Subscriber
Otherwise 55 11 66
Count
Subscriber 6 78 84
Cases Selected Original
Otherwise 83.3 16.7 100.0
%
Subscriber 7.1 92.9 100.0
Otherwise 10 10 20
Count
Subscriber 3 27 30
Cases Not Selected Original
Otherwise 50.0 50.0 100.0
%
Subscriber 10.0 90.0 100.0
a. 88.7% of selected original grouped cases correctly classified.

b. 74.0% of unselected original grouped cases correctly classified.
 The hit ratio for the analysis sample is 88.7%, while the hit ratio for the held out sample
is 74.0%.
 The hit ratio for the analysis sample is relatively high as it is above 80%.
 On the other hand, the held out sample is not relatively high as it is lower than 80%.
 This indicates that the discriminant function does not appear to be very satisfactory.
26
Question 12
The estimated equation for the regression model is:

Subscriber = - 0.996 + 0.104 AGE - 0.050 INCOME - 0.587 MARITAL - 0.251 PROFESSION -
0.061 GENDER
Based on the information given in Question 12, the following are:

Age: 40 years old
Income: $ 65,000 / $65K
Marital: 0 (Single)
Profession: 1 (Full-time)
Gender: 1 (Male)
After replacing it with the relevant figures given above, the result is shown as below:
Subscriber = - 0.996 + 0.104 (40) - 0.050 (65) - 0.587(0) - 0.251 (1) - 0.061 (1)
= - 0.398
= - $398
From the estimation above, D = - 0.398 which is smaller than the average centroid 0.1435.
Therefore, 40 years old single male in full-time employment and earns an annual income of
$65,000 is unlikely to be a subscriber to the online newspaper.
27
Part C (Factor Analysis)
Question 13 & Question 14
Table 16 shows mean and standard deviations of each variable for 200 observations from a
random sample of persons over the age of 18 who are in full-time or part-time employment.
Mean Std. Deviation Analysis N
Informative 2.48 1.378 200

Current 2.43 1.180 200
Economic 2.91 1.366 200
Mode 3.69 1.233 200
Payment 3.05 .923 200
Useful 3.79 1.235 200

 Informative and Current variables appear to be similar with quite close mean values.
Therefore, these two variables can be classified as one factor.
 Economic and Payment variables appear to be similar with quite close mean values.
Therefore, these two variables can be classified as one factor.
 Mode and Useful variables appear to be similar with quite close mean values. Therefore,
these two variables can be classified as one factor.
Table 17 shows the correlation matrix between variables.
Table 17: Correlation Matrix
Informative Current Economic Mode Payment Useful
Informative 1.000 .695 .038 -.166 -.043 -.128
Current .695 1.000 .166 .051 .063 .002
Economic .038 .166 1.000 -.044 .466 -.131

Correlation
Mode -.166 .051 -.044 1.000 .067 .767
Payment -.043 .063 .466 .067 1.000 -.052
Useful -.128 .002 -.131 .767 -.052 1.000

 The correlation between Mode and Useful is 0.767, which is very high.
 Therefore, factor analysis is recommended due to high correlation between variables.
28
Table 18 shows the total variance explained by the factors.
Table 18: Total Variance Explained
Component Initial Eigenvalues Rotation Sums of Squared Loadings
Total % of Variance Cumulative % Total % of Variance Cumulative %
1 1.915 31.925 31.925 1.795 29.911 29.911

2 1.628 27.137 59.062 1.701 28.345 58.256
3 1.440 23.997 83.059 1.488 24.803 83.059
4 .523 8.723 91.781
5 .287 4.789 96.570
6 .206 3.430 100.000
Extraction Method: Principal Component Analysis.

 Three components have eigenvalues greater than 1, which suggests the presence of three
factors.
 The first factor explains 31.925% of the variation in the responses.
 The second factor explains 27.137% of the variation in the responses.
 The third factor explains 23.997% of the variation in the responses.
 In total, the first three factors explain 83.059% of the variation in the responses.
29
Figure 3: Scree Plot of the 6 Factors
Based on Figure 3, the scree plot profile shows that three factors are doing most of the work,
since the plot levels out after the first three factors.
30
Table 19 shows the rotated component matrix of the factors.
Table 19: Rotated Component Matrixa
Component
1 2 3
Informative -.143 .916 -.057

Current .079 .921 .122
Economic -.088 .098 .847
Mode .943 -.031 .050
Payment .043 -.038 .861
Useful .932 -.031 -.098
Extraction Method: Principal Component Analysis.

Rotation Method: Varimax with Kaiser
Normalization.
a. Rotation converged in 4 iterations.

 The variables Mode and Useful load heavily onto Factor 1.
 The variables Informative, Current, Economic and Payment do not load heavily onto
Factor 1.
 This suggests that Factor 1 is something like a ‘Cost’ factor.
 The variables Informative and Current load heavily onto Factor 2.

 The variables Economic, Mode, Payment and Useful do not load heavily onto Factor 2.
 This suggests that Factor 2 is something like a ‘Educational’ factor.
 The variables Economic and Payment load heavily onto Factor 3.

 The variables Informative, Current, Mode and Useful do not load heavily onto Factor 3.
 This suggests that Factor 3 is something like a ‘Functional’ factor.
31
Question15
Discriminant Model (2)
Table 20 shows the group statistics.
Table 20: Group Statistics
Subscriber Mean Std. Deviation Valid N (listwise)
Unweighted Weighted
Age 48.7575758 11.29609954 66 66.000
Income 47.0151515 7.44930028 66 66.000
Marital .5000000 .50383147 66 66.000
Profession .2121212 .41194292 66 66.000

Otherwise
Gender .6212121 .48880235 66 66.000
Cost .2883733 .80274019 66 66.000
Educational -.3671864 .84978613 66 66.000
Functional .0411726 .90975376 66 66.000

Age 29.2976190 6.67991011 84 84.000
Income 52.7023810 7.13346054 84 84.000
Marital .6428571 .48203527 84 84.000
Profession .2023810 .40418777 84 84.000
Subscriber
Gender .7619048 .42847580 84 84.000
Cost -.3209875 1.05707360 84 84.000
Educational .2655815 .98969432 84 84.000
Functional -.0176387 .98875186 84 84.000
Age 37.8600000 13.20820876 150 150.000
Income 50.2000000 7.78313441 150 150.000
Marital .5800000 .49521197 150 150.000
Profession .2066667 .40627076 150 150.000

Total
Gender .7000000 .45979278 150 150.000
Cost -.0528687 .99783006 150 150.000
Educational -.0128364 .97978065 150 150.000

Functional .0082383 .95210300 150 150.000

 The means of AGE, INCOME, COST and EDUCATIONAL look different between
Subscriber and Otherwise, these variables may be predictors.
 The means of MARITAL, PROFESSION, GENDER and FUNCTIONAL looks similar
between Subscriber and Otherwise. Therefore, these variables may not be the predictors.
32
Table 21 shows the test of equality of group means.
Table 21: Tests of Equality of Group Means
Wilks' Lambda F df1 df2 Sig.
Age .462 172.655 1 148 .000

Income .868 22.595 1 148 .000
Marital .979 3.120 1 148 .079
Profession 1.000 .021 1 148 .885
Gender .977 3.519 1 148 .063
Cost .907 15.087 1 148 .000
Educational .897 17.079 1 148 .000
Functional .999 .140 1 148 .709
The null hypotheses are as follow:

(a) The mean of AGE for subscriber and otherwise are the same.
(b) The mean of INCOME for subscriber and otherwise are the same.
(c) The mean of MARITAL for subscriber and otherwise are the same.
(d) The mean of PROFESSION for subscriber and otherwise are the same.
(e) The mean of GENDER for subscriber and otherwise are the same.
(f) The mean of COST for subscriber and otherwise are the same.
(g) The mean of EDUCATIONAL for subscriber and otherwise are the same.
(h) The mean of FUNCTIONAL for subscriber and otherwise are the same.
The corresponding alternative hypotheses are as follows:

(a) The mean of AGE for subscriber and otherwise are not the same.
(b) The mean of INCOME for subscriber and otherwise are not the same.
(c) The mean of MARITAL for subscriber and otherwise are not the same.
(d) The mean of PROFESSION for subscriber and otherwise are not the same.
(e) The mean of GENDER for subscriber and otherwise are not the same.
(f) The mean of COST for subscriber and otherwise are not the same.
(g) The mean of EDUCATIONAL for subscriber and otherwise are not the same.
(h) The mean of FUNCTIONAL for subscriber and otherwise are not the same.
33
(a) Age
Decision:
 The significance value of AGE is 0.000, which is smaller than 0.05, the level of
hypothesis.
Conclusion:
 At 5% significance level, we conclude that AGE is a useful predictor.
(b) Income
Decision:
 The significance value of INCOME is 0.000, which is smaller than 0.05, the level of
 Therefore, we reject the null hypothesis that the mean of INCOME for subscriber and
hypothesis.
Conclusion:
 At 5% significance level, we conclude that INCOME is a useful predictor.
(c) Marital
Decision:
 The significance value of MARITAL is 0.079, which is larger than 0.05, the level of
 Therefore, we do not reject the null hypothesis that the mean of MARITAL for subscriber
null hypothesis.
Conclusion:
 At 5% significance level, we conclude that MARITAL is not a useful predictor.
34
(d) Profession
Decision:
 The significance value of PROFESSION is 0.885, which is larger than 0.05, the level of
 Therefore, we do not reject the null hypothesis that the mean of PROFESSION for
null hypothesis.
Conclusion:
 At 5% significance level, we conclude that PROFESSION is not a useful predictor.
(e) Gender
Decision:
 The significance value of GENDER is 0.063, which is larger than 0.05, the level of
 Therefore, we do not reject the null hypothesis that the mean of GENDER for subscriber
null hypothesis.
Conclusion:
 At 5% significance level, we conclude that GENDER is not a useful predictor.
(f) Cost
Decision:
 The significance value of COST is 0.000, which is smaller than 0.05, the level of
hypothesis.
Conclusion:
 At 5% significance level, we conclude that COST is a useful predictor.
35
(g) Educational
Decision:
 The significance value of EDUCATIONAL is 0.000, which is smaller than 0.05, the level
of significance for the test.
 Therefore, we reject the null hypothesis that the mean of EDUCATIONAL for subscriber
hypothesis.
Conclusion:
 At 5% significance level, we conclude that EDUCATIONAL is a useful predictor.
(h) Functional
Decision:
 The significance value of FUNCTIONAL is 0.709, which is larger than 0.05, the level of
 Therefore, we do not reject the null hypothesis that the mean of FUNCTIONAL for
null hypothesis.
Conclusion:
 At 5% significance level, we conclude that FUNCTIONAL is not a useful predictor.
36
Table 22 shows the correlation coefficients (pooled within-groups matrices).
Table 22: Pooled Within-Groups Matrices
Age Income Marital Profession Gender Cost Educational Functional
Age 1.000 .004 .108 -.012 -.023 .014 -.042 .030
Income .004 1.000 -.095 -.101 .241 .163 .075 -.039
Marital .108 -.095 1.000 -.301 -.019 -.079 -.266 -.099
Profession -.012 -.101 -.301 1.000 -.024 .028 .031 .131

Correlation
Gender -.023 .241 -.019 -.024 1.000 .022 .159 .015
Cost .014 .163 -.079 .028 .022 1.000 .084 -.046
Educational -.042 .075 -.266 .031 .159 .084 1.000 -.013
Functional .030 -.039 -.099 .131 .015 -.046 -.013 1.000
Based on the Table 22, it can be observe that:

 The largest relevant absolute correlation is discovered between PROFESSION and
MARITAL, which is 0.301. The largest relevant correlation seems relatively small, thus there
is no problem of multi-collinearity.
Table 23 shows canonical correlation.

Table 23: Eigenvalues
Function Eigenvalue % of Variance Cumulative % Canonical

Correlation
1 1.705a 100.0 100.0 .794
a. First 1 canonical discriminant functions were used in the analysis.

 The discriminant function explains 63.04 % (0.7942) of the variation in the model.
 The large value suggests the relative importance of the predictor variables.
 The eigenvalue of 1.705 seems to be relatively big, which supports the usefulness of the
model.
 We can conclude that the model may be useful because it can explain a relatively large
percentage of the variation in the whole model.
37
Table 24 shows the test of the discriminant function.
Table 24: Wilks' Lambda
Test of Function(s) Wilks' Lambda Chi-square df Sig.
1 .370 143.273 8 .000
H0: The discriminant function is not significant.

H1: The discriminant function is significant.
Decision:
 Based on Table 24, the significance value of the test is 0.000, which is smaller than 0.05,
the level of significance for the test.
 Therefore, we reject the null hypothesis that the discriminant function is not significant.
hypothesis.
Conclusion:
 At 5% significance level, we have sufficient evidence to conclude that the discriminant
function is significant.
38
Table 25 shows the standardized canonical discriminant function coefficients.
Table 25: Standardized Canonical Discriminant Function

Coefficients
Function
Age .848
Income -.379
Marital -.335
Profession -.114
Gender .027
Cost .295
Educational -.311
Functional -.026

 The coefficient of AGE is 0.848, which considered as the highest coefficient of the eight
variables. Therefore, AGE is the most important variable.
 The absolute coefficient of INCOME is 0.379, which considered as the second highest
coefficient of the eight variables. Therefore, INCOME is the second important variable.
 The absolute coefficient of MARITAL is 0.335, which considered as the third highest
coefficient of the eight variables. Therefore, MARITAL is the third important variable.
 The absolute coefficient of EDUCATIONAL is 0.311, which considered as the fourth
highest coefficient of the eight variables. Therefore, EDUCATIONAL is the fourth
important variable.
 The coefficient of COST is 0.295, which considered as the fifth highest coefficient of the
eight variables. Therefore, COST is the fifth important variable.
 The absolute coefficient of PROFESSION is 0.114, which considered as the sixth highest
coefficient of the eight variables. Therefore, PROFESSION is the sixth important
variable.
 The coefficient of GENDER is 0.027, which considered as the seventh highest coefficient
of the eight variables. Therefore, GENDER is the seventh important variable.
 The absolute coefficient of FUNCTIONAL is 0.026, which considered as the smallest
coefficient of the eight variables. Therefore, FUNCTIONAL is the least important
variable.
39
Table 25 shows the unstandardized canonical discriminant function coefficients.
Table 26: Canonical Discriminant Function Coefficients
Function
Age .094
Income -.052
Marital -.681
Profession -.279
Gender .060
Cost .309
Educational -.334
Functional -.027
(Constant) -.530
Unstandardized coefficients
The discriminant function is given by:

Subscriber = β0 + β1 AGE + β2 INCOME + β3 MARITAL + β4 PROFESSION + β5 GENDER +
β6 COST + β7 EDUCATIONAL + β8 FUNCTIONAL
The estimated equation from Table 26 is:

Subscriber = – 0.530 + 0.094 AGE – 0.052 INCOME – 0.681 MARITAL – 0.279 PROFESSION
+ 0.060 GENDER + 0.309 COST – 0.334 EDUCATIONAL – 0.027 FUNCTIONAL
Table 27 shows the unstandardized canonical discriminant function evaluated at group means.
Table 27: Functions at Group Centroids
Subscriber Function
Otherwise 1.463
Subscriber -1.150
Unstandardized canonical discriminant functions evaluated at group means
Based on Table 27, the average centroid (mid-point) can be calculate as follows:
Average centroid = (1.463-1.150) / 2
= 0.313 / 2
= 0.1565
40
Table 28 shows the hit ratio classification results.
Table 28: Classification Resultsa,b
Subscriber Predicted Group Membership Total
Otherwise Subscriber
Otherwise 54 12 66
Count
Subscriber 4 80 84
Cases Selected Original
Otherwise 81.8 18.2 100.0
%
Subscriber 4.8 95.2 100.0
Otherwise 12 8 20
Count
Subscriber 0 30 30
Cases Not Selected Original
Otherwise 60.0 40.0 100.0
%
Subscriber .0 100.0 100.0
a. 89.3% of selected original grouped cases correctly classified.

b. 84.0% of unselected original grouped cases correctly classified.
 The hit ratio for the analysis sample is 89.3%, while the hit ratio for the held out sample
is 84.0%.
 The hit ratio for the analysis sample is high as it is above 80%, almost reaching 90%.
 On the other hand, the held out sample is relatively high as it is above 80%.
 This indicates that the discriminant function appears to be very satisfactory.
41
Question 15
By comparing the two discriminants model, it can be observe that:

 Discriminant function Model 1 shows 58.83% (0.7672) of the variation in the model.
Conversely, discriminant function Model 2 shows 63.04 % (0.7942) of the variation in the
model.
 The eigenvalue in the discriminant Model 1 is 1.429. It seems to be relatively big, which
supports the usefulness of the model. Conversely, the eigenvalue in discriminant Model 2 of
1.705, which also supports the usefulness of the model seems to be larger compared to
discriminant Model 1.
 We conclude that discriminant Model 1 may be useful because it can explain a relatively
large percentage of the variation in the whole model.
 However, we conclude that discriminant Model 2 may be more useful because it can explain
a larger percentage compared to discriminant Model 1 of the variation in the whole model.
 Discriminant Model 1 shows that the hit ratio for the analysis sample is 88.7%, while the hit
ratio for the held out sample is 74.0%. The hit ratio for the analysis sample is high as it is
above 80%. On the other hand, the held out sample is not relatively high as it is lower than
80%. This indicates that discriminant function Model 1 does not appear to be very
satisfactory.
 Discriminant Model 2 shows that the hit ratio for the analysis sample is 89.3%, while the hit
ratio for the held out sample is 84.0%. The hit ratio for the analysis sample is high as it is
above 80%, almost reaching 90%. On the other hand, the held out sample is high as it is
above 80%. This indicates that the discriminant function appears to be very satisfactory.
Conclusion:
In conclusion, we conclude that discriminant Model 2 is a better model. Therefore, the use of
factor analysis has improved the discriminant analysis.
42

BEO2255 Applied Statistics For Business Assignment 3: Name Vu Id

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

BEO2255 Applied Statistics For Business Assignment 3: Name Vu Id

Загружено:

Авторское право:

Доступные форматы

BEO2255

APPLIED STATISTICS FOR BUSINESS

JAMIE OOI ZHI YUAN 4538241

TEE HUEY YEE 4520382

TUTOR’S NAME: MS. MOY TOW YOON

DUE DATE: 19th MAY 2016

DATE SUBMITTED: 19th MAY 2016

The regression Model is shown as below:

WEEKLYRENT= β0 + β1 BEDROOMS + β2 BATHROOMS + β3 GARAGE + β4 LOCATION +

The estimated Equation for the regression model is:

WEEKLYRENT= β0 + β1 BEDROOMS + β2 BATHROOMS + β3 GARAGE + β4 LOCATION +

1. β1 should be positive because when BEDROOMS (the number of BEDROOMS)

2. β2 should be positive because when BATHROOMS (the number of BATHROOMS)

1. The number of bedrooms in the property

1. The number of bathrooms in the property

1. The number of car spaces (GARAGE) in the property

1. The type of location (City, Suburb)

1. The type of property (House, Unit).

Table 2: Model Summaryb

Table 3: Descriptive Statistics

Adj R2 = 0.456 = 45.6%

H1: The model is significant.

Table 4 below shows the ANOVA test for the model.

Figure 1: Scatterplot of Dependent Variable (WeekyRent)

Figure 2: Histogram of Dependent Variable (WeekyRent)

Based on the information given in Question 7, the following are:

Question 9 & Question 10

Discriminant Model (1)

Table 7 shows the results for group statistics.

Table 7: Group Statistics

Subscriber Mean Std. Deviation Valid N (listwise)

Age 48.76 11.296 66 66.000

Income 47.02 7.449 66 66.000

Otherwise Marital .50 .504 66 66.000

Profession .21 .412 66 66.000

Gender .62 .489 66 66.000

Income 50.20 7.783 150 150.000

Total Marital .58 .495 150 150.000

Profession .21 .406 150 150.000

Gender .70 .460 150 150.000

Based on Table 7, it can be observe that:

Table 8: Tests of Equality of Group Means

Wilks' Lambda F df1 df2 Sig.

Age .462 172.655 1 148 .000

The null hypotheses are as follow:

The corresponding alternative hypotheses are as follows:

Based on Table 8, it can be observed that:

Table 9 shows the correlation coefficients (pooled within-groups matrices).

Age Income Marital Profession Gender

Age 1.000 .004 .108 -.012 -.023

Income .004 1.000 -.095 -.101 .241

Correlation Marital .108 -.095 1.000 -.301 -.019

Profession -.012 -.101 -.301 1.000 -.024

Gender -.023 .241 -.019 -.024 1.000

Based on the Table 9, it can be observe that:

Function Eigenvalue % of Variance Cumulative % Canonical

1 1.429a 100.0 100.0 .767

a. First 1 canonical discriminant functions were used in the analysis.

Based on Table 10, it can be observe that:

Table 11 shows the test of the discriminant function.

Test of Function(s) Wilks' Lambda Chi-square df Sig.

1 .412 129.143 5 .000