Академический Документы
Профессиональный Документы
Культура Документы
ASSIGNMENT 3
NAME VU ID
BEH YANG CHENG 4520339
1
Part A
Question 1
LOCATION and TYPE are dummy variables. LOCATION 1 (City) and TYPE 1 (House) are the
base level.
Table 1 below shows the estimated regression model for the regression model above.
Table 1: Coefficientsa
Model Unstandardized Standardized t Sig.
Coefficients Coefficients
B Std. Error Beta
(Constant) 271.904 28.392 9.577 .000
Bedrooms 21.538 9.512 .231 2.264 .026
Bathrooms 62.796 13.831 .409 4.540 .000
1
Garage 15.533 9.844 .124 1.578 .118
Location -35.011 17.208 -.165 -2.035 .045
Type 36.390 25.024 .139 1.454 .149
a. Dependent Variable: WeeklyRent
2
Question 2
Based on experience or theories, the a priori signs for each coefficient of the regression model
are:
3. β3 should be positive because when GARAGE (the number of car spaces) increases, the
more cars can be accommodated. Hence, the WEEKLYRENT of the property will
increase. Therefore, the a priori sign of GARAGE is positive. From the SPSS output, the
estimated sign for GARAGE is positive, same as the a priori sign.
4. β4 should be positive because when the property is located in a city, the weekly rental will
rise because it is more convenient to be located in a city as the facilities provided are
more efficient compared to suburb locations. Therefore, the a priori sign of LOCATION
is positive. However, from the SPSS output, the estimated sign for LOCATION is
negative, different as the a priori sign.
3
5. β5 should be positive because when TYPE (the type of property) is a HOUSE, the size of
the house is bigger, more spacious and can accommodate more people compared to the
UNIT type of property. Hence, the WEEKLYRENT of the property will increase.
Therefore, the a priori sign of TYPE is positive. From the SPSS output, the estimated
sign for TYPE is positive, same as the a priori sign.
4
Question 3
Based on Table 1 with reference to each of the coefficient value of the estimated Model, it can be
observed that:
BEDROOMS
Hypothesis:
H0: The number of bedrooms is not significant in explaining the weekly rent of properties.
H1: The number of bedrooms is significant in explaining the weekly rent of properties.
Decision:
At 5% significance level, there is sufficient evidence to conclude that the number of bedrooms is
significant in explaining the weekly rent of properties.
5
BATHROOMS
H0: The number of bathrooms is not significant in explaining the weekly rent of properties.
H1: The number of bathrooms is significant in explaining the weekly rent of properties.
Decision:
At 5% significance level, there is sufficient evidence to conclude that the number of bedrooms is
significant in explaining the weekly rent of properties.
6
GARAGE
H0: The number of garage is not significant in explaining the weekly rent of properties.
H1: The number of garage is significant in explaining the weekly rent of properties.
Decision:
At 5% significance level, there is insufficient evidence to conclude that the number of garage is
significant in explaining the weekly rent of properties.
7
LOCATION:
H0: The type of location is not significant in explaining the weekly rent of properties.
H1: The type of location is significant in explaining the weekly rent of properties.
Decision:
At 5% significance level, there is sufficient evidence to conclude that the type of location is
significant in explaining the weekly rent of properties.
8
TYPE OF PROPERTY
H0: The type of property is not significant in explaining the weekly rent of properties.
H1: The type of property is significant in explaining the weekly rent of properties.
Decision:
At 5% significance level, there is insufficient evidence to conclude that the type of property is
significant in explaining the weekly rent of properties.
9
Question 4
Table 2 & Table 3 show the summary of the regression model and the descriptive statistic for the
regression.
𝑆𝐸
𝐶𝑉 = × 100%
𝑀𝑒𝑎𝑛
74.521
= × 100%
452.33
= 16.47 %
10
The goodness of fit of the model:
● The adjusted R2 is 0.456. The model is moderately satisfactory for explaining 45.6% of
the variation in the weekly rent of properties.
● This means that 45.6% of the variation in WEEKLYRENT can be explained by the
variation in the independent variable (BEDROOMS, BATHROOMS, GARAGE,
LOCATION and TYPE).
● The goodness of fit of the model is considered a good model because it has a low
coefficient of variance value of 16.47% and a moderately high adjusted R2 of 0.456.
11
Question 5
H0: The model is not significant.
Table 4: ANOVAa
Model Sum of df Mean F Sig.
Squares Square
Regression 487783.106 5 97556.621 17.567 .000b
1 Residual 522011.004 94 5553.309
Total 1009794.110 99
a. Dependent Variable: WeeklyRent
b. Predictors: (Constant), Type, Garage, Bathrooms, Location, Bedrooms
Decision:
● The significant value of the ANOVA test is 0.000, which is smaller than 0.05, the level of
significance for the test.
● The significance value of 0.000 indicates that there is strong evidence to reject the null
hypothesis.
● Therefore, we reject the null hypothesis that the model is not significant.
Conclusion:
At 5% significance level, we have sufficient evidence to conclude that the model is significant.
12
Question 6
a) Multi-collinearity
Table 5 shows the Pearson Correlation Test to check the multi-collinearity problems on the
regression model.
Table 5: Correlations
WeeklyRent Bedrooms Bathrooms Garage Location Type
WeeklyRent 1.000 .536 .579 .304 -.177 .401
Bedrooms .536 1.000 .509 .197 .001 .522
Bathrooms .579 .509 1.000 .316 .061 .171
Pearson Correlation
Garage .304 .197 .316 1.000 .078 .127
Location -.177 .001 .061 .078 1.000 -.336
Type .401 .522 .171 .127 -.336 1.000
WeeklyRent . .000 .000 .001 .039 .000
Bedrooms .000 . .000 .025 .495 .000
Bathrooms .000 .000 . .001 .272 .044
Sig. (1-tailed)
Garage .001 .025 .001 . .221 .104
Location .039 .495 .272 .221 . .000
Type .000 .000 .044 .104 .000 .
WeeklyRent 100 100 100 100 100 100
Bedrooms 100 100 100 100 100 100
Bathrooms 100 100 100 100 100 100
N
Garage 100 100 100 100 100 100
Location 100 100 100 100 100 100
Type 100 100 100 100 100 100
13
Based on Table 5, it can be observed that:
● The largest Pearson Correlation coefficient is found between BEDROOMS and TYPE,
which is 0.522.
● The Squared Correlation figure of 0.2725 (0.522)2 is smaller compared to the adjusted R2
of the regression model, which is 0.456. Therefore, the correlation between BEDROOMS
and TYPE show no signs of collinearity problem.
● Based on the table above, since the biggest available coefficient between pairs of
independent variables of Table 5 is not a problem, it can be concluded that the rest of the
coefficient may not have a problem. Thus, the regression does not have multi-collinearity
problem.
b) Heteroskedasticity
14
Based on Figure 1, it can be observed that:
● By ignoring the possible outliers as shown in the circle in Figure 1, the residuals can be
seen fairly evenly distributed against the dependent variable of WEEKLYRENT.
● The distribution pattern of the residual in the regression model suggests that the model
does not have a heteroskedasticity problem.
c) Non-normality
Table 6: Descriptive Statistics
N Minimu Maximu Skewness Kurtosis
m m (peakness)
Statistic Statistic Statistic Statistic Std. Statistic Std.
Error Error
Standardized 100 -2.34236 3.41423 .613 .241 1.282 .478
Residual
Valid N (listwise) 100
15
If the data is normally distributed, the histogram should have a normal profile, the skewness
statistic should be close to zero and the kurtosis statistic should be close to zero.
Based on Table 6, there are 100 observations. It is fairly small sample size. The skewness
statistic is 0.613, which is moderately larger than 0. This indicates that this distribution is skewed
to the right. The kurtosis of the distribution of residuals is 1.282, which is larger than 0. This
shows that the distribution is high peaked.
Based on Figure 2, the histogram shows a normal profile plot and has a bell-shape curve. Besides
that, the distribution of the histogram is relatively symmetry. Therefore, it is relatively normally
distributed.
Conclusion
Based on Table 5 & 6 and Figure 1 & 2, we can conclude that the regression model does not
show non-normality problem and heteroskedasticity problem. Besides, the regression model
shows no multi-collinearity problem.
16
Question 7
The estimated equation for the regression model is:
WEEKLYRENT= 271.904 + 21.538 BEDROOMS + 62.796 BATHROOMS + 15.533
GARAGE – 35.011 LOCATION + 36.390 TYPE
After replacing it with the relevant figures given above, the result is shown as below:
WEEKLYRENT= 271.904 + 21.538 (5) + 62.796 (3) + 15.533 (3) – 35.011 (0) + 36.390 (1)
= $ 650.971
17
Question 8
The goodness of fit model considered good as the adjusted R2 demonstrates that the five
variables – BEDROOMS, BATHROOMS, GARAGE, LOCATION and TYPE of properties are
able to explain 45.6% of the variation in the weekly rent of properties. The remaining 54.4%
remains unexplained while there may be other hidden factors that affect the weekly rent of the
property.
Out of the five variables, three of the variables, which are BEDROOMS, BATHROOMS
and LOCATION strongly affect the weekly rent of the property in western suburbs of
Melbourne. We considered these variables as good predictors to predict the weekly rent of the
properties. A greater number of bedrooms and bathrooms in the property will cause the weekly
rent to be higher as it is more spacious and convenient; accommodating more people. Therefore,
the real estate agent is able to set the weekly rental at a high price. Besides that, when the
property is located in a suburb area compared to a city area, the weekly rent of the property is
greater than the city area. This is because the areas in western suburbs in Melbourne are close to
the city of Melbourne and are mainly residential area. So, demand is high as many people would
want to live there and it is also close to their workplace. Therefore, the real estate agent can set
the price higher in western suburb areas in Melbourne to compete with other real estate agents as
they are the price setter for the property.
In conclusion, these types of data and information can be extremely useful and valuable
for the real estate agent in determining the weekly rent of the property.
18
PART B
Unweighted Weighted
19
Table 8 shows the tests of equality of group means.
20
(b) Income
Decision:
The significance value of INCOME is 0.000, which is smaller than 0.05, the level of
significance for the test.
Therefore, we reject the null hypothesis that the mean of INCOME for subscriber and
otherwise are the same.
The significance value of 0.000 indicates that there is strong evidence against the null
hypothesis.
Conclusion:
At 5% significance level, we conclude that INCOME is a useful predictor.
(c) Marital
Decision:
The significance value of MARITAL is 0.079, which is larger than 0.05, the level of
significance for the test.
Therefore, we do not reject the null hypothesis that the mean of MARITAL for subscriber
and otherwise are the same.
The significance value of 0.079 indicates that there is insufficient evidence to reject the
null hypothesis.
Conclusion:
At 5% significance level, we conclude that MARITAL is not a useful predictor.
(d) Profession
Decision:
The significance value of PROFESSION is 0.885, which is larger than 0.05, the level of
significance for the test.
Therefore, we do not reject the null hypothesis that the mean of PROFESSION for
subscriber and otherwise are the same.
The significance value of 0.885 indicates that there is insufficient evidence to reject the
null hypothesis.
Conclusion:
At 5% significance level, we conclude that PROFESSION is not a useful predictor.
21
(e) Gender
Decision:
The significance value of GENDER is 0.063, which is larger than 0.05, the level of
significance for the test.
Therefore, we do not reject the null hypothesis that the mean of GENDER for subscriber
and otherwise are the same.
The significance value of 0.063 indicates that there is insufficient evidence to reject the
null hypothesis.
Conclusion:
At 5% significance level, we conclude that GENDER is not a useful predictor.
22
Table 10 shows canonical correlation.
Table 10: Eigenvalues
Decision:
Based on Table 11, the significance value of the test is 0.000, which is smaller than 0.05,
the level of significance for the test.
Therefore, we reject the null hypothesis that the discriminant function is not significant.
The significance value of 0.000 indicates that there is strong evidence against the null
hypothesis.
Conclusion:
At 5% significance level, we have sufficient evidence to conclude that the discriminant
function is significant.
23
Table 12 shows the standardized canonical discriminant function coefficients.
Function
Age .934
Income -.362
Marital -.288
Profession -.102
Gender -.028
24
Table 13 shows the unstandardized canonical discriminant function coefficients.
Function
Age .104
Income -.050
Marital -.587
Profession -.251
Gender -.061
(Constant) -.996
Unstandardized coefficients
Table 14 shows the unstandardized canonical discriminant function evaluated at group means.
Subscriber Function
Otherwise 1.340
Subscriber -1.053
Based on Table 14, the average centroid (mid-point) can be calculated as follows:
25
Table 15 shows the hit ratio classification results.
Otherwise Subscriber
Otherwise 55 11 66
Count
Subscriber 6 78 84
Cases Selected Original
Otherwise 83.3 16.7 100.0
%
Subscriber 7.1 92.9 100.0
Otherwise 10 10 20
Count
Subscriber 3 27 30
Cases Not Selected Original
Otherwise 50.0 50.0 100.0
%
Subscriber 10.0 90.0 100.0
The hit ratio for the analysis sample is 88.7%, while the hit ratio for the held out sample
is 74.0%.
The hit ratio for the analysis sample is relatively high as it is above 80%.
On the other hand, the held out sample is not relatively high as it is lower than 80%.
This indicates that the discriminant function does not appear to be very satisfactory.
26
Question 12
Marital: 0 (Single)
Profession: 1 (Full-time)
Gender: 1 (Male)
After replacing it with the relevant figures given above, the result is shown as below:
Subscriber = - 0.996 + 0.104 (40) - 0.050 (65) - 0.587(0) - 0.251 (1) - 0.061 (1)
= - 0.398
= - $398
From the estimation above, D = - 0.398 which is smaller than the average centroid 0.1435.
Therefore, 40 years old single male in full-time employment and earns an annual income of
$65,000 is unlikely to be a subscriber to the online newspaper.
27
Part C (Factor Analysis)
Question 13 & Question 14
Table 16 shows mean and standard deviations of each variable for 200 observations from a
random sample of persons over the age of 18 who are in full-time or part-time employment.
28
Table 18 shows the total variance explained by the factors.
Table 18: Total Variance Explained
29
Figure 3: Scree Plot of the 6 Factors
Based on Figure 3, the scree plot profile shows that three factors are doing most of the work,
since the plot levels out after the first three factors.
30
Table 19 shows the rotated component matrix of the factors.
Component
1 2 3
31
Question15
Discriminant Model (2)
Table 20 shows the group statistics.
Table 20: Group Statistics
Unweighted Weighted
32
Table 21 shows the test of equality of group means.
Table 21: Tests of Equality of Group Means
33
Based on Table 21, it can be observe that:
(a) Age
Decision:
The significance value of AGE is 0.000, which is smaller than 0.05, the level of
significance for the test.
Therefore, we reject the null hypothesis that the mean of AGE for subscriber and
otherwise are the same.
The significance value of 0.000 indicates that there is strong evidence against the null
hypothesis.
Conclusion:
At 5% significance level, we conclude that AGE is a useful predictor.
(b) Income
Decision:
The significance value of INCOME is 0.000, which is smaller than 0.05, the level of
significance for the test.
Therefore, we reject the null hypothesis that the mean of INCOME for subscriber and
otherwise are the same.
The significance value of 0.000 indicates that there is strong evidence against the null
hypothesis.
Conclusion:
At 5% significance level, we conclude that INCOME is a useful predictor.
(c) Marital
Decision:
The significance value of MARITAL is 0.079, which is larger than 0.05, the level of
significance for the test.
Therefore, we do not reject the null hypothesis that the mean of MARITAL for subscriber
and otherwise are the same.
The significance value of 0.079 indicates that there is insufficient evidence to reject the
null hypothesis.
Conclusion:
At 5% significance level, we conclude that MARITAL is not a useful predictor.
34
(d) Profession
Decision:
The significance value of PROFESSION is 0.885, which is larger than 0.05, the level of
significance for the test.
Therefore, we do not reject the null hypothesis that the mean of PROFESSION for
subscriber and otherwise are the same.
The significance value of 0.885 indicates that there is insufficient evidence to reject the
null hypothesis.
Conclusion:
At 5% significance level, we conclude that PROFESSION is not a useful predictor.
(e) Gender
Decision:
The significance value of GENDER is 0.063, which is larger than 0.05, the level of
significance for the test.
Therefore, we do not reject the null hypothesis that the mean of GENDER for subscriber
and otherwise are the same.
The significance value of 0.063 indicates that there is insufficient evidence to reject the
null hypothesis.
Conclusion:
At 5% significance level, we conclude that GENDER is not a useful predictor.
(f) Cost
Decision:
The significance value of COST is 0.000, which is smaller than 0.05, the level of
significance for the test.
Therefore, we reject the null hypothesis that the mean of AGE for subscriber and
otherwise are the same.
The significance value of 0.000 indicates that there is strong evidence against the null
hypothesis.
Conclusion:
At 5% significance level, we conclude that COST is a useful predictor.
35
(g) Educational
Decision:
The significance value of EDUCATIONAL is 0.000, which is smaller than 0.05, the level
of significance for the test.
Therefore, we reject the null hypothesis that the mean of EDUCATIONAL for subscriber
and otherwise are the same.
The significance value of 0.000 indicates that there is strong evidence against the null
hypothesis.
Conclusion:
At 5% significance level, we conclude that EDUCATIONAL is a useful predictor.
(h) Functional
Decision:
The significance value of FUNCTIONAL is 0.709, which is larger than 0.05, the level of
significance for the test.
Therefore, we do not reject the null hypothesis that the mean of FUNCTIONAL for
subscriber and otherwise are the same.
The significance value of 0.709 indicates that there is insufficient evidence to reject the
null hypothesis.
Conclusion:
At 5% significance level, we conclude that FUNCTIONAL is not a useful predictor.
36
Table 22 shows the correlation coefficients (pooled within-groups matrices).
Table 22: Pooled Within-Groups Matrices
37
Table 24 shows the test of the discriminant function.
Table 24: Wilks' Lambda
Decision:
Based on Table 24, the significance value of the test is 0.000, which is smaller than 0.05,
the level of significance for the test.
Therefore, we reject the null hypothesis that the discriminant function is not significant.
The significance value of 0.000 indicates that there is strong evidence against the null
hypothesis.
Conclusion:
At 5% significance level, we have sufficient evidence to conclude that the discriminant
function is significant.
38
Table 25 shows the standardized canonical discriminant function coefficients.
Function
Age .848
Income -.379
Marital -.335
Profession -.114
Gender .027
Cost .295
Educational -.311
Functional -.026
39
Table 25 shows the unstandardized canonical discriminant function coefficients.
Function
Age .094
Income -.052
Marital -.681
Profession -.279
Gender .060
Cost .309
Educational -.334
Functional -.027
(Constant) -.530
Unstandardized coefficients
Table 27 shows the unstandardized canonical discriminant function evaluated at group means.
Table 27: Functions at Group Centroids
Subscriber Function
Otherwise 1.463
Subscriber -1.150
Based on Table 27, the average centroid (mid-point) can be calculate as follows:
Average centroid = (1.463-1.150) / 2
= 0.313 / 2
= 0.1565
40
Table 28 shows the hit ratio classification results.
Otherwise Subscriber
Otherwise 54 12 66
Count
Subscriber 4 80 84
Cases Selected Original
Otherwise 81.8 18.2 100.0
%
Subscriber 4.8 95.2 100.0
Otherwise 12 8 20
Count
Subscriber 0 30 30
Cases Not Selected Original
Otherwise 60.0 40.0 100.0
%
Subscriber .0 100.0 100.0
The hit ratio for the analysis sample is 89.3%, while the hit ratio for the held out sample
is 84.0%.
The hit ratio for the analysis sample is high as it is above 80%, almost reaching 90%.
On the other hand, the held out sample is relatively high as it is above 80%.
This indicates that the discriminant function appears to be very satisfactory.
41
Question 15
Conclusion:
In conclusion, we conclude that discriminant Model 2 is a better model. Therefore, the use of
factor analysis has improved the discriminant analysis.
42