DS II Tut-3 Solved

TUTORIAL -3
Question 1 (20 points)
In a medical research study, 36 patients suffering from severe clinical depression were prescribed
different treatments and the effectiveness of these treatments in managing severe clinical
depression was recorded. Note that for the duration of the study, each patient was uniquely
prescribed one and only one of the four available treatments (referred to as Treatments A, B, C, and
D). Moreover, each treatment was prescribed to the exact same number of patients. Finally, all of
the variables recorded in the study are given in Table 3.1.
Table 3.1. Variable names, descriptions, and types
Variable Variable Description Variable Type

The effectiveness of the prescribed
Effectiveness Numerical
treatment measured on a scale of 0-100
Age Age of the patient Numerical
Gender Gender of the patient Categorical (Male/Female)
Treatment Type of treatment prescribed to a patient Categorical (A, B, C, D)
The pairwise correlation matrix for all these variables is first computed as shown below (Table 3.2):
Table 3.2 Correlation matrix
Treatmen Treatment Treatment Treatment

Effectiveness Age Gender
tA B C D
Effectiveness 1
Age 0.7967 1
Gender 0.2262 0.0721 1
Treatment A 0.3589 0.0355 0.1360 1
Treatment B -0.1231 0.0667 - 0.1360 - 0.3333 1
Treatment C -0.2489 -0.0756 0.1360 - 0.3333 - 0.3333 1
Treatment D 0.0131 -0.0266 - 0.1360 - 0.3333 - 0.3333 - 0.3333 1
Question 3.1 (1 point)
As the correlation coefficient value between Effectiveness and Treatment C is negative, it was
concluded that Treatment C is not effective in treating clinical depression. State whether this claim is
true or false and justify your conclusion.
Solution: False. The correlation coefficient merely shows that as Treatment C increases from 0 to 1,
the average effectiveness levels reduce, i.e., for patients not prescribed Treatment C (Treatment C
= 0), their effectiveness levels are higher on average than those patients who were prescribed
Treatment C (Treatment C = 1). Clearly, the effectiveness level of Treatment C in this case is
measured relative to other treatments and therefore it is fair to state that Treatment C is not as
effective as other treatments in the study, but we cannot state that it is not effective in treating
clinical depression. To make the claim that Treatment C is not effective in treating clinical
depression, we would have to compare Treatment C with patients who were not given any
treatment at all, but this option is not available in the data.
1
Model 1:
A linear regression model was initially constructed to predict effectiveness based only on the
treatment used and the following SPSS output was obtained.
Model 1 Coefficientsa
Model Unstandardized Coefficients Standardized T Sig.

Coefficients
B Std. Error Beta
(Constant) 55.444
TreatmentA 7.333 .259

1
TreatmentB -2.889 -.102
TreatmentC - 5.555 -.197
a. Dependent Variable: Effectiveness
Question 3.2. (2 points)

Using the regression coefficients given in Model 1, rank the different treatments in the order of their
effectiveness, starting from best to worst and clearly indicate the average effectiveness of each
treatment.
Y_eff= 55.444 + 7.333 treat_A - 2.889 treat_B - 5.555 treat_C
Treatment A (62.77) > Treatment D (55.44) > Treatment B (52.55) > Treatment C (49.88)
Treatment D = 55.444
Treatment A = 55.444 + 7.3333 = 62.774
Treatment B = 55.444 – 2.889 = 52.55555
Treatment C = 55.444 – 5.5555 = 49.889
Question 3.3. (1 point)
What would the prediction equation be if the base category was taken to be Treatment B?
If the base category is taken to be Treatment B, then the prediction equation can be written as:
Y_eff = 52.5555 + (62.774-52.555) Tr A + (49.889 – 52.5555) Tr C + (55.444-52.555) Tr D
Y^ =52.55+ 10.22∗Treatment A−2.667∗Treatment B+2.889∗Treatment D

It was noted that the total variability in effectiveness explained by the above regression model is only
15.46%. Moreover, when a reputed clinical psychologist was shown the rankings of the treatments,
she stated that based on her clinical experience all treatments are likely to have the same
2
effectiveness on average. Perform an appropriate test to verify this claim at a 95% confidence level,
clearly stating the null and alternate hypotheses.
Ho: beta1=beta2=beta3
Ha: not all beta are same
R^2 = 0.1546 = SSR/SST = (MSR *k)/SST ----- eqn 1
1 - R^2 = 0.8454 = SSE/SST = (MSE*(n-k-1))/SST ------ eqn 2
Divide the above two equations
0.1546/0.8454 = (MSR *k)/ (MSE*(n-k-1))
K = 3, n = 36
F = MSR/MSE = 1.9506
F_critical = 2.9
Therefore, we cannot reject the null hypothesis, and therefore we cannot reject the claim that all
treatments are the same on average.
Model 2
Recognizing that ‘Age’ of the patient might be a factor when measuring the effectiveness of a
received treatment, Model 1 was enhanced by using an interaction variable, namely
“Age*Treatment” (all combinations) and the regression output for this new model is given below.
Model Summaryb
Model R R Square Adjusted R Std. Error of the

Square Estimate
1 .937a .878 .848 4.84794
a. Predictors: (Constant), Age_TreatD, Age_TreatC, Age_TreatB,

Age_TreatA, TreatmentC, TreatmentD, TreatmentB,
b. Dependent Variable: Effectiveness
Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
constant 49.470 5.864 8.437 .000
TreatmentB -22.030 7.882 -.779 -2.795 .009
TreatmentC -44.931 7.512 -1.589 -5.981 .000
1 TreatmentD -20.964 7.844 -.742 -2.673 .012
Age_TreatA .296 .125 .496 2.361 .025
Age_TreatB .549 .110 .948 5.010 .000
Age_TreatC 1.074 .104 1.742 10.286 .000
Age_TreatD .620 .114 1.018 5.439 .000
3
Question 3.5 (3 points)
Based on the regression output, plot the regression equations for each of the four treatments (on the
same graph), if the minimum and maximum ages of the patients in the dataset are 19 and 67 years,
respectively. Based on your regression plots, which treatment would you recommend as the best on
average for a patient of age 60 years?
Y_treatA = 49.470 + 0.296*age

Y_treatD = (49.470-20.964) + 0.620*age
Question 3.6. (1 point)

At what age does the average effectiveness of Treatment C surpass that of Treatment A? Based on
your answer, what treatment(s) would you recommend for younger patients and older patients?
Equating the prediction equations for Treatments A and C, we get the threshold age where
Treatment C surpasses Treatment A as 57.30 years. (In the plot above, it is the intersection point
between the blue and grey lines). Therefore, based on the prediction equations, on average,
Treatment A works better for younger patients and Treatment C works better for elderly patients.
What is the probability that the effectiveness level would be at least 60 for a 55-year old patient who
is administered Treatment B?
Y_treatB = 27.44 + 0.549*age = 57.635
4
Given, sqrt(MSE) = 4.8479
Y|x ~N(57.635,4.84794^2)
Prob(y|age = 55 > 60)=P(z > (60-57.635)/4.8479) = P(Z > 0.4878)
Prob. = 0.312846 = 1 – NORM.S.DIST(0.4878, TRUE)
Model 3
A stepwise regression procedure was initiated to predict effectiveness using all the original
explanatory variables given in Table 3.1 as well as interaction variables: Age*Treatment A;
Age*Treatment B; and Age*Treatment C. The following SPSS output obtained in the first two models
of the stepwise regression procedure is shown below.
Variables Entered/Removeda
Step Variables Entered Variables Removed Method
Stepwise (Criteria:
Probability-of-F-to-enter <= .
1 Age . 050, Probability-of-F-to-
remove >= .100).
Stepwise (Criteria:
Probability-of-F-to-enter <= .
2 TreatmentA . 050, Probability-of-F-to-
remove >= .100).
At the end of Step 1 of the regression procedure, determine the partial and part correlations for
Treatment A. Correspondingly, determine what percentage of the variability in “effectiveness” is
explained by Step 1 and Step 2, respectively?
R_yx2,x1 = (r_yx2-r_yx1*r_x2x1)/sqrt((1- r_yx1^2)*(1- r_x2x1^2)) = 0.547 ( This formula was

discussed in last class)
Sr_ yx2,x1 = (r_yx2-r_yx1*r_x2x1)/sqrt(1- r_x2x1^2) = 0.331 (Same with this formula)
R^2=0.7967^2=0.6347 (Step 1)
R^2=0.331^2+0.6347=0.7443 (Step 2)

After Step 2 of the regression model, the excluded variable with the largest semi-partial (part)
correlation value = 0.219, was “Age*Treatment A”. Conduct an appropriate test at the 95%
confidence level to determine if this excluded variable can be added to the regression model. State
the null and alternate hypotheses and show all calculations.
Ho: beta_ age*treatA =0

Ha: beta_ age*treatA =/=0
5
We do a partial F-test to test this hypothesis.
Full model: age, treatA, age*treatA
Red model: age, treatA
R^2_full = 0.219^2 + 0.744=0.79226
R^2_red = 0.744
F_part=((R^2_ful-R^2_red)/(k-m))/(1-R^2_full)/(n-k-1)) = 7.37
K=3, m=2, n=36
F_crit = 4.14
As F_part>F_crit we reject the null hypothesis, and therefore “Age*Treatment A” must be included
into the model in Step 3.
Question 3.10 (1 point)

In Model 3, only three interaction variables, namely Age*Treatment A; Age*Treatment B; and
Age*Treatment C are considered as possible explanatory variables. Why is the fourth interaction
variable, namely “Age*Treatment D” not included in this case, while it was included in Model 2?
Explain.
In Model 3, note that “Age” is also an explanatory variable; hence, if we include all possible
combinations of “Age*Treatment” variables, it leads to linear dependency and therefore, we must
exclude one of them and treat it as the base category. However, in Model 2, as “Age” was not an
explanatory variable, it is okay to include all possible combinations of “Age*Treatment” as
explanatory variables.

DS II Tut-3 Solved

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

DS II Tut-3 Solved

Загружено:

Авторское право:

Доступные форматы

TUTORIAL -3

Question 1 (20 points)

Table 3.1. Variable names, descriptions, and types

Variable Variable Description Variable Type

Table 3.2 Correlation matrix

Treatmen Treatment Treatment Treatment

Question 3.1 (1 point)

Model Unstandardized Coefficients Standardized T Sig.

B Std. Error Beta

TreatmentA 7.333 .259

TreatmentC - 5.555 -.197

a. Dependent Variable: Effectiveness

Question 3.2. (2 points)

Question 3.3. (1 point)

Y_eff = 52.5555 + (62.774-52.555) Tr A + (49.889 – 52.5555) Tr C + (55.444-52.555) Tr D

Y^ =52.55+ 10.22∗Treatment A−2.667∗Treatment B+2.889∗Treatment D

Question 3.4. (3 points)

Ha: not all beta are same

R^2 = 0.1546 = SSR/SST = (MSR *k)/SST ----- eqn 1

1 - R^2 = 0.8454 = SSE/SST = (MSE*(n-k-1))/SST ------ eqn 2

Divide the above two equations

0.1546/0.8454 = (MSR *k)/ (MSE*(n-k-1))

Model R R Square Adjusted R Std. Error of the

1 .937a .878 .848 4.84794

a. Predictors: (Constant), Age_TreatD, Age_TreatC, Age_TreatB,

Y_treatA = 49.470 + 0.296*age

Question 3.6. (1 point)

Question 3.7. (2 points)

Y_treatB = 27.44 + 0.549*age = 57.635

Prob(y|age = 55 > 60)=P(z > (60-57.635)/4.8479) = P(Z > 0.4878)

Prob. = 0.312846 = 1 – NORM.S.DIST(0.4878, TRUE)

a. Dependent Variable: Effectiveness

Question 3.8 (3 points)

R_yx2,x1 = (r_yx2-r_yx1*r_x2x1)/sqrt((1- r_yx1^2)*(1- r_x2x1^2)) = 0.547 ( This formula was

Sr_ yx2,x1 = (r_yx2-r_yx1*r_x2x1)/sqrt(1- r_x2x1^2) = 0.331 (Same with this formula)

Question 3.9 (3 points)

Ho: beta_ age*treatA =0

R^2_full = 0.219^2 + 0.744=0.79226

K=3, m=2, n=36

Question 3.10 (1 point)

Вам также может понравиться

0.1546/0.8454 = (MSR k)/ (MSE(n-k-1))

R_yx2,x1 = (r_yx2-r_yx1r_x2x1)/sqrt((1- r_yx1^2)(1- r_x2x1^2)) = 0.547 ( This formula was