Академический Документы
Профессиональный Документы
Культура Документы
Raymond Guo
2020-02-19
Exercise 1
blood_pressure %>%
gather(Age:Pulse, key = "measurement", value = "value") %>%
ggplot() +
geom_point(mapping = aes(x = value, y = Systol)) +
facet_wrap(~ measurement, scales = "free_x")
Age Calf Chin
160
140
120
160
Systol
140
120
Weight Years
160
140
120
60 70 80 0 10 20 30 40
value
Exercise 2
THe years graph shows a negative correlation.
The variables that show a positive correlation are Forearm, Weight, Calf, and Height.
1
Exercise 3
Exercise 4
Exercise 5
systol_urban_frac_model %>%
tidy()
systol_urban_frac_model %>%
glance() %>%
select(r.squared)
r.squared
0.0762564
Exercise 6
Exercise 7
We can tell if it is reliable if the dependent variable Y has a linear relationship to the independent
variable X.
ggplot(systol_urban_frac_df) +
geom_point(mapping = aes(x = urban_frac_life, y = Systol)) +
geom_abline(slope = systol_urban_frac_model$coefficients[2], intercept = systol_urban_frac_mo
2
160
Systol
140
120
Exercise 8
ggplot(systol_urban_frac_df) +
geom_point(mapping = aes(pred, Systol)) +
geom_abline(
slope = 1,
intercept = 0,
color = "red",
size = 1
)
160
Systol
140
120
ggplot(systol_urban_frac_df) +
geom_point(aes(pred, resid)) +
geom_ref_line(h = 0)
3
30
20
resid
10
−10
i. The plots suggest the condition was not violated. There is no curve even remotely shown.
ii. The plots suggest the condition was not violated. There is an equilibrium of the points from
above and below the line. ## Exercise 9
ggplot(data = systol_urban_frac_df) +
geom_histogram(
mapping=aes(x = resid), binwidth = 5
)
10.0
7.5
count
5.0
2.5
0.0
−20 −10 0 10 20 30 40
resid
i. It looks very right skewed and the center is around the value 5 of resid.
ii. The skewed nature of the bell violated the nearly normal residuals because there is a dis-
proportion amount of negative residual values compared to the positive ones ## Exercise
10
ggplot(data = systol_urban_frac_df) +
geom_qq(mapping = aes(sample = resid)) +
geom_qq_line(mapping = aes(sample = resid))
4
40
20
sample
0
−20
−2 −1 0 1 2
theoretical
This graph clearly shows a violation within the nearly normal residual condition. There are more
points plotted above the linear line than below which explains the right skewed image of the bell
shape curve.
Exercise 11
systol_weight_model %>%
glance() %>%
select(r.squared)
r.squared
0.2718207
ggplot(systol_weight_df) +
geom_point(mapping = aes(pred, Systol)) +
geom_abline(
slope = 1,
intercept = 0,
color = "red",
size = 1
)
5
160
Systol
140
120
ggplot(systol_weight_df) +
geom_point(aes(pred, resid)) +
geom_ref_line(h = 0)
30
20
resid
10
−10
For the first condition, there is a linear relationship between pred and Systol. For the second
condition, there is a single outliear on the graph which might alter the bell shape curve but not
much. This is not much of a violation. For the third condition, the points around h = 0 looks like
an equilibrium. All three conditions are met so the new model is reliable.
Exercise 12
6
ggplot(systol_combo_model_df) +
geom_point(mapping = aes(pred, Systol)) +
geom_abline(
slope = 1,
intercept = ,
color = "red",
size = 1
)
160
Systol
140
120
ggplot(systol_combo_model_df) +
geom_point(aes(pred, resid)) +
geom_ref_line(h = 0)
30
20
resid
10
−10
For the first condition, there is a linear relationship between pred and Systol. For the second
condition, there is a single outliear on the graph which might alter the bell shape curve but not
much. This is not much of a violation. For the third condition, the points around h = 0 looks like
an equilibrium. All three conditions are met so the new model is reliable.
7
systol_combo_model %>%
glance() %>%
select(r.squared)
r.squared
0.4731078
This mult-variable system performed better because r.squared got closer to the value 1 compared
single-variable.