Вы находитесь на странице: 1из 8

Assignment 5: Under (blood) pressure

Raymond Guo
2020-02-19

Exercise 1

blood_pressure %>%
gather(Age:Pulse, key = "measurement", value = "value") %>%
ggplot() +
geom_point(mapping = aes(x = value, y = Systol)) +
facet_wrap(~ measurement, scales = "free_x")
Age Calf Chin

160

140

120

20 30 40 50 0 5 10 15 20 5.0 7.5 10.0

Forearm Height Pulse

160
Systol

140

120

2.5 5.0 7.5 10.0 12.5 1500 1550 1600 1650 50 60 70 80 90

Weight Years

160

140

120

60 70 80 0 10 20 30 40
value

Exercise 2
THe years graph shows a negative correlation.
The variables that show a positive correlation are Forearm, Weight, Calf, and Height.

1
Exercise 3

blood_pressure_updated <- blood_pressure%>%


mutate(urban_frac_life = Years / Age)

Exercise 4

systol_urban_frac_model <- lm(Systol ~ urban_frac_life, data = blood_pressure_updated


)

Exercise 5

systol_urban_frac_model %>%
tidy()

term estimate std.error statistic p.value


(Intercept) 133.49572 4.038011 33.059770 0.0000000
urban_frac_life -15.75182 9.012962 -1.747686 0.0888139

systol_urban_frac_model %>%
glance() %>%
select(r.squared)

r.squared
0.0762564

Exercise 6

systol_urban_frac_df <- blood_pressure_updated %>%


add_predictions(systol_urban_frac_model) %>%
add_residuals(systol_urban_frac_model)

i. The column that holds the response value is pred


ii. The column that holds the residuals is resid

Exercise 7
We can tell if it is reliable if the dependent variable Y has a linear relationship to the independent
variable X.
ggplot(systol_urban_frac_df) +
geom_point(mapping = aes(x = urban_frac_life, y = Systol)) +
geom_abline(slope = systol_urban_frac_model$coefficients[2], intercept = systol_urban_frac_mo

2
160

Systol
140

120

0.00 0.25 0.50 0.75


urban_frac_life

Exercise 8

ggplot(systol_urban_frac_df) +
geom_point(mapping = aes(pred, Systol)) +
geom_abline(
slope = 1,
intercept = 0,
color = "red",
size = 1
)

160
Systol

140

120

120 125 130


pred

ggplot(systol_urban_frac_df) +
geom_point(aes(pred, resid)) +
geom_ref_line(h = 0)

3
30

20
resid
10

−10

120 125 130


pred

i. The plots suggest the condition was not violated. There is no curve even remotely shown.
ii. The plots suggest the condition was not violated. There is an equilibrium of the points from
above and below the line. ## Exercise 9
ggplot(data = systol_urban_frac_df) +
geom_histogram(
mapping=aes(x = resid), binwidth = 5
)

10.0

7.5
count

5.0

2.5

0.0
−20 −10 0 10 20 30 40
resid

i. It looks very right skewed and the center is around the value 5 of resid.
ii. The skewed nature of the bell violated the nearly normal residuals because there is a dis-
proportion amount of negative residual values compared to the positive ones ## Exercise
10
ggplot(data = systol_urban_frac_df) +
geom_qq(mapping = aes(sample = resid)) +
geom_qq_line(mapping = aes(sample = resid))

4
40

20

sample
0

−20

−2 −1 0 1 2
theoretical

This graph clearly shows a violation within the nearly normal residual condition. There are more
points plotted above the linear line than below which explains the right skewed image of the bell
shape curve.

Exercise 11

systol_weight_model <- lm(Systol ~ Weight, data = blood_pressure_updated


)

systol_weight_model %>%
glance() %>%
select(r.squared)

r.squared
0.2718207

Yes, because r.squared is closer to 1 compared to what urban_frac_life can provide.


systol_weight_df <- blood_pressure_updated %>%
add_predictions(systol_urban_frac_model) %>%
add_residuals(systol_urban_frac_model)

ggplot(systol_weight_df) +
geom_point(mapping = aes(pred, Systol)) +
geom_abline(
slope = 1,
intercept = 0,
color = "red",
size = 1
)

5
160

Systol
140

120

120 125 130


pred

ggplot(systol_weight_df) +
geom_point(aes(pred, resid)) +
geom_ref_line(h = 0)

30

20
resid

10

−10

120 125 130


pred

For the first condition, there is a linear relationship between pred and Systol. For the second
condition, there is a single outliear on the graph which might alter the bell shape curve but not
much. This is not much of a violation. For the third condition, the points around h = 0 looks like
an equilibrium. All three conditions are met so the new model is reliable.

Exercise 12

systol_combo_model <- lm(Systol ~ urban_frac_life + Weight, data = blood_pressure_updated)

systol_combo_model_df <- blood_pressure_updated %>%


add_predictions(systol_urban_frac_model) %>%
add_residuals(systol_urban_frac_model)

6
ggplot(systol_combo_model_df) +
geom_point(mapping = aes(pred, Systol)) +
geom_abline(
slope = 1,
intercept = ,
color = "red",
size = 1
)

160
Systol

140

120

120 125 130


pred

ggplot(systol_combo_model_df) +
geom_point(aes(pred, resid)) +
geom_ref_line(h = 0)

30

20
resid

10

−10

120 125 130


pred

For the first condition, there is a linear relationship between pred and Systol. For the second
condition, there is a single outliear on the graph which might alter the bell shape curve but not
much. This is not much of a violation. For the third condition, the points around h = 0 looks like
an equilibrium. All three conditions are met so the new model is reliable.

7
systol_combo_model %>%
glance() %>%
select(r.squared)

r.squared
0.4731078

This mult-variable system performed better because r.squared got closer to the value 1 compared
single-variable.

Вам также может понравиться