Вы находитесь на странице: 1из 5

Jessica Yue

STAT 212 Homework 3


1. Use melanoma.csv to answer the following questions.
a. Generate a contingency table of these two variables. Here we consider using site as
explanatory variable, and tumor as response variable. For the table, put tumor in the rows and
site in the columns.
melanoma= read.csv("melanoma.csv", header= TRUE)
table(melanoma$tumor, melanoma$site)

Head Trunk
Nodular 19 33
Superficial 16 54

b. Draw a grouped absolute frequency bar graph of Tumor grouped by Site.


cont_table= table(melanoma$tumor, melanoma$site)
barplot(cont_table, beside= TRUE, legend.text= rownames(cont_table))

c. Draw a segmented absolute frequency bar graph of Tumor grouped by Site.


cont_table= table(melanoma$tumor, melanoma$site)
barplot(cont_table, beside= FALSE, legend.text= rownames(cont_table))
d. Draw a segmented relative frequency bar graph of Tumor grouped by Site.
col1= cont_table[,1]
col1_total= sum(cont_table[,1])
col2= cont_table[,2]
col2_total= sum(cont_table[,2])
rel_table= cbind("Head"= col1/col1_total, "Trunk"= col2/col2_total)
barplot(rel_table, beside = FALSE, ylim= c(0,1), legend.text= rownames(cont_table))

e. Calculate the risk of nodular tumor when the location is head (p1). Calculate the risk of nodular
tumor when the location is trunk (p0).
P1= 0.543
P2=0.379

f. Using the results you obtain from part (e), calculate the estimated risk difference, relative risk
and odds ratio of getting a nodular tumor in the head relative to the trunk.
Risk Difference: p1-p0= 0.164
Relative Risk= p1/p0=1.43
Odds Ratio= (p1(1-p1))/(p0/(1-p0))=1.94
g. From the plot you obtain from part (d) and the results you obtain from part(f), do you think the
location and type of tumor are associated? Why?
I believe that site and type of tumor are associated as if there is no association, than the values
for risk difference is 0, for relative risk is also 0, and for odds ratio is 1. However in this case,
there are differences seen.

2. Input worldcup.csv to R and answer the following questions.


a. Create an absolute frequency table the Position variable. How many different player positions
are there?
worldcup= read.csv("worldcup.csv", header= TRUE)
table(worldcup$Position)

Defender Forward Goalkeeper Midfielder

188 143 35 228

There are 4 different player positions.


b. How many different countries participated in World Cup 2010? Hint: use the same method you
used in part a) to find the number of different player positions.
table(worldcup$Team)

There are 32 countries.

c. Draw a boxplot of Time grouped by Team.


boxplot(Time ~ Team, data= worldcup, las=2)

d. From the plot you obtain from part (c), can you infer which four teams entered semi-finals?
Explain your answer.
I infer that Germany, Netherlands, Spain, and Uruguay entered the semifinals because these
countries on average have the most playing time in minutes, meaning they have played more
games and therefore have spent more time on the field.

3. Consider the same world cup dataset as problem (2), and answer the following question.
a. Draw a boxplot of Shots grouped by Position.
boxplot(Shots ~ Position, data= worldcup, xlab= "Position", ylab= "Number of shots")
b. Draw a boxplot of Tackles grouped by Position.
boxplot(Tackles~ Position, data= worldcup, xlab= "Position", ylab= "Number of tackles”)

c. From the plots you obtain from part (b) and (c), answer the following questions: Players from
which position focus more on shots? Players from which position focus more on tackles.
It appears from the plots obtained, that players from the Forward position are more focused on
shots while players from the defender position are more focused on tackles.

d. Create a summary table of Passes grouped by Position, including the following descriptive
statistics: mean, sd, median and IQR. Create a table like the one in slide 10 of lecture 10
tapply(worldcup$Passes, worldcup$Position, mean)
tapply(worldcup$Passes, worldcup$Position, sd)
tapply(worldcup$Passes, worldcup$Position, median)
tapply(worldcup$Passes, worldcup$Position, IQR)

Defender Forward Goalkeeper Midfielder


Mean 102.64362 50.82517 55.63889 95.27193
SD 75.60887 53.27755 29.19115 88.24947

Median 89.0 35.0 52.0 72.5


IQR 94.75 49.00 31.50 100.75

4. Use ozone.csv to answer the following questions.


a. Calculate the correlation coeffecient between O3 and each of the other variables in the
dataset.
cor(ozone$O3,ozone$vh)
cor(ozone$O3,ozone$wind)
cor(ozone$O3,ozone$temp)
cor(ozone$O3,ozone$doy)
Vh: 0.6073438
Wind: 0.002471078
Temp: 0.7807028
Doy: 0.06763357

b. From the results you obtain from part (a), which variables do you think are associated with
ozone concentration? Explain your answer.
From the results, it appears that there are correlations between 500 millibar pressure height (Vh)
and the temperature as the correlation coefficient reflects a moderately strong to strong positive
association.
c. Draw a scatter plot between temp and vh. Is there an association between these two
variables?If so, is it positive or negative?
plot(ozone$vh~ozone$temp, data= ozone, xlab= "Temperature", ylab= "Vh")

5. Suppose there are 5 red balls and 3 blue balls in a bag. We randomly pick a ball and record its color.
Then we put it back, and randomly pick another ball and record its color again. Let events A = “The first
ball is red”, B = “The second ball is blue”.
a. What is the sample space of this random trial?
The sample space is [red, red]; [red, blue]; [blue, red]; [blue, blue]
b. What is the complement of A?
c. Calculate P(A) and P(B).
P(A)= 5/8
P(B)= 3/8
d. Are events A and B independent?
Events A and B are independent because what is picked first does not change the probability of
the outcome of the second pick as the balls are put back after each pick.
e. If we do not put the first ball back, and randomly pick another ball from the remaining ones.
Are events A and B independent? Explain your answer.
If we do not put the first ball back, then the events A and B are not independent because the
outcome of the first pick will change the probability of the outcome of the second pick because
there are less balls in the bag and less balls of certain colors.

Вам также может понравиться