Вы находитесь на странице: 1из 9

Modern Statistical Computing in R

Instructors Albert Satorra and Ferran Carrascosa


July 26, 15:30 to 17:00, 2018, UPF (4nd quarter)

This is the Final Exam of Modern Statistical Computing in R This exam is composed of 10 short queries
—Q1 to Q10— plus one Question. Queries and Question have a context that is implicitely defined by related
R sintaxis. Be very carefull with the sintaxis! Q1-Q10 account for 80% of the grade,
the Question for the remaining 20%. Time for this exam is 1:30h.

Q1

x<- ("A","B","C")
is.numeric(x)

The result of the following R sintaxis will be:


1. TRUE
2. FALSE
3. there is a sintaxis error
Write the response number of your choice with a brief justification comment (one/two lines)
-
-> x<- ("A","B","C")
Error: unexpected ',' in "x<- ("A","

Option 3 because it should have been x<-c("A","B","C"). (student text in the exam 26-7-2018)
-
-

Q2

x<- rep(1,10)
sd(x)

[1] 0
The result of the above R sintaxis will be:
1. 0
2. 1
3. there is a sintaxis error
Write the response number of your choice with a brief justification comment (one/two lines)
-
-Option 1 because x will be a vector of 10 1's, and because it is
all the same numbers , variance will be 0.
(student text in the exam 26-7-2018)

1
-
-

Q3
Consider the following sintaxis (that simulates data for x and y) and the regression analysis results on that
data
set.seed(3984) ; # seed number
x<-runif(1000)
u<-runif(1000)-0.5
y <- 0+ .3*x + 0.6*u
plot(x,y)
0.6
0.4
0.2
y

0.0
−0.2

0.0 0.2 0.4 0.6 0.8 1.0

x
res<- lm(y~x)
A<-coefficients(summary(res))
A

Estimate Std. Error t value Pr(>|t|)


(Intercept) -0.009588047 0.01088475 -0.8808694 3.786006e-01
x 0.321875626 0.01869217 17.2198105 2.210016e-58
e<-res$residuals
qqnorm(e)
qqline(e, col="red")

2
Normal Q−Q Plot

0.3
0.2
Sample Quantiles

0.1
−0.1
−0.3

−3 −2 −1 0 1 2 3

Theoretical Quantiles One of


our colleagues defends: “In this regression the standard error cannot be trusted since the residuals are clearly
non-normal”. To further check for this, we run the following sintaxis
MT<-c()
for (i in 1:1000){
x<-runif(1000)
u<-runif(1000)-0.5
y <- 0+ .3*x + 0.6*u
res<- lm(y~x)
A<-coefficients(summary(res))[2,1]
MT<-c(MT,A)
}
sd(MT)

[1] 0.018433
In view of all above, we can say that the statement of our colleague
1. is correct
2. is not correct
3. the value obtained of sd(MT) has nothing to do with the statement of our colleague.
Write the response number of your choice with a brief justification comment (one/two lines)
-
-By bootstrap we estimate the coefficient beta1 one hundere times.
After that we compute the standard deviation.
It appears to be the same as the computed by regresion,
thus, we can trust the regresion.
(student text in the exam 26-7-2018)
-
-

3
Q4
Executing the sintaxis below, produces
X<-matrix(rbinom(2*1000,4,.2),1000,2)
M<-t(X)%*%X
iM<- solve(M)
sum(diag(iM%*%M))

[1] 2
1. a random number
2. 1000
3. 2
Write the response number of your choice with a brief justification comment (one/two lines)
-
-
-Option 3 because it sum two ones. (student text in the exam 26-7-2018)
-

Q5
In relation to the y and x above in Q3
library(MASS)

boots<-c()
for (i in 1:300){
n<- length(y)
ind<- sample(1:n,n)
res<- rlm(y[ind]~x[ind])
A<-coefficients(summary(res))[2,2]
boots<- c(boots,A)
}
sd(boots)

[1] 9.181486e-18
Be carefull in the sintaxis sample(). The value produced by the sintaxis will be
1. approximately .018
2. 0 (at the machine level )
3. approximately .0.31
Write the response number of your choice with a brief justification comment (one/two lines)
-
-
-Option 2 because sample function does not use replacement.
(student text in the exam 26-7-2018)
-

4
Q6
The following sintaxis, produces
data<-sample(1:4, 100, replace=TRUE)
prop.table(table(data))

data
1 2 3 4
0.22 0.20 0.26 0.32
The value obtained will be
1.
data
1 2 3 4
0.23 0.27 0.23 0.27

2.
data
1 2 3 4
23 27 23 27

3. There is a sintaxis error


Write the response number of your choice with a brief justification comment (one/two lines)
-
- Option 1 because it is supposed to give the proportions of each value in the variable "data"
(student text in the exam 26-7-2018)
-
-

Q7
The sintaxis
data<- 1:30
a<- data > 3
is.numeric(a)

[1] FALSE
results to
1. FALSE
2. TRUE
3. there is a sintaxis error
Write the response number of your choice with a brief justification comment (one/two lines)
-
- Option 1 because we feed a logical value into variable "a" so for every value,
it will answer te question with true's and fale's , which is not numeric, it is logical.
(student text in the exam 26-7-2018)
-
-

5
Q8
The following sintaxis
length(rep(1:3,3))

[1] 9
results in
1. 3
2. 9
3. 6 Write the response number of your choice with a brief justification comment (one/two lines)
-
-
- Option 2 because is the length of a vector of 9 elements.
(student text in the exam 26-7-2018)
-

Q9
The following R object
> da
x fx
1 1 1
2 1 1
3 2 2
4 3 3

results in
> mean(da$x)
[1] 1.75
> mean(da$fx)
[1] NA
Warning message:
In mean.default(da$fx) : argument is not numeric or logical: returning NA
>

This implies that


is.matrix(da)

results in
1. TRUE
2. FALSE
3. error of computation
Write the response number of your choice with a brief justification comment (one/two lines)
-
-
- Option 2 becase if "da" contains both numeric and factor variables
then it can not be a matrix, it is a data frame.

6
(student text in the exam 26-7-2018)
-

Q10
With respect to number P computed in the following sintaxis
data<-sample(1:10,1000,rep=TRUE)
O<-table(data)
E<-1000*rep(1/10,10)
T<- sum(((O-E)**2)/E)
P<-1-pchisq(T,9)

[1] 0.2077303
1. It is very likely, probability 95%, that P > 0.05.
2. It is very likely that P > 0.95.
3. P follows a chi-square distribution with 9 degrees of freedom
Write the response number of your choice with a brief justification comment (one/two lines)
-
- The null hypothesis for the chi-square goodness of fit test holds in this case, so P
has approximately a uniform distribution in the interval (0,1);
so it is approximately 95\% probable to be above .05.

Question
The file File logitConditionalEffectPlot.R of web of the course, models de variable Vot (voting (yes/no to the
party A) as a function of the log of income (Lrenda) and gender (Genere, Gender = 0 is male). Below is the
sintaxis and its execution in the Console of R.
# source("/Users/albert/FUNCTIONS/logitConditionalEffectPlot.R")
library(foreign)
data<- as.data.frame(read.spss("http://84.89.132.1/~satorra/dades/M2014dadesSIM.sav"))
dim(data)

[1] 800 4
head(data)

Lrenda Ldespeses Genere Vot


1 9.477 4.503 1 1
2 11.435 6.147 1 0
3 10.686 4.961 0 0
4 10.407 3.993 0 0
5 10.814 5.746 0 0
6 9.944 4.950 0 1

7
attach(data)

# this is my sintaxis for the conditional effect plot


reg <- glm(Vot ~ Lrenda , binomial)
summary(reg)

Call:
glm(formula = Vot ~ Lrenda, family = binomial)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.5206 -0.9692 0.4540 0.9087 2.5511

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 12.389 1.027 12.07 <2e-16 ***
Lrenda -1.208 0.101 -11.96 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1101.02 on 799 degrees of freedom


Residual deviance: 900.73 on 798 degrees of freedom
AIC: 904.73

Number of Fisher Scoring iterations: 4


beta<-reg$coefficients
x<-seq(min(Lrenda)-3,max(Lrenda)+2, length=200)

Logit <- beta[1] + beta[2]*x


prob <- 1/(1 + exp(-Logit))

reg3 <- glm(Vot ~ Lrenda + Genere, binomial)


beta3 <- reg3$coefficients
# x<-seq(min(Lrenda)-3,max(Lrenda)+2, length=200)
Logit1 <- beta3[1] + beta3[2]*x+ beta3[3]*1
Logit0 <- beta3[1] + beta3[2]*x+ beta3[3]*0
prob1 <- 1/(1 + exp(-Logit1))
prob0 <- 1/(1 + exp(-Logit0))
plot(Lrenda, Vot , main ="conditional effect plot: Vot vs Lrenda + Gender ")
lines(x, prob, col="gray", lwd=3)
lines(x, prob0, col="blue", lwd=3)
lines(x, prob1, col="orange", lwd=3)
abline(v = 10, lty = 3, lwd=0.8)
abline(v = 13, lty = 3, lwd=0.8)
legend(11, 0.9, c("Gender=0","Gender=1","overall"), col=c("blue","orange","grey"),lwd=3)

8
conditional effect plot: Vot vs Lrenda + Gender
1.0
0.8

Gender=0
Gender=1
overall
0.6
Vot

0.4
0.2
0.0

7 8 9 10 11 12 13

Lrenda
Comment briefly the main actions of the code and the results obtained in the analysis. Use a language that
can be understood by a non-statistician. Maximum length one page.
-
-
-
-
-
-
-
-
-
-
-
-
-
-

Вам также может понравиться