Вы находитесь на странице: 1из 11

Universidad Industrial de Santander Facultad de Ingenierías Físico Mecánicas Escuela de

Ingeniería de Sistemas e Informática Profesor: Andrés Leonardo González Gómez, MSc.

Estadística 2: Actividad en clase N◦10

Taller correlacion

Andres Ricardo Hernandez Torres

cod: 2122274

1. See Table E11-1 for data on the ratings of quarterbacks for the 2008 National Football League season (The
Sports Network). It is suspected that the rating (y) is related to the average number of yards gained per pass
attempt (x).

format long
x=[8.39,7.67,7.66,7.98,7.21,7.53,8.01,7.66,7.21,7.16,7.93,7.10,6.33,6.76,6.86,7.35,7.22,7.94,6.
y=[105.5,97.4,96.9,96.2,95,93.8,92.7,91.4,90.2,89.4,87.7,87.5,87,86.4,86.4,86,85.4,84.7,84.3,81
[b1,b0,s]=regresion_lineal(x,y,1);

(a) Calculate R2 for this model and provide a practical interpretation of this quantity.

yy=0;
yy1=0;
n=length(y);
for i=1:1:n
yy=((y(i)-mean(y))*(x(i)-mean(x)))+yy;
yy1=(y(i)-mean(y))*(y(i)-mean(y))+yy1;

end
R21=(b1*yy)/yy1

R21 =
0.671801770078746

Se puede observar que el modelo se ajusta de buena manera, es decir, es fiable ya que el coeficiente de
determinacion da un valor que se acerca a 1.

(b) Prepare a normal probability plot of the residuals from the least squares model. Does the normality
assumption seem to be satisfied?

e=[];

for i=1:1:n
e(i)=(y(i)-(b0+(b1*x(i)))) ;

end

1
figure()
normplot(e)

(c) Plot the residuals versus the fitted values and against x. Interpret these graphs (The linear regression model
appears to be appropriate).

figure()
plot(e,y,'o')
title("Y vs Residuales")
xlabel("e")
ylabel("Y")

2
figure()
plot(e,x,'o')
title("X vs Residuales")
xlabel("e")
ylabel("X")

3
2. An article in Technometrics by S. C. Narula and J. F. Wellington [“Prediction, Linear Regression, and a
Minimum Sum of Relative Errors” (1977, Vol. 19)] presents data on the selling price and annual taxes for 24
houses. The data are in the Table E11-2. Refer to the data in table on house-selling price y and taxes paid x.

x2=[5.0500,8.2464,6.6969,7.7841,9.0384,5.9894,7.5422,8.7951,6.0831,8.3607,8.1400,9.1416];
y2=[30.0,36.9,41.9,40.5,43.9,37.5,37.9,44.5,37.9,38.9,36.9,45.8]

y2 = 1×12
30.000000000000000 36.899999999999999 41.899999999999999 40.500000000000000

[b12,b02,s2]=regresion_lineal(x2,y2,1);

(a) Find the residuals for the least squares model.

e2=[];
n2=length(y2);
for j=1:1:n2
e2(j)=(y(j)-(b02+(b12*x2(j)))) ;

end
disp("Residuales")

Residuales

disp(e2)

4
Columns 1 through 8

72.150984623662168 56.404034322753120 59.611001376707122 56.310023912325846 52.109282527508476 58.20359846652

Columns 9 through 12

54.379434018021726 48.130587188666681 46.958581786154667 44.362390626915989

(b) Prepare a normal probability plot of the residuals and interpret this display.

figure()
normplot(e2)

(c) Plot the residuals versus ybi and versus xi . Does the normality assumption seem to be satisfied?

figure()
plot(e2,y2,'o')
title("Y vs Residuales")
xlabel("e")
ylabel("Y")

5
figure()
plot(e2,x2,'o')
title("X vs Residuales")
xlabel("e")
ylabel("X")

6
(d) What proportion of total variability is explained by the regression model?

yy2=0;
yy22=0;
n=length(y2);
for i=1:1:n
yy2=((y2(i)-mean(y2))*(x2(i)-mean(x2)))+yy2;
yy22=(y2(i)-mean(y2))*(y2(i)-mean(y2))+yy22;

end
R22=(b12*yy2)/yy22

R22 =
0.544645445033293

3. The number of pounds of steam used per month by a chemical plant is thought to be related to the average
ambient temperature (in ◦F) for that month. The past year’s usage and temperatures are in the following table:

x3=[21,24,32,47,50,59,68,74,62,50,41,30];
y3=[185.79,214.47,288.03,424.84,454.58,539.03,621.55,675.06,562.03,452.93,369.95,273.98];
[b13,b03,s3]=regresion_lineal(x3,y3,1);

(a) What proportion of total variability is accounted for by the simple linear regression model?

yy3=0;
yy32=0;
n=length(y3);
for i3=1:1:n

7
yy3=((y3(i3)-mean(y3))*(x3(i3)-mean(x3)))+yy3;
yy32=(y3(i)-mean(y3))*(y3(i3)-mean(y3))+yy32;

end
R23=(b13*yy3)/yy32

R23 =
-5.932776967566785e+15

(b) Prepare a normal probability plot of the residuals and interpret this graph.

e3=[];
n3=length(y3);
for j3=1:1:n3
e3(j3)=(y(j3)-(b03+(b13*x3(j3)))) ;

end
figure()
normplot(e3)

(c) Plot residuals versus ybi and xi .

figure()
plot(e3,y3,'o')
title("Y vs Residuales")
xlabel("e")
ylabel("Y")

8
figure()
plot(e3,x3,'o')
title("X vs Residuales")
xlabel("e")
ylabel("X")

9
4. Suppose that data are obtained from 20 pairs of (x, y) and the sample correlation coeffi cient is 0.8.

(a) Test the hypothesis that H0 : ρ = 0 against H1 : ρ with α = 0.05. Calculate the P-value.

T0=(0.8*sqrt(18))/(sqrt(1-(0.8*0.8)))

T0 =
5.656854249492381

Pvalue=2*(1-tcdf(abs(T0),18))

Pvalue =
2.292887199439875e-05

como α>p_value entonces se rechaza H0

(b) Test the hypothesis that H1 : ρ = 0.5 against H1 : ρ ≠ 0.5 with α = 0.05. Calculate the P-value.

T02=(0.5*sqrt(18))/(sqrt(1-0.25))

T02 =
2.449489742783178

Pvalue2=2*(1-tcdf(abs(T02),18))

Pvalue2 =
0.024769558804110

10
(c) Construct a 95% two-sided confidence interval for the correlation coefficient.

z=norminv(0.025)

z =
-1.959963984540054

inf=tanh(atanh(0.8)-(z/sqrt(17)))

inf =
0.917655484096945

sup=tanh(atanh(0.8)+(z/sqrt(17)))

sup =
0.553387644453858

11