Вы находитесь на странице: 1из 6

Introduction to R:

Data Management and Statistical


Analysis CORRELATION ANALYSIS

Regression and Correlation


Analysis
Leilani Nora
Assistant Scientist

DATAFRAME : corr.csv DATA FRAME:GYNPK


• Consider the data for
Read data file corr.csv
grain yield and N,P,K
content of the plant taken > GYNPK <- read.table(“corr.csv",header=T,
from several samples. sep=",")
> GYNPK
GY14 N P K
1 1678 1.0 0.1 0.4
2 4265 1.2 0.1 0.4
3 2431 1.1 0.1 0.4
4 2431 1.0 0.1 0.4
5 4461 1.2 0.1 0.4
. . .
48 5483 1.7 0.2 0.3
CORRELATION ANALYSIS : correlation()
CORRELATION ANALYSIS : correlation()
> library(agricolae)
• correlation() obtains the coefficients of correlation and > corrGY <- correlation(GYNPK)
p-value between all the variables of a data table. The > corrGY
results are similar to SAS. $correlation
GY14 N P K
• Required package is agricolae. GY14 1.00 0.72 0.38 -0.40
N 0.72 1.00 0.02 -0.34
P 0.38 0.02 1.00 -0.35
Usage K -0.40 -0.34 -0.35 1.00
> correlation(x, y=NULL, method=“pearson”,
$pvalue
alternative=“two.sided”,…) GY14 N P K
GY14 1.000000e+00 1.084596e-08 0.007979778 0.005289776
# x and y – table, matrix or vector N 1.084596e-08 1.000000e+00 0.868611288 0.017985414
# method – “pearson”, “kendall”, “spearman” P 7.979778e-03 8.686113e-01 1.000000000 0.016134208
K 5.289776e-03 1.798541e-02 0.016134208 1.000000000
# alternative – “two.sided”, “less”, “greater”
$n.obs
[1] 48

CORRELATION ANALYSIS : cor.matrix() CORRELATION ANALYSIS : cor.matrix()

• Package ‘Deducer’ is an intuitive graphical data • cor.matrix() creates a correlation matrix with a function
analysis for use with JGR. to test the significance of the correlation coefficient, r.

• JGR is a Java Gui for R, a cross platform, universal Usage


and unified Graphical User Interface for R > cor.matrix(variables, data,
test=cor.test, method …)
• This package was released last August 2, 2009 with 33
functions. # variables – an expression denoting a set of variable
• One of the functions in package Deducer is the # data – a data frame
cor.matrix()
# test – a function to test significance of the correlation
coefficient
# method – “pearson”, “kendall”, “spearman”
CORRELATION ANALYSIS: cor.matrix() CORRELATION ANALYSIS : print.cor.matrix()
> library(Deducer)
> corrGY2 <- cor.matrix(GY14:K,data=GYNPK) • print.cor.matrix() print object “cor.matrix” in a nice layout
> corrGY2
Usage
Pearson's product-moment correlation
> print.cor.matrix(x, digits=4, N=TRUE,
GY14 N P K CI=TRUE, stat=TRUE, p.value=TRUE,…)
GY14 cor 1 0.7157 0.3785 -0.3964
N 48 48 48 48 # x - object of class “cor.matrix”
CI* (0.5417,0.8309) (0.1058,0.5983) (-0.6116,-0.1265)
stat** 6.95 (46) 2.774 (46) -2.928 (46)
# digits - Number of digits to round
p-value 0.0000 0.0080 0.0180 # N - logical, prints a row for sample size
-----------
N . . . # CI - logical, prints a row for CI if they exist
P . . .
K . . . # stat - logical, prints a row for test statistics
-----------
HA: two.sided
# p.value - logical, prints a row for p-values

CORRELATION ANALYSIS: cor.matrix()


> print.cor.matrix(corrGY2, digits=4,
N=FALSE, CI=FALSE, stat=FALSE)
Pearson's product-moment correlation

GY14 N P K
GY14 cor
p-value
1 0.7157 0.3785 -0.3964
0.0000 0.0080 0.0053
REGRESSION ANALYSIS
-----------
N cor 0.7157 1 0.02452 -0.3402
p-value 0.0000 0.8686 0.0180
-----------
P cor 0.3785 0.02452 1 -0.3456
p-value 0.0080 0.8686 0.0161
-----------
K cor -0.3964 -0.3402 -0.3456 1
p-value 0.0053 0.0180 0.0161
----------- HA: two.sided
DATAFRAME : SRATE.csv DATA FRAME:SRATE
• Consider grain yield data for six levels of rates of
seedlings. Read data file corr.csv
> SRATE <- read.table(“SRATE.csv",
header=T, sep=",")
> SRATE
Seedrate GYield
1 25 5.30425
2 50 5.12400
3 75 5.07025
4 100 4.84775
5 125 4.70800
6 150 4.70325

REGRESSION ANALYSIS : lm() REGRESSION ANALYSIS : lm()


• lm() which stands for Linear Model, fits linear models > ModelGY <- lm(SRATE$GYield~SRATE$Seedrate)
which can be used to carry out regression, single stratum > ModelGY
ANOVA, ANACOVA and multiple linear regression Call:

Usage lm(formula = SRATE$GYield ~ SRATE$Seedrate)


> lm(formula, data, na.action, model=TRUE,…)
Coefficients:
# formula – a model formula. A typical model has the (Intercept) SRATE$Seedrate
5.324283 -0.004168
form “response ~ terms”
# data – dataframe
# na.action – when the data contains NAs the default • The result of lm is model object.
is “na.omit” and “na.exclude” can also be useful
# model – logical, if TRUE the corresponding
components of the fit are returned.
REGRESSION ANALYSIS : summary()
SCATTERPLOT : plot() and abline()
• The function summary is used to obtain and print a
summary and ANOVA table of the results. > plot(SRATE$Seedrate, SRATE$GYield,
> summary(ModelGY) main="ScatterPlot of Mean Yield",
xlab=“Seedrate", ylab=“Mean
Residuals:
1 2 3 4 5 6 Yield", col="Red")
0.292567 -0.096083 -0.045633 -0.059733 -0.095283 0.004167
> abline(ModelGY, col="blue", lty=3)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.324283 0.154081 34.555 4.18e-06 *** • abline(lm.object) displays a fitted line which draw
SRATE$Seedrate -0.004168 0.001583 -2.634 0.058 . lines of the intercept(a) and slope(b) from the lm
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 object.
Residual standard error: 0.1655 on 4 degrees of freedom
• lm.object – regression object where the first two
Multiple R-squared: 0.6342, Adjusted R-squared: 0.5428 values are taken to be the intercept and slope.
F-statistic: 6.936 on 1 and 4 DF, p-value: 0.05796

SCATTERPLOT : mtext() SCATTERPLOT : title() and mtext()


• mtext(text, side=3…) displays text on top of the plot
> plot(…) # same as previous slide
# text – a character expression specifying the text to be
written > abline(…) # same as previous slide
# side – on which side of the plot you want to display a > mtext(“GYield=(5.324-0.0042Seedrate)
text with r=-0.9773", side=3, cex=0.7)
1 – bottom 2 – left
3 – top 4 – right

> mtext(“GYield=(5.324-0.0042Seedrate) with


r=-0.7964", side=3, cex=0.7)
SCATTERPLOT RESIDUAL PLOT
ScatterPlot of Mean Yield
GYield=(5.324-0.0042Seedrate) with r=-0.7964 > plot(ModelGY$fitted.values,

5.3
ModelGY$residual, main=
“Residual Plot”, xlab="Fitted",
5.2
ylab="Residuals", col="red")
> abline(h=0, col="blue", lty=3)
5.1
Mean of yield

# draws a horizontal line at Y=0 with colored blue


5.0

dotted line
4.9
4.8
4.7

20 40 60 80 100 120 140

Seedrate

RESIDUAL PLOT

THANK YOU! ☺
Please do Exercise E

Вам также может понравиться