Introduction To R

Basic Introduction to R
By
Dr. Manika Gupta
Assistant Professor
Department of Geology
Delhi University
24-Mar-17 10:00 AM
Benefits of R
• Open source language

• Continuous Packages being developed and available
Installation
Install R:
https://cran.r-project.org/bin/windows/base/
Install R studio:
https://www.rstudio.com/products/rstudio/download3/
Resources
• http://cran.us.r-project.org/doc/manuals/R-intro.pdf
• https://onlinecourses.science.psu.edu/statprogram/sites/onlinecourses.science
.psu.edu.statprogram/files/lesson00/Short-refcard.pdf
• https://www.datacamp.com/courses/free-introduction-to-r
• http://www.cyclismo.org/tutorial/R/
• http://rseek.org/
Help in R
 help.start() # general help
 help(“mean”) # help about function foo
 ?mean # same thing
 apropos(“mean") # list all functions containing

string mean
 example(mean) # show an example of function foo

Basics!!!
Variables:
integers
characters
logical
Assigning values to variables:

x <- 5
x
y <- 3
z <- x+y
z
class(X)
Error: object 'X' not found
R is case sensitive
class(x)
[1] "numeric"
Operators in R
Binary operators work on vectors and matrices as well as scalars.
Arithmetic Operators include:
Logical Operators include:

Data Formats
Vector
A vector is a sequence of data elements of the same basic type.
Suppose we have vector containing three numeric values 2, 3 and 5 -
> aa<- c(2, 3, 5)
[1] 2 3 5
And a vector of logical values -
> bb<- c(TRUE, FALSE, TRUE, FALSE, FALSE)
[1] TRUE FALSE TRUE FALSE FALSE
A vector can contain character strings -

> cc <- c("aa", "bb", "cc", "dd", "ee")
[1] "aa" "bb" "cc" "dd" "ee"
> length(aa)
[1] 3
Arithmetic operations of vectors are performed member-by-member
> a = c(1, 3, 5, 7)
> b = c(1, 2, 4, 8)
>5*a
[1] 5 15 25 35
>a+b
[1] 2 5 9 15
>a-b
[1] 0 1 1 -1
>a*b
[1] 1 6 20 56
>a/b
[1] 1.000 1.500 1.250 0.875
a <- c(1,2,3,4)
> sqrt(a)
[1] 1.000000 1.414214 1.732051 2.000000
> exp(a)
[1] 2.718282 7.389056 20.085537 54.598150
> log(a)
[1] 0.0000000 0.6931472 1.0986123 1.3862944
> exp(log(a))
[1] 1 2 3 4
> cc <- (a + sqrt(a))/(exp(2)+1)

> cc
[1] 0.2384058 0.4069842 0.5640743 0.7152175
> a <- c(1,-2,3,-4)
> b <- c(-1,2,-3,4)
> min(a,b)
[1] -4
> pmin(a,b)
[1] -1 -2 -3 -4
>max(a)
>sum(a)
Recycling Rule
If two vectors are of unequal length, the shorter one will be recycled
in order to match the longer vector.
> u = c(10, 20, 30)
> v = c(1, 2, 3, 4, 5, 6, 7, 8, 9)
>u+v
[1] 11 22 33 14 25 36 17 28 39
Vector Index
We retrieve values in a vector by declaring an index inside a single square bracket "[]"
operator.
> s <- c("aa", "bb", "cc", "dd", "ee")

> s[3]
[1] "cc"
> s[c(2, 3)]
[1] "bb" "cc"
Negative Index
If the index is negative, it would strip the member whose position has the same absolute
value as the negative index.
> s[-3]
[1] "aa" "bb" "dd" "ee"
Out-of-Range Index
If an index is out-of-range, a missing value will be reported via the symbol NA.
> s[10]
[1] NA
>length(s)
Duplicate Indexes
The index vector allows duplicate values.
> s[c(2, 3, 3)]
[1] "bb" "cc" "cc"
Out-of-Order Indexes
The index vector can even be out-of-order.
> s[c(2, 1, 3)]
[1] "bb" "aa" "cc"
Range Index
To produce a vector slice between two indexes, we can use the colon operator ":".
> s[2:4]
[1] "bb" "cc" "dd"
Matrix
A matrix is a collection of data elements arranged in a two-dimensional rectangular
layout. The following is an example of a matrix with 2 rows and 3 columns.
> A = matrix(
+ c(2, 4, 3, 1, 5, 7), # the data elements
+ nrow=2, # number of rows
+ ncol=3, # number of columns
+ byrow = TRUE) # fill matrix by rows
>A # print the matrix

[,1] [,2] [,3]
[1,] 2 4 3
[2,] 1 5 7
An element at the mth row, nth column of A can be accessed by the expression
A[m, n]
> A[2, 3] # element at 2nd row, 3rd column
[1] 7
The entire mth row A can be extracted as A[m, ].
> A[2, ] # the 2nd row
[1] 1 5 7
Similarly, the entire nth column A can be extracted as A[ ,n].
> A[ ,3] # the 3rd column
[1] 3 7
We can also extract more than one rows or columns at a time.
> A[ ,c(1,3)] # the 1st and 3rd columns
[,1] [,2]
[1,] 2 3
[2,] 1 7
If we assign names to the rows and columns of the matrix, than we can access the
elements by names.
> dimnames(A) = list( c("row1", "row2"), # row names
c("col1", "col2", "col3")) # column names
>A # print A
col1 col2 col3
row1 2 4 3
row2 1 5 7
> A["row2", "col3"] # element at 2nd row, 3rd column
[1] 7
Transpose
We construct the transpose of a matrix by interchanging its columns and rows
with the function t .
> t(A) # transpose of A
Combining Matrices
The columns of two matrices having the same number of rows can be combined into a
larger matrix. For example, suppose we have another matrix C also with 3 rows.
> C = matrix( c(7, 4, 2), nrow=3, ncol=1)
Then we can combine the columns of B and C with cbind.
> cbind(B, C)
Similarly, we can combine the rows of two matrices if they have the same number of
columns with the rbind function.
> D = matrix( c(6, 2), nrow=1, ncol=2)
> rbind(B, D)
List
A list is a generic vector containing other objects.
For example, the following variable x is a list containing copies of three vectors n, s, b, and a
numeric value 3.
> n = c(2, 3, 5)
> s = c("aa", "bb", "cc", "dd", "ee")
> b = c(TRUE, FALSE, TRUE, FALSE, FALSE)
> x = list(n, s, b, 3) # x contains copies of n, s, b
List Slicing
We retrieve a list slice with the single square bracket "[]" operator. The following is a slice
containing the second member of x, which is a copy of s.
> x[2]
With an index vector, we can retrieve a slice with multiple members. Here a slice containing
the second and fourth members of x.
> x[c(2, 4)]
Member Reference
In order to reference a list member directly, we have to use the double square bracket "[[]]"
operator.
> x[[2]]
[1] "aa" "bb" "cc" "dd" "ee"
We can modify its content directly.
> x[[2]][1] = "ta"
> x[[2]]
[1] "ta" "bb" "cc" "dd" "ee"
>s
[1] "aa" "bb" "cc" "dd" "ee" # s is unaffected
A data frame is used for storing data tables. It is a list of vectors of equal length. For
example, the following variable df is a data frame containing three vectors n, s, b.
> n = c(2, 3, 5)
> s = c("aa", "bb", "cc")
> b = c(TRUE, FALSE, TRUE)
> df = data.frame(n, s, b) # df is a data frame
Build-in Data Frame
For example, here is a built-in data frame in R, called mtcars.
> mtcars
Here is the cell value from the first row, second column of mtcars.
> mtcars[1, 2]
[1] 6
Moreover, we can use the row and column names instead of the numeric coordinates.
> mtcars["Mazda RX4", "cyl"]
[1] 6
Lastly, the number of data rows in the data frame is given by the nrow function.
> nrow(mtcars) # number of data rows
[1] 32
And the number of columns of a data frame is given by the ncol function.
> ncol(mtcars) # number of columns
[1] 11
> head(mtcars)
> names(mtcars)
> dimnames(mtcars)
Retrieve column vectors
> mtcars[[9]]
[1] 1 1 1 0 0 0 0 0 0 0 0 ...
> mtcars[["am"]]
[1] 1 1 1 0 0 0 0 0 0 0 0 ...
> mtcars$am
[1] 1 1 1 0 0 0 0 0 0 0 0 ...
> mtcars[,"am"]
[1] 1 1 1 0 0 0 0 0 0 0 0 ...
Read and write data files
Working Directory
The data files need to be located in the R working directory, which can
be found with the function getwd.
> getwd() # get current working directory
When we need to set a directory other than the current one -

> setwd("<new path>") # set working directory
Note that the forward slash should be used as the path separator even
on Windows platform.
> setwd("C:/Users/Geology/Desktop/R_workshop")
Data Import
It is often necessary to import data into R
Excel File
We can use the function read.xlsx from the xlsx package. It reads from an Excel spreadsheet
and returns a data frame.
> library(xlsx) or require(xlsx) # load xlsx package
> mydata <- read.xlsx("my_data.xlsx”, 1) # read from first sheet
Alternatively, we can use the function load Workbook from the XLConnect package to read
the entire workbook, and then load the worksheets with readWorksheet.
The XLConnect package requires Java to be pre-installed.
> library(XLConnect) # load XLConnect package
> wk <- loadWorkbook("my_data.xlsx")
> df <- readWorksheet(wk, sheet="Sheet1")

Text File
Load txt file into the workspace with the function read.table.
> mydata <- read.table("my_data.txt") # read text file

> mydata # print data frame
V1 V2 V3
1 100 a1 b1
2 200 a2 b2
3 300 a3 b3
4 400 a4 b4
For further detail of the function read.table, please consult the R documentation.
> help(read.table)
CSV File
Sample data in .csv file -

100,a1,b1
200,a2,b2
300,a3,b3
After we copy and paste the data above in a file named "my_data.csv", we can read the data
with the function read.csv
> mydata <- read.csv("my_data.csv") # read csv file

> mydata
> help(read.csv)
>tree <- read.csv(file="trees91.csv", header=TRUE, sep=",")
>attributes(tree)
>names(tree)
>tree$C
> summary(tree$C)
> tree$C <- factor(tree$C)
> tree$C
> summary(tree$C)
> levels(tree$C)
Minitab File
If the data file is in Minitab Portable Worksheet format, it can be opened with the function
read.mtp from the foreign package.
It returns a list of components in the Minitab worksheet.
> library(foreign) # load the foreign package
> help(read.mtp) # documentation
> mydata <- read.mtp("my_data.mtp") # read from .mtp file
SPSS File
For the data files in SPSS format, it can be opened with the function read.spss also from the
foreign package. There is a "to.data.frame" option for choosing whether a data frame is to
be returned. By default, it returns a list of components instead.
> library(foreign) # load the foreign package
> help(read.spss) # documentation
> mydata <- read.spss(“international.sav”, to.data.frame=TRUE)

write.table(mydata, "mydata.txt", sep="\t")
write.csv(mydata, “mydata.csv”)
write.xlsx(mydata, "mydata.xlsx")
write.foreign(mydata, "mydata.txt", "mydata.sps", package="SPSS")
write.foreign(mydata, "mydata.txt", "mydata.sas", package="SAS")
write.dta(mydata, "mydata.dta")
Missing Values
is.na(x) # returns TRUE of x is missing
y <- c(1,2,3,NA)
is.na(y) # returns a vector (F F F T)
View(mydata)
> mydata$m_illit[mydata$m_illit==3.0]<-NA
> head(mydata)
x <- c(1,2,NA,3)
mean(x) # returns NA
mean(x, na.rm=TRUE) # returns 2
# list rows of data that have missing values

mydata[!complete.cases(mydata),]
# create new dataset without missing data

newdata <- na.omit(mydata)
Basic plots
attach(mtcars)
plot(wt, mpg)
abline(lm(mpg ~ wt))
title("Regression of MPG on Weight")
jpeg("myplot.jpg")
plot(wt, mpg)
dev.off()
Error Bars
x <- rnorm(10,sd=5,mean=20)
y <- 2.5*x - 1.0 + rnorm(10,sd=9,mean=0)
plot(x,y,xlab="Independent",ylab="Dependent",main="Random
Stuff")
xHigh <- x
yHigh <- y + abs(rnorm(10,sd=3.5))
xLow <- x
yLow <- y - abs(rnorm(10,sd=3.1))
arrows(xHigh,yHigh,xLow,yLow,col=2,angle=90,length=0.1,code=3)
# Simple Histogram
hist(mtcars$mpg)
# Colored Histogram with Different Number of Bins
hist(mtcars$mpg, breaks=12, col="red")
# Kernel Density Plot

d <- density(mtcars$mpg) # returns the density data
plot(d) # plots the results
# Filled Density Plot

d <- density(mtcars$mpg)
plot(d, main="Kernel Density of Miles Per Gallon")
polygon(d, col="red", border="blue")
# Compare MPG distributions for cars with
# 4,6, or 8 cylinders
library(sm)
attach(mtcars)
# create value labels

cyl.f <- factor(cyl, levels= c(4,6,8),
labels = c("4 cylinder", "6 cylinder", "8 cylinder"))
# plot densities
sm.density.compare(mpg, cyl, xlab="Miles Per Gallon")
title(main="MPG Distribution by Car Cylinders")
# add legend via mouse click

colfill<-c(2:(1+length(levels(cyl.f))))
legend(locator(1), levels(cyl.f), fill=colfill)
library(sm)
attach(mtcars)
# create value labels

cyl.f <- factor(cyl, levels= c(4,6,8),
labels = c("4 cylinder", "6 cylinder", "8 cylinder"))
# plot densities
sm.density.compare(mpg, cyl, xlab="Miles Per Gallon")
title(main="MPG Distribution by Car Cylinders")
# add legend via mouse click

colfill<-c(2:(1+length(levels(cyl.f))))
legend(locator(1), levels(cyl.f), fill=colfill)
# Simple Dotplot
dotchart(mtcars$mpg,labels=row.names(mtcars),cex=.7,
main="Gas Milage for Car Models",
xlab="Miles Per Gallon")
# Dotplot: Grouped Sorted and Colored
# Sort by mpg, group and color by cylinder
x <- mtcars[order(mtcars$mpg),] # sort by mpg
x$cyl <- factor(x$cyl) # it must be a factor

x$color[x$cyl==4] <- "red"
x$color[x$cyl==6] <- "blue"
x$color[x$cyl==8] <- "darkgreen“
dotchart(x$mpg,labels=row.names(x),cex=.7,groups= x$cyl,
main="Gas Milage for Car Models\ngrouped by cylinder",
xlab="Miles Per Gallon", gcolor="black", color=x$color)
# Simple Bar Plot
counts <- table(mtcars$gear)
barplot(counts, main="Car Distribution",
xlab="Number of Gears")
# Simple Horizontal Bar Plot with Added Labels

barplot(counts, main="Car Distribution", horiz=TRUE,
names.arg=c("3 Gears", "4 Gears", "5 Gears"))
# Stacked Bar Plot with Colors and Legend

counts <- table(mtcars$vs, mtcars$gear)
barplot(counts, main="Car Distribution by Gears and VS",
xlab="Number of Gears", col=c("darkblue","red"),
legend = rownames(counts))
# Grouped Bar Plot
counts <- table(mtcars$vs, mtcars$gear)
barplot(counts, main="Car Distribution by Gears and VS",
xlab="Number of Gears", col=c("darkblue","red"),
legend = rownames(counts), beside=TRUE)
# Fitting Labels
par(las=2) # make label text perpendicular to axis
par(mar=c(5,8,4,2)) # increase y-axis margin.

barplot(counts, main="Car Distribution", horiz=TRUE,
names.arg=c("3 Gears", "4 Gears", "5 Gears"), cex.names=0.8)
Line plots
x <- c(1:5); y <- x # create some data

par(pch=22, col="red") # plotting symbol and color
par(mfrow=c(2,4)) # all plots on one page
opts = c("p","l","o","b","c","s","S","h")
for(i in 1:length(opts)){
heading = paste("type=",opts[i])
plot(x, y, type="n", main=heading)
lines(x, y, type=opts[i])
}
# Create Line Chart
# convert factor to numeric for convenience
Orange$Tree <- as.numeric(Orange$Tree)

ntrees <- max(Orange$Tree)
# get the range for the x and y axis

xrange <- range(Orange$age)
yrange <- range(Orange$circumference)
# set up the plot
plot(xrange, yrange, type="n", xlab="Age (days)",
ylab="Circumference (mm)" )
colors <- rainbow(ntrees)
linetype <- c(1:ntrees)
plotchar <- seq(18,18+ntrees,1)
# add lines
for (i in 1:ntrees) {
tree <- subset(Orange, Tree==i)
lines(tree$age, tree$circumference, type="b", lwd=1.5,
lty=linetype[i], col=colors[i], pch=plotchar[i])
}
# add a title and subtitle
title("Tree Growth", "example of line plot")
# add a legend
legend(xrange[1], yrange[2], 1:ntrees, cex=0.8, col=colors,
pch=plotchar, lty=linetype, title="Tree")
# Simple Pie Chart
slices <- c(10, 12,4, 16, 8)
lbls <- c("US", "UK", "Australia", "Germany", "France")
pie(slices, labels = lbls, main="Pie Chart of Countries")
# Pie Chart with Percentages

slices <- c(10, 12, 4, 16, 8)
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct) # add percents to labels
lbls <- paste(lbls,"%",sep="") # ad % to labels
pie(slices,labels = lbls, col=rainbow(length(lbls)),
main="Pie Chart of Countries")
# 3D Exploded Pie Chart
library(plotrix)
slices <- c(10, 12, 4, 16, 8)
pie3D(slices,labels=lbls,explode=0.1,
main="Pie Chart of Countries ")
# Boxplot of MPG by Car Cylinders
boxplot(mpg~cyl,data=mtcars, main="Car Milage Data",
xlab="Number of Cylinders", ylab="Miles Per Gallon")
# Notched Boxplot of Tooth Growth Against 2 Crossed Factors

# boxes colored for ease of interpretation
boxplot(len~supp*dose, data=ToothGrowth, notch=TRUE,

col=(c("gold","darkgreen")),
main="Tooth Growth", xlab="Suppliment and Dose")
# In the notched boxplot, if two boxes' #notches do not overlap this is

‘strong evidence’ their medians differ (Chambers et al., 1983, p. 62)
# Simple Scatterplot
attach(mtcars)
plot(wt, mpg, main="Scatterplot Example",
xlab="Car Weight ", ylab="Miles Per Gallon ", pch=19)
# Add fit lines

abline(lm(mpg~wt), col="red") # regression line (y~x)
lines(lowess(wt,mpg), col="blue") # lowess line (x,y)
# Enhanced Scatterplot of MPG vs. Weight
# by Number of Car Cylinders
library(car)
scatterplot(mpg ~ wt | cyl, data=mtcars,
xlab="Weight of Car", ylab="Miles Per Gallon",
main="Enhanced Scatter Plot",
labels=row.names(mtcars))
# Basic Scatterplot Matrix

pairs(~mpg+disp+drat+wt,data=mtcars,
main="Simple Scatterplot Matrix")
# Scatterplot Matrices from the glus Package
library(gclus)
dta <- mtcars[c(1,3,5,6)] # get data
dta.r <- abs(cor(dta)) # get correlations
dta.col <- dmat.color(dta.r) # get colors
# reorder variables so those with highest correlation
# are closest to the diagonal
dta.o <- order.single(dta.r)
cpairs(dta, dta.o, panel.colors=dta.col, gap=.5,
main="Variables Ordered and Colored by Correlation" )
# 3D Scatterplot
library(scatterplot3d)
attach(mtcars)
scatterplot3d(wt,disp,mpg, main="3D Scatterplot")
# 3D Scatterplot with Coloring and Vertical Drop Lines

library(scatterplot3d)
attach(mtcars)
scatterplot3d(wt,disp,mpg, pch=16, highlight.3d=TRUE,
type="h", main="3D Scatterplot")
Thank You!!!
Additional slides
# Display the Student's t distributions with various
# degrees of freedom and compare to the normal distribution
x <- seq(-4, 4, length=100)
hx <- dnorm(x)
degf <- c(1, 3, 8, 30)

colors <- c("red", "blue", "darkgreen", "gold", "black")
labels <- c("df=1", "df=3", "df=8", "df=30", "normal")
plot(x, hx, type="l", lty=2, xlab="x value",

ylab="Density", main="Comparison of t Distributions")
for (i in 1:4){
lines(x, dt(x,degf[i]), lwd=2, col=colors[i])
}
legend("topright", inset=.05, title="Distributions",

labels, lwd=2, lty=c(1, 1, 1, 1, 2), col=colors)
x_func<- function(a,b)
{
x<-a+b
}
ab<-x_func(5,5)
x
x_func<- function(a,b){
x<-a+b
y<-(2*a)-b
}
x
ab<-x_func(5,5)
x_func<- function(a,b){
x<-a+b
y<-a-b
result<- c(x,y)
return(result)}
x
ab<-x_func(5,5)

Introduction To R

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Introduction To R

Загружено:

Авторское право:

Доступные форматы

Basic Introduction to R

• Open source language

 help.start() # general help

 help(“mean”) # help about function foo

 ?mean # same thing

 apropos(“mean") # list all functions containing

 example(mean) # show an example of function foo

Assigning values to variables:

Arithmetic Operators include:

Logical Operators include:

A vector can contain character strings -

> cc <- (a + sqrt(a))/(exp(2)+1)

> s <- c("aa", "bb", "cc", "dd", "ee")

>A # print the matrix

Then we can combine the columns of B and C with cbind.

> D = matrix( c(6, 2), nrow=1, ncol=2)

Build-in Data Frame

For example, here is a built-in data frame in R, called mtcars.

When we need to set a directory other than the current one -

> library(xlsx) or require(xlsx) # load xlsx package

> mydata <- read.xlsx("my_data.xlsx”, 1) # read from first sheet

> wk <- loadWorkbook("my_data.xlsx")

> df <- readWorksheet(wk, sheet="Sheet1")

> mydata <- read.table("my_data.txt") # read text file

Sample data in .csv file -

> mydata <- read.csv("my_data.csv") # read csv file

> help(read.mtp) # documentation

> mydata <- read.mtp("my_data.mtp") # read from .mtp file

> help(read.spss) # documentation

> mydata <- read.spss(“international.sav”, to.data.frame=TRUE)

write.foreign(mydata, "mydata.txt", "mydata.sps", package="SPSS")

write.foreign(mydata, "mydata.txt", "mydata.sas", package="SAS")

# list rows of data that have missing values

# create new dataset without missing data

# Kernel Density Plot

# Filled Density Plot

# create value labels

# add legend via mouse click

# create value labels

# add legend via mouse click

x <- mtcars[order(mtcars$mpg),] # sort by mpg

x$cyl <- factor(x$cyl) # it must be a factor

# Simple Horizontal Bar Plot with Added Labels

# Stacked Bar Plot with Colors and Legend

counts <- table(mtcars$gear)

x <- c(1:5); y <- x # create some data

Orange$Tree <- as.numeric(Orange$Tree)

# get the range for the x and y axis

# Pie Chart with Percentages

# Notched Boxplot of Tooth Growth Against 2 Crossed Factors

boxplot(len~supp*dose, data=ToothGrowth, notch=TRUE,

# In the notched boxplot, if two boxes' #notches do not overlap this is

# Add fit lines

# Basic Scatterplot Matrix

# 3D Scatterplot with Coloring and Vertical Drop Lines

degf <- c(1, 3, 8, 30)

plot(x, hx, type="l", lty=2, xlab="x value",

legend("topright", inset=.05, title="Distributions",

Вам также может понравиться