Академический Документы
Профессиональный Документы
Культура Документы
By
Dr. Manika Gupta
Assistant Professor
Department of Geology
Delhi University
24-Mar-17 10:00 AM
Benefits of R
Installation
Install R:
https://cran.r-project.org/bin/windows/base/
Install R studio:
https://www.rstudio.com/products/rstudio/download3/
Resources
• http://cran.us.r-project.org/doc/manuals/R-intro.pdf
• https://onlinecourses.science.psu.edu/statprogram/sites/onlinecourses.science
.psu.edu.statprogram/files/lesson00/Short-refcard.pdf
• https://www.datacamp.com/courses/free-introduction-to-r
• http://www.cyclismo.org/tutorial/R/
• http://rseek.org/
Help in R
class(X)
Error: object 'X' not found
R is case sensitive
class(x)
[1] "numeric"
Operators in R
Binary operators work on vectors and matrices as well as scalars.
> length(aa)
[1] 3
Arithmetic operations of vectors are performed member-by-member
> a = c(1, 3, 5, 7)
> b = c(1, 2, 4, 8)
>5*a
[1] 5 15 25 35
>a+b
[1] 2 5 9 15
>a-b
[1] 0 1 1 -1
>a*b
[1] 1 6 20 56
>a/b
[1] 1.000 1.500 1.250 0.875
a <- c(1,2,3,4)
> sqrt(a)
[1] 1.000000 1.414214 1.732051 2.000000
> exp(a)
[1] 2.718282 7.389056 20.085537 54.598150
> log(a)
[1] 0.0000000 0.6931472 1.0986123 1.3862944
> exp(log(a))
[1] 1 2 3 4
Recycling Rule
If two vectors are of unequal length, the shorter one will be recycled
in order to match the longer vector.
> u = c(10, 20, 30)
> v = c(1, 2, 3, 4, 5, 6, 7, 8, 9)
>u+v
[1] 11 22 33 14 25 36 17 28 39
Vector Index
We retrieve values in a vector by declaring an index inside a single square bracket "[]"
operator.
Negative Index
If the index is negative, it would strip the member whose position has the same absolute
value as the negative index.
> s[-3]
[1] "aa" "bb" "dd" "ee"
Out-of-Range Index
If an index is out-of-range, a missing value will be reported via the symbol NA.
> s[10]
[1] NA
>length(s)
Duplicate Indexes
The index vector allows duplicate values.
> s[c(2, 3, 3)]
[1] "bb" "cc" "cc"
Out-of-Order Indexes
The index vector can even be out-of-order.
> s[c(2, 1, 3)]
[1] "bb" "aa" "cc"
Range Index
To produce a vector slice between two indexes, we can use the colon operator ":".
> s[2:4]
[1] "bb" "cc" "dd"
Matrix
A matrix is a collection of data elements arranged in a two-dimensional rectangular
layout. The following is an example of a matrix with 2 rows and 3 columns.
> A = matrix(
+ c(2, 4, 3, 1, 5, 7), # the data elements
+ nrow=2, # number of rows
+ ncol=3, # number of columns
+ byrow = TRUE) # fill matrix by rows
Transpose
We construct the transpose of a matrix by interchanging its columns and rows
with the function t .
> t(A) # transpose of A
Combining Matrices
The columns of two matrices having the same number of rows can be combined into a
larger matrix. For example, suppose we have another matrix C also with 3 rows.
> C = matrix( c(7, 4, 2), nrow=3, ncol=1)
> cbind(B, C)
Similarly, we can combine the rows of two matrices if they have the same number of
columns with the rbind function.
> rbind(B, D)
List
A list is a generic vector containing other objects.
For example, the following variable x is a list containing copies of three vectors n, s, b, and a
numeric value 3.
> n = c(2, 3, 5)
> s = c("aa", "bb", "cc", "dd", "ee")
> b = c(TRUE, FALSE, TRUE, FALSE, FALSE)
> x = list(n, s, b, 3) # x contains copies of n, s, b
List Slicing
We retrieve a list slice with the single square bracket "[]" operator. The following is a slice
containing the second member of x, which is a copy of s.
> x[2]
With an index vector, we can retrieve a slice with multiple members. Here a slice containing
the second and fourth members of x.
> x[c(2, 4)]
Member Reference
In order to reference a list member directly, we have to use the double square bracket "[[]]"
operator.
> x[[2]]
[1] "aa" "bb" "cc" "dd" "ee"
We can modify its content directly.
> x[[2]][1] = "ta"
> x[[2]]
[1] "ta" "bb" "cc" "dd" "ee"
>s
[1] "aa" "bb" "cc" "dd" "ee" # s is unaffected
A data frame is used for storing data tables. It is a list of vectors of equal length. For
example, the following variable df is a data frame containing three vectors n, s, b.
> n = c(2, 3, 5)
> s = c("aa", "bb", "cc")
> b = c(TRUE, FALSE, TRUE)
> df = data.frame(n, s, b) # df is a data frame
> mtcars
Here is the cell value from the first row, second column of mtcars.
> mtcars[1, 2]
[1] 6
Moreover, we can use the row and column names instead of the numeric coordinates.
> mtcars["Mazda RX4", "cyl"]
[1] 6
Lastly, the number of data rows in the data frame is given by the nrow function.
> nrow(mtcars) # number of data rows
[1] 32
And the number of columns of a data frame is given by the ncol function.
> ncol(mtcars) # number of columns
[1] 11
> head(mtcars)
> names(mtcars)
> dimnames(mtcars)
Retrieve column vectors
> mtcars[[9]]
[1] 1 1 1 0 0 0 0 0 0 0 0 ...
> mtcars[["am"]]
[1] 1 1 1 0 0 0 0 0 0 0 0 ...
> mtcars$am
[1] 1 1 1 0 0 0 0 0 0 0 0 ...
> mtcars[,"am"]
[1] 1 1 1 0 0 0 0 0 0 0 0 ...
Read and write data files
Working Directory
The data files need to be located in the R working directory, which can
be found with the function getwd.
> getwd() # get current working directory
Note that the forward slash should be used as the path separator even
on Windows platform.
> setwd("C:/Users/Geology/Desktop/R_workshop")
Data Import
It is often necessary to import data into R
Excel File
We can use the function read.xlsx from the xlsx package. It reads from an Excel spreadsheet
and returns a data frame.
Alternatively, we can use the function load Workbook from the XLConnect package to read
the entire workbook, and then load the worksheets with readWorksheet.
The XLConnect package requires Java to be pre-installed.
> library(XLConnect) # load XLConnect package
Load txt file into the workspace with the function read.table.
V1 V2 V3
1 100 a1 b1
2 200 a2 b2
3 300 a3 b3
4 400 a4 b4
For further detail of the function read.table, please consult the R documentation.
> help(read.table)
CSV File
After we copy and paste the data above in a file named "my_data.csv", we can read the data
with the function read.csv
> help(read.csv)
>tree <- read.csv(file="trees91.csv", header=TRUE, sep=",")
>attributes(tree)
>names(tree)
>tree$C
> summary(tree$C)
> tree$C <- factor(tree$C)
> tree$C
> summary(tree$C)
> levels(tree$C)
Minitab File
If the data file is in Minitab Portable Worksheet format, it can be opened with the function
read.mtp from the foreign package.
It returns a list of components in the Minitab worksheet.
> library(foreign) # load the foreign package
SPSS File
For the data files in SPSS format, it can be opened with the function read.spss also from the
foreign package. There is a "to.data.frame" option for choosing whether a data frame is to
be returned. By default, it returns a list of components instead.
> library(foreign) # load the foreign package
write.csv(mydata, “mydata.csv”)
write.xlsx(mydata, "mydata.xlsx")
write.dta(mydata, "mydata.dta")
Missing Values
is.na(x) # returns TRUE of x is missing
y <- c(1,2,3,NA)
is.na(y) # returns a vector (F F F T)
View(mydata)
> mydata$m_illit[mydata$m_illit==3.0]<-NA
> head(mydata)
x <- c(1,2,NA,3)
mean(x) # returns NA
mean(x, na.rm=TRUE) # returns 2
jpeg("myplot.jpg")
plot(wt, mpg)
dev.off()
Error Bars
x <- rnorm(10,sd=5,mean=20)
y <- 2.5*x - 1.0 + rnorm(10,sd=9,mean=0)
plot(x,y,xlab="Independent",ylab="Dependent",main="Random
Stuff")
xHigh <- x
yHigh <- y + abs(rnorm(10,sd=3.5))
xLow <- x
yLow <- y - abs(rnorm(10,sd=3.1))
arrows(xHigh,yHigh,xLow,yLow,col=2,angle=90,length=0.1,code=3)
# Simple Histogram
hist(mtcars$mpg)
# Colored Histogram with Different Number of Bins
hist(mtcars$mpg, breaks=12, col="red")
# plot densities
sm.density.compare(mpg, cyl, xlab="Miles Per Gallon")
title(main="MPG Distribution by Car Cylinders")
# plot densities
sm.density.compare(mpg, cyl, xlab="Miles Per Gallon")
title(main="MPG Distribution by Car Cylinders")
# Simple Dotplot
dotchart(mtcars$mpg,labels=row.names(mtcars),cex=.7,
main="Gas Milage for Car Models",
xlab="Miles Per Gallon")
# Dotplot: Grouped Sorted and Colored
# Sort by mpg, group and color by cylinder
dotchart(x$mpg,labels=row.names(x),cex=.7,groups= x$cyl,
main="Gas Milage for Car Models\ngrouped by cylinder",
xlab="Miles Per Gallon", gcolor="black", color=x$color)
# Simple Bar Plot
counts <- table(mtcars$gear)
barplot(counts, main="Car Distribution",
xlab="Number of Gears")
# Fitting Labels
par(las=2) # make label text perpendicular to axis
par(mar=c(5,8,4,2)) # increase y-axis margin.
# add a legend
legend(xrange[1], yrange[2], 1:ntrees, cex=0.8, col=colors,
pch=plotchar, lty=linetype, title="Tree")
# Simple Pie Chart
slices <- c(10, 12,4, 16, 8)
lbls <- c("US", "UK", "Australia", "Germany", "France")
pie(slices, labels = lbls, main="Pie Chart of Countries")
for (i in 1:4){
lines(x, dt(x,degf[i]), lwd=2, col=colors[i])
}
x_func<- function(a,b){
x<-a+b
y<-(2*a)-b
}
x
ab<-x_func(5,5)
x_func<- function(a,b){
x<-a+b
y<-a-b
result<- c(x,y)
return(result)}
x
ab<-x_func(5,5)