Академический Документы
Профессиональный Документы
Культура Документы
David Chiu
2/20/15
Background of R
2/20/15
What is R?
GNU Project Developed by John Chambers @ Bell Lab
Free software environment for statistical computing and graphics
Functional programming language written primarily in C, Fortran
2/20/15
R Language
R is functional programming language
R is an interpreted language
R is object oriented-language
Why Using R
Statistic analysis on the fly
Mathematical function and graphic module embedded
FREE! & Open Source!
http://cran.r-project.org/src/base/
Kaggle
R is the most widely language used by
kaggle participants
http://www.kaggle.com/
2/20/15
Revolotion R
Free for Community Version
http://www.revolutionanalytics.com/downloads/
Base R 2.14.2
64
Revolution R
(1-core)
Revolution R
(4-core)
Speedup (4 core)
17.4 sec
2.9 sec
2.0 sec
7.9x
2.0 sec
1.2 sec
7.8x
2.7 sec
2.7 sec
Not Appreciable
Matrix
Calculation
http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php
2/20/15
IDE
R Studio
RGUI
http://www.rstudio.com/
2/20/15
http://www.r-project.org/
10
http://www.rstudio.com/shiny/
2/20/15
11
Package Management
CRAN (Comprehensive R Archive Network)
Repository
CRAN
Bioconductor
R-Forge
2/20/15
URL
http://cran.r-project.org/web/packages/
http://www.bioconductor.org/packages/release/Software.html
http://r-forge.r-project.org/
12
R Basic
2/20/15
13
Basic Command
help()
help(demo)
demo()
demo(is.things)
q()
ls()
rm()
rm(x)
2/20/15
14
Basic Object
Vector
List
Factor
Array
Matrix
Data Frame
2/20/15
15
Vectors
2/20/15
16
Subscripting
2/20/15
x = c(1,2,3,4,5,6,7,8,9,10)
x[1:3]; x[c(1,3,5)];
x[c(1,3,5)] * 2 + x[c(2,2,2)]
x[-(1:6)]
17
Lists
Contain a heterogeneous selection of objects
Factor
Ordered collection of items to present categorical value
Different values that the factor can take are called levels
Factors
phone = factor(c('iphone', 'htc', 'iphone', 'samsung', 'iphone',
'samsung'))
levels(phone)
2/20/15
19
Matrices
Data Frame
Useful way to represent tabular data
essentially a matrix with named columns may also
include non-numerical variables
Example
df = data.frame(a=c(1,2,3,4,5),b=c(2,3,4,5,6));df
Function
Function
`%myop%` <- function(a, b) {2*a + 2*b}; 1 %myop% 1
f <- function(x) {return(x^2 + 3)}
create.vector.of.ones <- function(n) {
return.vector <- NA;
for (i in 1:n) {
return.vector[i] <- 1;
} return.vector;
}
create.vector.of.ones(3)
Control Structures
If else
Repeat, for, while
Anonymous Function
Functional language Characteristic
apply.to.three <- function(f) {f(3)}
apply.to.three(function(x) {x * 7})
S3 & S4 Object
Many R functions were implemented using S3 methods
In S version 4 (hence S4), formal classes and methods
were introduced that allowed
Multiple arguments
Abstract types
inheritance.
OOP of S4
S4 OOP Example
setClass("Student", representation(name = "character",
score="numeric"))
studenta = new ("Student", name="david", score=80 )
studentb = new ("Student", name="andy", score=90 )
setMethod("show", signature("Student"),
function(object) {
cat(object@score+100)
})
setGeneric("getscore", function(object)
standardGeneric("getscore"))
Studenta
Packages
A package is a related set of functions, help files, and
data files that have been bundled together.
Basic Command
library(rpart)
CRAN
Install
(.packages())
2/20/15
28
Apply
Apply
Returns a vector or array or list of values obtained by applying a
function to margins of an array or matrix.
2/20/15
29
Apply
lapply
returns a list of the same length as X, each element of which is
the result of applying FUN to the corresponding element of X.
sapply
is a user-friendly version and wrapper of lapply by default
returning a vector, matrix or
vapply
is similar to sapply, but has a pre-specified type of return value,
so it can be safer (and sometimes faster) to use.
2/20/15
30
File IO
Save and Load
x = USPersonalExpenditure
save(x, file="~/test.RData")
rm(x)
load("~/test.RData")
x
Plotting Example
xrange = range(as.numeric(colnames(USPersonalExpenditure)));
yrange= range(USPersonalExpenditure);
plot(xrange, yrange, type="n", xlab="Year",ylab="Category" )
for(i in 1:5) {
lines(as.numeric(colnames(USPersonalExpenditure)),USPersonalExpenditure[i,],
type="b", lwd=1.5)
}
IRIS Dataset
data()
IRIS Dataset
The Iris flower data set or Fisher's Iris data set is a
multivariate data set introduced by Sir Ronald Fisher
(1936) as an example ofdiscriminant analysis.[1] It is
sometimes called Anderson's Iris data set
http://en.wikipedia.org/wiki/Iris_flower_data_set
Iris setosa
2/20/15
Iris versicolor
35
Iris virginica
Classification of IRIS
Classification Example
install.packages("e1071")
pairs(iris[1:4],main="Iris Data
(red=setosa,green=versicolor,blue=virginica)", pch=21,
bg=c("red","green3","blue")[unclass(iris$Species)])
classifier<-naiveBayes(iris[,1:4], iris[,5])
table(predict(classifier, iris[,-5]), iris[,5])
classifier<-svm(iris[,1:4], iris[,5]) > table(predict(classifier, iris[,5]), iris[,5] + )
prediction = predict(classifier, iris[,1:4])
http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Classification/Na%C3%
AFve_Bayes
2/20/15
36
Performance Tips
Use Built-in Math Functions
Use Environments for Lookup Tables
Use a Database to Query Large Data Sets
Preallocate Memory
Monitor How Much Memory You Are Using
Cleaning Up Objects
Functions for Big Data Sets
Parallel Computation with R
2/20/15
38
??base::delim
# Search for 'delim' in all help files for functions in 'base'
help.search("delimited")
# Search for 'delimited' in all help files
RSiteSearch("parsing text")
# Search for the term 'parsing text' on the R site.
2/20/15
40
2/20/15
41
Study Material
R in a nutshell
2/20/15
42
Online Reference
2/20/15
43
2/20/15
44
Resource
Websites
Stackoverflow
Cross Validated
R-help
R-devel
R-sig-*
Package-specific mailing list
Blog
R-bloggers
Twitter
https://twitter.com/#rstats
Quora
http://www.quora.com/R-software
2/20/15
45
Resource (Cond)
Conference
useR!
R in Finance
R in Insurance
Others
Joint Statistical Meetings
Royal Statistical Society Conference
2/20/15
46
Thank You!
2/20/15
47