Вы находитесь на странице: 1из 47

R Language Tutorial

David Chiu

2/20/15

Confidential | Copyright 2013 Trend Micro Inc.

Background of R

2/20/15

Confidential | Copyright 2012 Trend Micro Inc.

What is R?
GNU Project Developed by John Chambers @ Bell Lab
Free software environment for statistical computing and graphics
Functional programming language written primarily in C, Fortran

2/20/15

Confidential | Copyright 2012 Trend Micro Inc.

R Language
R is functional programming language
R is an interpreted language
R is object oriented-language

Why Using R
Statistic analysis on the fly
Mathematical function and graphic module embedded
FREE! & Open Source!
http://cran.r-project.org/src/base/

Kaggle
R is the most widely language used by
kaggle participants

http://www.kaggle.com/

Data Scientist of these Companies Using R


What is your programming language of choice, R,
Python or something else?
I use R, and occasionally matlab, for data analysis. There is
a large, active and extremely knowledgeable R community at
Google.
http://simplystatistics.org/2013/02/15/interview-with-nick-chamandy-statistician-at-google/

Expert knowledge of SAS (With Enterprise


Guide/Miner) required and candidates with
strong knowledge of R will be preferred
http://www.kdnuggets.com/jobs/13/03-29-apple-sr-data-scientist.html?utm_s
ource=twitterfeed&utm_medium=facebook&utm_campaign=tfb&utm_content=FaceBo
ok&utm_term=analytics#.UVXibgXOpfc.facebook

2/20/15

Confidential | Copyright 2013 Trend Micro Inc.

Commercial support for R

In 2007, Revolution Analytics providea commercial support for


Revolution R
http://www.revolutionanalytics.com/products/revolution-r.php
http://www.revolutionanalytics.com/why-revolution-r/which-r-is-right-for-me.php

Big Data Appliance, which integrates R, Apache Hadoop, Oracle


Enterprise Linux, and a NoSQL database with the
Exadata hardware
http://
www.oracle.com/us/products/database/big-data-appliance/overview/index.html

Revolotion R
Free for Community Version
http://www.revolutionanalytics.com/downloads/
Base R 2.14.2
64

Revolution R
(1-core)

Revolution R
(4-core)

Speedup (4 core)

17.4 sec

2.9 sec

2.0 sec

7.9x

Matrix Functions 10.3 sec

2.0 sec

1.2 sec

7.8x

Program Control 2.7 sec

2.7 sec

2.7 sec

Not Appreciable

Matrix
Calculation

http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php
2/20/15

Confidential | Copyright 2013 Trend Micro Inc.

IDE

R Studio

RGUI

http://www.rstudio.com/

2/20/15

Confidential | Copyright 2013 Trend Micro Inc.

http://www.r-project.org/

10

Web App Development


Shiny makes it super simple for R users like you to turn
analyses into interactive web applications that anyone can
use

http://www.rstudio.com/shiny/
2/20/15

Confidential | Copyright 2013 Trend Micro Inc.

11

Package Management
CRAN (Comprehensive R Archive Network)
Repository
CRAN
Bioconductor
R-Forge

2/20/15

URL
http://cran.r-project.org/web/packages/
http://www.bioconductor.org/packages/release/Software.html
http://r-forge.r-project.org/

Confidential | Copyright 2013 Trend Micro Inc.

12

R Basic

2/20/15

Confidential | Copyright 2012 Trend Micro Inc.

13

Basic Command
help()
help(demo)

demo()
demo(is.things)

q()
ls()
rm()
rm(x)

2/20/15

Confidential | Copyright 2013 Trend Micro Inc.

14

Basic Object
Vector
List
Factor
Array
Matrix
Data Frame

2/20/15

Confidential | Copyright 2013 Trend Micro Inc.

15

Objects & Arithmetic


Scalar
x=3; y<-5; x+y

Vectors

2/20/15

x = c(1,2,3, 7); y= c(2,3,5,1); x+y; x*y; x y; x/y;


x =seq(1,10); y= 2:11; x+y
x =seq(1,10,by=2); y =seq(1,10,length=2)
rep(c(5,8), 3)
x= c(1,2,3); length(x)

Confidential | Copyright 2013 Trend Micro Inc.

16

Summaries and Subscripting


Summary
X = c(1,2,3,4,5,6,7,8,9,10)
mean(x), min(x), median(x), max(x), var(x)
summary(x)

Subscripting

2/20/15

x = c(1,2,3,4,5,6,7,8,9,10)
x[1:3]; x[c(1,3,5)];
x[c(1,3,5)] * 2 + x[c(2,2,2)]
x[-(1:6)]

Confidential | Copyright 2013 Trend Micro Inc.

17

Lists
Contain a heterogeneous selection of objects

e <- list(thing="hat", size="8.25"); e


l <- list(a=1,b=2,c=3,d=4,e=5,f=6,g=7,h=8,i=9,j=10)
l$j
man = list(name="Qoo", height=183); man$name

Factor
Ordered collection of items to present categorical value
Different values that the factor can take are called levels
Factors
phone = factor(c('iphone', 'htc', 'iphone', 'samsung', 'iphone',
'samsung'))
levels(phone)

2/20/15

Confidential | Copyright 2013 Trend Micro Inc.

19

Matrices & Array


Array
An extension of a vector to more than two dimensions
a <- array(c(1,2,3,4,5,6,7,8,9,10,11,12),dim=c(3,4))

Matrices

A vector to two dimensions 2d-array


x = c(1,2,3); y = c(4,5,6); rbind(x,y);cbind(x,y)
x = rbind(c(1,2,3),c(4,5,6)); dim(x)
x<-matrix(c(1,2,3,4,5,6),nr=3);
x<-matrix(c(1,2,3,4,5,6),nrow=3, ,byrow=T)
x<-matrix(c(1,2,3,4),nr=2);y<-matrix(c(5,6),nr=2); x%*%y
t(matrix(c(1,2,3,4),nr=2))
solve(matrix(c(1,2,3,4),nr=2))

Data Frame
Useful way to represent tabular data
essentially a matrix with named columns may also
include non-numerical variables
Example
df = data.frame(a=c(1,2,3,4,5),b=c(2,3,4,5,6));df

Function
Function
`%myop%` <- function(a, b) {2*a + 2*b}; 1 %myop% 1
f <- function(x) {return(x^2 + 3)}
create.vector.of.ones <- function(n) {
return.vector <- NA;
for (i in 1:n) {
return.vector[i] <- 1;
} return.vector;
}
create.vector.of.ones(3)

Control Structures
If else
Repeat, for, while

Catch error trycatch

Anonymous Function
Functional language Characteristic
apply.to.three <- function(f) {f(3)}
apply.to.three(function(x) {x * 7})

Objects and Classes


All R code manipulates objects.
Every object in R has a type
In assignment statements, R will copy the object, not
just the reference to the object Attributes

S3 & S4 Object
Many R functions were implemented using S3 methods
In S version 4 (hence S4), formal classes and methods
were introduced that allowed
Multiple arguments
Abstract types
inheritance.

OOP of S4
S4 OOP Example
setClass("Student", representation(name = "character",
score="numeric"))
studenta = new ("Student", name="david", score=80 )
studentb = new ("Student", name="andy", score=90 )
setMethod("show", signature("Student"),
function(object) {
cat(object@score+100)
})
setGeneric("getscore", function(object)
standardGeneric("getscore"))
Studenta

Packages
A package is a related set of functions, help files, and
data files that have been bundled together.
Basic Command

library(rpart)
CRAN
Install
(.packages())

Package used in Machine Learning for


Hackers

2/20/15

Confidential | Copyright 2013 Trend Micro Inc.

28

Apply
Apply
Returns a vector or array or list of values obtained by applying a
function to margins of an array or matrix.

2/20/15

data <- cbind(c(1,2),c(3,4))


data.rowsum <- apply(data,1,sum)
data.colsum <- apply(data,2,sum)
data

Confidential | Copyright 2013 Trend Micro Inc.

29

Apply
lapply
returns a list of the same length as X, each element of which is
the result of applying FUN to the corresponding element of X.

sapply
is a user-friendly version and wrapper of lapply by default
returning a vector, matrix or

vapply
is similar to sapply, but has a pre-specified type of return value,
so it can be safer (and sometimes faster) to use.

2/20/15

Confidential | Copyright 2013 Trend Micro Inc.

30

File IO
Save and Load

x = USPersonalExpenditure
save(x, file="~/test.RData")
rm(x)
load("~/test.RData")
x

Charts and Graphics

Plotting Example
xrange = range(as.numeric(colnames(USPersonalExpenditure)));
yrange= range(USPersonalExpenditure);
plot(xrange, yrange, type="n", xlab="Year",ylab="Category" )
for(i in 1:5) {
lines(as.numeric(colnames(USPersonalExpenditure)),USPersonalExpenditure[i,],
type="b", lwd=1.5)
}

IRIS Dataset
data()

IRIS Dataset
The Iris flower data set or Fisher's Iris data set is a
multivariate data set introduced by Sir Ronald Fisher
(1936) as an example ofdiscriminant analysis.[1] It is
sometimes called Anderson's Iris data set
http://en.wikipedia.org/wiki/Iris_flower_data_set

Iris setosa

2/20/15

Confidential | Copyright 2013 Trend Micro Inc.

Iris versicolor

35

Iris virginica

Classification of IRIS
Classification Example
install.packages("e1071")
pairs(iris[1:4],main="Iris Data
(red=setosa,green=versicolor,blue=virginica)", pch=21,
bg=c("red","green3","blue")[unclass(iris$Species)])
classifier<-naiveBayes(iris[,1:4], iris[,5])
table(predict(classifier, iris[,-5]), iris[,5])
classifier<-svm(iris[,1:4], iris[,5]) > table(predict(classifier, iris[,5]), iris[,5] + )
prediction = predict(classifier, iris[,1:4])
http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Classification/Na%C3%
AFve_Bayes

2/20/15

Confidential | Copyright 2013 Trend Micro Inc.

36

Performance Tips
Use Built-in Math Functions
Use Environments for Lookup Tables
Use a Database to Query Large Data Sets
Preallocate Memory
Monitor How Much Memory You Are Using
Cleaning Up Objects
Functions for Big Data Sets
Parallel Computation with R

R for Machine Learning

2/20/15

Confidential | Copyright 2012 Trend Micro Inc.

38

Helps of the Topic


?read.delim
# Access a function's help file

??base::delim
# Search for 'delim' in all help files for functions in 'base'

help.search("delimited")
# Search for 'delimited' in all help files

RSiteSearch("parsing text")
# Search for the term 'parsing text' on the R site.

Sample Code of Chapter 1


https://github.com/johnmyleswhite/ML_for_Hackers.git

2/20/15

Confidential | Copyright 2013 Trend Micro Inc.

40

Reference & Resource

2/20/15

Confidential | Copyright 2012 Trend Micro Inc.

41

Study Material
R in a nutshell

2/20/15

Confidential | Copyright 2013 Trend Micro Inc.

42

Online Reference

2/20/15

Confidential | Copyright 2013 Trend Micro Inc.

43

Community Resources for R help

2/20/15

Confidential | Copyright 2013 Trend Micro Inc.

44

Resource
Websites

Stackoverflow
Cross Validated
R-help
R-devel
R-sig-*
Package-specific mailing list

Blog
R-bloggers

Twitter
https://twitter.com/#rstats

Quora
http://www.quora.com/R-software
2/20/15

Confidential | Copyright 2013 Trend Micro Inc.

45

Resource (Cond)
Conference

useR!
R in Finance
R in Insurance
Others
Joint Statistical Meetings
Royal Statistical Society Conference

Local User Group


http://blog.revolutionanalytics.com/local-r-groups.html

Taiwan R User Group


http://www.facebook.com/Tw.R.User
http://www.meetup.com/Taiwan-R/

2/20/15

Confidential | Copyright 2013 Trend Micro Inc.

46

Thank You!

2/20/15

Confidential | Copyright 2012 Trend Micro Inc.

47

Вам также может понравиться