Вы находитесь на странице: 1из 28

R Programming

© 2016 SMART Training Resources Pvt. Ltd.


Overview

1. The Basics

2. R Data Structures

3. Data Input/Output

4. In-Built Functions

5. Data Visualization
What R does and does not
• data handling and storage: • is not a database, but
numeric, textual connects to DBMSs

• matrix algebra • has no graphical user


interfaces, but connects to
• hash tables and regular Java, TclTk
expressions
• language interpreter can be
• high-level data analytic very slow, but allows to
and statistical functions call own C/C++ code
• classes (“OO”) • no spreadsheet view of
• graphics data, but connects to Excel
• programming language: / MsOffice
loops, branching, • no professional /
subroutines commercial support
R and statistics
• Packaging: a crucial infrastructure to efficiently
produce, load and keep consistent software libraries
from (many) different sources / authors
• Statistics: most packages deal with statistics and data
analysis
• State of the art: many statistical researchers provide
their methods as R packages
> 1550+2000
R as a Calculator
[1] 3550
or various calculations in the same row

1.0
> 2+3; 5*9; 6-6
[1] 5
[1] 45

0.5
sin(seq(0, 2 * pi, length = 100))
[1] 0

0.0
> log2(32)

[1] 5

-0.5
> sqrt(2)

[1] 1.414214
-1.0

> seq(0, 5, length=6)


0 20 40 60 80 100

Index
[1] 0 1 2 3 4 5
> plot(sin(seq(0, 2*pi, length=100)))
Variables
> i = 81
> sqrt(i) numeric
[1] 9

> prov = "All that Glitters are not Gold"


character
> sub("Glitters ","Glisters",prov)
[1] " All that Glisters are not Gold“ string

> 1>2
[1] FALSE logical
Object orientation
primitive (or: atomic) data types in R are:

• numeric (integer, double, complex)


• character
• logical
• function
Numbers in R: NAN and NA

• NAN (not a number)


• NA (missing value)
o Basic handling of missing values
>x
[1] 1 2 3 4 5 6 7 8 NA
> mean(x)
[1] NA
> mean(x,na.rm=TRUE)
[1] 4.5
Objects in R

• Objects in R obtain values by assignment.


• This is achieved by the gets arrow, <-, and not the
equal sign, =.
• Objects can be of different kinds.
R Data Structures

Vector
Matrix
Array
Factor
Data Frame
List
Vectors
• vector: an ordered collection of data of the same type

> a = c(1,2,3)
> a*2
[1] 2 4 6

• In R, a single number is the special case of a vector with 1


element.
• Other vector types: character strings, logical
Vectors
• Create a vector
> x <- 1:10
• Give the elements some names
> names(x) <-
c("first","second","third","fourth","fifth")

• Select elements based on another vector


> i <- c(1,5)
> x[i]
first fifth
1 5
> x[-c(i,8)]
second third fourth <NA> <NA> <NA> <NA>
2 3 4 6 7 9 10
Matrices

• matrix: a rectangular table of data of the same type

• array: 3-,4-,..dimensional matrix


• example: the red and green foreground and background
values for 20000 spots on 120 chips: a 4 x 20000 x 120 (3D)
array.
Matrices
• Create an array
> x <- array(1:10, dim = c(2, 5))
>x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> attributes(x)
$dim
[1] 2 5
> dim(x)
[1] 2 5
Matrices
• Set column or row names
> colnames(x) <- c("col1", "col2", "col3", "col4", "5", "6")
>x
col1 col2 col3 col4 col5
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> colnames(x)[1] <- "column1"
>x
column1 col2 col3 col4 col5
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
Matrix
• Set row and columns names using dimnames
> dimnames(x) <- list(c("first", "second"), NULL)
>x
column1 col2 col3 col4 col5
first 1 3 5 7 9
second 2 4 6 8 10
• Setting dimension names
> dimnames(x) <- list(my.rows = c("first", "second"), my.cols = NULL)
>x
my.cols
my.rows [,1] [,2] [,3] [,4] [,5]
first 1 3 5 7 9
second 2 4 6 8 10
Lists
• vector: an ordered collection of data of the same type.
> a = c(7,5,1)
> a[2]
[1] 5
• list: an ordered collection of data of arbitrary types.
> doe = list(name="john",age=28,married=F)
> doe$name
[1] "john“
> doe$age
[1] 28
• Typically, vector elements are accessed by their index (an integer),
list elements by their name (a character string). But both types
support both access methods.
Data frames
• data frame: is supposed to represent the typical data table that
researchers come up with – like a spreadsheet.

• It is a rectangular table with rows and columns; data within


each column has the same type (e.g. number, text, logical), but
different columns may have different types.

Example:
>a
localisation tumorsize progress
XX348 proximal 6.3 FALSE
XX234 distal 8.0 TRUE
XX987 proximal 10.0 FALSE
Factors
• A character string can contain arbitrary text. Sometimes it is useful
to use a limited vocabulary, with a small number of allowed
words. A factor is a variable that can only take such a limited
number of values, which are called levels.
• Example
• a family of two girls (1) and four boys(0),
>kids = factor(c(1,0,1,0,0,0),levels=c(0,1),
labels=c("boy","girl"))
> Kids
[1] girl boy girl boy boy boy
Levels: boy girl
> class(kids)
[1] "factor"
Data Input/Output
Directory management
• dir() list files in directory
• setwd(path) set working directory
• getwd() get working directory
• ?files File and Directory Manipulation

Standard ASCII Format


• read.csv read comma-delimited file
• write.csv write comma-delimited file
Reading

> sets <- read.csv("Sets_All.csv", header = TRUE)


> sets$Ordered.Year <- ordered(sets$Year)
> sets$SpotCd.Fac <- factor(sets$SpotCd, exclude = NULL)
> spotted.sets <- sets[sets$Sp1Cd == 2, ]

> write.csv(spotted.sets, file = "spotted.txt", row.names =


FALSE)
Data Visualization

• plot() is the main graphing function


• Automatically produces simple plots for vectors, functions or data
frames
Sample Data Set
Plotting a Vector

• plot(v) will print the elements of the vector v according to their


index
# Plot height for each observation
> plot(dataset$Height)
# Plot values against their ranks
> plot(sort(dataset$Height))
Common Parameters for
plot()
• Specifying labels:
o main: provides a title
o xlab: label for the x axis
o Ylab: label for the y axis
• Specifying range limits
o ylim – 2-element vector gives range for x axis
o xlim – 2-element vector gives range for y axis
• Example

o plot(sort(dataset$Height), ylim = c(120,200), ylab =


"Height (in cm)", xlab = "Rank", main = "Distribution of
Heights”)
Plotting Two Vectors

• plot()can pair elements from 2 vectors to produce x-y coordinates


• plot() and pairs() can also produce composite plots that pair all the
variables in a data frame.
• Example
o plot(dataset$Hip, dataset$Waist, xlab = "Hip", ylab = "Waist", main =
"Circumference (in cm)", pch = 2, col = "blue")
Histograms

• Generated by the hist() function


• The parameter breaks is key
o Specifies the number of categories to plot
or
o Specifies the breakpoints for each category
• The xlab, ylab, xlim, ylim options work as expected
• Example
o hist(dataset$bp.sys, col = "lightblue", xlab = "Systolic Blood Pressure",
main = "Blood Pressure“)
End of Session
Thank you…

© 2016 SMART Training Resources Pvt. Ltd.

Вам также может понравиться