Вы находитесь на странице: 1из 37

Chapter 3.

Working with R
CHAPTER OUTLINE
Introduction to R
Basic R Concepts
R Programming
ANOVA Using R
INTRODUCTION

Chapter Objectives
At the end of the chapter, the
student should be able to:
Understand R-language concepts;
and
Learn basic syntax in R-language;
and
Use R in ANOVA analyses.
INTRODUCTION
What is R?
R is a programming
language for statistical
computing and graphics.

Advantages of R:
1. It is free.
2. It is extensible.
3. It is widely-used.
4. It is open source.
INTRODUCTION
R Facts
1. It was developed by Robert
Gentleman and Ross Ihaka
at the University of Auckland
(New Zealand) to teach
Statistical programming to
students.
2. The syntax of R was based
on the popular S language
by Bell Laboratories (later
AT&T).
INTRODUCTION
Installing and Using R
To use R, users need to install the R base package
and an optional IDE called Rstudio.
You need to install R base package first before
Rstudio.

Download
To download R base package, go to:
http://cran.stat.upd.edu.ph/
To download RStudio, go to:
https://www.rstudio.com/products/rstudio/download/
INTRODUCTION
Installing and Using R (cont.)

Choose your version based on your operating system.


INTRODUCTION
Installing and Using R (cont.)

Choose your
version based on
your operating
system.
INTRODUCTION
R Studio IDE
Bottom left: console window or
command window where coding
is done. Take note that codes in R
are CHARACTER SENSITIVE.
Top left: editor window or script
window which receives the
commands from the command
window and executes it.
Top right: workspace/history
window shows which data and
values R has in its memory.
Bottom right: les/plots/
packages/help window where
you can open les, view plots (also
previous plots , install and load
packages or use the help function.
BASIC R CONCEPTS
Working Directory
The working directory is the folder in the computer in which R is
currently working on (file opening/importing and
saving/exporting).
Always set your directory first at every start of each session.

Syntax:
To set the working directory use the following syntax (dont forget
the quotation marks):
setwd(<directory>")

Example:

(sets the directory to drive E under R folder)


BASIC R CONCEPTS
Libraries
R can do many
statistical and data
analyses using
packages or libraries.
To view existing
package in the
installing, type the
following command:
library()
Example:
BASIC R CONCEPTS
Libraries (cont.)
If you want to install and use a package use either of
the following methods:

Using GUI Using Command Line


Install the package: click Type the following codes in
install packages in the the command window:
packages window and
type geometry install.packages("geometry")
Load the package: check library("geometry")
box in front of geometry
BASIC R CONCEPTS
Variables
Unlike BASIC, populating and declaring of variables are done
at the same time in R using <- symbol (combination of less-
than and hyphen symbols).

Refer to the examples below:


a<-10 #puts the value 10 to variable a
#note that comments are started with the number sign

b<-c(seq(1,2,by=0.5))
#seq(<lowerbound>, <upperbound>, by=<interval>)
#puts the sequence 1, 1.5, and 2 to variable b
c<-c(1:3)
# puts the sequence 1, 2, and 3 to variable c
# use colon for a sequence of interval 1
BASIC R CONCEPTS
Variables (Cont.)
A variable in R can contain (1) numeric; (2) character;
or (3) Boolean value or modes.

1. Numeric. Any combination of numbers from 0 to


9 symbol.
2. Character. Basically strings. Input are enclosed in
quotation marks. e.g.: MyString<-Sample
String
3. Boolean. Logic value either TRUE or FALSE
(take note of the capitalization).
BASIC R CONCEPTS
Variables (Cont.)
R represents modes into six basic datatypes: (1)
scalars; (2) vectors; (3) matrices; (4) arrays; (5) data
frames; (6) lists; and (7) factors.

1. Scalars are single value, one-dimensional variables which


can contain a single string, numeric, or Boolean value.

e.g.:
X<-20 #puts the value 20 to variable X
Y<-String #puts the value String to variable Y
Z<-TRUE #puts the value TRUE to variable Z
BASIC R CONCEPTS
Variables (Cont.)
2. Vectors are multi-value, one-dimensional variables which
can contain a multiple string, numeric, or Boolean value.
Note that vectors can hold one mode at a time.
Syntax: c(<values>)
e.g.:
a <- (1:6) # numeric vector
b <- c("one","two","three") # character vector
c <- c(TRUE,TRUE,TRUE,FALSE) #logical vector

You can use a value of a vectors element as shown


below:
a[c(2,4)] #returns 2 and 6 from the above example. Take note
that bracket are used to obtain the value from an
element.
BASIC R CONCEPTS
Variables (Cont.)
3. Matrices are multi-value, two-dimensional variables which
can contain multiple string, numeric, or Boolean value.
Same as vectors, all values in a matrix should be under
the same mode.
A matrix is populated by vectors which is repeated until
all elements of the matrix are populated.
Syntax: matrix(<vector>, nrow=<number of rows>,
ncol=<number of columns>, byrow=FALSE,
dimnames=list (char_vector_rowname,
char_vector_colname))
#codes in red are optional.
#byrow=FALSE means that the matrixs columns will be filled first (default)
from left to right.
BASIC R CONCEPTS
Variables (Cont.)
3. Matrices (cont.)
e.g.:
y <- matrix(1:4, nrow=2,ncol=4)
#returns a 2x4 matrix with the numbers 1,2,3, and 4 are repeated.
#returns a matrix containing the following values:

1 3 1 3
2 4 2 4
#note that y[2,3] returns 2 and y[1,3] returns 1.
#note that y[c(5,7)] returns 1 and 3.
#dimnames specify names to each row and column rather than the default
numeric number.
BASIC R CONCEPTS
Variables (Cont.)
4. Arrays are multi-value, multi-dimensional variables which
can contain multiple string, numeric, or Boolean value.
Arrays are simply multi-dimensional vectors
Syntax: array(<data>, dim = <number>, dimnames =
<list>)
#codes in red are optional.
#data is a vector value which repeated among the elements of the
array
#dim specifies the dimension of the array; e.g.: dim = 2 is a 1x2 array
while dim = c(3,3,3) is a 3x3x3 array
BASIC R CONCEPTS
Variables (Cont.)
4. Arrays (cont.)
e.g.:
BASIC R CONCEPTS
Variables (Cont.)
5. Data frames are general-purpose matrices in which each
column can be of different modes.
Syntax: data.frame(vector for column1, vector for
column2,..., vector for column N)
#Note that each column should have the same number of rows.

e.g.:
BASIC R CONCEPTS
Variables (Cont.)
6. Lists are collection of values.
Syntax: list(<collection name1> = <values>,
(<collection name2> = <values>,...,
(<collection name N> = <values>)
#Note that each collection can be of different modes and datatype.

e.g.:
BASIC R CONCEPTS
Variables (Cont.)
7. Factors is an ordered collection of categorical vector
values.
Syntax: factor(<vector data>, <vector containing
list of values>, ordered=<TRUE or FALSE,
default is TRUE>)
#level specifies the ordered list of values at an increasing order of
arrangement.
e.g.:
R Programming
Basic Commands
Comamnd Function
ls() List all declared objects.
rm(list=ls()) Clears all declared objects.
rnorm(<N>) Generates specified N-number of random numbers under
normal distribution.
table(<vector>) Shows a frequency distribution table of values within a vector
object.
plot(<data>) Generates a scattered chart from a given data
curve(<function>,a, b) Plots a curve of a function from lowerbound, a, and
upperbound, b.
sum(<vector>) Calculates the sum of a given vector.
rep(value, N) Generates N number of the specified values
print(value) Displays the value on the script window
R Programming
Basic Commands (cont.)
Comamnd Function
mean(<vector>) Calculates the mean of a given vector
solve(<matrix A>) Returns the inverse of matrix A
solve(<matrix A>, Gets the inverse of matrix A and multiplies the result to matrix
<matrix B>) B to obtain a matrix X or X = A-1B
AMat*BMat Element-wise multiplication of matrix AMat and matrix BMat
AMat%*%BMat Matrix multiplication
crossprod(AMat,BMat) Cross-product of AMat and BMat.
read.csv(file=<filena Loads a csv file that is under the working directory to a
me.csv>,head=TRUE,sep dataframe.
=,)
Summary(<data>) Displays the statistical summary of values within a variable.
R Programming
For Loop
The syntax for For Loop in R are shown below:
for (<variable> in <sequence>)
{repeated code}
Example:
R Programming
While Loop
The syntax for For Loop in R are shown below:
While(condition)
{repeated code}
Example:
R Programming
If Structure
The syntax for For Loop in R are shown below:
{if(condition)<code to happen>
else <code to happen>
Example:
R Programming
Example 1
Calculate the value of the following expression:
100
=10 3 + 4 2
Soln:

Example 2
Calculate the value of the following expression:
3
25 2
=1 + 2

Soln:
R Programming
Example 3
Solve the following systems of linear equations given 5 unknowns:
1 + 22 + 33 + 44 + 55 = 7
21 + 2 + 23 + 34 + 45 = 1
31 + 22 + 3 + 24 + 35 = 3
41 + 32 + 23 + 4 + 25 = 5
51 + 42 + 33 + 24 + 5 = 17
Soln: Using inverse matrix method,
ANOVA
ANOVA
Analysis of Variance (ANOVA) is a method of
determining if there are significant difference among
means of group of measurements.

Syntax:
aov(<response header>~<factor
header>,data=<dataframe>)
ANOVA
Example 4

Given the following data,


perform one-way ANOVA to
determine if drugs
formulations A, B and C
have significant difference in
pain alleviation.
ANOVA
Example 4 (cont.)
1. Set the working directory and load file into a
dataframe.

2. Check if the data is actually loaded.


3. Evaluate first graphically if there are
possible variations.
ANOVA
Example 4 (cont.)
4. Use the ANOVA function and put the result to results.

5. Display the ANOVA table.

Since F-statistic is 11.91 and p-value is approx. 0.0003 then we


reject null hypothesis that the means are equal. Therefore, there are
significant difference among means.
ANOVA
Tukeys Honest Significant Test
Since ANOVA f-test only answers the question if
significance between means exist, a rejection of the
null-hypothesis requires an additional test to determine
which specifically among the treatments means are the
source of the difference.
Common methods to do multiple comparison procedure
is either thru Tukeys Honest Significant Test (THSD)
or Duncans Multiple Range Test (DMRT)
Below is the syntax for THSD:
TukeyHSD(x, conf.level=0.95)
ANOVA
Example 4 (cont.)
6. Use THSD to compare the differences among the means of
the treatments.

Since p-values for below 0.05, results showed that B-A and C-A
treatments have significant differences while no significant
differences between treatments B-C (p>0.05).
Summary

R is a language specifically designed


for numerical calculations, specifically
statistical analyses, and displaying
these calculations thru its graphical
capabilities.
R is a linear programming language
whose functionalities can be enhanced
by using the RStudio environment.
Since R, is highly extensible and has a
large user base, functions are
continuously added.
To learn more about R, a large how-to
is available in various online resources.
Thank you very much!

Вам также может понравиться