Вы находитесь на странице: 1из 12

KD Lab – 1 Introductions to R1

If at any point you'd like more information on a particular topic related to R, you can type help.start() at the
prompt, which will open a menu of resources.

Basic Building Blocks

In its simplest form, R can be used as an interactive calculator. Type 5 + 7 and press Enter.
> 5+7
[1] 12
R simply prints the result of 12 by default. However, R is a programming language and often the
reason we use a programming language as opposed to a calculator is to automate some process or
avoid unnecessary repetition. In this case, we may want to use our result from above in a second
calculation. Instead of retyping 5 + 7 every time we need it, we can just create a new variable
that stores the result. The way you assign a value to a variable in R is by using the assignment
operator, which is just a 'less than' symbol followed by a 'minus' sign. It looks like this: 
Think of the assignment operator as an arrow. You are assigning the value on the right side of
the arrow to the variable name on the left side of the arrow. To assign the result of 5 + 7 to a new
variable called x, you type x  5 + 7. This can be read as 'x gets 5 plus 7'. Give it a try now.
> x <- 5 + 7
You'll notice that R did not print the result of 12 this time. When you use the assignment
operator, R assumes that you don't want to see the result immediately, but rather that you intend
to use the result for something else later on. To view the contents of the variable x, just type x
and press Enter. Try it now.
>x
[1] 12
Now, let's create a small collection of numbers called a vector. Any object that contains data is
called a data structure and numeric vectors are the simplest type of data structure in R. In fact,
even a single number is considered a vector of length one. The easiest way to create a vector is
with the c() function, which stands for 'concatenate' or 'combine'. To create a vector containing
the numbers 1.1, 9, and 3.14, type c(1.1, 9, 3.14). Try it now and store the result in a variable
called z.
> z <- c(1.1, 9, 3.14)
Anytime you have questions about a particular function, you can access R's built-in help files via
the `?` command. For example, if you want more information on the c() function, type ?c without
the parentheses that normally follow a function name.
> ?c
Type z to view its contents. Notice that there are no commas separating the values in the output.
>z
[1] 1.10 9.00 3.14
You can combine vectors to make a new vector. Create a new vector that contains z, 555, then z
again in that order. Don't assign this vector to a new variable, so that we can just see the result
immediately.
> c(z, 555, z)
[1] 1.10 9.00 3.14 555.00 1.10 9.00 3.14
Numeric vectors can be used in arithmetic expressions. Type the following to see what happens:
z * 2 + 100.
> z * 2 + 100
[1] 102.20 118.00 106.28
Other common arithmetic operators are `+`, `-`, `/`, and `^` (where x^2 means 'x squared'). To
take the square root, use the sqrt() function and to take the absolute value, use the abs() function.
When given two vectors of the same length, R simply performs the specified arithmetic
operation (`+`, `-`, `*`, etc.) element-by-element. If the vectors are of different lengths, R
'recycles' the shorter vector until it is the same length as the longer vector. When we did z * 2 +
100 in our earlier example, z was a vector of length 3, but technically 2 and 100 are each vectors
of length 1. Behind the scenes, R is 'recycling' the 2 to make a vector of 2s and the 100 to make a
vector of 100s. In other words, when you ask R to compute z * 2 + 100, what it really computes
is this: z * c(2, 2, 2) + c(100, 100, 100).

Workspace and Files

Now you'll learn how to examine your local workspace in R and begin to explore the relationship
between your workspace and the file system of your machine.
Determine which directory your R session is using as its current working directory using
getwd().
> getwd()
[1] "/home/user1/Desktop"
List all the objects in your local workspace using ls().
> ls()
[1] “x” “z”
List all the files in your working directory using list.files() or dir().
> list.files()
[1] "Weka Datasets" "MATLAB" "Custom Office
Templates"
[4] ""R" "RapidMiner" "SQL Server
Management”
Access information about the file by using file.info().
Sequences of Numbers

The simplest way to create a sequence of numbers in R is by using the `:` operator. Type 1:10 to
see how it works.
> 1:10
[1] 1 2 3 4 5 6 7 8 9 10
If we're interested in creating a vector that contains 10 zeros, we can use rep(0, times = 10).
> rep(0, times = 10)
[1] 0 0 0 0 0 0 0 0 0 0
Vectors

The simplest and most common data structure in R is the vector. Vectors come in two different
flavors: atomic vectors and lists. An atomic vector contains exactly one data type, whereas a list
may contain multiple data types. We'll explore atomic vectors further before we get to lists. In
previous lessons, we dealt entirely with numeric vectors, which are one type of atomic vector.
Other types of atomic vectors include logical, character, integer, and complex. In this lesson,
we'll take a closer look at logical and character vectors. Logical vectors can contain the values
TRUE, FALSE, and NA (for 'not available'). These values are generated as the result of logical
'conditions'. create a numeric vector num_vect that contains the values 0.5, 55, -10, and 6.
> num_vect <- c(0.5, 55, -10, 6)
Now, create a variable called tf that gets the result of num_vect < 1, which is read as 'num_vect
is less than 1'.
> tf <- num_vect < 1
> tf
[1] TRUE FALSE TRUE FALSE
The `<` symbol in these examples IS called a 'logical operators'. Other logical operators include
`>`, `>=`, `<=`, `==` for exact equality, and `!=` for inequality. If we have two logical
expressions, A and B, we can ask whether at least one is TRUE with A | B (logical 'or' a.k.a.
'union') or whether they are both TRUE with A & B (logical 'and' a.k.a. 'intersection'). Lastly, !A
is the negation of A and is TRUE when A is FALSE and vice versa.
Character vectors are also very common in R. Double quotes are used to distinguish character
objects, as in the following example. Create a character vector that contains the following words:
"My", "name", "is". Remember to enclose each word in its own set of double quotes, so that R
knows they are character strings. Store the vector in a variable called my_char.

> my_char <- c("My", "name", "is")


> my_char
[1] "My" "name" "is"
> paste(my_char, collapse = " ")
[1] "My name is"
> my_name <- c(my_char, "xyz")
> my_name
[1] "My" "name" "is" "xyz"
> paste(my_name, collapse = " ")
[1] "My name is xyz"
In this example, we used the paste() function to collapse the elements of a single character
vector. paste() can also be used to join the elements of multiple character vectors.
> paste("Hello", "world!", sep = " ")
[1] "Hello world!"

Subsetting Vectors

In this lesson, we'll see how to extract elements from a vector based on some conditions that we
specify. For example, we may only be interested in the first 20 elements of a vector, or only the
elements that are not NA, or only those that are positive or correspond to a specific variable of
interest. By the end of this lesson, you'll know how to handle each of these scenarios.
>x <- sample(c(1:20,NA),size=10)
[1] 8 14 20 1 NA 13 4 7 11 15
The way you tell R that you want to select some particular elements (i.e. a 'subset') from a vector
is by placing an 'index vector' in square brackets immediately following the name of the vector.
For a simple example, try x[1:5] to view the first ten elements of x.
> x[1:5]
[1] 8 14 20 1 NA
Index vectors come in four different flavors -- logical vectors, vectors of positive integers,
vectors of negative integers, and vectors of character strings -- each of which we'll cover in this
lesson. Let's start by indexing with logical vectors. One common scenario when working with
real-world data is that we want to extract all elements of a vector that are not NA (i.e. missing
data).
>x[is.na(x)]
[1] NA
Recall that `!` gives us the negation of a logical expression, so !is.na(x) can be read as 'is not
NA'. Therefore, if we want to create a vector called y that contains all of the non-NA values from
x, we can use y <- x[!is.na(x)]. Give it a try.
>y <- x[!is.na(x)]
>y
[1] 8 14 20 1 13 4 7 11 15
Now that we've isolated the non-missing values of x and put them in y, we can subset y as we
please. Type y[y > 10] to see that we get all of the positive elements of y grater than 10.
>y[y>10]
[1] 14 20 13 11 15
So far, we've covered three types of index vectors -- logical, positive integer, and negative
integer. The only remaining type requires us to introduce the concept of 'named' elements.
Create a numeric vector with three named elements using vect <- c(foo = 11, bar = 2, norf =
NA).
> vect <- c(foo = 11, bar = 2, norf = NA)
> vect
foo bar norf
11 2 NA
We can also get the names of vect by passing vect as an argument to the names() function. Give
that a try.
> names(vect)
[1] "foo" "bar" "norf"
Alternatively, we can create an unnamed vector vect2 with c(11, 2, NA). Do that now.
> vect2 <- c(11, 2, NA)
Then, we can add the `names` attribute to vect2 after the fact with names(vect2) <- c("foo",
"bar", "norf"). Go ahead.
> names(vect2) <- c("foo", "bar", "norf")
Now, let's check that vect and vect2 are the same by passing them as arguments to the identical()
function.
> identical(vect, vect2)
[1] TRUE
> vect["bar"]
bar
2
> vect[c("foo", "bar")]
foo bar
11 2
Matrices and Data Frames

In this lesson, we'll cover matrices and data frames. Both represent 'rectangular' data types,
meaning that they are used to store tabular data, with rows and columns. The main difference, as
you'll see, is that matrices can only contain a single class of data, while data frames can consist
of many different classes of data.
Let's create a vector containing the numbers 1 through 20 using the `:` operator. Store the result
in a variable called my_vector.
> my_vector <- 1:10
> my_vector
[1] 1 2 3 4 5 6 7 8 9 10
> dim(my_vector)
NULL
> length(my_vector)
[1] 10
The dim() function tells us the 'dimensions' of an object. Clearly, that's not very helpful! Since
my_vector is a vector, it doesn't have a `dim` attribute (so it's just NULL).
> dim(my_vector) <- c(2, 5)
> dim(my_vector)
[1] 2 5
> my_vector
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> class(my_vector)
[1] "matrix"
The example that we've used so far was meant to illustrate the point that a matrix is simply an
atomic vector with a dimension attribute. A more direct method of creating the same matrix uses
the matrix() function.
Bring up the help file for the matrix() function now using the `?` function.
> my_matrix <- matrix(1:10, nrow=2, ncol=5)
> identical(my_matrix, my_vector)
[1] TRUE
Now, imagine that the numbers in our table represent some measurements from a clinical
experiment, where each row represents one patient and each column represents one variable for
which measurements were taken. We may want to label the rows, so that we know which
numbers belong to each patient in the experiment. One way to do this is to add a column to the
matrix, which contains the names of all two people. Let's start by creating a character vector
containing the names of our patients -- A and B. Remember that double quotes tell R that
something is a character string. Store the result in a variable called patients.
> patients <- c("A", "B")
Now we'll use the cbind() function to 'combine columns'. Don't worry about storing the result in
a new variable. Just call cbind() with two arguments -- the patients vector and my_matrix.
> cbind(patients, my_matrix)
patients
[1,] "A" "1" "3" "5" "7" "9"
[2,] "B" "2" "4" "6" "8" "10"
Something is fishy about our result! It appears that combining the character vector with our
matrix of numbers caused everything to be enclosed in double quotes. This means we're left with
a matrix of character strings, which is no good. If you remember back to the beginning of this
lesson, matrices can only contain ONE class of data. Therefore, when we tried to combine a
character vector with a numeric matrix, R was forced to 'coerce' the numbers to characters, hence
the double quotes.
So, we're left with the question of how to include the names of our patients in the table without
destroying the integrity of our numeric data. Try the following.
> my_data <- data.frame(patients, my_matrix)
> my_data
patients X1 X2 X3 X4 X5
1 A 1 5 9 13 17
2 B 2 6 10 14 18
It looks like the data.frame() function allowed us to store our character vector of names right
alongside our matrix of numbers. That's exactly what we were hoping for! Behind the scenes, the
data.frame() function takes any number of arguments and returns a single object of class
`data.frame` that is composed of the original objects.
> class(my_data)
[1] "data.frame"
It's also possible to assign names to the individual rows and columns of a data frame, which
presents another possible way of determining which row of values in our table belongs to each
patient. However, since we've already solved that problem, let's solve a different problem by
assigning names to the columns of our data frame so that we know what type of measurement
each column represents. Since we have six columns (including patient names), we'll need to first
create a vector containing one element for each column. Create a character vector called cnames
that contains the following values (in order) -- "patient", "age", "weight", "bp", "rating", "test".
> cnames <- c("patient", "age", "weight", "bp", "rating", "test")
Now, use the colnames() function to set the `colnames` attribute for our data frame. This is
similar to the way we used the dim() function earlier in this lesson.
> colnames(my_data) <- cnames
> my_data
patient age weight bp rating test
1 A 1 3 5 7 9
2 B 2 4 6 8 10
Functions

Functions are one of the fundamental building blocks of the R language. They are small pieces of
reusable code that can be treated like any other R object. As you've worked through any previous
part of this course, you've probably used some functions already. Functions are usually
characterized by the name of the function followed by parentheses.
Let's try using a few basic functions just for fun. The Sys.Date() function returns a string
representing today's date. Type Sys.Date() and see what happens.
> Sys.Date()
[1] "2017-03-29"
The mean() function takes a vector of numbers as input, and returns the average of all of the
numbers in the input vector. Inputs to functions are often called arguments. Providing arguments
to a function is also sometimes called passing arguments to that function. Arguments you want to
pass to a function go inside the function's parentheses. Try passing the argument c(2, 4, 5) to the
mean() function.
> mean(c(2, 4, 5))
[1] 3.666667
You're about to write your first function! Just like you would assign a value to a variable with
the assignment operator, you assign functions in the following way:
function_name <- function(arg1, arg2){
# Manipulate arguments in some way
#Return a value
}

The "variable name" you assign will become the name of your function. arg1 and arg2 represent
the arguments of your function. You can manipulate the arguments you specify within the
function. After sourcing the function, you can use the function by typing:

function_name(value1, value2)

Below we will create a function called boring_function. This function takes the argument `x` as
input, and returns the value of x without modifying it.
> boring_function <- function(x) {
+ x
+}
> boring_function("This is my function")
[1] "This is my function"
The idea of passing functions as arguments to other functions is an important and fundamental
concept in programming. You may be surprised to learn that you can pass a function as an
argument without first defining the passed function. Functions that are not named are
appropriately known as anonymous functions.
Let's use the evaluate function to explore how anonymous functions work. For the first argument
of the evaluate function we're going to write a tiny function that fits on one line. In the second
argument we'll pass some data to the tiny anonymous function in the first argument.
> evaluate(function(x){x+1}, 6)
[1] 7
> evaluate(function(x){x[1]}, c(8, 4, 0))
[1] 8
> evaluate(function(x){x[length(x)]}, c(8, 4, 0))
[1] 0
The ellipses(…) can be used to pass on arguments to other functions that are used within the
function you're writing. Usually a function that has the ellipses as an argument has the ellipses as
the last argument. The usage of such a function would look like:

# ellipses_func(arg1, arg2 = TRUE, ...)

In the above example arg1 has no default value, so a value must be provided for arg1. arg2 has a
default value, and other arguments can come after arg2 depending on how they're defined in the
ellipses_func() documentation.
Interestingly the usage for the paste function is as follows:

# paste (..., sep = " ", collapse = NULL)


Notice that the ellipses is the first argument, and all other arguments after the ellipses have
default values. This is a strict rule in R programming: all arguments after ellipses must have
default values.
Let's explore how to "unpack" arguments from an ellipse when you use the ellipses as an
argument in a function. Below I have an example function that is supposed to add two explicitly
named arguments called alpha and beta.
# add_alpha_and_beta <- function(...){
# # First we must capture the ellipsis inside of a list
# # and then assign the list to a variable. Let's name this
# # variable `args`.
#
# args <- list(...)
#
# # We're now going to assume that there are two named arguments within args
# # with the names `alpha` and `beta.` We can extract named arguments from
# # the args list by used the name of the argument and double brackets. The
# # `args` variable is just a regular list after all!
#
# alpha <- args[["alpha"]]
# beta <- args[["beta"]]
#
# # Then we return the sum of alpha and beta.
#
# alpha + beta
#}
You're familiar with adding, subtracting, multiplying, and dividing numbers in R. To do this you
use the +, -, *, and / symbols. These symbols are called binary operators because they take two
inputs, an input from the left and an input from the right. In R you can define your own binary
operators.
The syntax for creating new binary operators in R is unlike anything else in R, but it allows you
to define a new syntax for your function. I would only recommend making your own binary
operator if you plan on using it often!
User-defined binary operators have the following syntax:
%[whatever]%
where [whatever] represents any valid variable name.
Let's say I wanted to define a binary operator that multiplied two numbers and then added one to
the product. An implementation of that operator is below:
"%mult_add_one%" <- function(left, right){ # Notice the quotation marks!
left * right + 1
}
I could then use this binary operator like `4 %mult_add_one% 5` which would evaluate to 21.

Assignments

a) What is vector in R? Create a numeric vector z, take the square root of z - 1 and assign it
to a new variable called my_sqrt and show the content of my_sqrt.

b) Try adding c(1, 2, 3, 4) and c(0, 10), explain the output.

c) What does happen if the length of the shorter vector does not divide evenly into the
length of the longer vector? Explain with an example.

d) Which functions are used in R for creating directories and files?

e) Which function is used in R for setting working directory?

f) What is the purpose of file.exists() function in R?

g) Create a file named test.R and store its size, mode and uname in a vector.

h) Create a directory in the current working directory called "testdir2" and a subdirectory for
it called "testdir3", all in one command by using dir.create() and file.path().

i) Create a vector of real numbers starting with pi (3.142...) ending with 10 and increasing
in increments of 1.

j) What is the output of seq(0,2,by=0.5)?

k) Create a vector to contain 10 repetitions of the vector (0, 1, 2).

l) Create a vector to contain 10 zeros, then 10 ones, then 10 twos.

m) Join two vectors, each of length 3. Use paste() to join the integer vector 1:3 with the
character vector c("X", "Y", "Z", "W"). This time, use sep = "" to leave no space between
the joined elements.

n) Let x is a vector (8,14,20,1,NA,13,4,7,11,15). What is the output of x[x>10]? Explain.

o) Subset the 3rd, 5th, and 7th elements of x vector given above.

p) Subset all elements of vector x given above EXCEPT the 2nd and 10th.

q) Justify: R uses 'one-based indexing'.

r) Create a matrix of 4X5 containing 1 to 20 values arranged row wise.

s) Create a data frame as shown below.


Name Gender Age AI IP KD CC MAD

A 0 21 16 19 24 18 23

B 1 22 19 22 15 16 20

C 0 21 21 17 18 17 19

t) Write a function named my_mean that calculates mean of given numeric vector using
function sum and length. Show execution of function with suitable input.

u) Write a function called "remainder." remainder() will take two arguments: "num" and
"divisor" where "num" is divided by "divisor" and the remainder is returned. Imagine that
you usually want to know the remainder when you divide by 2, so set the default value of
"divisor" to 2. Please be sure that "num" is the first argument and "divisor" is the second
argument. Hint #1: You can use the modulus operator %% to find the remainder.

v) When does in R the ordering of the arguments become unimportant?

w) Telegrams used to be peppered with the words START and STOP in order to demarcate
the beginning and end of sentences. Write a function called telegram that formats
sentences for telegrams. For example the expression `telegram("Good", "morning")`
should evaluate to: "START Good morning STOP"

x) The function below will construct a sentence from parts of speech that you provide as
arguments. Most of the function is written, but you'll need to unpack the appropriate
arguments from the ellipses.

mad_libs <- function(...){


# Do your argument unpacking here!
???
# Don't modify any code below this comment.
# Notice the variables you'll need to create in order for the code below to
# be functional!
paste("News from", place, "today where", adjective, "students took to the streets in
protest of the new", noun, "being installed on campus.")
}
Test your code with
>mad_libs(adjective = "", place = "", noun = "")
[1] "News from today where students took to the streets in protest of the new being
installed on campus."
y) Write your own binary operator from absolute scratch! Your binary operator must be
called %p% so that the expression: "Good" %p% "job!" will evaluate to: "Good job!"
References

1. Swirl package of R

Вам также может понравиться