Вы находитесь на странице: 1из 5

ADVANCED R

Data structures
typeof() [e.g., is.character() / is.double() / is.integer() / is.logical() / is.atomic()]
attributes()
is.atomic() / is.list() to test if an object is a vector
list() is used to create a list; they are sometimes called recursive vectors, because a list can contain
other lists. This makes them fundamentally different from atomic vectors.
unlist() to turn a list into an atomic vector.
names(), class(), and dim() to access vector
setNames() – sets the names on an object and returns the object
c() generalizes to cbind() and rbind() for matrices; abind() for arrays; t() transposes a matrix;
aperm() is the generalized equivalent for arrays.
data.frame()’s default behaviour turns strings into factors; use stringAsFactors = FALSE to
suppress this behaviour
as.data.frame() is used to coerce an object to a data frame
cbind() and rbind() are used to combine data frames

Atomic vectors: logical, integer, double (numeric), and character. Two rare types are complex and
raw.
Atomic vectors are created with c(). They are always flat, even if you nest c()’s/
All elements of an atomic vector must be of the same type, whilst the elements of a list can be of
different types.
Lists are different from atomic vectors because their elements can be of any time, including lists.
Lists are used to build up many more complicated data structures, e.g. data frames and linear model
objects are lists.

Attributes are used to store metadata about the object – they can be thought of as a named list (with
unique names). They can be accessed individually with attr() or all at once (as a list) with
attributes().
By default, most attributes are lost when modifying a vector.

The only attributes not lost are the most important – names, dimensions, and class
Factor – a vector that can contain only predefined values, and is used to store categorical variables.
Factors are built on top of integer vectors using two attributes: the class (factor) + the levels (which
defines the set of allowed values).
Factors are useful when you know the possible values a variable may take, even if you don’t see
all the values in a given dataset.

!!!! When a data frame is read directly from a file, a column you’d thought would produce a
numeric vector instead produces a factor :
 This is cased by a non-numeric value in the column, often a missing value encoded in
a special way
 Remedy this situation by coercing the vector from a factor to a character vector , and
then from a character to a double vector

While factors look like character vectors, they are actually integers. Be careful when treating them
like strings.

Array – adding a dim() attribute to an atomic vector allows is to behave like a multi-dimensional
array.

A data frame is a list of equal-length vectors. This makes it a two-dimensional structure, so it


shares properties of both the matrix and the list.
A data frame has: names(), colnames(), rownames(); the length() of a data frame is the length of
the underlying list and so is the same as ncol(); nrow() gives the number of rows.
A data frame is an S3 class - its type reflects the underlying vector used to build it – the list

To check if an object is a data frame, use class() or test with is.data.frame()

 A vector will create a one-column data frame

 A list will create one column for each element; it’s an error if they are not all the same
length

 A matrix will create a data frame with the same number of columns and rows

When combining data frames row-wise, both the number and the names of columns must match –
use plyr::rbind.fill() to combine data frames that don’t have the same columns

When combining data frames column-wise, the number of rows must match, but row names are
ignored.
Subsetting

Three subsetting operators


Six types of subsetting
Important differences in behaviour for different objects (e.g., vectors, lists, factors,
matrices, and data frames)
The use of subsetting in conjunction with assignment

[ is the most commonly used operator


[[ and $ are the two other main subsetting operators
setNames()
vector[order(vector)]
order()

Positive integers return elements at the specified positions


Negative integers omit elements at the specified positions
Logical vectors select elements where the corresponding logical value is TRUE.
If the logical vector is shorter than the vector being subsetted, it will be recycled to be the same
length
A missing value in the index always yields a missing value in the output
Nothing returns the original vector
Zero returns a zero-length vector; this is not useful for vectors, but is very useful for matrices, data
frames, and arrays; it can also be very useful in conjunction with assignment.
Zero returns a zero-length vector; this can be useful for generating test data
If the vector is named, you can use character vectors to return elements with matching names

Lists – subsetting a list works in the same way as subsetting an atomic vector: using [ will always
return a list; using [[ and $ let you pull out the components of the list.

Subsetting higher-dimensional structures: a) with multiple vectors; b) with a single vector; c) with
a matrix.

The most common way of subsetting matrices (2d) and arrays (>2d) is a simple generalization of
the 1d setting: you supply a 1d index for each dimension, separated by a comma. Blank subsetting
is now useful because it lets you keep all rows and all columns.

Vocabulary
Style guide
Functions

Вам также может понравиться