Академический Документы
Профессиональный Документы
Культура Документы
Data structures
typeof() [e.g., is.character() / is.double() / is.integer() / is.logical() / is.atomic()]
attributes()
is.atomic() / is.list() to test if an object is a vector
list() is used to create a list; they are sometimes called recursive vectors, because a list can contain
other lists. This makes them fundamentally different from atomic vectors.
unlist() to turn a list into an atomic vector.
names(), class(), and dim() to access vector
setNames() – sets the names on an object and returns the object
c() generalizes to cbind() and rbind() for matrices; abind() for arrays; t() transposes a matrix;
aperm() is the generalized equivalent for arrays.
data.frame()’s default behaviour turns strings into factors; use stringAsFactors = FALSE to
suppress this behaviour
as.data.frame() is used to coerce an object to a data frame
cbind() and rbind() are used to combine data frames
Atomic vectors: logical, integer, double (numeric), and character. Two rare types are complex and
raw.
Atomic vectors are created with c(). They are always flat, even if you nest c()’s/
All elements of an atomic vector must be of the same type, whilst the elements of a list can be of
different types.
Lists are different from atomic vectors because their elements can be of any time, including lists.
Lists are used to build up many more complicated data structures, e.g. data frames and linear model
objects are lists.
Attributes are used to store metadata about the object – they can be thought of as a named list (with
unique names). They can be accessed individually with attr() or all at once (as a list) with
attributes().
By default, most attributes are lost when modifying a vector.
The only attributes not lost are the most important – names, dimensions, and class
Factor – a vector that can contain only predefined values, and is used to store categorical variables.
Factors are built on top of integer vectors using two attributes: the class (factor) + the levels (which
defines the set of allowed values).
Factors are useful when you know the possible values a variable may take, even if you don’t see
all the values in a given dataset.
!!!! When a data frame is read directly from a file, a column you’d thought would produce a
numeric vector instead produces a factor :
This is cased by a non-numeric value in the column, often a missing value encoded in
a special way
Remedy this situation by coercing the vector from a factor to a character vector , and
then from a character to a double vector
While factors look like character vectors, they are actually integers. Be careful when treating them
like strings.
Array – adding a dim() attribute to an atomic vector allows is to behave like a multi-dimensional
array.
A list will create one column for each element; it’s an error if they are not all the same
length
A matrix will create a data frame with the same number of columns and rows
When combining data frames row-wise, both the number and the names of columns must match –
use plyr::rbind.fill() to combine data frames that don’t have the same columns
When combining data frames column-wise, the number of rows must match, but row names are
ignored.
Subsetting
Lists – subsetting a list works in the same way as subsetting an atomic vector: using [ will always
return a list; using [[ and $ let you pull out the components of the list.
Subsetting higher-dimensional structures: a) with multiple vectors; b) with a single vector; c) with
a matrix.
The most common way of subsetting matrices (2d) and arrays (>2d) is a simple generalization of
the 1d setting: you supply a 1d index for each dimension, separated by a comma. Blank subsetting
is now useful because it lets you keep all rows and all columns.
Vocabulary
Style guide
Functions