Академический Документы
Профессиональный Документы
Культура Документы
print(textToPrint)
nchar(textToPrint)
# we can check the data type of a variable using the function str() (like "structure")
str(anExampleOfCharacters)
hoursPerDay <- 24
daysPerWeek <- 7
class(hoursPerDay)
class(daysPerWeek)
hoursPerWeek
# Important! Just becuase something is a *number* doesn't mean R thinks it's numeric!
a <- 5
b <- "6"
# this will get you the error "non-numeric argument to binary operator", becuase b isn't
a*b
# You can change character data to numeric data using the as.numeric() function.
a * as.numeric(b)
# check out the stucture: note that b changes from "chr" to "num
str(b)
str(as.numeric(b))
# b <- as.numeric(b)
listOfNumbers
# becuase this is a numeric vector, we can do math on it! When you do math to a vector,
# it happens to every number in the vector. (If you're familiar with matrix
# mutiplication, it's the same thing as multiplying a 1x1 matrix by a 1xN matrix.)
5 * listOfNumbers
# add one to every number in the vector
listOfNumbers + 1
listOfNumbers[1]
\ DIGANTI JADI /
# some of our column names have spaces in them. This line changes the column names to
# versions without spaces, which let's us talk about the columns by their names.
# the head() function reads just the first few lines of a file.
head(chocolateData)
# the tail() function reads in the just the last few lines of a file.
tail(chocolateData, 3)
get the contents in the cell in the sixth row and the forth column
chocolateData[6,4]
dataframe[row,column]
# Before we get going, let's get rid of the white spaces in the column names of this
# dataset. This will make it possible for us to refer to columns by thier names, since
str(chocolateData)
sapply(my.data, typeof)
y x1 x2 X3
"double" "integer" "logical" "integer"
#print the first few values from the column named "Rating" in the dataframe "chocolateData"
head(chocolateData$Rating)
One of them is type_convert, which will look at the first 1000 rows of each column, guess what the data
type of that column should be and then convert that column into that data type.
# remove all the percent signs in the fifth column. You don't really need to worry about
# all the different things that are happening in this line right now.
str(chocolateData)
https://www.kaggle.com/ericson61/getting-started-in-r-summarize-data/edit
misahin variable
strsplit(LOAN$RATE,"%")
# return a data_frame with the mean and sd of the Rating column, from the chocolate
# dataset in it
chocolateData %>%
summarise(averageRating = mean(Rating),
sdRating = sd(Rating))
The functions we used above give you an overview of the entire dataset, but often you're only
interested in one or two variables. We can look at specific variables really easily using the summarise()
function and pipes. Pipes are part of the Tidyverse package we loaded in the beginning: if you try to use
them without load in the package, you'll get an error.
GROUP
chocolateData %>%
group_by(Review_Date) %>%
summarise(averageRating = mean(Rating),
sdRating = sd(Rating))
# tell R that we're going to use the tidyverse library library(tidyverse)
print(roulette_winning_days)
titles <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
colnames(star_wars_matrix) = region
rownames(star_wars_matrix) = titles
star_wars_matrix
worldwide_vector
all_wars_matrix
# Create speed_vector
# Print factor_speed_vector
factor_speed_vector
summary(factor_speed_vector)
planets_df[positions,]
# draw a blank plot with "Review_Date" as the x axis and "Rating" as the y axis.
http://ggplot2.tidyverse.org/reference/#section-layer-geoms
# draw a plot with "Review_Date" as the x axis and "Rating" as the y axis, and add a point for each data
point
# draw a plot with "Review_Date" as the x axis and "Rating" as the y axis, add a point for each data
point, move each point slightly so they don't overlap and add a smoothed line (lm = linear model)
geom_point() +
geom_jitter() +
geom_smooth(method = 'lm')
geom_point() +
geom_jitter() +
geom_smooth(method = 'lm')
height=6, width=10, units="in") # the size of the plot & units of the size
# notice that this cell doesn't have any output in place! That's because in the first section we're
# giving the plot a name rather than printing it, and in the second we're saving our plot rather
# than printing it. We've never actually said to print our plot at any point.
# Return the average and sd of ratings by the year a rating was given
group_by(Review_Date) %>%
summarise(averageRating = mean(Rating))