Belajar RS

textToPrint <- "this is some text to print"
# our old friend print()
print(textToPrint)
# the nchar() function tells you the number of characters in a variable
nchar(textToPrint)
# the c() function concatenates (strings together) all its arguments
c(textToPrint, textToPrint, textToPrint)
# we can check the data type of a variable using the function str() (like "structure")
str(anExampleOfCharacters)
# we can tell this is a character because it's structure is "chr"
# let's create some numeric variables
hoursPerDay <- 24
daysPerWeek <- 7
# we can check to make sure that these actually are numeric
class(hoursPerDay)
class(daysPerWeek)
# since this is numeric data, we can do math with it!
# "*" is the symbol for multiplication
hoursPerWeek <- hoursPerDay * daysPerWeek
hoursPerWeek
# Important! Just becuase something is a *number* doesn't mean R thinks it's numeric!
a <- 5
b <- "6"
# this will get you the error "non-numeric argument to binary operator", becuase b isn't
# numeric, even though it's a number!
a*b
# You can change character data to numeric data using the as.numeric() function.
# This will let you do math with it again. :)
a * as.numeric(b)
# check out the stucture: note that b changes from "chr" to "num
str(b)
str(as.numeric(b))
# to fix b to be a number permentantly
# b <- as.numeric(b)
# let's make a vector!
listOfNumbers <- c(1,5,91,42.8,100008.41)
listOfNumbers
# becuase this is a numeric vector, we can do math on it! When you do math to a vector,
# it happens to every number in the vector. (If you're familiar with matrix
# mutiplication, it's the same thing as multiplying a 1x1 matrix by a 1xN matrix.)
# multiply every number in the vector by 5
5 * listOfNumbers
# add one to every number in the vector
listOfNumbers + 1
# get the first item from "listOfNumbers"
listOfNumbers[1]
chocolateData <- read_csv("../input/chocolate-bar-ratings/flavors_of_cacao.csv")
\ DIGANTI JADI /
# some of our column names have spaces in them. This line changes the column names to
# versions without spaces, which let's us talk about the columns by their names.
names(chocolateData) <- make.names(names(chocolateData), unique=TRUE)
# the head() function reads just the first few lines of a file.
head(chocolateData)
# the tail() function reads in the just the last few lines of a file.
# we can also give both functions a specific number of lines to read.
# This line will read in the last three lines of "chocolateData".
tail(chocolateData, 3)
get the contents in the cell in the sixth row and the forth column
chocolateData[6,4]
dataframe[row,column]
# Before we get going, let's get rid of the white spaces in the column names of this
# dataset. This will make it possible for us to refer to columns by thier names, since
# any white space in a name will mess R up.
names(chocolateData) <- gsub("[[:space:]+]", "_", names(chocolateData))
str(chocolateData)
sapply(my.data, typeof)
y x1 x2 X3
"double" "integer" "logical" "integer"
#print the first few values from the column named "Rating" in the dataframe "chocolateData"
head(chocolateData$Rating)
One of them is type_convert, which will look at the first 1000 rows of each column, guess what the data
type of that column should be and then convert that column into that data type.
# remove all the percent signs in the fifth column. You don't really need to worry about
# all the different things that are happening in this line right now.
chocolateData$Cocoa_Percent <- sapply(chocolateData$Cocoa_Percent, function(x) gsub("%", "", x))

chocolateData <- chocolateData[-1,]
# make sure we removed the row we didn't want

head(chocolateData)
# try the type_convert() function agian
chocolateData <- type_convert(chocolateData)
# check the structure to make sure it actually is a percent
str(chocolateData)
https://www.kaggle.com/ericson61/getting-started-in-r-summarize-data/edit
misahin variable
strsplit(LOAN$RATE,"%")
> coklat = type_convert(coklat)

Parsed with column specification:
cols(
Cocoa.Percent = col_character()
)
> coklat$Cocoa.Percent = sapply(coklat$Cocoa.Percent, function(x) gsub("%", "", x))
> coklat = type_convert(coklat)
Parsed with column specification:
cols(
Cocoa.Percent = col_double()
summarise_all(chocolateData, funs(mean))
## Summarizing a specific variable
# return a data_frame with the mean and sd of the Rating column, from the chocolate
# dataset in it
chocolateData %>%
summarise(averageRating = mean(Rating),
sdRating = sd(Rating))
The functions we used above give you an overview of the entire dataset, but often you're only
interested in one or two variables. We can look at specific variables really easily using the summarise()
function and pipes. Pipes are part of the Tidyverse package we loaded in the beginning: if you try to use
them without load in the package, you'll get an error.
GROUP
chocolateData %>%
group_by(Review_Date) %>%
summarise(averageRating = mean(Rating),
sdRating = sd(Rating))
# tell R that we're going to use the tidyverse library library(tidyverse)
# read in our dataset as a data_frame coklat = read.csv(“…”)
# remove the first line of our dataset using a negative index
# remove the white spaces in the column names
names(chocolateData) <- gsub("[[:space:]+]", "_", names(chocolateData))
# remove percentage signs in the Cocoa_Percent
chocolateData$Cocoa_Percent <- sapply(chocolateData$Cocoa_Percent, function(x) gsub("%", "", x))
# Poker and roulette winnings from Monday to Friday:
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector
# Which days did you make money on roulette?
selection_vector <- roulette_vector > 0
# Select from roulette_vector these days
roulette_winning_days <- roulette_vector[selection_vector]
print(roulette_winning_days)
# Box office Star Wars (in millions!)
new_hope <- c(460.998, 314.4)
empire_strikes <- c(290.475, 247.900)
return_jedi <- c(309.306, 165.8)

# Construct matrix
star_wars_matrix <- matrix(c(new_hope, empire_strikes, return_jedi), nrow = 3, byrow = TRUE)
# Vectors region and titles, used for naming
region <- c("US", "non-US")
titles <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
# Name the columns with region
colnames(star_wars_matrix) = region
# Name the rows with titles
rownames(star_wars_matrix) = titles
# Print out star_wars_matrix
star_wars_matrix
# The worldwide box office figures
worldwide_vector <- rowSums(star_wars_matrix)
worldwide_vector
# Bind the new variable worldwide_vector as a column to star_wars_matrix
all_wars_matrix <- cbind(star_wars_matrix,worldwide_vector)
all_wars_matrix
# Create speed_vector
speed_vector <- c("medium", "slow", "slow", "medium", "fast")
# Add your code below
factor_speed_vector <- factor(speed_vector, ordered = TRUE, levels = c("slow", "medium", "fast"))
# Print factor_speed_vector
factor_speed_vector
summary(factor_speed_vector)
# Create a data frame from the vectors
planets_df <- data.frame(name,type,diameter,rotation,rings)
# planets_df is pre-loaded in your workspace
# Use order() to create positions
positions <- order(planets_df$diameter)
# Use positions to sort planets_df
planets_df[positions,]
# draw a blank plot with "Review_Date" as the x axis and "Rating" as the y axis.
ggplot(chocolateData, aes(x= Review_Date, y = Rating))
http://ggplot2.tidyverse.org/reference/#section-layer-geoms
# draw a plot with "Review_Date" as the x axis and "Rating" as the y axis, and add a point for each data
point
ggplot(chocolateData, aes(x= Review_Date, y = Rating)) + geom_point()
# draw a plot with "Review_Date" as the x axis and "Rating" as the y axis, add a point for each data
point, move each point slightly so they don't overlap and add a smoothed line (lm = linear model)
ggplot(chocolateData, aes(x= Review_Date, y = Rating)) +
geom_point() +
geom_jitter() +
geom_smooth(method = 'lm')
save our plot to a variable with an informative name
chocolateRatingByReviewDate <- ggplot(chocolateData, aes(x= Review_Date, y = Rating, color =

Cocoa_Percent)) +
geom_point() +
geom_jitter() +
geom_smooth(method = 'lm')
# save our plot
ggsave("chocolateRatingByReviewDate.png", # the name of the file where it will be save
plot = chocolateRatingByReviewDate, # what plot to save
height=6, width=10, units="in") # the size of the plot & units of the size
# notice that this cell doesn't have any output in place! That's because in the first section we're
# giving the plot a name rather than printing it, and in the second we're saving our plot rather
# than printing it. We've never actually said to print our plot at any point.
# Return the average and sd of ratings by the year a rating was given
averageRatingByYear <- chocolateData %>%
group_by(Review_Date) %>%
summarise(averageRating = mean(Rating))
# plot only the average rating by year
ggplot(averageRatingByYear, aes(y= averageRating, x = Review_Date )) +
geom_point() + # plot individual points
geom_line() # plot line

Belajar RS

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Belajar RS

Загружено:

Авторское право:

Доступные форматы

textToPrint <- "this is some text to print"

# our old friend print()

# the nchar() function tells you the number of characters in a variable

# the c() function concatenates (strings together) all its arguments

c(textToPrint, textToPrint, textToPrint)

# we can tell this is a character because it's structure is "chr"

# let's create some numeric variables

# we can check to make sure that these actually are numeric

# since this is numeric data, we can do math with it!

# "*" is the symbol for multiplication

hoursPerWeek <- hoursPerDay * daysPerWeek

# numeric, even though it's a number!

# This will let you do math with it again. :)

# to fix b to be a number permentantly

# let's make a vector!

listOfNumbers <- c(1,5,91,42.8,100008.41)

# multiply every number in the vector by 5

# get the first item from "listOfNumbers"

chocolateData <- read_csv("../input/chocolate-bar-ratings/flavors_of_cacao.csv")

names(chocolateData) <- make.names(names(chocolateData), unique=TRUE)

# we can also give both functions a specific number of lines to read.

# This line will read in the last three lines of "chocolateData".

# any white space in a name will mess R up.

names(chocolateData) <- gsub("[[:space:]+]", "_", names(chocolateData))

chocolateData$Cocoa_Percent <- sapply(chocolateData$Cocoa_Percent, function(x) gsub("%", "", x))

# make sure we removed the row we didn't want

# try the type_convert() function agian

chocolateData <- type_convert(chocolateData)

# check the structure to make sure it actually is a percent

> coklat = type_convert(coklat)

## Summarizing a specific variable

# read in our dataset as a data_frame coklat = read.csv(“…”)

# remove the first line of our dataset using a negative index

# remove the white spaces in the column names

names(chocolateData) <- gsub("[[:space:]+]", "_", names(chocolateData))

# remove percentage signs in the Cocoa_Percent

chocolateData <- type_convert(chocolateData)

chocolateData$Cocoa_Percent <- sapply(chocolateData$Cocoa_Percent, function(x) gsub("%", "", x))

chocolateData <- type_convert(chocolateData)

# Poker and roulette winnings from Monday to Friday:

poker_vector <- c(140, -50, 20, -120, 240)

roulette_vector <- c(-24, -50, 100, -350, 10)

days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")

names(poker_vector) <- days_vector

names(roulette_vector) <- days_vector

# Which days did you make money on roulette?

selection_vector <- roulette_vector > 0

# Select from roulette_vector these days

roulette_winning_days <- roulette_vector[selection_vector]

# Box office Star Wars (in millions!)

new_hope <- c(460.998, 314.4)

empire_strikes <- c(290.475, 247.900)

return_jedi <- c(309.306, 165.8)

star_wars_matrix <- matrix(c(new_hope, empire_strikes, return_jedi), nrow = 3, byrow = TRUE)

# Vectors region and titles, used for naming

region <- c("US", "non-US")

# Name the columns with region

# Name the rows with titles

# Print out star_wars_matrix

# The worldwide box office figures

worldwide_vector <- rowSums(star_wars_matrix)

# Bind the new variable worldwide_vector as a column to star_wars_matrix

all_wars_matrix <- cbind(star_wars_matrix,worldwide_vector)