Академический Документы
Профессиональный Документы
Культура Документы
Name:_________________________ Year/Progam:______________________
Instructions
Answer each problem to the best of your ability. Fully read instructions for all sections. Justify or explain
your answers when appropriate. Partial credit will be given for answers that are partially correct. Points will
be deducted for incorrect statements even if all other parts of your answer are correct.
All data tables can be found at the end of the exam. You may remove that page from the exam for your
convenience when referencing them.
No notes are permitted for this exam. Use of computers, smartphones, or other unauthorized aides while
taking the exam is strictly prohibited.
You may ask the instructor or a TA to display the help/documentation page of any function from base R or
tidyverse packages.
Honor statement
I promise I will not cheat on this exam. I will neither give nor receive any unauthorized assistance. I will not
share information about the exam with anyone who may be taking it at a different time. I have not been told
anything about the exam by someone who has taken it earlier.
Signature:________________________ Date:___________________________
Give the names of 3 to 5 people in this class (including yourself) who will be your project group members.
1
Part Points Possible Points Received
A 68
B 36
C 96
Total 200
2
Part A
This section uses multiple choice. For each problem, circle the best answer for each question.
4. (4 pts) What kind of plot can be used to visualize the relationship between two continuous
variables?
a. geom = “bar”, stat = “count”
b. geom = “bar”, stat = “bin”
c. geom = “point”, stat = “identity”
d. geom = “point”, stat = “count”
e. None of the above
5. (4 pts) What kind of plot can be used to visualize the distribution of a single categorical
variable?
a. geom = “bar”, stat = “count”
b. geom = “bar”, stat = “bin”
c. geom = “point”, stat = “identity”
d. geom = “point”, stat = “count”
e. None of the above
3
6. (4 pts) What kind of plot can be used to visualize the distribution of a single continuous
variable?
a. geom = “bar”, stat = “count”
b. geom = “bar”, stat = “bin”
c. geom = “point”, stat = “identity”
d. geom = “point”, stat = “count”
e. None of the above
9. (4 pts) Consider two continuous variables A and B and a categorical variable C. How can
we visualize the relationship between all three?
a. A scatter plot of A and B, mapping C to “color”
b. A scatter plot of A and C, mapping B to “size”
c. A bar plot of A and B, mapping C to “color”
d. A bar plot of A and C, mapping B to “size”
e. All of the above
10. (4 pts) Consider two categorical variables A and B and a continuous variable C. How can
we visualize the relationship between all three?
a. ggplot(data) + geom_boxplot(aes(x=A, y=C)) + facet_wrap(~B)
b. ggplot(data) + geom_boxplot(aes(x=B, y=C)) + facet_wrap(~A)
c. ggplot(data) + geom_histogram(aes(x=C)) + facet_grid(A~B)
d. ggplot(data) + geom_freqpoly(aes(x=C, color=A)) + facet_wrap(~B)
e. All of the above
4
12. (4 pts) What is true of each aesthetic mapped to a variable in a plot?
a. It must be a continuous variable
b. It must be a categorical variable
c. It has a corresponding scale
d. It has a corresponding coordinate system
e. It has a corresponding facet specification
17. (4 pts) What is NOT true of working with relational database management systems such
as MySQL or SQLite from R?
a. All computations take place in R
b. The data does not need to fit into memory
c. Multiple tables of data can be accessed from disk
d. All of the above
e. None of the above
5
Part B
In this section, provide the primary key and any foreign keys for each table in the set of relational data tables
(found at the end of this exam).
If a table does not have a given key, put “none”. For any foreign keys you list, also give the name of the table
for which it is a primary key.
titles:
• Primary key =
• Foreign key(s) =
sales:
• Primary key =
• Foreign key(s) =
inventory:
• Primary key =
• Foreign key(s) =
orders:
• Primary key =
• Foreign key(s) =
6
Part C
In this section, provide a pseudocode strategy (using relational data concepts such as group_by(),
summarise(), joins such as left_join() and right_join(), etc.) for solving each problem.
All datasets referenced can be found at the end of the exam. You do not need to account for missing data or
other special cases. You do not need to calculate anything.
20. (12 pts) Ms. Nelson and Ms. Paige are the two agents for a certain literary agency described
in Appendix A. How many books of each genre do they each, separately, represent?
21. (12 pts) What is the average word count for books of each genre?
7
22. (12 pts) What is the total sum of money in advances (as given by the advance column in
the sales table) for deals made by each agent?
23. (12 pts) Rank the literary genres by the total amount of money made in advances on sales
of domestic first print rights.
8
24. (12 pts) Customer and transaction information for a certain online vendor are given in
Appendix B. Calculate the total price of each order in orders.
25. (12 pts) Calculate the average price of items from each department.
9
26. (12 pts) Calculate the total amount of revenue for each department across all orders.
27. (12 pts) Create a new table with all items that have not yet been ordered by anyone.
10
Appendix A
The following three data tables describe the authors, book titles, and book sales managed by a certain literary
agency:
clients
## # A tibble: 5 x 5
## cid first_name last_name sign_date agent
## <chr> <chr> <chr> <chr> <chr>
## 1 jsmith Jane Smith 2001-03-04 Nelson
## 2 adory April Dory 2001-03-04 Paige
## 3 shu Simon Hu 2003-01-29 Paige
## 4 jsmith2 Jane Smith 2006-11-09 Nelson
## 5 lortiz Lorena Ortiz 2010-09-26 Nelson
titles
## # A tibble: 9 x 4
## title author genre word_count
## <chr> <chr> <chr> <dbl>
## 1 The House on the Hill jsmith contemporary 106789
## 2 The Blue Diary jsmith contemporary 95019
## 3 Dragon Eaters adory fantasy 135501
## 4 Silent Wizards adory fantasy 126038
## 5 Forbidden Alchemy adory fantasy 111666
## 6 My Father's Piano shu memoir 101365
## 7 Blueberry Pastures jsmith2 contemporary 95019
## 8 Sudden Confinement jsmith2 horror 95134
## 9 Rubi Saves the World lortiz young adult 76045
sales
## # A tibble: 9 x 4
## title rights advance royalty
## <chr> <chr> <dbl> <dbl>
## 1 The House on the Hill domestic first print 15000 0.125
## 2 Dragon Eaters domestic first print 12000 0.1
## 3 Dragon Eaters foreign markets 5000 0.05
## 4 Dragon Eaters audio 4000 0.075
## 5 Blueberry Pastures domestic first print 15000 0.125
## 6 My Father's Piano domestic first print 14500 0.1
## 7 My Father's Piano foreign markets 14500 0.1
## 8 Rubi Saves the World domestic first print 13500 0.11
## 9 Rubi Saves the World audio 6000 0.06
11
Appendix B
The following three data tables describe the customer information, inventory items, and online orders for a
certain vendor:
customers
## # A tibble: 7 x 3
## customer_id name email
## <dbl> <chr> <chr>
## 1 1 John Smith john@thedude.com
## 2 2 Kelly Shay kt598@h0tmail.com
## 3 3 Simone Arnold coolchick99@geemail.com
## 4 4 Denise Sanchez dsanchez@outlooook.org
## 5 5 Shirley Grace noreply@somedomain.edu
## 6 6 John Smith jsmith@harvrad.org
## 7 7 Aiden Shu noreply@somedomain.edu
inventory
## # A tibble: 7 x 4
## item_id description department price
## <dbl> <chr> <chr> <dbl>
## 1 7 Black wood chair Furniture 60.9
## 2 8 XE Laptop computer Electronics 2200.
## 3 11 Sandalwood desk Furniture 111.
## 4 13 Shiny thing Toys and Games 1000.
## 5 113 Mini screwdriver Tools 5.76
## 6 213 Black wood chair Furniture 161.
## 7 226 Deck playing cards Toys and Games 5.76
orders
## # A tibble: 10 x 4
## order_id customer_id item_id date
## <dbl> <dbl> <dbl> <chr>
## 1 1001 2 7 10/3/18
## 2 1001 2 7 10/3/18
## 3 1001 2 11 10/3/18
## 4 1004 4 8 10/3/18
## 5 1022 5 113 10/6/18
## 6 1022 5 8 10/6/18
## 7 1103 1 226 10/6/18
## 8 1103 1 213 10/6/18
## 9 1268 5 226 10/8/19
## 10 1299 4 7 10/8/18
12