Вы находитесь на странице: 1из 17

SQL Practice Questions ANSWER

KEY
Module 1 Practice Questions:
a. Using the cdc hive month-ASIN level table
vn5018r.P1M_Amz_Top_100K_All_Exclusions_consolidated identify the
reporting_level_4 with the highest number of ASINs over all the months combined (ASIN
is an Amazon Item and the above table is at ASIN level and reporting_level_4 is the
subcategory that contains the ASINs. reporting_level_4 is subcategory,
reporting_level_3 is category, reporting_level_2 is department, reporting_level_1 is
super department and reporting_level_0 is division)

ANSWER:
select reporting_level_4,
count(distinct asin) as no_of_asins
from vn5018r.P1M_Amz_Top_100K_All_Exclusions_consolidated
where reporting_level_4 is not null and reporting_level_4 != '' and upper(trim(reporting_level_4)) !=
'NULL'
group by reporting_level_4
order by count(distinct asin) desc
limit 1
b. Identify the month for which the reporting_level_4 identified above has the highest
number of ASINs

ANSWER:
select wm_month_name,
reporting_level_4,
count(distinct asin) as no_of_asins
from vn5018r.P1M_Amz_Top_100K_All_Exclusions_consolidated
where reporting_level_4 in
(
select reporting_level_4
from vn5018r.P1M_Amz_Top_100K_All_Exclusions_consolidated
where reporting_level_4 is not null and reporting_level_4 != '' and upper(trim(reporting_level_4)) !=
'NULL'
group by reporting_level_4
order by count(distinct asin) desc
limit 1
)
group by
wm_month_name,
reporting_level_4
order by count(distinct asin) desc
limit 1

c. Using the same table, for each month, identify the reporting_level_0 that has the highest
number of reporting_level_4’s under it (Refer to Module 2 topics – CTE & Window
Functions to solve this question) and obtain the output table at month-reporting_level_0
level having the reporting_level_0s with the highest number of reporting_level_4s

ANSWER:
with month_l0_level_counts as
(
select wm_month_name,
reporting_level_0,
count(distinct reporting_level_4) as no_of_L4s
from vn5018r.P1M_Amz_Top_100K_All_Exclusions_consolidated
group by
wm_month_name,
reporting_level_0
),
ranking_l0s_within_month as
(
select wm_month_name,
reporting_level_0,
no_of_L4s,
row_number() over(partition by wm_month_name order by no_of_L4s desc) as rank
from month_l0_level_counts
)
select
wm_month_name,
reporting_level_0,
no_of_L4s,
rank
from ranking_l0s_within_month
where rank = 1

d. Below is the student-topic level table with students and their grades in different topics.
Convert this 1NF table to 3NF

UnitID StudentI Date TutorID Topic Room Grade Book TutEmail


D
U1 St1 23.02.03 Tut1 GMT 629 4.7 Deumlich Tut1@fhbb.ch
U2 St1 18.11.02 Tut3 Gin 631 5.1 Zehnder Tut3@fhbb.ch
U1 St4 23.02.03 Tut1 GMT 629 4.3 Deumlich Tut1@fhbb.ch
U5 St2 05.05.03 Tut3 PhF 632 4.9 Dummlers Tut3@fhbb.ch
U4 St2 04.07.03 Tut5 AVQ 621 5.0 SwissTop Tut5@fhbb.ch
o

ANSWER:
2NF TABLES

In the above student-topic level table, the columns Room, Book & Date are only dependent
on Topic and hence they are only partially dependent on the primary key(StudentID –
Topic). To get rid of this Partial Dependency these columns can be moved to a separate
table and the above table can be reduced to 2NF.
Topic Date Room Book
GMT 23.02.03 629 Deumlich
Gin 18.11.02 631 Zehnder
PhF 05.05.03 632 Dummlers
AVQ 04.07.03 621 SwissTopo

UnitID StudentI TutorID Topic Grade TutEmail


D
U1 St1 Tut1 GMT 4.7 Tut1@fhbb.c
h
U2 St1 Tut3 Gin 5.1 Tut3@fhbb.c
h
U1 St4 Tut1 GMT 4.3 Tut1@fhbb.c
h
U5 St2 Tut3 PhF 4.9 Tut3@fhbb.c
h
U4 St2 Tut5 AVQ 5.0 Tut5@fhbb.c
h

3NF TABLES

The above 2NF table has a column TutEmail which is not even partially dependent on the
Primary Key (StudentID – Topic) but is dependent on TutorID which is a non-key column.
To get rid of this Transitive Dependency (a non-primary key column dependent on other
non-primary key column rather than depending on a primary key column) the TutEmail
column can be moved to another table.
Below are the three 3NF tables

Topic Date Room Book


GMT 23.02.03 629 Deumlich
Gin 18.11.02 631 Zehnder
PhF 05.05.03 632 Dummlers
AVQ 04.07.03 621 SwissTopo
UnitID StudentI TutorI Topic Grade
D D
U1 St1 Tut1 GMT 4.7
U2 St1 Tut3 Gin 5.1
U1 St4 Tut1 GMT 4.3
U5 St2 Tut3 PhF 4.9
U4 St2 Tut5 AVQ 5.0

TutorI TutEmail
D
Tut1 Tut1@fhbb.c
h
Tut3 Tut3@fhbb.c
h
Tut5 Tut5@fhbb.ch

e. Write a SQL statement to make a list with order no, purchase amount, customer name and their
cities for those orders whose order amount between 500 and 2000. Use the below 2 tables and illustrate
the output table
orders: tableA
Order_no Purch_amt Ord_date Customer_id Salesman_id
70001 150.5 2012-10-05 3005 5002
70009 270.65 2012-09-10 3001 5005
70002 65.26 2012-10-05 3002 5001
70004 110.5 2012-08-17 3009 5003
70007 948.5 2012-09-10 3005 5002
70005 2400.6 2012-07-27 3007 5001
70008 5760 2012-09-10 3002 5001
70010 1983.43 2012-10-10 3004 5006
70003 2480.4 2012-10-10 3009 5003
70012 250.45 2012-06-27 3008 5002
70011 75.29 2012-08-17 3003 5007
70013 3045.6 2012-04-25 3006 5001

Customers: tableB
Customer_id Cust_name City Grade Salesman_id
3002 Nick Rimando New York 100 5001
3005 Graham Zusi California 200 5002
3001 Brad Guzan London 300 5005
3004 Fabian Johns Paris 300 5006
3007 Brad Davis New York 200 5001
3009 Geoff Camero Berlin 100 5003
3008 Julian Green London 300 5002
3003 Jozy Altidor Moscow 200 5007

ANSWER:
select orders.order_no,
orders.purch_amt,
cust.cust_name,
cust.city
from tableA orders
left join tableB cust
on orders.customer_id = cust.cutomer_id
where orders.purch_amt between 500 and 2000;

Output Table
Order_no Purch_amt cust_name city
70007 948.5 Graham Zusi California
70010 1983.43 Fabian Johns Paris

f. Using the table vn5018r.deliver_date_item_output_format_dec18 calculate the deliver it


percentage as sum(total_deliver_it_units)/sum(total_units). Calculate the percentage for each
of the reporting_level_1 category at month- reporting_level_0- reporting_level_1 level.

ANSWER:
select wm_month_name,
reporting_level_0,
reporting_level_1,
sum(total_deliver_it_units)/sum(total_units) as deliver_perc
from vn5018r.deliver_date_item_output_format_dec18
group by
wm_month_name,
reporting_level_0,
reporting_level_1
order by
wm_month_name,
reporting_level_0,
reporting_level_1

g. Write a query in SQL to display those employees who contain a letter ‘z’ to their first name and
display their department and city using the below tables. Also illustrate the output table

Departments: tableA
Department_ID Department_Name Location_ID
10 Administration 1700
20 Marketing 1800
30 Purchasing 1700
40 Human Resources 2400
50 Shipping 1500
60 IT 1400
70 Public Relations 2700

Employees: tableB
Employee_ID First_Name Department_ID
100 Zack 10
101 Zohan 10
102 Jim 20
103 Jill 30
104 Jejo 30
105 Zaakir 40
106 Yacob 50
Locations: tableC
Location_ID City
1700 Venice
1800 Rome
1900 Tokyo
2000 London
2100 New York
2200 Paris
2300 Beijing

ANSWER:
Select emp.employee_id,
emp.first_name,
dept.department_name,
location.city
from tableB emp
left join tableA dept
on emp.department_id = dept.department_id
left join tableC location
on dept.location_id = location.location_id
where emp.first_name like ‘%z%’

output table

Employee_ID First_Name Department_name city


100 Zack Administration Venice
101 Zohan Administration Venice
105 Zaakir Human Resources

h. Convert the item level(ASIN level) table vn5018r.p1m_100k_final_dec18 to date-asin level by


using the dates present in the table vn5018r.evaluation_dates_dec18 such that every asin is
present across every date. Save the result into another table.
ANSWER:
Create table date_asin_level_table stored as ORC as
select item_level.*,
dates.calendar_date
from vn5018r.p1m_100k_final_dec18 item_level
cross join vn5018r.evaluation_dates_dec18 dates

i. For all the date – asin combinations (date-asin level table) created in the above table obtain the
instock and publish flags and have_it flag from the date-item level flags table
vn5018r.top_100K_instock_published_dec18 and create a 1/0 flag column called
not_in_catelogue to tag all the rows that do not obtain any flag from the flags table. The join key
would be catlg_item_id & calendar_date.

ANSWER:
Create table date_asin_level_table_w_flags stored as ORC as
select date_asin.*,
flags.instock,
flags.published,
flags.have_it_sku,
case when have_it_sku is null then 1 else 0 end as not_in_catelogue
from date_asin_level_table date_asin
left join vn5018r.top_100K_instock_published_dec18 flags
on date_asin.catlg_item_id = flags.catlg_item_id
and date_asin.calendar_date = flags.calendar_date

j. Create the same date-asin level table with flags as created above (question i.) for another new
list using these three, new list tables vn5018r.p1m_100k_final_dec18_unadj (asin level table),
vn5018r.evaluation_dates_dec18_unadj (dates table) and
vn5018r.top_100K_instock_published_dec18_unadj (item-date level flags), using the same join
key. Once created, unify this resultant date-asin level flags table with the above (question i.)
date-asin level flags table with a flag to denote if the row is part of the original list or new list.

ANSWER:

i. Create table date_asin_level_table_new_list stored as ORC as


select asin_level.*,
dates.calendar_date
from vn5018r.p1m_100k_final_dec18_unadj asin_level
cross join vn5018r.evaluation_dates_dec18_unadj dates
ii. Create table date_asin_level_table_w_flags_new_list stored as ORC as
select date_asin.*,
flags.instock,
flags.published,
flags.have_it_sku,
case when have_it_sku is null then 1 else 0 end as not_in_catelogue
from date_asin_level_table_new_list date_asin
left join vn5018r.top_100K_instock_published_dec18_unadj flags
on date_asin.catlg_item_id = flags.catlg_item_id
and date_asin.calendar_date = flags.calendar_date
iii. Create table date_asin_level_table_w_flags_new_list_original_list stored as ORC as
select *,
'original_list' as list_name
from date_asin_level_table_w_flags

union all

select *,
'new_list' as list_name
from date_asin_level_table_w_flags_new_list

END OF Module 1 Practice Questions

Module 2 Practice Questions:


a. Rank the salaries of all the employees within each department using the table below and
then pick the top ranked (highest salaried) employee within each department. Write a single
query for the same and show the output table
employee_id full_name department salary
100 Mary Johns Sales 1000
101 Sean Moldy IT 1500
102 Peter Dugan Sales 2000
103 Lilian Penn Sales 1700
104 Milton Kowarsky IT 1800
105 Mareen Bisset Accounts 1200
106 Airton Graue Accounts 1100
107 John Joe Sales 1100
108 Cherry Quir IT 1600
109 Jijo James Sales 2100
110 Jean Justin Sales 1800
111 Paul Chris IT 1900
112 Samuel Jackson Accounts 1300
113 Jovin Jolly Accounts 1200

ANSWER:
With ranking_within_dept as
(
select employee_id,
full_name,
department,
salary,
rank() over(partition by department order by salary) as salary_rank
)
select *
from ranking_within_dept
where salary_rank = 1
Output Table:
employee_id full_name department salary Salary_rank
109 Jijo James Sales 2100 1
111 Paul Chris IT 1900 1
112 Samuel Jackson Accounts 1300 1

b. Calculate the Deliver IT % as ratio of sum of total_deliver_it_units to total_units using the


cdc table vn5018r.deliver_date_item_output_format_dec18. Calculate the percentage at all
levels of product hierarchy (From overall level till reporting_level_4 level) and produce a
single resultant table. Obtain the resultant table using the Union All method as well as
grouping sets method and identify the difference between the 2 methods.
ANSWER:
i. ### Grouping Sets method
select reporting_level_0,
reporting_level_1,
reporting_level_2,
reporting_level_3,
reporting_level_4,
sum(total_deliver_it_units)/sum(total_units) as deliver_it_perc
from vn5018r.deliver_date_item_output_format_dec18
group by
reporting_level_0,
reporting_level_1,
reporting_level_2,
reporting_level_3,
reporting_level_4
grouping sets
(
(),
(reporting_level_0),
(reporting_level_0,reporting_level_1),
(reporting_level_0,reporting_level_1,reporting_level_2),
(reporting_level_0,reporting_level_1,reporting_level_2,reporting_level_3),
(reporting_level_0,reporting_level_1,reporting_level_2,reporting_level_3,reportin
g_level_4)
)

ii. ### Union All method


select reporting_level_0,
reporting_level_1,
reporting_level_2,
reporting_level_3,
reporting_level_4,
sum(total_deliver_it_units)/sum(total_units) as deliver_it_perc
from vn5018r.deliver_date_item_output_format_dec18
group by
reporting_level_0,
reporting_level_1,
reporting_level_2,
reporting_level_3,
reporting_level_4

union all

select reporting_level_0,
reporting_level_1,
reporting_level_2,
reporting_level_3,
'' as reporting_level_4,
sum(total_deliver_it_units)/sum(total_units) as deliver_it_perc
from vn5018r.deliver_date_item_output_format_dec18
group by
reporting_level_0,
reporting_level_1,
reporting_level_2,
reporting_level_3

union all

select reporting_level_0,
reporting_level_1,
reporting_level_2,
'' as reporting_level_3,
'' as reporting_level_4,
sum(total_deliver_it_units)/sum(total_units) as deliver_it_perc
from vn5018r.deliver_date_item_output_format_dec18
group by
reporting_level_0,
reporting_level_1,
reporting_level_2

union all

select reporting_level_0,
reporting_level_1,
'' as reporting_level_2,
'' as reporting_level_3,
'' as reporting_level_4,
sum(total_deliver_it_units)/sum(total_units) as deliver_it_perc
from vn5018r.deliver_date_item_output_format_dec18
group by
reporting_level_0,
reporting_level_1

union all

select reporting_level_0,
'' as reporting_level_1,
'' as reporting_level_2,
'' as reporting_level_3,
'' as reporting_level_4,
sum(total_deliver_it_units)/sum(total_units) as deliver_it_perc
from vn5018r.deliver_date_item_output_format_dec18
group by
reporting_level_0

union all

select
'' as reporting_level_0,
'' as reporting_level_1,
'' as reporting_level_2,
'' as reporting_level_3,
'' as reporting_level_4,
sum(total_deliver_it_units)/sum(total_units) as deliver_it_perc
from vn5018r.deliver_date_item_output_format_dec18
The difference between the 2 methods is that, in the grouping sets method the column that
is not part of the aggregation for a row is by default set as NULL and this NULL can be
changed to any other required value. For example, the row that has been aggregated to
reporting_level_1 hierarchy, will have values under reporting_level_2, reporting_level_3 &
reporting_level_4 as NULL by default. Whereas, using the Union All method, the value must
be set to whatever required by the user. For example, in the above query using the Union
All method, all the columns that are not part of the aggregation are manually set to blank (‘’)
in the code. Moreover, with Grouping Sets method, the code is much shorter, simpler and
optimized.
c. Convert the cdc table vn5018r.have_date_item_level_sku_corr_output_format_dec18
which is at asin-date level to asin level by picking only the latest instance of each asin
based on the date. Implement in a single query through CTE and window function. Window
Function can be used to rank all the date instances of each ASIN. Check the level of the
original source table and resultant ASIN level table and verify.
ANSWER:
### Reducing to ASIN level
with asin_ranking as
(
select *,
row_number() over(partition by asin order by calendar_date desc) as asin_no
from vn5018r.have_date_item_level_sku_corr_output_format_dec18
)
select * from asin_ranking where asin_no = 1
### Level check of initial table
select
count(*),
count(distinct asin,calendar_date)
from vn5018r.have_date_item_level_sku_corr_output_format_dec18
### Level check of ASIN level table

with asin_ranking as
(
select *,
row_number() over(partition by asin order by calendar_date desc) as asin_no
from vn5018r.have_date_item_level_sku_corr_output_format_dec18
),
asin_level as
(
select * from asin_ranking where asin_no = 1
)
select count(*),
count(distinct asin)
from asin_level

d. From the below yearly employee sales table, obtain the total sales for each year without
rolling up the table to year level and by using window functions. Write down the resultant
output table.
year sales_employee sale_amount
2016 John 350
2016 David 425
2017 Melwin 225
2017 George 570
2017 Jack 325
2018 James 260
2018 Jill 780

ANSWER:
Select year,
sales_employee,
sale_amount,
sum(sale_amount) over(partition by year) as total_yearly_sales_amt
from employee_sales_table
Output Table:
year sales_employee sale_amount total_yearly_sales_amt
2016 John 350 775
2016 David 425 775
2017 Melwin 225 1120
2017 George 570 1120
2017 Jack 325 1120
2018 James 260 1040
2018 Jill 780 1040

e. From the Month – ASIN level cdc table


vn5018r.P1M_Amz_Top_100K_All_Exclusions_consolidated obtain the
monthly reporting_level_1 (L1) hierarchy level % distribution of ASINs by
writing a single query using CTEs. The % distribution of an L1 is the number
of ASINs mapped under an L1 for a given month divided by the total number
of ASINs present in that month.

ANSWER:
with month_l1_level_count as
(
select wm_month_name,
reporting_level_0,
reporting_level_1,
count(distinct asin) as l1_count
from vn5018r.P1M_Amz_Top_100K_All_Exclusions_consolidated
group by
wm_month_name,
reporting_level_0,
reporting_level_1
),
month_level_total_count as
(
select wm_month_name,
count(distinct asin) as total_count
from vn5018r.P1M_Amz_Top_100K_All_Exclusions_consolidated
group by
wm_month_name
)
select
l1_level.wm_month_name,
l1_level.reporting_level_0,
l1_level.reporting_level_1,
(l1_level.l1_count/month_level.total_count) as l1_distribution
from month_l1_level_count l1_level
left join month_level_total_count month_level
on l1_level.wm_month_name = month_level.wm_month_name
f. From the below table obtain the total units sold at month-year-company level, at year-
company level and at company level using grouping sets. On the rolled-up table, rank the
year-company combinations and the companies based on the units sold, with a flag
indicating the level which is ranked. Write the resultant output table.
company_nam year month day units_sold
e
X 2016 January Sunday 4200
X 2016 January Friday 3250
X 2016 January Saturday 2425
X 2016 February Tuesday 1450
X 2016 February Saturday 6300
X 2017 January Thursday 4300
X 2017 January Monday 1350
X 2017 February Wednesday 1000
X 2017 February Sunday 4700
X 2017 February Tuesday 1800
Y 2016 January Sunday 4230
Y 2016 January Friday 3251
Y 2016 January Saturday 2426
Y 2016 February Tuesday 1451
Y 2016 February Saturday 6301
Y 2017 January Thursday 4301
Y 2017 January Monday 1351
Y 2017 February Wednesday 1001
Y 2017 February Sunday 4701
Y 2017 February Tuesday 1801

ANSWER:
with roll_up as
(
Select company_name,
year,
month,
sum(units_sold) as total_units
from units_table
group by
company_name,
year,
month
grouping sets
(
(company),
(company, year),
(company, year, month)
)
)
select *,
rank() over(partition by company, year order by total_units desc) as total_units_rank,
‘company_year_level_month_rank’ as ranking_level
from roll_up
where month is not null and year is not null

union all

select *,
rank() over(partition by company order by total_units desc) as total_units_rank,
‘company_level_year_rank’ as ranking_level
from roll_up
where year is not null and month is null
union all

select *,
rank() over(order by total_units desc) as total_units_rank,
‘company_rank’ as ranking_level
from roll_up
where year is null and month is null

Output Table:

company_nam year month total_unit total_units_ran ranking_level


e s k
X 201 January 9875 1 company_year_level_month_rank
6
X 201 Februar 7750 2 company_year_level_month_rank
6 y
X 201 January 5650 2 company_year_level_month_rank
7
X 201 Februar 7500 1 company_year_level_month_rank
7 y
Y 201 January 9907 1 company_year_level_month_rank
6
Y 201 Februar 7752 2 company_year_level_month_rank
6 y
Y 201 January 5652 2 company_year_level_month_rank
7
Y 201 Februar 7503 1 company_year_level_month_rank
7 y
X 201 17625 1 company_level_year_rank
6
X 201 13150 2 company_level_year_rank
7
Y 201 17659 1 company_level_year_rank
6
Y 201 13155 2 company_level_year_rank
7
X 30775 2 company_rank
Y 30814 1 company_rank

END OF Module 2 Practice Questions


Refer to the attached Notebook to view the execution of the codes and their output tables for
the questions that are based on CDC table.

sql_practice_questi sql_practice_questi
ons_ANSWER_KEY.html ons_ANSWER_KEY.ipynb

Вам также может понравиться