Data warehouse fundamentals and fact table design questions

Assessment 1: Data Warehouse
fundamentals, fact table design
Short answer questions
Business Intelligence (BCO6676)
by
01-May-2016
Questions & Answers: Data warehouse fundamentals and fact table design
Assignment 1: Data Warehouse Design
QUESTION 1.
a. Explain why decision support systems have become so important.

Due to the quick expansion of the Internet along the information system technologies, it has
brought a new era to observe data with different perspectives and in real-time. Nowadays,
companies can collect thousands of data in a very short time, but what to do with all this data?
That is why the Decision support systems (DSS) has taken an important role in every single
business that wants to keep expanding, but most importantly, anticipating market trends and
a better understanding of customer needs. The DSS importance is not the huge amount of
information a company can have, but rather the quality and the knowledge that we can get
from it (Sauter, 2002). For that reason, DSS allow executives, managers and different type of
users to support decision making processes, visualising the information in a “What if” analysis
scenario (data sourced from transactional systems) in a quick and summarised way by
dynamic reports, graphics, KPI measurements and under a specific data structure based on
the strategic objectives of an organisation. In this ways, companies can embrace new business
opportunities, anticipate competitors and gain market share.
- Why are they different to transactional database systems?

Because DSS are subject oriented while transactional database systems are process oriented.
This means that DSS base their analysis on the data warehouse sourced by the transactional
database system, which is in charge to run the organisational transactions to add, modify and
delete current records. DSS are more complex, they are based on historical data to summarise
it and convert it in information used to determine a particular subject, supporting decision
making processes to create best practices and strategies within a company.
b. One quoted advantage of implementing a Business Intelligence systems is the

concept of a ‘single version of the truth’ Explain what this term refers to?
Single version of the truth (SVOT) means that a company relies its whole knowledge on a
centralised database (Below, 2011). Organisations need to make clear to employees, where
is the data source located so they can trust and work from it. This process to align the entire
variety of business processes, combining internal and external type of sources, makes the
data irredundant and coherent for employees to maintain the same understanding of the
business. For example information about customers, services, products, vendors, assets and
financial records have just one version of the truth, and that information is unique, consistent
and most reliable within an enterprise. For that reason, Business Intelligence (BI) base its
1
analysis on this information stored in a data repository, and then to be visualised by

presentation tools, such as: multidimensional reports, graphics, dashboards, KPIs, etc, to
support decision making process.
c. What are the properties that a data warehouse star schema must contain explain
each property in detail.
The data warehouse star schema is mainly made up to answer business questions and
anticipate market behaviour. This schema has the following elements: fact tables, dimension
tables and the joins which interrelate the dimension tables with the fact table to make the logical
structure of a data warehouse. Below I will explain in detail each one of this elements.
1. Fact table: This tables are specifically designed to answer questions for individual
business measures and only one can be permitted per star schema. It contains a series
Foreign Key (FK) from the dimensional tables primary key (e.g. Customer ID, product No).
It must include at least one time dimension (e.g. Order date) and some KPIs in order to
measure some variables depend on the record you want to analyse, aggregate and
quantify. Each one of the combination between the dimension (e.g. customer ID, Product
No, order date) represents a different record within a fact table. The level of detail
displayed on an individual record is called granularity, which can be defined by the “by”
words, such as: sales by customers, sales by region, etc.
2. Dimension Tables: Are a series of tables that surround the fact table and help to describe
the object/entity that has been linked with the fact table (e.g. Customer ID, Product ID,
Date). These tables are characterised for having one primary key (PK) which represents
an individual record within a database, and also, store a number of attributes where you
extract specific data from (e.g. Customer ID (PK) with the dimensions: name, suburb,
phone, post code).
3. Joins: Are in charged to link the dimension tables with its fact table. Their functionality is
to maintain the structure and integrity between the two tables, verifying the value from de
FK to the primary key in the dimension table.
QUESTION 2.
a. In terms of Analytics, what is meant by the terms drill down and slicing a cube?
A cube is based on a series of dimensions with specific type of data. These dimensions can
be analysed in detail by these two technics: Drill down/up analysis or slicing analysis, which
its selection depends on what information you want to know deeply.
2
Drill down: refers to the level of granularity for a particular dimension. In other words, this
technic takes the information from the most summarised (Up) to the most detailed (Down) data
in a dimension (Wikipedia, 2016). For example, sales by customer, sale by region and sales
by time-period, the “by” word can summarised the grade of drill down report.
Slicing Analysis: brings information related with a single value within a dimension. This creates
a new cube with fewer dimensions and allows to analyse data on that particular value. For
instance, sales by time can be sliced by the year 2014, the other years are taken out from the
cube reducing the amount of dimensions.
b. What is the difference between a Star Schema fact table and SAP’s Infocube?
The main difference is the way that the structure of the fact table/SAP’s Infocube and their
dimension tables are built from. The traditional star schema is based on a series of dimensional
tables that surround the fact table. This fact table contains dimension primary keys, at least
one time dimension and KPIs within it. This fact table structure, in comparison with SAP’s
Infocube, is much bigger and is not limited to a maximum number of dimensions (Guru99.com,
2016). On the contrary, the SAP’s Infocube is limited to a 16 dimension tables where three of
them are predefined (Time, unit, Info Package), and the 13 left are user defined. This makes
this Infocube much smaller than the traditional fact table, but dimensions are much bigger due
to the connection through Surrogate ID (SID) between master data (divided by attribute, text
and hierarchies) and the Infocube. It is worth to mention that Star schema fact table has master
data, while SAP’s infocube does not. The latter has only SID tables which are linked with
master data, which is different.
c. In relation to a fact table, what does the term granularity refer to?
Granularity refers to the level of detail or characteristics to describe the key figures (KPIs)
within a fact table. In other words is how far you want to drill down on your data, the most
summarised (low granularity) or the most detailed possible (High granularity). E.g. Sales by
customer, sales by region, sales by product or just total sales amount.
- What are the implications of implementing either high or low granularity?

When you implement high granularity within a fact table means that you will have a restricted
amount of data to choose from. When the fact table has too many dimensions, the amount of
data can make the systems slow down by using useless reporting information to answer
business questions. In this way, responding a specific business variable will take longer to
analyse the data, and consequently shifting away from the real value of the information
3
gathered. On the other hand, having low granularity means that the information required to
answer a business question is insufficient to drill down through the data, bringing no too much
knowledge to the company. A good example is to put a time dimension by year. This time
dimension limits the analysis to get conclusion per month to identify trends. The best option
would be to have a medium granularity that allows you to perform a good speed while having
a detailed source from the data obtained. In this way, decision making process has multiplies
possibilities to confront the data against other variables.
d. What are the perceived limitations of the traditional star schema model?
The limitations can be summarised in this four main issues:
1. It does not support multiple languages
2. The system performance is reduced because of the use of alphanumeric primary keys.
3. It is not supported for time dependent changes.
4. It is common to find duplication of dimensional data.
- How did SAP’s extended star schema model address these issues?
SAP resolves most of the traditional star schema issues by using Surrogate ID (SID) tables.
This tables incorporates additional information to describe a variety of languages depending
on the location or country where the information is taking from. This SID tables makes the
systems perform faster and more efficient due to the non-alphanumeric structure of the SID
tables. By giving a numerical format to different dimensions, the system can easily find records.
In the same way, duplication of dimensional data is almost none because of the unique number
given by the SID table to a specific dimension. In regards of time dependent changes, SAP
creates a new record to preserve historical changes. This new records are incorporate to the
master data as: Date From and Date to (Easy-learn-bw.blogspot.com.au, 2013). In this way,
you can keep a time-dependent attribute to be able to find records before and after the effective
date of the change.
QUESTION 3.
a. Describe the three methods that can be used to cater for slowly changing
dimensions.
There are three ways a change can be recorded in a system (1keydata.com, 2016):
1. Overwrite the existing record. With this type of change there is no history kept in the
system, so all the old information is lost. For example, change the location of a current
customer when they moved to another city.
4
2. Create a new record. This type of change preserves the history of the old data plus the
new information added. Following the previous example, in this case both locations are
kept in the customer dimension adding a new row into the table. The only problem with
this method is that reduces the performance of the ETL process due to the size of the
table.
3. Add a new field to the record. For this method part of the history is preserved by creating
two new columns or fields. One with the current information added, and the other one
with the effective date of that change.
b. Employees with an organisation could potentially be transferred to several

departments over their working history. How are strategies implemented to ensure
an employee is reported to be at the department they are assigned to at a specific
time?
Once an organisation is notified that an employee has been promoted and moved to another
department or has changed location, the IT department is in charge of modifying the
employee’s dimension within the database. By creating a new record including two fields in
this dimension: Date from and date to (effective date of change), companies can deal and
preserve changes throughout their employee’s history.
- What are the consequences of ignoring this issue?

The consequences can go from payroll problems (salary adjustment), redundancy data,
duplicated information, issues to measure key performance information (KPIs) issues and tax
deduction discrepancies among others.
QUESTION 4.
In your own words, define each of the following SAP terms:
(i) Characteristics: Helps to define the master data through specific information related
with a dimension table. Within a dimension you can find characteristics such as: product
No, product description, product category, unit/size which helps to bring information to
measure KPIs.
(ii) InfoAreas: are folders where all the information related with characteristics and KPIs
are stored.
(iii) InfoCatalog: is a subcategory of the infoareas where characteristics and key figures
(KPIs) have independent catalogues with information grouped by specific criteria.
5
(iv) Key Figures: refer to the KPIs to measure a specific business question. These
indicators can be classified by amount (currency), quantity (Unit), numeric and time/date.
(v) Master Data: Is a dimension table with related data to describe a particular record
surrounding an infocube (Fact table). This master data is determined by three variables:
attributes, text and hierarchies.
(vi) Surrogate keys: or well known as Surrogate ID tables (SID), makes the systems
perform faster and more efficient due to the numeric structure of these tables. By giving
a numerical format to different dimensions, the system can easily find records instead of
looking for alphanumeric data.
(vii) Dimensions: Although they can be related with master data, this dimensions are not
considered master data. Instead, dimensions tables are based on SID tables linking the
information extracted from the master data. This dimension build the infocube in a SAP
extended star schema model.
QUESTION 5
The following table represents transactional data the needs to be stored into the fact tables
below.
Note: SalesRevenue represents a key performance measure and has been set to aggregate
data.
Date SalesRep Region Product SalesRevenue

21.2.2012 S1 N P1 300
21.2.2012 S1 N P2 200
22.2.2012 S1 N P1 150
23.2.2012 S2 E P1 300
24.2.2012 S1 W P2 250
25.2.2012 S2 E P1 100
26.2.2012 S3 W P1 80
26.2.2012 S1 S P2 150
27.2.2012 S2 E P1 50
28.2.2012 S1 N P1 60
29.2.2012 S1 E P2 30
29.2.2012 S2 E P2 60
29.2.2012 S2 S P1 400
a. For each of the fact table below, show the data that would be transferred and stored
in it after the transfer of the transactional data above.
Fact table 1: Sales Revenue measured by Year, Month, SalesRep, Region and Produc
Year Month SalesRep Region Product SalesRevenue

2012 2 S1 N P1 510
2012 2 S1 N P2 200
6
2012 2 S1 S P2 150
2012 2 S1 W P2 250
2012 2 S1 E P2 30
2012 2 S2 S P1 400
2012 2 S2 E P1 450
2012 2 S2 E P2 60
2012 2 S3 W P1 80
Total Sales Revenue 2130
Fact table 2: Sales Revenue measured by Year, Month, Region
Year Month Region SalesRevenue

2012 2 N 710
2012 2 S 550
2012 2 E 540
2012 2 W 330
Total Sales Revenue 2130
b. Explain the term granularity. Which of the two fact tables above has the greater
granularity?
Based on the previous question 2.C, granularity refers to the level of detail or characteristics
to describe the key figures (KPIs) within a fact table. In other words is how far you want to drill
down on your data, the most summarised (low granularity) or the most detailed possible (High
granularity). In the facts table above, the one with greater granularity is the FACT TABLE 1.
QUESTION 6.
a. Students are required to design a standard star scheme to meet the above
requirements.
Sales revenue = QtyPurchased x UnitSalesPrice
7
Dimension
Dimension Part
Customer
PartNO (pk) CustomerNO (pk)
PartDescription Name
QtyOnHand Street
UnitPrice Suburb
CategoryNO (fk) Postcode
CategoryName IndustryNO (fk)
IndustryName
Balance
Fact table
Sales
(Revenue)
CustomerNO
PartNO
SalesPersonNO
OrdDate
QtyPurchased
UnitSalePrice
Dimension Dimension
Order Date Sales Person
Year SalesPersonNO (pk)
Month SalesPersonName
Day DepartmentNO (fk)
DepartmentName
RegionNO (fk)
RegionName
b. Students are required to transform their design in part (a) to match SAP’s extended
star schema model.
Sales revenue = QtyPurchased x UnitSalesPrice
Atribute Text Hierarchies Atribute Text Hierarchies

CustomerNO Name CustomerNO SalesPersonNO SalesPersonName SalesPersonNO
IndustryNO (fk) Street IndustryNO DepartmentNO (fk) DepartmentName SalesPersongroup
Suburb RegionNO (fk) RegionName
Postcode
Balance
SalesPerson_SID
Customer_SID InfoCube Sales (Revenue) SalesPersonNO
CustomerNO
DIM-ID DIM-ID
Customer Sales Person
Cutomer_SID SalesPerson_SID
DIM-ID DIM-ID DIM-ID DIM-ID Qty Unit Sales

Customer Part SalesPerson Order Date Purchased Price
DIM-ID DIM-ID
Part Order Date
Part_SID OrdDate_SID
Part_SID OrderDate_SID
PartNO OrderDate

PartNO PartDescription PartNO Year TimePeriod
CategoryNO (fk) CategoryName PartGroup Month
QtyOnHand Day
UnitPrice
8
c. A sales person over time can move to different regions and the company would like
to record this fact. Indicate two ways this situation can be modelled in your design.
You may need to redesign your model
1. First method is by adding a new record to the Sales Person master data. Sales person
number changes as well as the region. The previous record is preserved in the data base.
Atribute Text Hierarchies
SalesPersonNO SalesPersonName SalesPersonNO
Atribute Text Hierarchies DepartmentNO (fk) DepartmentName SalesPersongroup
CustomerNO Name CustomerNO RegionNO (fk) RegionName
IndustryNO (fk) Street IndustryNO
Suburb
Postcode
Balance SalesPerson_SID
SalesPersonNO SalesPersonName RegionNO
S1 David Sanabria R1
Customer_SID InfoCube Sales (Revenue) S2 David Sanabria R2
CustomerNO
DIM-ID DIM-ID

DIM-ID DIM-ID
Part Order Date
PartNO OrderDate
ID Name Region
S1 Bill Smith R1
S2 Anne Jones R2
ID SID
Atribute Text Hierarchies Atribute Text Hierarchies S1 1
S2 2
CategoryNO (fk) CategoryName PartGroup Month DIMID Sale SID R
1 1
QtyOnHand Day 2 2
UnitPrice
2. The second method is by adding two fields to the same record in the master data: New
region and the effective date of the transfer.
Atribute Text Hierarchies

SalesPersonNO SalesPersonName SalesPersonNO
Atribute Text Hierarchies DepartmentNO (fk) DepartmentName SalesPersongroup
CustomerNO Name CustomerNO RegionNO (fk) RegionName
IndustryNO (fk) Street IndustryNO
Suburb
Postcode
Balance SalesPerson_SID
OLD NEW Effective
SalesPersonNO SalesPersonName
RegionNO RegionNO Date
S1 David Sanabria R1 R2 1/05/2016
Customer_SID InfoCube Sales (Revenue)
CustomerNO
DIM-ID DIM-ID

DIM-ID DIM-ID
Part Order Date
PartNO OrderDate

CategoryNO (fk) CategoryName PartGroup Month
QtyOnHand Day
UnitPrice
9
QUESTION 7
Create a star schema diagram that will FIT-WORLD GYM INC. to analyse their revenue. The
fact table will include: for every instance of revenue taken – attributes(s) useful for analysing
revenue.
- The star schema will include all dimensions that can be useful for analysing revenue.
- The only data sources available are shown below.
Answer:
Dimension Dimension
Membership Merchandise
MshpID (pk) MrchID (pk)
MshpName MrchName
MshpPrice MrchPrice
Dimension
One Day Pass Guess Fact table Revenue
Pass
PassID (pk) MshpID
PassDate MrchID
PassCatID (fk) PassID
CatName CorpCustID
Price SalesTranDate
MembID (fk) QtyMshpSold
MshpUnitPrice
QtyMrchPurchased
MrchUnitPrice
QtyCorpCust
CorpAmountCharge
Dimension Dimension
Sales Transaction Special Events
STrID (pk) CorpCustID (pk)
SalesTranDate CorpCustName/Location
MembID (fk) Event Type Code
Event Type Code
Event Date
Amount Charged
Note: The revenue is calculated from the following formula:
Total Revenue = (Qty Membership sold x MembershipUnitPrice) + (Qty Merchandise x

MerchandiseUnitPrice) + (Qty CorporateCustomers x CorporateAmountCharge)
10
QUESTION 8
1. This first fact table is analysing the number of transaction per day for each plant, sale
channel (Internet and Warehouse) and product.
Dimension Dimension
Plant Sale Channel
PlantNO (pk) ChannelNO (pk)
PlantName ChannelName
CountryNO (fk)
RegionNO (fk)
Dimension Dimension Fact table

Category Product by Plant/Channel
CategoryNO (pk) ProductNO (pk) PlantNO
CategoryName ProdDescription ChannelNO
UnitPrice ProductNO
CategoryNO (fk) SalesOrderNO
QtyOnHand Day
QtySold
UnitSalesPrice
Dimension Dimension
Order Date Sales Order
Day SalesOrderNO (pk)
Month CustomerNO (fk)
Year ProductNO (fk)
OrderDate
SalesQty
UnitSalesPrice
2. This second fact table is answering the business question of customer sales by
country, sales by region and product sold per those locations.
Dimension Dimension Dimension

Customer Product Category
CustomerNO (pk) ProductNO (pk) CategoryNO (pk)
CustomerName ProdDescription CategoryName
Street UnitPrice
Postcode CategoryNO (fk)
RegionNO (fk) QtyOnHand
CountryNO (fk)
Dimension Fact table

Country by Country/Region
CountryNO (pk) CustomerNO
CountryName ProductNO
CountryNO
RegionNO
OrderDate
QtySold
UnitSalesPrice
Dimension Dimension
Order Date Region
Day RegionNO (pk)
Month RegionName
Year
11
REFERENCES
1. Below, P. (2011). QSM. [online] The myth of the single version of the truth. Available at:
http://www.qsm.com/Blog/The%20Myth%20of%20the%20Single%20Version%20of%20t
he%20Truth_BELOW012310.pdf [Accessed 17 Apr. 2016].
2. Easy-learn-bw.blogspot.com.au. (2013). SAP BI, SAP BW: Time dependent attributes.
[online] Available at: http://easy-learn-bw.blogspot.com.au/2013/06/time-dependent-
attributes.html [Accessed 25 Apr. 2016].
3. Guru99.com. (2016). All about classical extended star schema. [online] Available at:
http://www.guru99.com [Accessed 24 Apr. 2016].
4. 1keydata.com. (2016). Type 3 Slowly Changing Dimension. [online] Available at:
https://www.1keydata.com/datawarehousing/slowly-changing-dimensions-type-3.html
[Accessed 25 Apr. 2016].
5. Oracle, (2003). Understanding Star Schemas. [online] Gkmc.utah.edu. Available at:
http://gkmc.utah.edu/ebis_class/2003s/Oracle/DMB26/A73318/schemas.htm [Accessed
17 Apr. 2016].
6. Sauter, V. (2002). Decision Support Systems (DSS). [online] Umsl.edu. Available at:
http://www.umsl.edu/~sauterv/analysis/488_f02_papers/dss.html [Accessed 16 Apr.
2016]. [Accessed 17 Aug. 2015].
7. Wikipedia. (2016). OLAP cube. [online] Available at:
https://en.wikipedia.org/wiki/OLAP_cube [Accessed 24 Apr. 2016].
12

Data warehouse fundamentals and fact table design questions

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Data warehouse fundamentals and fact table design questions

Загружено:

Авторское право:

Доступные форматы

Assessment 1: Data Warehouse

fundamentals, fact table design

Short answer questions

Business Intelligence (BCO6676)

a. Explain why decision support systems have become so important.

- Why are they different to transactional database systems?

b. One quoted advantage of implementing a Business Intelligence systems is the

analysis on this information stored in a data repository, and then to be visualised by

- What are the implications of implementing either high or low granularity?

b. Employees with an organisation could potentially be transferred to several

- What are the consequences of ignoring this issue?

In your own words, define each of the following SAP terms:

Date SalesRep Region Product SalesRevenue

Year Month SalesRep Region Product SalesRevenue

Fact table 2: Sales Revenue measured by Year, Month, Region

Year Month Region SalesRevenue

Sales revenue = QtyPurchased x UnitSalesPrice

Atribute Text Hierarchies Atribute Text Hierarchies

DIM-ID DIM-ID DIM-ID DIM-ID Qty Unit Sales

Atribute Text Hierarchies Atribute Text Hierarchies

DIM-ID DIM-ID DIM-ID DIM-ID Qty Unit Sales

Atribute Text Hierarchies

DIM-ID DIM-ID DIM-ID DIM-ID Qty Unit Sales

Atribute Text Hierarchies Atribute Text Hierarchies

Note: The revenue is calculated from the following formula:

Total Revenue = (Qty Membership sold x MembershipUnitPrice) + (Qty Merchandise x

Dimension Dimension Fact table

Dimension Dimension Dimension

Dimension Fact table

Вам также может понравиться