Вы находитесь на странице: 1из 8

Data Warehouse Testing

Neveen ElGamal
Faculty of Computers and Information,
Cairo University,
Giza, Egypt

Supervised by:

Ali ElBastawissy

Galal Galal-Edeen

Faculty of Computers and Information,

Cairo University, Giza, Egypt

Faculty of Computers and Information,

Cairo University, Giza, Egypt


Thesis State: Middle

During the development of the data warehouse (DW), too much
data is transformed, integrated, structured, cleansed, and
grouped in a single structure that is the DW. These various types
of changes could lead to data corruption or data manipulation.
Therefore, DW testing is a very critical stage in the DW
development process.
A number of attempts were made to describe how the testing
process should take place in the DW environment. In this paper,
I will state briefly these testing approaches, and then a proposed
matrix will be used to evaluate and compare these approaches.
Afterwards, I will highlight the weakness points that exist in the
available DW testing approaches. Finally, I will describe how I
will fill the gap in the DW testing in my PhD by developing a
DW Testing Framework presenting briefly its architecture.
Then, I will state the scope of work that I am planning to
address and what type of limitations that exist in this area that I
am expecting to experience. In the end, I will conclude my
work and state possible future work in the field of DW testing.


Testing the DW system outputs with the data in the data

sources is not the best approach to experiment the quality of
the data. However, this type of test is an informative test that
will take place at a certain point in the testing process but the
most important part in the testing process should take place
during the DW development. Every stage and every
component the data passes through should be tested to
guaranty its efficiency and data quality preservation or even



DW always answers Ad-hoc queries, which makes it

impossible to test prior to system delivery. On the other
hand, all functions in any computer application realm
are predefined.


DW testing is data centric, while application is code



DW always deals with huge data volumes.


The testing process in other systems ends with the

development life-cycle while in DWs it continues after
the system delivery.


Software projects are self contained but a DW project

continues due to decision-making process requirement
for ongoing changes [8].


Most of the available testing scenarios are driven by

some user inputs while in DW most of the tests are
system-triggered scenarios.


Volume of test-data in DW is considerably large

compared to any other testing process.


In the data warehousing process, data passes through several

stages each of which causes a different kind of changes to the
data to finally reach the user in a form of a chart or a report.
There should be a way of guaranteeing that the data in the
sources is the same data that reaches the user, and the data
quality is improved; not lost.
2013 Association for Computing Machinery. ACM
acknowledges that this contribution was authored or co-authored
by an employee, contractor or affiliate of the national
government of Egypt. As such, the government of Egypt retains
a nonexclusive, royalty-free right to publish or reproduce this
article, or to allow others to do so, for Government purposes

EDBT/ICDT '13 , March 18 - 22 2013, Genoa, Italy

Copyright 2013 ACM 978-1-4503-1599-9/13/03$15.00

The Challenges of DW testing

It is widely agreed upon that the DW is totally different from

other systems, such as Computer Applications or
Transactional Database Systems. Consequently, the testing
techniques used for these other systems are inadequate to be
used in DW testing. Here are some of the differences as
discussed in [6-8];


In other systems test cases can reach hundreds but the

valid combinations of these test cases will never be
unlimited. Unlike the DW, the test cases are unlimited
due to the core objective of the DW that allows all
possible views of data. [7].


DW testing consists of different types of tests

depending on the time the test is taking place and the
component being tested. For example; Initial data load
test is different from the incremental data load test.
One of the core challenges of testing DWs or providing
techniques for testing DWs is its flexible architecture. DW
systems could have different architectures according to
business requirements, DW required functionalities, and/or
budget/time constraints.


Data Warehouse Architecture

As shown in figure 1, a global DW system consists of a

number of inter-related components:
Data Sources (DS)
Operational Data Store (ODS)/ Data Staging Area
Data Warehouse (DW)
Data Marts (DM)
And, User Interface (UI) Applications .Ex; OLAP
reports, Decision Support tools, and Analysis Tools

Figure 1.

DW System Architecture

Each component needs to be tested to verify its efficiency

independently. The connections between the DW
components are groups of transformations that take place on
data. These transformation processes should be tested as well
to ensure data quality preservation. The outputs of the DW
system should be compared with the original data existing in
the DSs. Finally, from the operational point of view the DW
system should be tested for performance, reliability,
robustness, recovery, etc
The remainder of this paper will be organized as follows;
Section 2 briefly surveys the existing DW testing approaches
and introduces our DW testing matrices that we used to
compare and evaluate the existing DW testing approaches.

Section 3 will analyze the comparison matrix to highlight the

drawbacks and weaknesses that exist in the area of DW
testing and the requirements for a DW testing framework.
Section 4 presents my expected contribution in the PhD to
fill the gap of DW testing by introducing a new DW testing
framework. Section 5 will present the architecture of the
proposed DW testing Framework and Section 6 will state the
scope of work and expected limitations that we are expecting
to experience during our work. Finally I will conclude my
work in section 7.




DW Testing Approaches

A number of trials were made to address the DW testing

process. Some were made by companies offering consultancy
services for DW testing like [5, 13, 14, 17, 22]. Others, to fill
the gap of not finding a generic DW testing technique, have
proposed one as a research attempt like [1-4, 8, 11, 15, 18,
23]. A different trend was taken by authors and organizations
to present some automated tools for the DW testing process
like [12, 16, 22], and from a different perspective, Some
authors presented a DW testing methodology like [2, 14, 21].
The rest of this section will briefly introduce these
approaches in groups according to similarities, and a
comparison between them will be presented later in the
following sections.
The approaches presented in [1-3, 14, 15] have adapted some
of the software testing types like;

Unit testing,


Integration Testing,


System Testing,


User Acceptance Testing,


Security Testing,


Regression Testing,


Performance Testing,

and extended them to support the special needs of the DW

testing. What was a great advantage for these approaches is
that they had a solid background which is the well defined
and formulated software testing that helped them in
presenting a DW testing approach that is not so far from the
system testers to understand and use.
Other attempts like [4, 13, 23] focused on addressing the
testing of the most important part of the Data Warehousing
process which is the ETL process. [13] addressed the process
of DW testing from 2 high-level aspects;

The underlying Data which focuses on the data

coverage and data compliance with the
transformation logic in accordance with the
business rules,

DW components which focuses on the

Orchestration, and regression testing.

What was unique in the approach presented in [4] is that they

concentrated on the data validation of ETL and presented 2

alternatives for the testing process either the White Box

Testing where the data is tracked through the ETL process
itself, or the Black Box Testing where only the input and the
output data of the ETL process is validated.
The Research attempts presented by the two research groups
from DISI -Bologna University (Matteo Golfarelli and
Stefano Rizzi) and Slovak University of Technology (Pavol
Tanuka, Oliver Moravk, Pavel Vaan Fratiek Miksa,
Peter Schreiber, Jaroslav Zeman, Werner Verschelde, and
Michal Kopcek) where the richest attempts that addressed the
DW testing from various perspectives. In [18], the authors
suggested a proposal for basic DW testing activities
(routines) as a final part of the DW testing methodology.
Other parts of the methodology were published in [19-21].
The testing activities can be split into four logical units

Multidimensional database testing,

Data pump (ETL) testing,
Metadata and,
OLAP testing.

The authors then highlighted how these activities split into

smaller more distinctive activities to be performed during the
DW testing process.
In [8], the authors introduced DW testing activities (routines)
framed within a DW development methodology introduced
in [9]. They have stated that the components that need to be
tested are, Conceptual Schema, Logical Schema, ETL
Procedures, Database, and Front-end. To be able to test these
components, they have listed eight test types that best fit the
characteristics of DW systems. These test types are:
Functional test
Usability test
Performance test
Stress test
Recovery Test
Security test
Regression test
A comprehensive explanation of how the DW components
are being tested by the above testing routines is then explored
showing what type of test(s) is suitable for which component
as shown in table 1.
The authors then customized their DW testing technique to
present a prototype-based methodological framework [11].
Its main features are (1) Earliness to the life-cylce, (2)
modularity, (3) Tight coupling with design, (4) Scalability,
and (5) measurability throught proper matrics. The latest
piece of work that this research group presented in [10] was a
number of data-mart specific testing activities, classified
interms of what is tested and how it is tested. The only
drawback with this research groups work is that its DW
architecture does not include a DW component. The data is
loaded from the data sources to the data marts directly. Their
DW architecture consists of a number of data marts
addressing different business areas.


DW components Vs testing types [8]

The approaches presented in [12, 16, 22] are CASE tools

specially designed to address the DW Testing. TAVANT
product named One-Click Automation Framework for Data
Warehouse Testing is a tool that supports DW testing
process that works concurrently with the DW Development
Life Cycle [22]. It imbeds the test cases in the ETL process
and insures that all stages of ETL are tested and verified
before subsequent data loads are triggered. Their DW
Testing methodology is incremental and is designed to
accommodate changing DW schema arising from evolving
business needs.
The QuerySurge CASE tool developed by RTTS [16] is a
tool that assists the DW testers in preparing and scheduling
query pairs to compare data transformed from the source to
the destination, for example; preparing a query pair one that
runs on a DS and the other on the ODS to verify the
completeness, correctness, and consistency of the structure of
data and the data transformed from the DS to ODS.
Inergy is a company specialized in designing, developing and
managing DWs and BI solutions [12]. They have developed
a tool to automate the ETL testing process. They did not
build their tool from scratch. Instead, they used some existing
tools like DbUnit (Database Testing Utility) and Ant Task
Extension (Java-based build tool) and they presented an
extension to the DbUnit by applying some DW specific logic

Which ETL process to run at what time

Which dataset should be used as a reference

What logic to repeat and for which range

They also developed a PowerDesigner script that extracts the

core of the test script based on the data model of the DW.
Unfortunately, not enough data were available for the
approaches [5, 17] as they are two approaches developed by
companies offering consultancy services and reveling their
DW testing techniques could negatively affect their business.
After we introduced the previous work that had been made in
the DW testing now it is time to exhaustively study, compare
and evaluate them. In order to do such a comprehensive
study, we have defined a group of DW testing matrices in


DW Testing Matrices

As we have presented previously in [6], the DW testing

matrices classifies tests or test routines for clarificationaccording to where, what, and when they take place;
WHERE: presents the component of the DW that this test
targets. This divides the DW architecture as shown in
figure 1 into the following layers:
Data Sources to Operational Data Store: Presents
the testing routines targeting data sources, wrappers,
extractors, transformations and the operational data
store itself.
o Operational Data Store to DW: Presents the testing
routines targeting the loading process, and the DW
o DW to Data Marts: Presents the testing routines
targeting the data marts and the transformations that
take place on the data used by the data marts and the
data marts themselves.
o Data Marts to User Interface: Presents the testing
techniques targeting the transformation of data to the
interface applications and the interface applications
WHAT: presents what these routines will test in the
targeted component.
o Schema: focuses on testing DW design issues.
o Data: concerned with all data related tests like data
quality, data transformation, data selection, data
presentation, etc
o Operational: includes tests that are concerned with
the process of putting the DW into operation.
WHEN will this test take place?
o Before System Delivery: A onetime test that takes
place before the system is delivered to the user or
when any change takes place on the design of the
o After System Delivery: Redundant test that takes place
several times during system operation.
The what, where and when testing categories will result
in a three dimensional matrix. As shown in table 2, the rows
represent the where dimension, the columns represent the
what dimension, and later on the when dimension shall be
represented in color in the following section. Each cell of this
table will consist of the group of test routines that addresses
this combination of dimension members when this matrix is
used to compare the existing DW testing approaches.
Table 2:

DW Testing Matrices
Schema Data

Frontend DMUI


Comparison and Evaluation of
Surveyed Approaches
After studying how each proposed DW testing approach
addressed the DW testing and according to the DW testing
matrices defined in the previous section, a comparison matrix
is presented in table 3 showing the test routines that each
approach covers. The DW testing approaches are represented
on the columns, the what and where dimensions classify
the test routines on the rows. The intersection of rows and
columns indicates the coverage of the test routine in this
approach where represents full coverage and
represents partial coverage. Finally, the when dimension
that indicates whether this test takes place before or after
system delivery is represented by color highlighting the tests
which take place after the system delivery, while the tests
that take place during the system development or when the
system is subject to change are left without color
We were able to compare only 10 approaches, as not enough
data was available for the rest of the approaches.
As it is obvious in table 3, none of the proposed approaches
addressed the entire DW testing matrices. This is simply
because each approach addressed the DW testing process
from its own point of view without leaning on any standard
or general framework. Some of the attempts considered only
parts of the DW framework shown in figure 1. Other
attempts used their own framework for the DW environment
according to the case they are addressing. For example; [8]
used a DW architecture that does not include either an ODS
or DW Layers. The data is loaded from the Data Sources to
the Data Marts directly. This architecture makes the Data
Marts layer acts as both the DW and the Data Mart
interchangeably. Other approaches like [1, 14, 15] did not
include the ODS layer.
From another perspective, there are some test routines that
are not addressed by any approach like; Data Quality factors
like accuracy, precision, continuity, etc... Some major
components of the DW were not tested by any of the
proposed approaches which is the DM Schema and the
additivity of measures in the DMs.

Based on carefully studying the available DW testing
approaches using the DW testing matrices presented
previously to analyze, compare and evaluate them, it is
evident that the DW environment lacks the following:
1. The existence of a generic, well defined, DW testing
approach that could be used in any project. This is
because each approach presented its testing techniques
based on its DW architecture which limits the
reusability of the approach in other DW projects with
different DWs architectures.
2. None of the existing approaches included all the test
routines needed to guarantee a high quality DW after

Table 3: DW Approaches Comparison



Requirement testing
User Requirements coverage
ODS Logical Model
Field Mapping
Data type constraints
Aspects of Transformation rules
Correct Data Selection
Integrity Constraints
Parent-child relationship
Record counts
Duplicate Detection
Threshold Test
Data Boundaries
Data profiling
Random record comparison
Surrogate keys

Operation Review job procedures

Error messaging
Processing time
Integration testing
Rejected record
Data access





DW Conceptual Schema:
DW Logical Model:
Integrity Constraints
Threshold test
Data type constraints
Hierarchy level integrity
Derived attributes checking
Record counts
No constants loaded
Null records
Field-to-Field test
Data relationships
Data transformation
Duplicate detection
Value totals
Data boundaries
Quality factors
Compare transformed data with
expected transformation
Data Aggregation
Reversibility of data from DW
to DS
Confirm all fields loaded
Simulate Data Loading

Operation Document ETL Process

ETL Test
Scalability test
Initial load test
Incremental load test
Regression Test
Data Access


DM Schema Design
Calculated Members
Irregular hierarchies
Correct data filters
Additivity Guards

Reports comply with

Report structure
Report cosmetic checks (font,
color and format)
Graph cosmetic checks (type,
color, labels, and legend)
Column headings
Drilling Across Query Reports









Correct data displayed

Trace report fields to Data
Field data verification

Performance test (Response

Operation time, # of queries, # of users)
Stress test
Audit user accesses
Refresh time for standard and
complex reports



Dependencies between levels

Functionality meets
Object definition
Data providers
Data type compatibility

Operation Error logging

HW Configuration

SW setup in test environment

System connections setup
Security Test
Final Source to target
Test Through Development


None of the existing approaches took into consideration

the dependencies between test routines because
sometimes it is not mandatory to perform all test
routines. However, some tests if passed successfully
others could be neglected.
4. The approaches proposed in [18, 19, 21] were the only
ones focusing on both the DW testing routines and the
life cycle of the testing process. The life cycle of each
test routine includes a test plan, test cases, test data,
termination criteria, and test results. Nevertheless, they
presented the two approaches independently not
showing how the testing routines can fit in a complete
DW testing life cycle.
5. In several projects testing is neglected or diminished
due to time or resource problems. None of the existing
approaches included any differentiation technique or
prioritization for the test routines according to impact
on the overall DW quality in order to help the testers
select the important test routines that highly affect the
quality of the DW in case they have resource or time
limitations that force them to shorten the testing
6. Some of the above test routines could be automated but
none of the proposed approaches showed how these
routines could be automated or have an automated
assistance using custom developed or existing tools.
These drawbacks urged us to think of a new DW testing
approach that fills the gaps in this area as well as benefit
from the efforts made by others. The following section will
include description of our proposed DW testing framework.

In my PhD I am planning to develop a DW testing
framework that is generic enough to be used in several DW
projects with different DW architectures. The frameworks
primary goal is to guide the testers through the testing
process by recommending the group of test routines that are
required given the projects customized DW architecture.
The proposed framework is supposed to include definitions
for the test routines. The main target of our research is to
benefit from existing DW testing approaches by adopting the
test routine definitions from available testing approaches and
define test routines that were not addressed or
comprehensively defined by any previous approach.
In our study we will prioritize the test routines according to
importance and impact on the output product, so that the
tester could select the tests that highly affect the quality of
the delivered system if any scheduling or budget limitations
were faced.
Part of the test routine definition should include how the test
routines can be automated or get automatic support if full

automation is not applicable. It should also include

suggestions for using existing automated test tools to
minimize the amount of work done to get automated support
in the DW testing process.
One of the core features that we intend to include in our
proposed testing framework is testing along with the DW
development life cycle. This could be done by stating the
pre-requisites of each testing routine with respect to the DW
development life cycle stages in order to detect errors as
early as possible.
The frameworks infrastructure is planned to take the form of
a directed graph. Nodes of the graph will represent test
routines, and links between nodes will represent test routine
relationships. It is expected that there will be several types of
relationships between test routines. For example, dependency
between test routines, tests that are guaranteed to succeed if
other tests passed successfully, or some tests should never be
performed unless other tests pass.
After defining the DW testing framework and fully
documenting it, I am planning in my PhD to materialize the
DW testing framework by developing a web service that
makes the framework reachable for testers.
Finally, a case study will be used to experiment the testing
framework in a real world DW project and the results of the
experimentation will be used to evaluate the proposed DW
testing framework.

This section will present the architecture of the framework
proposed to show the work flow of the framework when it is
put into operation. As Shown in figure 2, the key player of
the DW testing process is the Test Manager who feeds the
system with the DW architecture under test and the current
state of the DW. In other words, which component of the
DW is developed so far. This step is done because we
support the testing through system development in our
proposed framework.
The DW Architecture Analyzer component then studies the
received data and compares it with the dependencies between
test routines from the Test Dependency Graph with the
assistance of the Test Dependency Manager component and
passes the data to the Test Recommender to generate an
Abstract Test Plan.
The process of preparing the Detailed Test plan then splits
into two different directions according to the type of test
routine whether it is a validation test or a verification test. In
case of validation test types, which are the more complicated
testing routines, the Validation Manager involves the
Business Expert(s) and System User(s) in addition to
accessing the relevant data from the system Repository to
prepare the part of the test plan concerns the validation test
routines. For the Verification Test Routines, the Verification
Manager along with the Test Case Generator and Test Data
Generator modules helps in preparing the Detailed Test Plan
of verification test routines.

Figure 2: Proposed DW Testing Framework Architecture

The Verification Manager involves the system tester(s) and

DB administrator for assistance in the process of Test Case
preparation and test data generation (in case the DW is still
in the development phase and no real data is available).

defined and implemented. This is because a considerable

number of the existing approaches are industrial
organizations and revealing such information could highly
affect the organizations novelty in the field of DW testing.

To benefit from the existing test automation tools for DW,

the Verification Manager will include in the Detailed Test
Plan possible use of existing Test Automation Tools in test
routines that could be automated.




In My PhD I will not take into consideration:

1. The testing routines that targets unconventional
DWs like temporal DWs, active DWs, Spatial
DWs, DW2.0, etc...
2. Testing routines that targets the last layer of the
DW Architecture presented in table 1 (DM to UI)
since it is a broad range of different application
types spread with different architectures and
implementation techniques which will be very hard
to define a generic testing routines for them.
3. Test routines checking the data quality of the data
sources; however, we will consider all the sources
to be of low quality.
In the process of assimilating the existing DW testing
approaches into our proposed DW testing framework, we are
expecting to face some difficulties regarding the availability
of data and the knowhow of the way testing routines are

Some trials have been carried out to address the DW testing,

most of them were oriented to a specific problem and none of
them were generic enough to be used in other data
warehousing projects.
By the end of my PhD, I will present a generic DW testing
framework that integrates and benefits from all existing DW
testing trials. It will guarantee to the system user that the data
quality of the data sources is preserved or even improved due
to the comprehensive testing process that the DW had passed
Future works in this field could be extending this testing
framework to support nonconventional DWs like Active
DW, Temporal DW, Spatial DW, etc. This could be done by
defining a specialization for test routines to address the
special needs for these DWs.




Bateman, C. Where are the Articles on Data

Warehouse Testing and Validation Strategy?
Information Management, 2002.




Bhat, S. Data Warehouse Testing - Practical Stick

Minds, 2007.
Brahmkshatriya, K. Data Warehouse Testing Stick
Minds, 2007.
Cooper, R. and Arbuckle, S., How to Throughly Test a
Data Warehouse. in Software Testing Analysis and
Review (STAREAST), (Orlando, Florida, 2002).
CTG. CTG Data Warehouse Testing, 2002.
ElGamal, N., ElBastawissy, A. and Galal-Edeen, G.,
Towards a Data Warehouse Testing Framework. in
IEEE 9th International Conference on ICT and
Knowledge Engineering (IEEE ICT&KE), (Bangkok,
Thailand, 2011), 67-71.
Executive-MiH. Data Warehouse Testing is Different,
Golfarelli, M. and Rizzi, S. A Comprehensive
Approach to Data Warehouse Testing ACM 12th
international workshop on Data warehousing and
OLAP (DOLAP '09), Hong Kong, China, 2009.
Golfarelli, M. and Rizzi, S. Data Warehouse Design:
Modern Principles and Methodologies. McGraw Hill,
Golfarelli, M. and Rizzi, S. Data Warehouse Testing.
International Journal of Data Warehousing and
Mining, 7 (2). 26-43.
Golfarelli, M. and Rizzi, S. Data Warehouse Testing:
A prototype-based methodology. Information and
Software Technology, 53 (11). 1183-1198.
Inergy. Automated ETL Testing in Data Warehouse
Environment, 2007.
Mathen, M.P. Data Warehouse Testing InfoSys, 2010.






Munshi, A. Testing a Data Warehouse Application

Wipro Technologies, 2003.
Rainardi, V. Testing your Data Warehouse. in
Building a Data Warehouse with Examples in SQL
Server, Apress, 2008.
RTTS. QuerySurge, 2011.
SSNSolutions. SSN Solutions, 2006.
Tanuka, P., Moravk, O., Vaan, P. and Miksa, F.,
The Proposal of Data Warehouse Testing Activities. in
20th Central European conference on Information and
Intelligent Systems, (Varadin, Croatia, 2009), 7-11.
Tanuka, P., Moravk, O., Vaan, P. and Miksa, F.,
The Proposal of the Essential Strategies of Data
Warehouse Testing. in 19th Central European
Conference on Information and Intelligent Systems
(CECIIS), (2008), 63-67.
Tanuka, P., Schreiber, P. and Zeman, J. The
Realization of Data Warehouse Testing Scenario
proizvodstvo obrazovanii. (Infokit-3) Part II: 3
medunarodnaja nature-technieskaja konferencija.,
Stavropol, Russia, 2008.
Tanuka, P., Verschelde, W. and Kopek, M., The
proposal of Data Warehouse Test Scenario. in
European conference on the use of Modern
Information and Communication Technologies
(ECUMICT), (Gent, Belgium, 2008).
TAVANT, T. Data Warehouse Testing. Date
Accessed: Jan 2013
Theobald, J. Strategies for Testing Data Warehouse
Applications Information Management, 2007.