Академический Документы
Профессиональный Документы
Культура Документы
Summary: Projects that involve data: CRM projects, MDM initiatives, ERP implementations, Business
Intelligence and data warehouse projects, data governance programs, migrations, consolidations,
harmonizations all offer the opportunity to improve data quality. This paper is a phase by phase guide
that identifies for both business team members and IT resources, what tasks should be incorporated into
a project plan for each team function, as relates to data quality. This provides a roadmap for optimal
effectiveness and coordination. These data quality essentials are based upon best practices collected
from experiences on thousands of data management projects and successes over the past 30 years.
TRILLIUM SOFTWARE
Usage Notice
Permission to use this document is granted, provided that: (1) The copyright notice 2008 by Harte-Hanks
Trillium Software, appears in all copies, along with this permission notice. (2) Use of this document is only
for informational and noncommercial or personal use and does not include copying or posting the document
on any network computer or broadcasting the document through any medium. (3) The document is not
modified from the original version.
It is illegal to reproduce, distribute, or broadcast this document in any context without express written
permission from Trillium Software . Use for any other purpose is expressly prohibited by law, and may result
in severe civil and criminal penalties. Violators will be prosecuted to the maximum extent possible.
Page 2 of 22
TRILLIUM SOFTWARE
The importance and ways to best involve business users in the project to ensure
their needs are met
Page 3 of 22
TRILLIUM SOFTWARE
Page 4 of 22
TRILLIUM SOFTWARE
Each of the types of team members has clearly defined roles for making the initiative a
success and must be accountable for his/her part. Here is how roles and responsibilities
are typically defined with regards to the data quality initiative elements of a project.
Role
Responsibility
Executive
Leaders
(CIO, CFO, VP)
Line-OfBusiness
Managers
Data Stewards
Information
Professionals
Implement business rules for cleansing, standardizing, and deduplicating the data, support data stewards, run day-to-day
operations.
Page 5 of 22
TRILLIUM SOFTWARE
The right technology will help keep team members engaged and communicating. A data
analysis environment with a central repository provides the right architecture for multi-role,
multi-member projects where resources need to interact and communicate about source
data and target environment designs.
This architecture provides an infrastructure that allows a common understanding about the
issues around the quality of data, recommendations for use of data, and what
transformations may need to take place as data is migrated.
Scope
Scoping draws clear parameters around the data you are capturing, moving, cleansing,
standardizing, linking, and enriching, and its use. Each requirement must be assessed to
determine whether or not the data involved in this project can or will meet the requirement
to the satisfaction of the business. There are several basic questions to answer:
1.
2.
3.
What is the level of quality within each source, for this information?
4.
5.
In a data migration, for example, you might be looking for certain key elements to appear
in the target data model. You may first need to confirm that the anticipated target data
physically exists within source systems and may next need to determine the best source
for the data, or most trusted source. If taking data from multiple sources, you may have to
establish a set of standards that all source systems conform to in order to produce a
consistent representation of that data in the new target system.
Understanding the scope of the project early is key to its successful and timely delivery.
Be sure to categorize the need-to-have data and the nice-to-have data. Be prepared to
drop off the nice-to-haves if time becomes short or if the effort of moving, cleansing,
standardizing, etc. outweighs the anticipated business benefit.
There are ways to limit scope. For example, if youre integrating multiple data sources, will
it be one large movement of data or several smaller movements? Will the data need to be
the entire database, or is 6 months enough? Working through these issues with the
business team and IT will keep the project on-time and on target, and will help manage
expectations during the project lifecycle so there are no surprises as the project nears a
close.
Copyright 2008 Harte-Hanks Trillium Software
All rights reserved
Page 6 of 22
TRILLIUM SOFTWARE
Page 7 of 22
TRILLIUM SOFTWARE
Metric
Business Impact
Page 8 of 22
TRILLIUM SOFTWARE
Metric
Business Impact
measurement
systems?
Technology can play a significant role in uncovering data conditions such as those listed
above and establish a recorded baseline of these conditions. Not only will technology help
you organize and document results, but it can further be used to manage conditions going
forward. Automated data profiling analysis and exception reporting along with drill-down
functionality gives you results and the tools to involve non-technical users in the analysis
of these results. You can set conditions, such as those listed above, and understand
immediately to what degree the metrics are met.
Define Standards
Project team members representing the business play a key role in standards definition.
The team members involved in this step should be a fair representation of the ultimate
user audience. For example, if the end user audience will include sales and marketing
and potentially shipping, someone from each of the named departments should be
involved in defining system standards. Also, a representative from each of the company's
departments should act as a data steward to make sure data adheres to the defined
standards in the new system, if not ALSO in the source systems.
With every business, there are certain standards that can be applied to every piece of
data. For example, a name and address should almost always conform to postal
standards. E-mail addresses conform to a certain shape with a user name, internet
domain and an @ sign in the middle. However, there may be data for which your team
needs to define a new standard. This is typically a part number, item description, supply
chain data, and other non-address data. For this, you need to set the definition with the
business team. As part of the process, explore the current data, decide what special data
exists in your required fields, and establish system standards that can then be automated
and monitored for compliance.
Page 9 of 22
TRILLIUM SOFTWARE
business. Most project managers find it helpful at this point to seek out the endorsement
of a ranking executive. Using the data quality metrics and business impact generated in
the previous step, keeping the executives in the loop about your initiative will help you
maintain your endorsement of the data quality initiative, foster support, and secure funding
for future projects or additional resources. If there are any internal political challenges,
executives can help resolve issues and remove roadblocks. If they are already well
informed of your efforts, status, and potential positive impact, it will be much easier to
invoke their support.
Access Data
At this point in design, it is necessary to take a deep dive into data extracts, representative
of the actual data that will be used as part of the production system. The purpose here is
to understand what mappings, transformations, processing, cleansing, etc. must be
established to create and maintain data that meets the needs and standards of the new
system or solution.
IT resources are generally responsible for defining data extracts and gaining appropriate
access to source systems. This data can then be shared with other team members to
support detailed design tasks.
Page 10 of 22
TRILLIUM SOFTWARE
Capture a Baseline
Business team members have defined the data quality metrics and business impact in a
previous step. Now is the time to take a baseline measurement. As part of the source
system analysis, a baseline of each source system should be captured and stored as well
as how multiple systems conform to expected metrics or business rules. In some cases, it
will make sense to look not only at each source system in isolation, but across systems.
John Smith
Smith/John
Its up to you and your business team members to decide how to standardize each of
these name formats for optimal efficiency in the target systems. Should John and Jan
Smith be linked, but separate records in your master file or remain as a single entry?
Set up a test file or database of records that present these common data situations for QA
purposes during this stage of the project. A quality assurance (QA) task will be completed
prior to going live with new data. This test case scenario definition effectively begins to
build a list of data quality anomalies, which you can leverage to build and test business
rules and quality processes. Some of the business rules and test cases will come standard
with the cleansing process of packaged data quality solutions. These should be highly
Page 11 of 22
TRILLIUM SOFTWARE
tunable to meet your organizations specific needs. Others, you can begin to build, based
on your needs.
Page 12 of 22
TRILLIUM SOFTWARE
Phase 3: Implement
When all the planning is done, its time to begin to put the
technology in place to improve data using automation wherever
possible. For the technology resources implementation tasks,
we recommend Trillium Software Data Quality Methodology, a
white paper detailing how to standardize, enrich, and match
data, and how to fine-tune the business rules to optimize data.
Although this is the most technical of the plans phases,
business users still play an important role in this phase.
QA Initial Results
Results. The most important part of the data quality process should be that business
users are happy with the results. As you begin to implement new data quality process
designs, project managers should have business users run sample data through the data
quality processes to ensure results meet their expectations. Business users can compare
results before and after processing with the same data discovery tool that they have been
using all along. Coarse-tune processes using sample data, then switch over to a complete
data set for formal QA.
Once results have been verified, its time to load sample data into the target applications
and begin testing it more thoroughly. By taking the extra step with the business during the
QA cycle, youre much more likely to be successful the first time you load data and will
avoid loading and reloading data repeatedly.
Validate Rules
In phase one and two, youve both determined what you have and what you need. Rules
are developed in an iterative analytic process. This requires access to knowledge about
intended meaning of the data. Business users and data analysts should work together on
this process, applying the same technology and process described for analyzing source
data, if additional questions come up. Give business users an opportunity to set up test
data scenarios and allow them to review the results after the cleansing process.
Copyright 2008 Harte-Hanks Trillium Software
All rights reserved
Page 13 of 22
TRILLIUM SOFTWARE
This is also your opportunity to review and add your specific terminology, e.g., industryspecific terms, company-specific definitions and regional colloquialisms) not initially part of
the standardization terminology. This is a chance also to determine whether they will
require geography-specific standardization.
Page 14 of 22
TRILLIUM SOFTWARE
Data discovery tools can here again help aid the project during
the UAT process by giving both business users and technical users a view into the data.
Teams can collaborate and view the results of any data quality process, before and after
the process is run.
Its valuable to test inside the target application, too. Things to test include:
All formsparticularly important when using a real time interface into the data
quality tool
All reportsensures the results from the reports are as expected
Test scenariostest the results of the data quality processs impact on
systems and applications that interface with the ones included in your project
Throughout your UAT, make sure your business users have easy access to the data,
whether it be through tools and technologies used throughout the project, or otherwise, to
quickly address any questions that arise.
Any new required fields or formats as they enter data into the system
Any new screens or pop-ups requesting validation of automated cleansing and
matching of data
The positive impact and business benefits of new, cleaner data
Page 15 of 22
TRILLIUM SOFTWARE
The involvement of both business users and IT users in the process of creating
high quality data.
If you have executed the tasks communicated so far, you have significantly reduced the
likelihood of any of the above-mentioned issues from occurring on your project. By taking
the time upfront to thoroughly investigate source system data, incorporate necessary
processing into your designs, and perform UAT that includes anticipated problematic data
conditions, you have proactively addressed the issues that cause most project teams
severe headaches late in the game.
Should something unexpected occur and require attention, you already have the
resources and infrastructure in place to quickly react: your team of both IT and business
users is already familiar with the project, the data, and any technology you have been
using (i.e., your data discovery tool) and can swiftly look at the data and assess the
problem for a quick resolution.
Page 16 of 22
TRILLIUM SOFTWARE
Phase 5: Go Live
Congratulations you are going live! During this phase your
team will turn on the switch and your new data quality processes
will begin to provide immediate benefits to your organization.
The fruits of your labor will begin to be realized.
SWOT Team
At this stage, its a good idea to have in place a cross-functional
SWOT (Strengths, Weaknesses, Opportunities, Threats) team
including business analysts or departmental resources familiar
with business processes, performance engineers, data
architects, field technicians, and contacts from any vendors, to
be available on an emergency basis to provide rapid problem
resolution.
Teams may adopt different processes to help them understand
the problem presented and to design a response. Practitioners
using problem-solving processes believe that it is important to
analyze a problem thoroughly to understand it and design
interventions that have a high probability of working.
The intent is to intervene early after a problem is identified and to
provide ways by which that problem may be alleviated and the
corporation can achieve success.
Teams should meet to complete a post mortem, discussing how the project went and how
to further improve on data quality during the next round.
Problem Resolution
All support organizations have some form of processes and procedures in place for
helping to resolve user and system-generated queries, issues, or problems in a consistent
manner. In some organizations these processes are very structured; in others they are
more informal.
In addition to efficient processes, it is also very important that the support team have well
defined roles and responsibilities to reduce response time to customer needs. Here is an
example of an escalation hierarchy, along with the individuals who perform these tasks, for
a fairly large implementation. For this example, when a problem is identified, it is escalated
as follows.
Tier 1: Help Desk - Help Desk technicians provide first-line support to the user community
and perform any additional training and remote operations to resolve issues. If the help
desk is unable to resolve the issue, it is escalated to Tier 2.
Tier 2: Informational Professionals - Information Professionals are typically more aware of
the data aspect of the operation than the Help Desk. With the aid of a data discovery tool
and access to the end-user application, they troubleshoot the issue. If the information
professionals are unable to resolve the issue, they escalate it to the Data Stewards at
Tier 3.
Page 17 of 22
TRILLIUM SOFTWARE
Post Mortem
Re-run your baseline processes and collect updated results for a quantified measurement
of your impact. Gather up your metrics, your support log, your exceptions processing log,
and other relevant documentation. Call a meeting to:
List the lessons learned during the project; use it as input to improve future project
delivery
Page 18 of 22
TRILLIUM SOFTWARE
Phase 6: Maintain
In most religions of the world, there is a day to reflect on the
good work youve done, admit your shortcomings, and set a
plan in place to improve. Phase six is that day for those who
believe in data quality. It is also a time of joy, however. In this
phase, the fruits of your labor will be realized and you should not
be shy about telling the world what you have accomplished.
Announce Successes
One of the keys to maintaining funding for your project is to
internally publicize the successes youve had. In reality, a data
quality initiative should be constantly re-sold at every
opportunity, to continue to reinforce in peoples minds, the value
you are introducing to your organization.
Ways to communicate your success include:
This is also a very good time to remind the company that data quality is everyones
problem and ways they can help solve data quality issues.
Monitor
You can keep track of data quality in a number of ways. A full analysis with your data
discovery tool is one way. Each time you compare it to your baseline or the previously
measured baseline, you will get a very detailed idea of how your data quality initiative is
progressing.
Some tools include a way to automatically keep track of data quality, too. For example,
they may include an e-mail notification feature to inform key personnel when business
rules are violated such as when data values do not meet predefined requirements,
threshold values are exceeded, or nulls are present where unacceptable. These powerful
features prevent errors from impacting your business, should your enterprise use data
sources that are prone to change.
Data stewards, system owners, and/or key business contacts can receive alerts on critical
changes and errors. The tools can then allow users to call up the violation(s) and drill
down on the error(s).
Page 19 of 22
TRILLIUM SOFTWARE
Organizations with action-oriented governance programs use such features to alert key
stakeholders and responsible parties of data anomalies. Each day, stewards can address
the issues at hand and create prioritized tasks to resolve the issues identified.
Page 20 of 22
TRILLIUM SOFTWARE
Principle
Comprehensive
Description
Deliver fit-for-purpose data for all types of data, everywhere, anytime. It includes the
ability to support global business, not just information from the US and UK, but from
China, Japan, Germany, Mexico, etc. Not just single byte, but double byte data. Not
just name and address data, but all types of data. Important for
consolidation/migration projects with international or cross-functional reach.
Intelligent
Contains intelligence to identify and address problems in context so you do not have
to apply heavy human resources to fix the data. Important for lowering IT costs and
saving you money on human resources.
Seamless
Does your solution have the capability to expand and grow over time, extending to
any and all applications, even those that may come to your company through
mergers and acquisitions? Important when you want to apply data quality to key
enterprise applications.
Dynamic
Can you quickly and precisely change the rules if you need to, to adapt to meet
changing business needs? Important because business models and business
processes can quickly change as technology advances.
Measurable
Can you measure that your solution is working both immediately and over time? Does
it produce quantifiable results that can impact business? Important as an internal selfjustification of your team, a way to continue improvement, and a way to justify
expenditures.
The Trillium Software System answers these challenges with a scalable, flexible
framework that supports the integration of data quality processes into any system, at any
time, anywhere in the world. From tactical projects to strategic practices, the Trillium
Software System increases integration efficiency, lowers development costs, and provides
faster return on investment (ROI) from data quality initiatives through:
Page 21 of 22
TRILLIUM SOFTWARE
Page 22 of 22