You are on page 1of 9

data quality management best practices

In this file, you can ref useful information about data quality management best practices such as
data quality management best practicesforms, tools for data quality management best practices,
data quality management best practicesstrategies If you need more assistant for data quality
management best practices, please leave your comment at the end of file.
Other useful material for data quality management best practices:

I. Contents of data quality management best practices

Data governance is everybodys business, which is why several best practices involve getting
business users involved with data quality initiatives.
By Priya Singh, Information Builders
All companies struggle to manage the cyclical data quality process. A majority of organizations
use only a fraction of their enterprise information to gain the kind of actionable insight needed to
facilitate superior business performance. Additionally, they fail to realize the substantial cost
associated with the presence of subpar, inaccurate and inconsistent data.
The significant amount of revenue that is lost to bad information compels a shift in data quality
strategies from occasional data cleansing to an ongoing cycle of data quality created by
incorporating governance plans. Data governance is a continuous quality improvement process,
embraced at all levels of the organization, to filter bad information by defining and enforcing
policies and approval procedures for achieving and maintaining data quality.
Below are five best practices for data governance and quality management. These best practices
are being leveraged by companies that have successfully achieved -- and benefited from -- peak
data quality in their enterprise.
Conduct a Data Quality Assessment

Start tackling your data quality management problems by performing a complete analysis of the
current state of your data. Information with errors, inconsistencies, duplicates or missing fields
can often be difficult to identify and correct. That's because bad data can be buried deep within
legacy systems, or is received from external sources such as third-party data providers, external
applications and social media channels like Facebook and Twitter.
An independent analysis will provide the organization with an in-depth report that includes
accurate and detailed statistics about the quality of the organizations data. The business can then
formulate or refine a data quality management strategy tailored to its unique organizational
needs, and develop governance policies that address specific data management requirements.
Build a Data Quality Firewall

Data is a strategic information asset, and the organization should treat it as such. Like any other
corporate asset, the data contained within the organization's information systems has financial
value. The value of the data increases and correlates to the number of people who are able to
make use of it. Feeding inaccurate data into your data warehouse or mastering systems will not
only make it difficult to obtain clear business insights and gather actionable information, it will
also damage good data.
A virtual data quality firewall detects and blocks bad data at the point it enters the environment,
acting to proactively prevent bad data from polluting enterprise information sources. A
comprehensive data quality management solution that includes a data quality firewall will
dynamically identify invalid or corrupt data as it is generated or as it flows in from external
sources, based on pre-defined business rules.
Unify Data Management and Business Intelligence
Even with the best data governance policies in place, this alone is not enough to protect data. The
sheer volume of data that flows through enterprise systems can make it particularly challenging
to maintain peak data quality at all times. It simply isn't possible to manage quality record-byrecord, or to attempt to govern every piece of data that is collected by an organization. The key
to success is to identify and prioritize the type and volume of data that requires data governance.
Business intelligence (BI) solutions allow organizations to determine which data sets are most
likely to be utilized and should be targeted for quality management and governance. Astute data
management processes can then be used to collect that data -- for example, customer preferences
or purchasing information -- and move it to a repository for cleansing and analysis as a high
Make Business Users Data Stewards
Advanced organizations realize business professionals need to take ownership of the data they

are helping to create and feed into IT systems. This has prompted many companies to create a
data governance role to manage data quality from end-to-end.
The data governance director is typically chosen from a business group, and is the primary focal
point for all data related-needs within that group. Some organizations have multiple roles for data
governance to represent different areas of the business. These data overseers take a leadership
role in resolving data integrity issues, and act as liaisons with the IT group that manages the
underlying information management infrastructure.
Create a Data Governance Board
The primary objective for instituting a data governance board is to mitigate business risks that
arise from highly data-driven decision-making processes and systems in the current business
environment. These boards include business and IT users and are responsible for setting data
policies and standards, ensuring that there is a mechanism for resolving data related issues,
facilitating and enforcing data quality improvement efforts, and taking proactive measures to
stop data-related problems before they occur.
Wrapping up
Successful data governance starts with a solid, well-defined data management strategy, and relies
upon the selection and implementation of a cutting edge data quality management solution. The
key to effective data quality management is to create data integrity teams, comprised of a
combination of IT staff and business users, with business users taking the lead and maintaining
primary ownership for preserving the quality of any incoming data.
While data integrity teams will drive the data quality management plan forward, it is also
important to have a comprehensive data quality management solution in place. This will make
the strategy more effective by enabling data governance professionals to profile, transform and
standardize information.
To best support data quality goals, the quality management solution should be Web-enabled and
must be intuitive to use so operational business users can play a vital role in data governance
activities. When data strategy and governance is led from a business perspective and enabled by
a complete solution, true data integrity can be ensured across the organization.

III. Quality management tools

1. Check sheet

The check sheet is a form (document) used to collect data

in real time at the location where the data is generated.
The data it captures can be quantitative or qualitative.
When the information is quantitative, the check sheet is
sometimes called a tally sheet.
The defining characteristic of a check sheet is that data
are recorded by making marks ("checks") on it. A typical
check sheet is divided into regions, and marks made in
different regions have different significance. Data are
read by observing the location and number of marks on
the sheet.
Check sheets typically employ a heading that answers the
Five Ws:

Who filled out the check sheet

What was collected (what each check represents,
an identifying batch or lot number)
Where the collection took place (facility, room,
When the collection took place (hour, shift, day of
the week)
Why the data were collected

2. Control chart
Control charts, also known as Shewhart charts
(after Walter A. Shewhart) or process-behavior
charts, in statistical process control are tools used
to determine if a manufacturing or business
process is in a state of statistical control.
If analysis of the control chart indicates that the
process is currently under control (i.e., is stable,
with variation only coming from sources common
to the process), then no corrections or changes to
process control parameters are needed or desired.

In addition, data from the process can be used to

predict the future performance of the process. If
the chart indicates that the monitored process is
not in control, analysis of the chart can help
determine the sources of variation, as this will
result in degraded process performance.[1] A
process that is stable but operating outside of
desired (specification) limits (e.g., scrap rates
may be in statistical control but above desired
limits) needs to be improved through a deliberate
effort to understand the causes of current
performance and fundamentally improve the
The control chart is one of the seven basic tools of
quality control.[3] Typically control charts are
used for time-series data, though they can be used
for data that have logical comparability (i.e. you
want to compare samples that were taken all at
the same time, or the performance of different
individuals), however the type of chart used to do
this requires consideration.

3. Pareto chart

A Pareto chart, named after Vilfredo Pareto, is a type

of chart that contains both bars and a line graph, where
individual values are represented in descending order
by bars, and the cumulative total is represented by the
The left vertical axis is the frequency of occurrence,
but it can alternatively represent cost or another
important unit of measure. The right vertical axis is
the cumulative percentage of the total number of
occurrences, total cost, or total of the particular unit of
measure. Because the reasons are in decreasing order,
the cumulative function is a concave function. To take
the example above, in order to lower the amount of
late arrivals by 78%, it is sufficient to solve the first
three issues.
The purpose of the Pareto chart is to highlight the
most important among a (typically large) set of
factors. In quality control, it often represents the most
common sources of defects, the highest occurring type
of defect, or the most frequent reasons for customer
complaints, and so on. Wilkinson (2006) devised an
algorithm for producing statistically based acceptance
limits (similar to confidence intervals) for each bar in
the Pareto chart.

4. Scatter plot Method

A scatter plot, scatterplot, or scattergraph is a type of

mathematical diagram using Cartesian coordinates to
display values for two variables for a set of data.
The data is displayed as a collection of points, each
having the value of one variable determining the position
on the horizontal axis and the value of the other variable
determining the position on the vertical axis.[2] This kind
of plot is also called a scatter chart, scattergram, scatter
diagram,[3] or scatter graph.
A scatter plot is used when a variable exists that is under
the control of the experimenter. If a parameter exists that
is systematically incremented and/or decremented by the
other, it is called the control parameter or independent
variable and is customarily plotted along the horizontal
axis. The measured or dependent variable is customarily
plotted along the vertical axis. If no dependent variable
exists, either type of variable can be plotted on either axis
and a scatter plot will illustrate only the degree of
correlation (not causation) between two variables.
A scatter plot can suggest various kinds of correlations
between variables with a certain confidence interval. For
example, weight and height, weight would be on x axis
and height would be on the y axis. Correlations may be
positive (rising), negative (falling), or null (uncorrelated).
If the pattern of dots slopes from lower left to upper right,
it suggests a positive correlation between the variables
being studied. If the pattern of dots slopes from upper left
to lower right, it suggests a negative correlation. A line of
best fit (alternatively called 'trendline') can be drawn in
order to study the correlation between the variables. An
equation for the correlation between the variables can be
determined by established best-fit procedures. For a linear
correlation, the best-fit procedure is known as linear
regression and is guaranteed to generate a correct solution
in a finite time. No universal best-fit procedure is
guaranteed to generate a correct solution for arbitrary
relationships. A scatter plot is also very useful when we
wish to see how two comparable data sets agree with each

other. In this case, an identity line, i.e., a y=x line, or an

1:1 line, is often drawn as a reference. The more the two
data sets agree, the more the scatters tend to concentrate in
the vicinity of the identity line; if the two data sets are
numerically identical, the scatters fall on the identity line

5.Ishikawa diagram
Ishikawa diagrams (also called fishbone diagrams,
herringbone diagrams, cause-and-effect diagrams, or
Fishikawa) are causal diagrams created by Kaoru
Ishikawa (1968) that show the causes of a specific event.
[1][2] Common uses of the Ishikawa diagram are product
design and quality defect prevention, to identify potential
factors causing an overall effect. Each cause or reason for
imperfection is a source of variation. Causes are usually
grouped into major categories to identify these sources of
variation. The categories typically include
People: Anyone involved with the process
Methods: How the process is performed and the
specific requirements for doing it, such as policies,
procedures, rules, regulations and laws
Machines: Any equipment, computers, tools, etc.
required to accomplish the job
Materials: Raw materials, parts, pens, paper, etc.
used to produce the final product
Measurements: Data generated from the process
that are used to evaluate its quality
Environment: The conditions, such as location,
time, temperature, and culture in which the process

6. Histogram method

A histogram is a graphical representation of the

distribution of data. It is an estimate of the probability
distribution of a continuous variable (quantitative
variable) and was first introduced by Karl Pearson.[1] To
construct a histogram, the first step is to "bin" the range of
values -- that is, divide the entire range of values into a
series of small intervals -- and then count how many
values fall into each interval. A rectangle is drawn with
height proportional to the count and width equal to the bin
size, so that rectangles abut each other. A histogram may
also be normalized displaying relative frequencies. It then
shows the proportion of cases that fall into each of several
categories, with the sum of the heights equaling 1. The
bins are usually specified as consecutive, non-overlapping
intervals of a variable. The bins (intervals) must be
adjacent, and usually equal size.[2] The rectangles of a
histogram are drawn so that they touch each other to
indicate that the original variable is continuous.[3]

III. Other topics related to data quality management best

practices (pdf download)
quality management systems
quality management courses
quality management tools
iso 9001 quality management system
quality management process
quality management system example
quality system management
quality management techniques
quality management standards
quality management policy
quality management strategy
quality management books