Академический Документы
Профессиональный Документы
Культура Документы
Data Integration:
Total Cost of Ownership
Really Matters
Dr David Waddington
Sponsored by expressor software corporation
www.expressor‐software.com
Tyson Consulting
July, 2008
www.tyson‐consulting.com
Data Integration: TCO Really Matters 2
TABLE OF CONTENTS
Background ................................................................................................................ 3
Copyright © 2008 Tyson Consulting. All Rights Reserved.
Data Integration: TCO Really Matters 3
Data Integration (DI) is expensive and hard – that’s how business views it today. But does it need to
be? This paper takes a look back and examines the promise of the current data integration tools and
assesses how well that promise has been fulfilled. Against this background the gaps, opportunities
and challenges are identified. The focus then shifts to the key requirements for the future, the new
data integration tools now becoming available, and illustrates how these are better able to meet the
requirement for lower total cost of ownership (TCO) of today’s more agile and faster moving
businesses. The paper concludes by offering ten tips for selecting the next DI tool.
BACKGROUND
Today businesses are struggling to become more agile in a rapidly changing market place. One that is
more competitive, faster and with fewer geographical and time zone boundaries than a decade ago.
The demand for consistent, timely and integrated business performance information is increasing
and such information is vital to enable businesses to understand and support this rapidly changing
environment. Business managers are frustrated1 by the slow progress in this area and pressure is
increasing from senior business managers (41% reported in a recent study2) for better information
management.
To deliver this essential business performance information it is increasingly necessary to integrate
data from a wide variety of disparate sources. Most modern businesses have a complex
heterogeneous applications environment since companies have over time built and purchased
multiple applications (e.g., ERP, CRM, data warehousing, distribution, etc.). Recent studies3 indicate
anywhere between six and more than 100 applications depending on size and type of business.
Data volumes are growing exponentially as new business ventures and increased regulation mean
more data has to be collected, stored, transformed and processed. Batch windows for transforming,
loading and reporting are decreasing. For example, several companies polled recently indicated that
they currently couldn’t extract, transform and load all their sales data from the previous week into
their sales reporting system in time for this to be available to sales staff by 8 a.m. Monday morning,
in spite of using the latest DI tools and top‐tier hardware! Further, increasing use of EPOS, RFID and
global data synchronization with partners means that near real‐time data integration is coming of
age.
However, business leaders perceive data integration as costly (both in terms of software and labor
costs) and difficult. Despite 15 years of DI software most companies (>75%) still do it using in‐house,
hand‐coded routines rather than using a tool (most often these in‐house systems have no built‐in
error reporting or handling). Their argument being that tools typically only address part of the
problem and are too expensive. Some companies even use more than one tool, suggesting that no
single tool has met all the key requirements.
1 BI: The Inconvenient Truth, D. Waddington, January 2008
2 Kenny MacIver in Information Age, A Strategic Resource, May 2008
3 The Information Difference, MDM Adoption Survey Report, June 2008
Copyright © 2008 Tyson Consulting. All Rights Reserved.
Data Integration: TCO Really Matters 4
In the important area of data migration (Bloor4 reports the market size in 2012 reaching $8 billion)
despite more than a decade of DI tools still today 80% of all data migration projects (e.g., from
legacy applications to new ERP implementations or for upgrades) fail, fall short of expectations or
have serious cost overruns.
The need to integrate data is clear. A recent study2 revealed that when business leaders were asked
“what technologies play a key part in your organizations’ information/data management strategy”
data integration was ranked second in importance only to BI. There is, however, a gradual and
growing realization that this in‐house approach is slow and cannot meet the growing need to change
fast. Organizations are gradually coming to terms with the fact that they will have to look to tools to
meet their future data integration needs. In part this is fueled by the need to embrace new
application areas such as master data management and real‐time data integration. These in turn are
imposing new demands on DI software.
But business leaders are cautious and if they are to be convinced to embrace DI tools they will need
to see lower total cost of ownership and greater ease of use.
According to Gartner5 the DI market in 2007 was $1.4 billion and they predict that by 2011 this will
reach $2.6 billion. These are staggering figures implying dramatic growth potential for the vendors.
So what has been the promise of DI tools and to what extent has this been fulfilled?
THE PROMISE OF DATA INTEGRATION
DI is the collective term for generally separate technologies that support BI applications and data
warehousing or operational systems. The area focused on BI and data warehousing is termed ETL
(extract, transform and load) while that for operational systems is EAI (enterprise application
integration). Historically ETL has been focused on bulk transfer of data while EAI has tended to
address near real‐time requirements for data transfer. This distinction is now less sharp and the two
flavors are coming together to meet the demands of modern applications. Although ETL has its roots
in the application data transfer and data warehousing areas, it has rapidly spread to application
interlinking and migration of data. The introduction of these tools over the past 15 years has brought
the promised benefits of:
• Development and Implementation Faster and More Efficient. A key benefit claimed by the DI
vendors is that using their tools design, development and implementation is faster and more
efficient. In particular, much is made of the notion that it’s much faster than in‐house coding.
Companies moving to tool based approaches have often reported savings of resources and time
between 30% and 70%.5
• Re‐use of code. The ability to code a transformation or transfer link once and re‐use this leads to
increased efficiency with increased readability and transparency. Other developers can easily
understand the code and re‐use it.
• Increased standardization. The use of a single common tool with standard coding will increase
the use of common data standards across the enterprise.
4 Bloor Research, Data Migration, Philip Howard, June 2008
5 Gartner, The Benefits (and Challenges) of Deploying Data Integration Tools, Ted Friedman, August 2007
Copyright © 2008 Tyson Consulting. All Rights Reserved.
Data Integration: TCO Really Matters 5
• Improving data quality. The overall quality of data will be improved as a consequence of the
introduction of standards and the identification and elimination of errors in data. The use of
data profiling will improve the understanding of data in an automated fashion.
• Reduce need for manual mapping. The use of a tool will reduce the need to map data entities
manually from one application system to another. Once mappings have been defined they can
be re‐used.
• Provides audit‐ability and transparency. The use of a DI tool will provide automated audit‐
ability, track and trace and source identification functionality that will help business cope with
increasing levels of regulation.
• Control of the management of changes. The DI tools will provide functionality to manage and
track changes making it easier to adapt (fast, faster) to changing business requirements.
• Ease of data migration. The DI tools will make the complex and time consuming process of
migrating data from one application (e.g., a legacy app) to another (e.g., a new ERP system)
much easier. In upgrading many applications, upload of historical and current data often
involves complex transformations, and it is usually not a one‐off process.
• Flexibility to interlink packaged applications. DI tools in the EAI space support interlinking of
two or more proprietary applications packages (such as two or more, often customized, SAP R/3
systems], enabling them to access each other’s data and exchange data in near real time.
• Linking to Application Packages. DI with a wide range of third‐party proprietary vendor
application packages (CRM, ERP, etc.) is easier since vendor‐built, standard connectors provide
much simplified access to the data stored within the packages.
• Ease of use. Fully integrated application stacks and common user interfaces will make DI tools
easy to use. The introduction of graphical design interfaces offers a step forward in making the
process easier and more productive.
HAS THE PROMISE BEEN REALIZED?
So have these promises been realized in practice? The answer must be no. The two most frequent
criticisms leveled at DI tools and vendors are that DI is expensive and difficult and current tools have
by no means fully addressed these issues.
Data Integration is Expensive
Surveys6 indicate that DI does involve high investment and maintenance costs. These are typically
upwards of $300,000 to $500,000 license cost and 17‐25% annual maintenance usually based upon a
per‐CPU per machine pricing policy. Business models and established practices make it difficult for
existing vendors to change course. Since dedicated hardware is usually necessary, costs for
additional hardware tend to be high, too. It is often necessary to additionally purchase a data
quality/ cleaning tool, although nowadays DI vendors are increasingly incorporating these into their
offerings (at a cost). Furthermore, when you start data integration projects it’s very difficult, if not
impossible, to scope them out and estimate costs and timescales. This often results in significant
time and cost overruns.4 For example, in a study7 by Standish Group it was revealed that 80% of data
migration projects ran over estimated time and budgets.
The current DI tools are still ones designed to be used by “developers.” To ensure success, an in‐
house expertise center to use tools and build integration scenarios is desirable. Additionally, they
often need to have extensive knowledge of underlying application systems, such as ERP (e.g., SAP
6 Gartner, FAQ: Data Integration Tools Market Prices and Licensing Trends, February 2008
7 Standish Group Survey Report, 1999
Copyright © 2008 Tyson Consulting. All Rights Reserved.
Data Integration: TCO Really Matters 6
ERP) and CRM. These expert DI resources are frequently scarce and very expensive. They have to be
brought in often on a time and materials basis.
Re‐use of integration scenarios and business rules also frequently turns out to be impractical. Clearly
this is impressive in the advertising blurb, but in practice it’s often easier to rebuild the integration
routine rather that to re‐use an existing one. And there is nothing in the DI tool to prevent
developers from doing just this. This source of potential savings is usually not realized.
Much of this DI cost tends to be a major investment up front. CIOs frequently comment that they
have to be prepared to make an initial high outlay for DI infrastructure in order to tackle the first
project. This usually results in the internal IT department deciding to build in‐house using stored
procedures and the like.
Many companies have already built DI applications with existing in‐house developers. Most tools do
not support conversion of these and generally this means the organization has to start again –
resulting in yet further costs.
Data Integration is Hard
There is no doubt that DI is “hard.” Design and development often consumes 60% to 80% of a BI
project.8 Usually project and data‐mapping specifications are created up front in Microsoft Project,
Word and Excel. The developer then translates this into the DI tool, but there is no closed‐loop
process because the information is not stored in a common repository. Generally the process of
designing, building and implementing new integration links and transformations is too “slow” to be
able to meet the needs of the current fast changing business environment.
In particular data mapping is usually a very labor‐intensive, manual process, and despite the current
sophistication of many tools they fail to provide effective help with the mapping of data entities and
definition of objects. This remains a largely manual process.
Although many of the tools for DI are now based around the use of graphical interfaces they are still
focused on use by “developers” rather than “business users.” All too often these UIs look different
and behave very differently from one module to another, compounding the complexity of learning
and using the tool.
Additionally, the representations of transformations and “business rules” in most DI tools are very
hard to read, being mostly represented in complex mathematical notation rather than a format
meaningful to a business user such as a data steward. This means that these people cannot use the
tools alone and are totally reliant on the developer.
Most tools try to get the developer to do everything, whereas they should focus more on getting the
appropriate user – project manager, data steward, business analyst and developer – to do the
appropriate tasks not trying to fit a square peg in a round hole. Leveraging each user group’s skill set
is key to reducing overall costs and ensuring the business needs are met.
A key issue for organizations is that because highly specialized experts are required to develop the
transformations and extracts from source systems these “specialists” become the repository of all
knowledge and “documentation” of systems. If they leave or move on to new roles it is difficult for
others to understand what the constructs (business rules, etc.) actually do and how they work.
8 TDWI, Evaluating ETL and Data Integration Platforms, W. Eckerson & C. White, 2003
Copyright © 2008 Tyson Consulting. All Rights Reserved.
Data Integration: TCO Really Matters 7
Because of the dependence on this specialist expertise, the risks associated with using these DI tools
is still high, even though reduced when compared to in‐house, hard‐coded developments.
Although most tool vendors would claim to have developed implementation methodologies, these
are loose in practical implementation – and often not used.
Lack of Support for Governance
Too often projects fail through lack of governance. While this is not directly attributable to the data
integration tools per se, the fact that many of these tools offer little or no auditing, tracking or
workflow functionality means that they provide no data to support the governance initiative. They
also generally offer no incentives and support encouraging teams to comply with standards, to
collaborate and to leverage existing work.
THE CHALLENGES
What are the gaps, opportunities and challenges remaining?
Aging Architectures
The majority of DI vendors first architected their tools more than a decade ago. All the existing
packages were built with the technology that focused on solving the ETL problems 10 years past.
Vendors have generally had to resort to embracing developments in storage technology, processing
power and the like to meet the demands of today’s complex data integration requirements. To keep
pace with market demands, products have frequently been extended over time by “bolting on”
newly acquired technology into a single platform stack to deliver new functionality, but the additions
are not really integrated “under the covers.” This manifests itself in differently structured UIs and
inconsistent ways of working – or sometimes the need to reenter definitions and data.
Outdated Pricing
The current pricing models are outdated and just plain wrong for the future of data integration if this
is ever to become a ubiquitous “infrastructure.” The licensing models are furthermore machine‐,
rather than usage/throughput‐, bound making it difficult to clearly plan hardware and software
investment. Many vendors “nickel and dime” customers for each and every new connector and tool,
adding to the already difficult task of understanding the true cost / benefit of ownership. There
should be no barrier, as is currently often the case, to starting with a smaller project (one which may
offer low hanging fruit), and then growing.
Similarly the business should be able to invest in a tool with the confidence that it will be able to use
it for a wide range of purposes (providing data to the data warehouses, migrating data to new
applications, linking two or more applications, linking to SOA backbones, etc.) both at the present
time and for the future.
Copyright © 2008 Tyson Consulting. All Rights Reserved.
Data Integration: TCO Really Matters 8
Old Data Integration Paradigm
The current DI integration paradigm, which is to map physical data items (i.e., data type mappings
between source and target fields), hasn’t changed for the past 15 years. It is fundamentally weak
because it isolates developers from business users.
Figure 1 ‐ Disconnect between Specification and Execution
When using most tools the connection between the specification and the execution is lost. This is
illustrated in the Figure 1 above. The business analyst can’t maintain and test source / target linkage
rules. Lack of visibility results in developers introducing errors that go undetected by the business
analyst and the rest of the team. The entire design is closely associated with the physical
environment.
Design taking too long
Design consumes much of the time in delivering a DI project, and this needs to be reduced by:
• Abstracting the design effort to a higher level where commonly understood “standard” business
terms could be used to describe transformations and business rules. This will then enable closer
direct involvement from the business, ensuring the design accurately reflects the business
requirement or rule.
• Making transformations and business rules more clearly “readable” by business technical
analysts, so it’s immediately obvious what a particular transformation does and why.
• Improving the overall level of business “friendliness.” Businesses work with transactions and
reporting – they should be able to design data integration using these concepts rather than
values such as “cust_acc” or “fin_bal.”
• Ensure adequate data profiling is undertaken early. Lack of knowledge of source data cripples
many projects.
Many aspects of data integration, such as the type of data (EBCDIC, short integer, packed decimal,
string), are not relevant for the business user and should be hidden so that the focus can be on the
real‐world business issues. While this clearly cannot be eliminated, adoption of role‐based design
Data Integration: TCO Really Matters 9
can ensure that such tasks are done once by the technical architect who understands them, and
then subsequent users/developers are freed from understanding these complexities.
Software as a Service
DI tools need to more effectively meet the challenge of software as a service (SaaS), where
organizations want to outsource some data processing but need to be able to retain security of the
data. In making data available for SaaS applications, DI will be needed to transform it to conform to a
“standard” model used by the service provider. DI tools need to take this on board.
Distributed Computing
The availability of “grid” and distributed computing – either distributed across the enterprise or even
worldwide – is rapidly becoming a reality. DI tools are currently not designed to operate in this
highly distributed environment, but new applications (SOA, MDM, SaaS, RFID, EPOS data, etc.) are
increasing the need for DI tools to be able to accommodate this new requirement.
Data Silo Support
Despite the notion that all data should be stored once, many large organizations must, for very
practical purposes, store local or regional data sets (or warehouses). This is not necessarily a bad
thing and is in any event a fact of life in most big companies, where local groups “own and jealously
protect” their data. Most current data integration tools have difficulty supporting this concept, but it
will need to be accommodated. Similarly, although there is a growing need across an enterprise to
have a common set of master data and to manage it, local business units will want to see data the
way with which they are familiar, and data integration in the future will have to accommodate this.
Implementation Methodologies
Most current tools proclaim (usually on “slide ware”) to have methodologies for implementation,
but many of these are difficult or impossible to enforce, leading to sloppy implementation (e.g.,
developers who tweak transformations in the production system). This becomes an even more
serious issue when organizations want to roll out best practices DI across their enterprise in a
consistent manner. A new approach is required here with managed implementation and reporting at
each stage to ensure the methodology is followed.
FUTURE REQUIREMENTS OF DATA INTEGRATION
Having identified some of the challenges facing DI, let’s take a look at what the next generation of
data integration tools needs to deliver.
Abstraction and Semantic Rationalization
Although DI tools are currently very useful, in general they do not provide the rules for
transformation and movement of data. Most of this work has to be done manually because the rules
and semantics are rarely found automatically in the data. To meet the growing business
requirements for agility, usability and maintainability, new tools will have to be much more
metadata‐driven and harness metadata discovery and rationalization technologies. This, in turn,
implies organizations putting greater emphasis on extending master and metadata definition and
Copyright © 2008 Tyson Consulting. All Rights Reserved.
Data Integration: TCO Really Matters 10
maintenance. The new tools will have to deliver built‐in ability to automatically recognize and
identify common data items. For example, Net Proceeds of Sales in one system may be represented
as NPS, in another as Net_PS, in yet another as net_sales.
One company I worked with had as many as 13 different variations on this item – and they were
supposed to be all the same! And there are many thousands of these types of metadata objects in
company business systems. Much of this semantic mapping and rationalization must be automated
since there will be insufficient time and resources to do it by hand. Such systems need to highlight to
business experts the exceptions or areas that cannot be automatically rationalized such that scarce
resources deal only with exceptions. After all, why pay contractors to do your data mappings – your
own business staff understands this area best of all. The new tools must help them to do it. In this
way, the new tools need to take as much manual work as possible out of DI. This will deliver benefits
in terms of supporting governance and conformance to agreed standards and productivity.
Current DI tools focus very much on the physical implementation dealing with mapping database
field names and the like. This approach gives the “developer” a pivotal role but alienates the
business user, such as the data steward. It also makes using such integrated data for BI difficult
because data descriptions are aligned to physical names. The notion of abstraction is the separation
of the actual meaning of objects from their physical implementation. We can talk about a ”product
name” much more easily than the underlying physical representations of this in databases such as
“prod_nm,” “pname,” “pro.name,” etc. For master data purposes, for example, we can maintain a
consistent definition of “product_name,” and this remains constant in spite of underlying changes in
the source systems. This is illustrated9 in Figure 2 below. The effect of this abstraction is that it is
much easier to involve business analysts in the design phase ensuring much greater transparency
and closing the disconnect referred to above (Figure 1). This then leads to reductions in design and
development times, especially in the area of facilitating construction and testing of business rules
and transformations, and so lowers TCO.
Figure 2 ‐ Abstraction and Semantic Rationalization
9 See also Neil Raden, Data Integration: What’s Next?, Smart (enough) Systems, May 2008
Copyright © 2008 Tyson Consulting. All Rights Reserved.
Data Integration: TCO Really Matters 11
A further benefit of DI tools embracing abstraction and semantic rationalization is that the
abstraction approach will make it much more straightforward to re‐use rules and transformations
already created. This will also help lower TCO.
Scalability and Speed is of the Essence
Coupled with the high cost of current DI tools, system resources cannot be estimated with
confidence. Indeed, scalability is frequently “hit and miss”, and the trend is to throw as much
computing power as possible at the problem. Applications are often run on dedicated hardware
since many current tools are designed in such a way that in order to achieve maximum throughput
all system resources are committed to the processing. This is both costly and wasteful of system
resources, and tools need to be re‐architected to take advantage of the growing popularity of grid
and distributed computing. Only by adopting this route will DI tools be capable of meeting the
demands for speed of throughput and scalability that new applications such as RFID, MDM, SOA and
EPOS demand. Real‐time data collection and analysis (such as analyzing EPOS or RFID data) is
growing fast and this data needs to be transformed, migrated and analyzed quickly if it is to offer
business value. New DI tools need to deliver a single processing engine for batch and perpetual
operations. Reducing the need to invest in dedicated hardware (usually expensive high end
machines) and enabling the use of distributed resources will lead to a lower TCO.
As the business world is moving faster, so too is the demand for timely, accurate and frequently
changing key performance information. A frequent issue is that with BI projects the ETL window is
longer than the available time for delivering the reports. Such things as monthly reports of sales
from retailers are being replaced by RFID inventory tracking in near real time. New applications for
streaming data are now available and the current DI tools can barely hold the fort.
Coping with Growing Volumes of Data
Companies are increasingly doing more business over the Internet and this has brought many new
data sources, data types and data classifications to be transformed and processed. Also business is
no longer a one‐to‐one but a one‐to‐many operation resulting in an explosion of data to be
exchanged. Added to this increased regulation and standards means than businesses have to
maintain and process much more information to enable audit‐ability and tracking. Businesses
nowadays need to store, transform, migrate and analyze increasing volumes of data (faster) to
comply with new regulations. Real time data is becoming much more important (RFID, EPOS,
product catalogs, internet shops…). This rapidly growing volume of data9, the advent of new
applications and the speed at which BI needs to deliver performance information to organizations
means that a new phase of innovation is desperately needed in DI tools.
One specific area where data volumes are getting larger is in data migration projects (e.g.: moving
from a legacy PeopleSoft implementation to SAP ERP). This is a significant and growing market for DI
($5 billion in 2007) yet many of these projects fail to meet expectations with significant cost
overruns. New tools must facilitate the design and development process to reduce delivery time and
lower costs.
Copyright © 2008 Tyson Consulting. All Rights Reserved.
Data Integration: TCO Really Matters 12
New Pricing Models
The majority of current pricing models for DI tools are based upon CPU/machine licensing. Most
models combine a core server component (the run‐time engine used by the tool) plus add‐ons such
as additional modules, developer seats, and connectors to other (packaged business) applications.
Usually, the more CPUs, the higher the price. A lack of clarity of definition of such concepts by
vendors often leads to much confusion over licensing terms. Moreover, this CPU‐based approach to
pricing in no way lends itself to coping with distributed computing resources: In that sense, this
model is outdated.
To make available the processing power required (as and when it’s needed) nowadays for new high‐
volume ETL processes demands a model based upon charging for throughput rather than for which
machine it runs on. If I only have a small amount of data which I load once a month into my data
warehouse, I would expect to pay much less than if I’m doing near real‐time data transformation
with high volumes of RFID sales data drawn from all my retail outlets. This form of model is also
much more suited to beginning a small startup project without burdening this with all the initial DI
infrastructure costs – something that current models make really hard. New pricing models will also
help lower the cost of ownership.
Open‐source providers10 have recently entered the DI space (mostly focused on simple ETL)
however, this area is still at a very early stage of development. Gartner10 cautions that it will be at
least 2011 before open‐source tools reach parity in functionality and performance with current
tools. Currently this is not a realistic alternative option for most organizations.
Security and Off‐shoring
In many areas of IT nowadays, high investment projects are seeking to reduce costs by using third‐
party offshore resources. This is increasingly the case with DI projects. The problem is that with
current DI tools this generally either means giving the third‐party developers access to your sensitive
data or spending considerable resource on creating sets of “sanitized” test data. This is equally
applicable when making use of SaaS suppliers. You need to transform data yourself to a suitable
format for them to use, or ask the supplier to do it. Either way you are exposed to risk of your
sensitive data (for example financial records) getting into the wrong hands. The reason behind this is
that most current tools, because they do design at the physical level, need to operate with the actual
data. There is no way to test the business rules and transformations other than on the real data.
New DI tools, with their design based around abstraction and semantic rationalization can help.
Much of the testing can be undertaken at the abstraction level without the need to process large
batches of sensitive data just to verify that a given rule works correctly.
Full Lifecycle Management
Current tools lack the understanding and management of the full lifecycle of a DI project, from initial
ideas phase to production environment and maintenance. This process, when managed effectively,
ensures the continued success of data integration projects. However, all too often a group of people
is brought together (often virtually) for the duration of the project, after which they disband taking
10 Gartner, Open Source in Data Integration Tools 2008, March 2008
Copyright © 2008 Tyson Consulting. All Rights Reserved.
Data Integration: TCO Really Matters 13
with them all the experience and understanding. When changes have to be made or a new team
comes along, the only option may be to start over.
Full lifecycle management (see Figure 3 below) requires that the DI tool provides a managed active
metadata repository that can hold all the data related to managing and monitoring the project. But
also essential is to allocate work on the basis of roles such that the project tasks are matched to skill
levels and costs of resources. The new tools must encompass design, development, optimization and
deployment at run time and use intelligent automation of these steps wherever possible. They must
also be able to track who took a particular action, who is responsible, has it been completed, and
any delays in the path to delivery. So a single (possibly federated) metadata repository covers every
aspect of the process from project management, to design to collaboration and even re‐use. It needs
to use abstraction because this presents a common view independent of the physical
implementation for each role. Abstraction supports and promotes re‐use too. Once a transformation
or business rule is formulated and parameterized at the abstracted level it can be applied in a variety
of circumstances.
Figure 3 ‐ Full Data Integration Lifecycle Management
Lowering TCO
How will the requirements outlined above contribute to ensuring lower TCO? We can conclude that
there are four broad areas of TCO in which the new tools (some of which are already coming on the
market) will directly or indirectly contribute to lowering the overall cost of ownership of DI tools and
projects.
Lower Software Costs. By moving away from the outdated current CPU/machine‐based pricing
models to ones based on throughput and usage. Providing the ability to buy just enough DI software
to solve the problem at hand, and extend it at a later stage when appropriate.
Reduced Hardware Costs. By moving away from a model which throws high‐end hardware at solving
the problem of performance and throughput to one which is focused on making optimal use of
Copyright © 2008 Tyson Consulting. All Rights Reserved.
Data Integration: TCO Really Matters 14
existing distributed (grid) computing power. Using re‐architected engines that can cope with both
batch and perpetual processing, and which have a smaller footprint and use less machine resources.
Lower Overall Cost of Development. Embracing concepts such as abstraction and semantic
rationalization to re‐connect the business analysts and DI developers to ensure design time is
radically reduced. Banishing the current myth of re‐use and making this a real and practical
proposition for reducing development time.
Lower Lifecycle Management Costs. By providing tools that monitor and help manage the entire
project lifecycle from initial design to production process focused on a central shared metadata
repository (see Figure 3 above). Valuable time is lost and extra costs incurred by problems at
interfaces and hand‐offs between project roles. Tools that monitor this process and alert
management to possible bottlenecks, hand‐over failures, failure to make use of existing code, etc.,
will help shorten the design and development time. Similarly, ensuring that the disconnect between
business analysts and developers is removed – providing visibility of business rules and
transformations – will speed progress and reduce errors.
Taken together these factors will result in lowering TCO and ensuring DI becomes the ubiquitous
backbone for building future application architectures. The future of DI is upon us. You can’t afford
to miss it!
TEN TIPS FOR SELECTING A DATA INTEGRATION TOOL
What are the key aspects end users should focus on in selecting the next data integration tool to
ensure reduced TCO and success in implementation?
1. Implement and enforce strong data governance across your organization so that the definitions,
standards, ownership, location and history (audit trail) of key enterprise data is understood.
Prior to data migration, extend your data governance scope to any newly acquired business
before you undertake migration of the data. Recognize that data governance is an ongoing
program, not a once‐off project. This is not for the faint hearted!
2. Implement a data quality program as an integral part of your data governance. This is not a
once‐off initiative.
3. Select data integration tools that fully support the governance organization and provide
tracking, audit trail, alerts and other similar key performance parameters to allow effective
management of the process.
4. Start small and grow. Don’t be tempted to undertake a data integration or migration project
using a “big bang” approach. Ensure that you select a data integration tool that will allow you to
add more “technology” as you need it, rather than having an initial very high startup cost. Be
critical of high startup costs, mega once‐off deals and high maintenance costs (typically 17‐25%)
and challenge your vendors’ pricing models. They must allow you to start small and grow.
5. Choose a data integration tool that is designed for purpose and avoid tools that are composed of
multiple (bolted together) and not necessarily well‐integrated components or modules. Most
tools developed over the past 10 years fall into this category. Don’t take at face value the claims
of the vendors that their tools – despite having incorporated modules and technology from
many vendors over the years – are “fully integrated.”
Copyright © 2008 Tyson Consulting. All Rights Reserved.
Data Integration: TCO Really Matters 15
6. Select tools that support a strong business‐oriented focus for data integration. Many of today’s
tools are complex, sophisticated and difficult to learn, requiring specialists. You should choose a
tool that supports building of integration links based on business terms – let the tool take care of
translating this to the underlying code. A tool that abstracts and rationalizes the particular data
store entity and represents this as a semantic universal entity. As mentioned earlier, most data
integration tools developed in the past 10 years require that much of the work of building the
transformations be done by hand. Tools now on the market significantly reduce this – and
reduce the cost of implementation and ownership.
7. Select a tool that will offer the scalability needed to encompass your future business needs.
Most tools are unable to predict this, so seek to identify vendors that can clearly demonstrate
scalability without the need for high‐performance hardware. This is especially true when
considering transformation of real‐time data. It (e.g., RFID data) has to be analyzed rapidly if it is
to offer value to the business, so data integration tools must scale to allow this real‐time
transformation. Seek to identify tools that allow distribution of the processing rather than
concentration on a single array of top‐tier hardware.
8. Seek vendors that can offer role‐based implementation where (as described above) tasks
currently undertaken by developers can be shared more effectively among your current
experienced business staff. Ensure that the people who understand the business requirements
are fully and effectively involved.
9. Steer clear of products that require you to hard code or embed business logic into the internals
of the tool. Select a tool that presents these rules in an easily readable and understandable form
and is effectively self‐documenting. Most current tools are not self‐documenting, and most
developers don’t maintain adequate documentation.
10. Remember that total cost of ownership is what really matters. Not just the negotiation of license
costs, but all cost aspects both now and in the future of your integration strategy.
ABOUT THE AUTHOR
The former Chief IT Systems Architect for Unilever’s two Food Groups in Europe, Dr David
Waddington, is one of a number of IT directors who joined the field from a business background. He
joined Unilever in 1974, where he led an international team working on the development of low
calorie spreads and was involved in process automation. He also served as Section Head of the
Linear Programming Group and was a founding member of the IT team that supported the move to
open systems standards.
Since the mid‐1990's, he has worked on establishing common data standards, data warehousing and
data integration and was the first to introduce KALIDO® software into Unilever. More recently, he
has led a team to develop a Master Reference Data Repository, which is now being implemented in
Unilever.
He now leads his own consultancy, Tyson Consulting (www.tyson‐consulting.com) based in The
Netherlands, which offers advisory services in the areas of business intelligence, data warehousing
and information management strategies. Besides his consultancy work, David was also VP and
Research Director at Ventana Research from March 2005 to October 2006 and has recently co‐
founded The Information Difference, (www.informationdifference.com) a strategic advice and
research firm dedicated entirely to Master Data Management (MDM).
Copyright © 2008 Tyson Consulting. All Rights Reserved.