Академический Документы
Профессиональный Документы
Культура Документы
Practice Guide:
Data Extraction Services
Ariba Spend Visibility Best Practice Guide: Data Extraction Services
Table of Contents
3 Introduction
3 Overview
3 Purpose
3 Best Practice Recommendations
3 Phase 1: Kick-off
4 Phase 2: Data Collection - Mapping
9 Phase 3: Data Collection - Extraction
10 Phase 4: Data Collection - File Transmission
12 Phase 5: Data Validation
12 Conclusion
13 Appendix 1: Common Data Collection Problems
14 Appendix 2: Summary of Bad Practices and Potential Impact
to the Project
Task: Extracting data from company ERP systems to load into the Ariba Spend
Visibility solution.
Audience: The intended audience of this document is a project team with the task of
extracting data for a spend analysis utility.
Introduction
Overview: The first and likely most-difficult task in implementing a spend analysis tool,
such as Ariba Spend Visibility, is extracting data from the existing ERP systems. In a
perfect world, one button is pushed and the data is migrated between systems.
Unfortunately, the process can be quite daunting and time-consuming. This can delay
results, or even scare companies away from implementing such a solution. This creates
a status-quo where a single corporate-level view of spend is not obtained. In today’s
global economy, this can mean millions of dollars in savings opportunities lost.
This document will walk you step-by-step through a typical extraction process up to and
including the validation step. We will use the example of a customer who just
purchased Ariba Spend Visibility and will do the extract work themselves (i.e. no
consulting or Data Transformation services). Each phase will include tasks that should
be completed and tips for best practices.
Purpose: Based on Ariba’s experience with more than 100 current Ariba Spend
Visibility customers, this guide will share best practices that we recommend to complete
this task quickly and accurately.
Phase 1: Kick-off
Day one, and the Ariba Spend Visibility contract is signed, sealed, and delivered. The
project champion has assigned an internal project manager (PM) to lead the initiative.
At this stage, the source systems that will be included should have been identified. The
PM should confirm the list and work to have at a minimum one IT resource and one
business user per source (it is possible that there is overlap for some sources).
This team should be in place and be required to attend the project kick-off meeting,
where the full project overview will be presented by the Ariba project manager (APM).
In most cases, most of this team did not take part in any of the sales discussions, so
this will provide the internal project team with an overview of the project. This is a key
step so that everyone involved knows the scope of the project and what steps require
their participation.
The Ariba PM will plan the data schema overview soon after the project kick-off. This
session will provide guidelines and formatting requirements for the Ariba Analysis tool.
This presentation should be attended by the PM, all IT resources, and core business
users. We will cover more details in the next section, Data Collection - Mapping
Customer is deploying more than 10 Identify an IT project manager who is able to lead the IT
source systems effort, track task completion, coordinate between all
source system owners and drive consistency.
4
2 Copyright © 2010 Ariba, Inc. All rights reserved.
Ariba Spend Visibility Best Practice Guide: Data Extraction Services
• Do you have the ability to load specific invoice lines such as taxes, freight, and
other indirect costs associated with purchases?
• Is it necessary to flag direct vs. indirect spend?
This is commonly overlooked, and ends up being a valuable filter in reporting. In
some cases, this will affect enrichment, but being able to quickly filter direct vs.
indirect spend is beneficial.
• Is it necessary to flag suppliers as preferred, internal, or in another method?
For example, if you are not able to link contract details to invoices, you may be able
to flag preferred suppliers quite easily. This will allow quick and easy compliance
reviews to see where quick-hit savings can be achieved.
• Are there other specific requirements that are required internally such as property-
level reporting?
A use case for this would be state-level tax reporting guidelines that require
companies to report all spend done within a specific state or country.
You have now defined the scope of the extract and overall requirements. Next up is the
mapping session. This can be difficult between various ERPs (i.e. Indirect spend may
not have part level detail, while direct data does).
The APM will have provided the data acquisition schema after the training session. You
will need to work with each source system team (IT and business user) to map their fields.
The diagram on the next page provides a sample overview of the available fields and
how the tables are related.
6
4 Copyright © 2010 Ariba, Inc. All rights reserved.
Ariba Spend Visibility Best Practice Guide: Data Extraction Services
Your project manager will provide you with the data acquisition schema and document.
They will also provide the team with training on the extract requirements.
Are there required fields? The data schema document will denote what is required
vs. recommended. It is common that a customer will not
have all data elements that Ariba can accept. In addition
to the ID key fields that link the fact tables (Invoice and
PO) to the dimension tables (i.e. Supplier, Account, etc.),
additional fields are recommended to support enrichment.
For commodity enrichment, the following list is the set of
fields that are sent to the enrichment team. Anything that
the customer believes will assist in assigning a commodity
should be provided in one of these fields:
- Invoice Description
- PO Description
- Part Description
- Supplier
- ERP Commodity
- GL Account
- Flex Fields (the first three of six are used as
enrichment key fields)
Similar to the commodity enrichment key fields mentioned
above, the more details that are provided about the
supplier, the more accurate the supplier enrichment
process can be. The following fields are recommended to
be populated as much as possible:
- Supplier Name
- Street Address
- City
- State
- Postal Code
- Country
I have a field that I need to provide, Do not get stuck on field names. They can be renamed in
but I cannot find its equivalent in the utility if they need to fit the requirements of the data
your data schema. you would like to provide. A common reason for this is to
provide hierarchies on appropriate fields.
8
6 Copyright © 2010 Ariba, Inc. All rights reserved.
Ariba Spend Visibility Best Practice Guide: Data Extraction Services
• Document the extracts very well. It should be extremely easy to understand what
was done, what data was pulled, etc. If proper documentation is not done, this
could lead to delays in the long run.
• Create the extracts to be as automated as possible. There are two reasons for this:
- Manual steps can lead to errors and are not repeatable.
- Most spend analytic projects include refreshes, so it is not a single extract, but
will typically include four per year.
• Update the extracts if there are changes required on the data. It is common that
small changes are just made in the raw files because it is easier, but the extracts
are never updated. Then the same problems are encountered during a refresh,
delaying the project.
• Have validation reports created as part of the extract. Further details can be found
in the Validation section of this document, but it is helpful to have these built with
the extracts.
Ariba has extraction guides for Peoplesoft, SAP, and Oracle. These guides provide
some basic extract details, common field mappings and starter scripts, but will need
adjusted for the specific configurations of the system. Ask your APM for these guides if
you have one of these systems.
The following screenshots show the staging area and a sample validation report.
10
8 Copyright © 2010 Ariba, Inc. All rights reserved.
Ariba Spend Visibility Best Practice Guide: Data Extraction Services
NOTE: These example slides show the validations when the extended validations are
not checked. So you may want to show both, if just one, as we recommend that you
always check that option.
Customer is deploying more than Customers with greater than 20 source systems may
20 source systems. consider creating a central collection location. The
transmission utility could be used from here to send the
files to Ariba. Some customers have also built a validation
step at this location, which also performs customer-
specific validations.
This is an extremely important step, as incorrect data at rollout can derail user adoption.
Keeping this in mind, while completing the task efficiently, Ariba recommends the following:
• Review the total spend of all data as a whole, and per source system. This may be
difficult to get a figure down to the penny, but you should be close—allowing for a
small margin. This review should be done from the ERP system itself.
• Review spend in the top 20 items per each dimension that was provided. For
example, top 20 suppliers’ spend, top 20 GL accounts, etc. per source system. This
will confirm that data loaded and is linking accurately from the master table to the
supporting dimensions. Ninety-nine percent of the time, any potential data issue will
be found following this process.
• Review unclassified spend for each provided dimension. For example, unclassified
supplier names, account names, etc. Were these unclassified because a name was
not included with the ID listed in the dimension table or because the ID listed in the
Invoice was not listed, with the supporting information in the dimension?
• As mentioned in the last section, summary reports created with the extracts will
speed up the process and be the validation that the data extracted matches the
data loaded.
• Keep in mind any filters that were built on the extract. It is common to forget that
you excluded certain expenses, and time is wasted attempting to find the
discrepancy between Ariba and the ERP system.
• If you provide hierarchies for items such as Company Site, Account, Cost Center
Management Files or Flex Dimension 6, check for accurate roll-ups. Making sure
the spend matches is first priority, but you want to ensure your hierarchies are
named correctly and all fields are populated.
Conclusion
The recent economic downturn has created a need for increased visibility to track risk,
compliance, and to make sure that companies are doing everything they can to spend
wisely. While spend analytics tools require an investment in time and money, you can
no longer afford to use complexity as an excuse.
In addition to this document, your assigned Ariba project manager should be contacted
for any questions you may have. We have deployed many successful large
deployments, and each company had the same concern of up-front efforts and the
experience from each of those projects is shared among the project manager team.
Those customers likely started the project with the same concerns, but once those
hurdles were crossed, the benefits were ultimately achieved. Hopefully this document is
helpful in getting you started on the project.
12
10 Copyright © 2010 Ariba, Inc. All rights reserved.
Ariba Spend Visibility Best Practice Guide: Data Extraction Services
Invalid joins Links between fact tables (Invoice When the file is uploaded to the Ariba site, you can select an
between and PO) and the dimensions (i.e. option to perform dimension reference checks to be completed.
supporting Supplier, GL Account, ERP The results will be provided on the error summary report. As
Commodity) are extremely important. mentioned in this document, the Ariba PM will also monitor the
tables
If the links fail, then reporting on uploaded files and will provide feedback based on the
the data is not possible. information, though you can review the file yourself.
Keeping with the supplier If you want to check the files prior to sending to Ariba, a simple
dimension example above, the query can be done in most databases (SQL, Oracle, etc).
Supplier ID and Supplier Location Depending on the file size (less than 1GB), MS Access, has a
ID that is on the Invoice table must relatively simple way to check for duplicates using the “Find
be in the same format as the one Duplicates Query Wizard”. Simply select the key ID fields for the
on the Supplier Dimension. This given table. If the joins are not valid, additional research must be
will link the invoice to the supplier, done to correct the extract files.
and allow reporting on that record.
If the link fails, data will appear as
“Unclassified” in the reporting tool,
which is the default value for data
elements that are NULL.
Commas/ The Ariba Analysis database uses This is one of the more-difficult issues to review. The upload
Double Quotes comma-separated format files. validation will catch issues if the delimited issue causes data to
Many customer ERP systems allow jump into a field that fails validation. The Ariba PM will also work
in data
commas to be included in fields with you to review validation reports after the data is loaded to
such as descriptions. If this data is make sure each of the fields loaded is correct.
included, the files must be You can also use Access or another database to search for
delimited with double-quotes or unparsable separators, typically an upload error that is presented
the fields will run together, when importing the files.
causing problems with links and An example of how the data should appear in the file is as follows:
field content. ERP Data: hex ½”x4”, cap screw
Extract View: “10”,”100”,”hex ½"”x4,"” cap screw”
Incorrect date This is one of the more-common The standard upload validation on the Ariba site checks for this
format (YYYY- issues that are seen in the initial and provides feedback on the validation report. It is also
MM-DD) extracts. The Ariba required format recommended that the extract script is reviewed, and a sanity
is yyyy-mm-dd. check can be done on the raw files prior to uploading.
Extracts are not Best practice is to automate the Ariba will provide a mapping template that can be used to track
automated/ extracts as fully as possible. In the mapping and format elements.
doing so, any changes in formats
changes not
or mappings must be reflected in Customer PM and IT Lead (if applicable) should be diligent with
documented the extract automation. Risks of the IT team members that the extracts are automated and
not doing this include: updated throughout the process.
• Any manual work increases
the risk for errors
• Refreshes are delayed due to
the re-work, and potentially
the same errors being made
in the initial deployment
• Increased transition time if
new team members are
brought on to do the work
Data Collection Mapping guides are not created This is quite common, and ultimately leads to the same errors
and updated with changes. and delays to be encountered during refreshes. Again, this leads
to increased internal costs to creating the extracts, and delays
deployment of new data.
Data Collection Data file changes are done Even if the updates are tracked well in a mapping guide, any
manually to correct data issues manual work can create problems, especially if there is a
instead of incorporating the resource change. It also increases turnaround time and resource
changes in the extracts themselves. costs to pull an extract. The extracts should be as automated
as possible.
Data Validation No plan is developed with Failure to validate the data, and provide proof that it is correct will
business users and IT to do a ultimately cause user deployment issues. The end-user will
proper validation. constantly question the data, and having good details on the
validation plan will alleviate concern that the data is not accurate.
14
12 Copyright © 2010 Ariba, Inc. All rights reserved.