Академический Документы
Профессиональный Документы
Культура Документы
I
n the consulting world, project estimation is a critical Before starting your ETL estimation, you need to
component required for the delivery of a successful understand what type of estimate you are trying to produce.
project. If you estimate correctly, you will deliver a How precise does the estimate need to be? Will you be
project on time and within budget; get it wrong and you estimating effort, schedule, or both? Will you build your
could end up over budget, with an unhappy client and a estimate top down or bottom up? Is the result being used for
burned out team. Project estimation inclusion in an RFP response or will it
for business intelligence and data be used in an unofficial capacity? By
integration projects is especially answering these questions, you can
difficult, given the number of
The key is being clear assess risk and produce an estimate that
stakeholders involved across the best mitigates that risk.
organization as well as the unknowns about how the estimate In many cases, the information
of data complexity and quality. Add you have to base your estimate on
to this mix a firm fixed price RFP should be used and what the is high level, with only a few key
”
(request for proposal) response for a limitations are. data points do go on, and you do
client your organization has not done not have either the time or ability
work for, and you have the perfect to ask for more details. In these
climate for a poor estimate. In this situations, the response I hear most
article, I share my thoughts about often is that an estimate cannot be
the best way to approach a project estimate for an extract, produced. I disagree! As long as the precision of the estimate
transform, and load (ETL) project. produced is understood by the customer, there is value in
For those of you not familiar with ETL, it is a common the estimate and it should be done. The alternative to a
technique used in data warehousing to move data from high-level estimate is none at all, and as someone who has to
one database (the source) to another (the target). In order deliver on the estimate, I would rather have a bad estimate
to accomplish this data movement, the data first must be with clear assumptions than no baseline at all. The key is
extracted out of the source system—the “E.” Once the data being clear about how the estimate should be used and what
extract is complete, data transformation may need to occur. the limitations are. I have found that one of the best ways
For example, it may be necessary to transform a state name to to frame the accuracy of the estimate with the customer
a two-digit state code (Virginia to VA)—the “T.” After the and project team is through the use of assumptions. Every
data have been extracted from the source and transformed to estimate is built with many assumptions in mind and having
meet the target system requirements, they can then be loaded them clearly laid out almost always generates good discussion
into the target database—the “L.” and, eventually, a more refined and accurate estimate.
A common question that comes up during the estimation Phase Percentage of Development
process is effort versus schedule; in other words, how many Requirements 50% of Development
hours will the work take versus the duration it will take to Design 25% of Development
complete the effort. To simplify the estimating process, I
Development
start with a model that delivers the effort and completely
System Test 25% of Development
ignores the schedule. Once the effort has been refined, it can
Integration Test 25% of Development
be taken to the delivery team for a secondary discussion on
overlaying the estimated effort across time.
Once you know what type of estimate you are trying to Once I have my verticals established, I break my
deliver and who your audience is, you can begin the process estimate horizontally into low, medium, and high, using the
of effectively estimating the work. All too often, this up-front percentages below:
thinking is ignored and the resulting estimate does not meet
expectations. Complexity Percent of Medium
I’ve reviewed a number of the different ETL estimating Low 50% of Medium
techniques available and have found some to be extremely Medium N/A
complex and others more straightforward. Then there are High 150% of Medium
the theory of estimating and the tried and true models of
Wide Band Delphi and COCOMO. All of these theories Generally, when doing a high-level ETL estimate, I know
are interesting and have value but they don’t easily produce the number of sources I am dealing with and, if I’m lucky, I
the data to support the questions I am always asked in also have some broad stroke level of complexity information.
the consulting world: How much will this project cost? Once I have my model built out, as described above, I work
How many people will you need to deliver it? What does with my development team to understand the effort involved
the delivery schedule look like? I have discovered that for a single source. I then take the numbers of sources and
most models focus on one part of the effort (generally plug them into my model, as shown below (Figure 1, in
development) but neglect to include requirements, design, yellow). If I don’t have complexity information, I simply
testing, data stewardship, production deployment, warranty record the same numbers of sources in the low, medium, and
support, and so forth. When estimating a project in the high columns to give me an estimate range of +/−50%.
consulting world, we care about the total cost, not just how I now have a framework I can share with my team
long it will take to develop the ETL code. to shape my estimate. After my initial cut, I meet with
key team members to review the estimate, and I inevitably
Estimating an ETL Project end up with a revised estimate and, more importantly, a
In the ETL space I use two models (top down and bottom comprehensive set of assumptions. There is no substitute for
up) for my estimation, if I have been provided enough data socializing your estimate with your team or with a group of
to support both; this helps better ground the estimate and subject matter experts; they are closest to the work and have
confirms that there are no major gaps in the model.
# Resources
Role Effort (Hours) Effort (Days) Target Days Needed Rate Cost
Business System Analyst 1833.0 229.1 71 3.2 $ 10.00 $ 18,330.00
Developer 2420.0 302.5 71 4.2 $ 15.00 $ 36,300.00
Tester 1756.0 219.5 71 3.1 $ 12.00 $ 21,072.00
Tech Lead 600.9 75.1 86 0.9 $ 14.00 $ 8,412.60
Project Manager 300.5 37.6 86 0.4 $ 18.00 $ 5,408.10
Subject Matter Expert 600.9 75.1 86 0.9 $ 20.00 $ 12,018.00
SubTotal 7511.3 938.9 12.7 $ -
Contingency 751.1 93.9 86 1.1 $ 14.83 $ 11,141.69
Grand Total 8262.4 1032.8 13.8 $ 11,141.69
Data Integration – The process of combining data from Subject Area – A term used in data warehousing that
multiple sources to provide end users with a unified view of describes a set of data with a common theme or set of related
the data. measurements (e.g., customer, account, or claim).
Data Steward – The person responsible for maintaining the Target – An ETL term used to describe the database that
metadata repository that describes the data within the data receives the transformed data.
warehouse.
About the Author
Data Warehouse – A repository of data designed to facilitate Ben Harden, PMP, is a manager in the Data Management
reporting and business intelligence analysis. and Business Intelligence practice of the Richmond,
Virginia–based consulting firm CapTech. He specializes in
RFP – A request for proposal (RFP) is an early stage in the the project management and delivery of data integration and
procurement process, issuing an invitation for suppliers, often business intelligence projects for Fortune 500 organizations.
through a bidding process, to submit a proposal on a specific Mr. Harden has successfully managed data-related projects
commodity or service. in the health care, financial services, telecommunications,
and governmental sectors and can be reached via e-mail at
bharden@captechconsulting.com.