Академический Документы
Профессиональный Документы
Культура Документы
Why do we need Staging Area during ETL Load
Written by DWBIConcepts Team Last Updated: 31 December 2014
"We have a simple data warehouse that takes data from a few RDBMS source systems and load the data in
dimension and fact tables of the warehouse. I wonder why we have a staging layer in between. Why can’t
we process everything on the fly and push them in the data warehouse?"
Last night, I received this question from one of the members of DWBIConcepts community over email and
thought of discussing the pros and cons of having a staging layer in this article.
Really staging area is not a necessity if we can handle it on the fly. But can we? Here are a few reasons
why you can’t avoid a staging area:
1. Source systems are only available for extraction during a specific time slot which is generally lesser
than your overall data loading time. It’s a good idea to extract and keep things at your end before
you lose the connection to the source systems.
2. You want to extract data based on some conditions which require you to join two or more different
systems together. E.g. you want to only extract those customers who also exist in some other
system. You will not be able to perform a SQL query joining two tables from two physically different
databases.
3. Various source systems have different allotted timing for data extraction.
4. Data warehouse’s data loading frequency does not match with the refresh frequencies of the source
systems.
5. Extracted data from the same set of source systems are going to be used in multiple places (data
warehouse loading, ODS loading, thirdparty applications etc.)
6. ETL process involves complex data transformations that require extra space to temporarily stage the
data
7. There is specific data reconciliation / debugging requirement which warrants the use of staging area
for pre, during or post load data validations
Clearly staging area gives lot flexibility during data loading. Shouldn't we have a separate staging area
always then? Is there any impact of having a stage area? Yes there are a few.
1. Staging area increases latency – that is the time required for a change in the source system to take
effect in the data warehouse. In lot of real time / near real time applications, staging area is rather
avoided.
2. Data in the staging area occupies extra space.
To me, in all practical senses, the benefit of having a staging area outweighs its problems. Hence, in
general I will suggest designating a specific staging area in data warehousing projects.
Prev (/etl/etl/53methodsofincrementalloadingindatawarehouse)
Next (/etl/etl/25dataintegration)
Do you know the answer?
Which of the following is not a database?
Oracle
MS SQL Server
Hadoop
MySQL
Submit
Popular
Top 20 SQL Interview Questions with Answers (/database/sql/72top20sqlinterviewquestionswithanswers)
Best Informatica Interview Questions & Answers (/etl/informatica/131importantpracticalinterviewquestions)
Top 50 Data Warehousing/Analytics Interview Questions and Answers (/datamodelling/dimensionalmodel/58
top50dwbiinterviewquestionswithanswers)
Top 50 DWBI Interview Questions with Answers Part 2 (/datamodelling/dimensionalmodel/59top50dwbi
interviewquestionswithanswerspart2)
The 101 Guide to Dimensional Data Modeling (/datamodelling/dimensionalmodel/1dimensionalmodeling
guide)
Top 30 BusinessObjects interview questions (BO) with Answers (/analysis/businessobjects/69top
businessobjectsinterviewquestions)
Also Read
Building the Next Generation ETL data loading Framework (/etl/etl/56etldataloadframeworkrfc)
Incremental Loading for Dimension Table (/etl/etl/54incrementalloadingfordimensiontable)
ETL Design Pattern (/etl/etldesignpattern/57etldesignpattern)
Business Intelligence Certification (/etl/etl/2uncategorised/179businessintelligencecertification)
Using Informatica Normalizer Transformation (/etl/informatica/147usinginformaticanormalizer
transformation)
Have a question on this subject?
Ask questions to our expert community members and clear your doubts. Asking question or engaging in
technical discussion is both easy and rewarding.
Ask a Question, we'll Answer
Are you on Twitter?
Start following us. This way we will always keep you updated with what's happening in Data Analytics
community. We won't spam you. Promise.
Follow @dwbic
About Us
Data Warehousing and Business Intelligence Organization™ Advancing Business Intelligence
DWBI.org is a professional institution created and endorsed by veteran BI and Data Analytics professionals
for the advancement of datadriven intelligence
Join Us (/dwbi.org/component/easysocial/login) | Submit an article (/contribute) | Contact Us (/contact)
Copyright
(https://creativecommons.org/licenses/byncsa/4.0/)
Except where otherwise noted, contents of DWBI.ORG by Intellip LLP (http://intellip.com) is licensed under
a Creative Commons AttributionNonCommercialShareAlike 4.0 International License.
Privacy Policy (/privacy) | Terms of Use (/terms)
Get in touch
(https://www.facebook.com/datawarehousing) (https://twitter.com/dwbiconcepts)
(https://www.linkedin.com/company/dwbiconcepts) (https://www.youtube.com/dwbiconcepts)
(https://plus.google.com/b/105042632846858744029)
Security
(https://www.beyondsecurity.com/vulnerabilityscannerverification/dwbi.org)