Вы находитесь на странице: 1из 19

Change Data Capture

Srikant Jahagirdar
Development Lead
MSIT - Microsoft
Session Objectives And Takeaways
Session Objective(s):

Describe Change Data Capture architecture

Build a data tracking solution using Change Data


Capture
Key Takeaways

Best practices and recommendations for change data


capture solutions
Tracking Methods
The traditional tracking methods provides different
levels of change information and can be intrusive
Trigger based tracking
Timestamp columns
Join queries
Additional table to track deletes

Scenario requirements vary and one size fits all


may not be the best approach when building
tracking solutions
Typical Scenario
“Right time” BI is becoming more critical for
businesses

Data volume is increasing and batch windows are


decreasing

Efficient ETL through incremental data load is key


to reducing the overall ETL time

Upfront information about “what changed” helps


build efficient ETL solutions
SQL Server 2008 Tracking
Features
Change Data Capture
Provide rich change information harvested from
database log

Change Tracking
Light weight tracking providing real-time change
detection

SQL Auditing
Low impact tracking providing auditing information
Value Proposition
Change Data Capture provides valuable change
information about DML changes on a table
efficiently and close to real-time
Eliminates expensive techniques:
user triggers, timestamp columns, expensive join queries

Can provide answers to a variety of key questions


What are ALL the changes that happened between 12:00AM and
12:00PM?
I want to get a transactional consistent set of all changes that
happened in Order table and Order_Detail since 12:00PM
Which column changed?
Was the change an insert or an update?
Functional Overview
Configuration

Tracking Mechanism

Querying for changes


Configuration
Change Data Capture enabled at two levels
Database - sp_cdc_enable_db
Individual tables - sp_cdc_enable_table

sysadmin privilege required to enable CDC at the


database level
Requires dbo privilege to enable CDC at the table level

System metadata available to track configuration

Supported only on enterprise edition


Tracking Mechanism
Asynchronous log reading technology populate
change tables
Provide the flexibility to stop capture to minimize impact on
source
Build on top of the replication log reader technology

Provides guarantees on transactional


consistency in case of failures

Tracking infrastructure allows time-based range


queries
All changes are tracked
Captures the before and after image of updates from the log
CDC Consumer
SSIS Package
STORE TABLE
Store_Num Order_Status Store_desc
Store_Num Order_Status Store_desc
Insert 1001 ,H
1001 A Mystore1 Insert 1002, D
Update 1001, A
Update 1002, B
1002 B Mystore2
DW
Lo g
Based Cleanup
Db Log Job
Capture
Job

STORE Change Table

CDC API’s
Store_Id
Store_Id Order_Status
Order_Status Store_desc
Store_desc Operation
Operation Column
Column Info
Info Transaction
Transaction
Timestamp
Timestamp
1001 H Mystore1 Insert 0x00 1

1002 D Mystore2 Insert 0x00 2

1001 A Mystore1 Update 0x10 3

1002 B Mstore2 Update 0x10 4


Select * from
fn_cdc_get_net_changes_*(sta
rttime,endtime)
Capturing Changes
Querying for Changes
Two table valued functions (TVF) for each
change table
Allows bounded range based queries
Row filter option provides filtering of result set
Consistent result set guaranteed when using time-based ranges

All changes TVF


Useful in extracting all changes very efficiently
Optionally provide before and after image of update

Net changes TVF


Accurately identifies net operation to a row in a range
Provides information about which column(s) changed for the net
change row
More expensive query depending on filter options
Querying for Changes
TVF uses LSN to specify the range
select * from cdc.fn_cdc_get_all_changes_Customer
(@from_lsn, @to_lsn, 'all')

Log sequence number (LSN) is a binary(10) type


Yuk!!!......Why can’t I use date time to specify the range?

Time to LSN mapping function enables range to


be specified using time

Wrapper function generator provides an easy way


to deploy time based enumeration functions
Querying for Changes
Advanced system functions provide mapping and
other functionality
Easily determine the lowest and the highest watermark for
changes available in the change tables
Determine which column changed given a bitmask

Security Model
Access to change data controlled through TVF
Require access to all tracked columns in the base table at a
minimum
CDC role provide additional layer of security
Querying Changes
Best Practices and Recommendations
Change tracking Solution
Use the change tracking that suits your application
requirement

Performance
Use separate file-groups for change tracking table
Track only the required columns
Writes to the change table are logged

Querying change data


Make sure that the fromLSN and toLSN are in the low
end and high end of change tracking window
sys.sp_cdc_generate_wrapper_function
Resources
• http://www.databasejournal.com/features/ms
sql/article.php/3720361

• http://www.databasejournal.com/features/ms
sql/article.php/3725476
Contact

Srikant.Jahagirdar@microsoft.com
© 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Вам также может понравиться