Вы находитесь на странице: 1из 22

EIM126 Replication Server Overview

JEFF TALLMAN
SENIOR SOFTWARE ENGINEER/ARCHITECT,
EIM PRODUCT SPECIALIST, SYBASE

OCT 18-22, 2010

September 27, 2010 1

EIM126 Replication Server Overview


Data Replication & Data Movement
Differences between replication & other DM technologies
Trends we are seeing

Common Architectures
Beyond simplistic standby systems

How It Works
Pieces & Parts

Implementation Tips

Sybase Confidential September 27, 2010 2


September 27, 2010 2

1
DATA MOVEMENT COMPARISON
One Solution Doesnt Fit All..Complementary Technologies
Replication EAI/ ETL Synchronization
Messaging
Change Capture Transaction Log Application Initiated Date scan (common) Flagged rows
or Tran Log
Common Application DBMS DBMS External feeds OLTP DW Mobile applications
HA & Reporting Packaged Apps
Problem Areas Latency with batch Implementation time Latency Use of atomic row
processes and effort Impact of scanning statements vs. bulk
for changes methods
Claim to fame Serialized Non-serialized Bulk extraction and 1000s of mobile
transactions with transactions with loading users
very low latency very low latency
Implementation Easy Medium Medium Easy Medium Easy Medium
Effort Considerable
App or Schema None Application Schema (date change Schema (row capture
changes? (implementation) column) for subscribers)
Data Flexibilty Row-wise Transaction Row-wise Row-wise

September 27, 2010 3

DATA MOVEMENT REALITIES (1)


Some Common Implementation Lessons Learned

All DM Technologies have a considerable learning curve


Product/technology introduction and comfort level
Staff need to get their minds around asynchronous data movement and impact on
overall system
DBAs who get it are key candidates for architecture role
After ~1 year of use, most users will admit it is easy

Developers DO need to be aware.as app changes may happen


Fastest way to find application logic problems is to implement replication
Batch jobs: monolithic single threaded vs. multiple threads with recovery points
Planning is Key
Know what needs to go where, by when - and what the business drivers are
Monitoring is critical
If the goal is very low latency due to business requirements, any problem that results
in a backlog has a business impact..and they will call.

September 27, 2010 4

2
DATA MOVEMENT REALITIES (2)
Typical Reasons for Failure

Unreasonable expectations
Asynchronous data movement implies some latency
Realities of log based replication vs. bulk statement execution
Failing to monitor the processing
Both performance as well as exception
Operations staff not trained
Implementation staff sticking to old methods vs. exploiting
product features
May only be comfortable with <10% of product features
Rather than using one more advanced features, they will tend to re-
implement outside product using technology comfortable with

September 27, 2010 5

TRENDS WE ARE SEEING (1)


Based on Sybase Replication Server Customer Base

Overall resurgence of replication implementations


Warm Standby is a commodity/checkbox feature
Long distance BC vs. local DR as well as application/DBMS upgrades proving value of
database replication
Complemented by disk replication for zero data loss, fast materialization, lower risk
systems (Primary Bunker Continuity Site)

Significant use for offloading reporting or web isolation


70% of customers have implemented this in some form
One of the major drivers is txn volumns on OLTP or adaptation of specialty analytics
software
Significant interest in use for application upgrades
Mitigate the risks of moving from application version X to X+1
Both packaged application as well as DBMS upgrades
Allow migration to be staged/phased vs. all-or-nothing hard cut over

September 27, 2010 6

3
TRENDS WE ARE SEEING (2)
Based on Sybase Replication Server Customer Base

Large numbers of bi-directional sites with global customers


E.g. London NY Singapore for FSI
Local autonomy/performance but global operations/management
Web Facing Security Enclaves
Completely obfuscated data, with asynchronous requests (e.g. change of
address request) to avoid security issues and unpredictable performance
In-Memory Database Real-Time Synchronization

Most true heterogeneous implementations are used as a light-


weight form of application integration
Many confuse homogeneous standbys with true heterogeneous
Replication perceived as a faster, less intrusive means vs. EAI

September 27, 2010 7

COMMON ARCHITECTURES

BEYOND THE SIMPLE STANDBY

September 27, 2010 8

4
LEVERAGING REMOTE BUSINESS CENTERS
Bi-Directional Replication

Chicago
>700 miles:
Beyond reasonable disk replication
distances
Split processing/shared primary (bi-
directional)
Business Continuity vs. Disaster
Recovery

New York New Jersey


(primary) (bunker)

September 27, 2010 9

DISTRIBUTED GLOBAL CLUSTERS/SPLIT PROCESSING


Trend: Bi-directional/Multi-directional is more mainstream
Global business and global reporting key drivers
Mergers & Acquisitions as well as COTS package integration driving high requirements for heterogeneous
solutions.
Current State
Predominately peer-to-peer although a good number of split-primary with segmented data ownership
Acceptance of application partitioning as a key entrance for cluster scaling has removed the mental roadblock
with many customers
Application partitioning is a fundamental science to bi-directional systems
Deployments are predominantly in FSI linking APO, EMEA, and NAO
Other industries such as Telco and ISPs finding out that WAN latencies are causing centralized systems to hit
their limits

London

New York

Singapore

September 27, 2010 10

5
GOING TO WHERE THE MARKETS ARE.
Follow-The-Sun Trading, Corporate Visibility
FTSE

NYSE
Metals

Tokyo Grain
Energy

SIMEX

Joburg & SAFEX


Syndey Futures
& Bank HQ

September 27, 2010 11

LOW LATENCY/LIGHT WEIGHT INTEGRATION


typical FSI Implementation

Front Office
Trades Research
Mergers
Investment &
Retail Institutional Banking Asset
Sales Sales ManagementAcquisitions

Middle Office
Trade Order
Coding Figuration
Verification Control

Client Books &


Back Office Account Records
Servicing (Stock & GL)
Security
Comparison Confirmation Margin Settlement Financing
Control

September 27, 2010 12

6
WEB FACING SECURITY ENCLAVE
Zero Sensitive Data Storage - Zero Impact on Internal Systems

Request Responses:
Update address
Insert CreditCards(1,Visa,xxxx xxxx 6789)

Replicated Requests:
changeAddress(123 Main Street)
addCreditCard (1, Visa, 1232 2345 6789)

September 27, 2010 13

REAL TIME LOADING: OLTP DSS


Trend
Strong push for direct replication into analytics optimized DBMSs (such as IQ)
Customers have adopted specialty software DBMSs (e.g. Sybase IQ) for analytics due to faster
performance and decreased storage
ETL perceived as too slow (not real-time)
Corporate rollups (reporting) or consolidated systems such as ODS (DSS)
Current State
RS IQ in production since ~1997 using an intermediate staging DB
RS 15.5 direct replication into IQ (RTL) and high volume for ASE (HVAR)

September 27, 2010 14

7
IMDB MASH-UPS & KPI DATABASES
For Low Latency Queries or Federated Data Mash-ups

IMDB Mash-up IMDB KPI


(federated multi-source (NRT business
analysis) performance)

Source Systems
(disk-based)

September 27, 2010 15

HOW IT WORKS

PIECES & PARTS

September 27, 2010 16

8
REPLICATION SERVER
Since 1992

Store & Forward Messaging Based


Uses Standard Publish/Subscribe Model
Publications at individual object (table or procedure) or database level
Subscriptions at object or table level using where clauses
Collections of publications can be created and subscribed to at once
Customizable Delivery
Key Components
Replication Agent
Replication Server
Replication Manager + Sybase Control Center/Sybase Central
RS 15.5 Current Release
SMP Support, 64 bit memory addressing
Linux x86/64, MS Windows x86, Solaris SPARC, AIX, HPUX (IA64)

September 27, 2010 17

SUPPORTED DATABASES
Heterogeneous Capabilities

Source DBMSs
Sybase ASE, SQL Anywhere
Oracle 9i (legacy), 10g, 11g, RAC
IBM DB2 OS/390 & DB2 UDB
Microsoft SQL Server 2000 (legacy), 2005, 2008 (2005 feature level)
Custom (using published API to build your own)
3rd Party DB2 OS/400, etc.

Target DBMSs
Sybase ASE, SQL Anywhere, Sybase IQ
Oracle 10g, 11g, RAC
IBM DB2 OS/390 & DB2 UDB
Microsoft SQL Server 2000, 2005, 2008
Any ODBC (as long as you have an ODBC driver)
Most common message buses (via RepConnector)

September 27, 2010 18

9
RS CAPABILITIES
Swiss Army Knife for Data Replication

What RS can replicate


Table data modifications (including encrypted data fm ASE)
Applied & Request Stored Procedure executions
DDL & system procedures (Warm Standby - ASE & Oracle)
DML Statements (ASE only) - update, delete, insert/select, select..into..
Data Movement Capabilities
Bi-Directional Replication w/ SSL wire encryption
Direct & Indirect Routes (hierarchical or topological) (100s of nodes)
Read Directly from Transaction Log or from Disk Mirrored Copy (via Mirror Activator)
Delivery Capabilities
Customizable Delivery via Function Strings
Parallel Threads
High Volume Adaptive Replication (use bulk modes - ASE and Oracle only)
Design/Architecture & Systems Management
Automated object creation, modeling, impact analysis, etc. via PowerDesigner
Sybase Control Center Monitoring plus 3rd Party (Bradmark, Precise, et. al)

September 27, 2010 19

REP SERVER COMPONENTS Replicate Sybase ASE


Sybase IQ
Oracle
Primary Catalog/Control DB
Sybase ASE
Oracle Replication
RSSD
MS SQL Server
IBM DB2/OS390
IBM DB2/UDB
Rep Agent Route

Replication MQ, JDBC,


Change Detection Server RepConnector
MSMQ, etc.
ECDA Gateway
Rules Engine MS SQL
Queues
IBM DB2
Stable Devices Any ODBC
Message Repository
Sybase Control Center
Monitoring
September 27, 2010 20

10
REP SERVER (INBOUND)
RSSD

Replication Definition(s)
(repdefs)

Log Transfer
Language (LTL)

Rep Agent
Order Entry Stable Device

Primary DB
Inbound Queue

September 27, 2010 21

THE FIRST STEP: MARKING THE TABLES/PROCS


RepAgent only processes marked tables/procs/databases

Marking the tables/procs


Designates whether the table or proc is to be replicated or not
Settings are true, false, never
ASE: You can also designate whether to replicate DML and threshold for
number of rows when RA is also replicate DML in addition to rows
Syntax:
ASE sp_setreptable or sp_setrepproc
non-ASE pdb_mark (in RepAgent)

Marking the database (logical standby)


No need to mark individual tables (unless you want to exclude some)
Enables DDL replication
Stored procedures to be replicated still need to be marked

September 27, 2010 22

11
STEP 2: CREATE A REPDEF FOR THE TABLE
REPDEF = Replication Definition (most basic form of publication)

Very similar to a create table command


Table name, list of columns with datatypes, primary key
Can be autogenerated (zero typing)
PowerDesigner modeling
Sybase Central drag/drop
Numerous free scripts/utilitiesor write your own (system tables)
Capabilities
Any table can have more than 1 repdef
Each repdef can publish different sets of columns for same table
Repdefs can easily handle table name, column name and minor
datatype differences
Identifies the primary key and which columns are used for subscriptions

September 27, 2010 23

WHAT DOES A REPDEF LOOK LIKE


CREATE REPLICATION DEFINITION MYSAMPLE_RD
WITH PRIMARY AT MYSERVER.MYDATABASE
Different table names
WITH PRIMARY TABLE NAMED CUST_ORDERS
WITH REPLICATE TABLE NAMED ORDER_HEADER
( Different datatypes
ORDER_NUM INT MAP TO NUMERIC,
CUSTOMER AS CUST_ID NUMERIC,
(especially useful in
ORDER_DATE DATE, heterogeneous systems)
NUM_ITEMS INT,
ORDER_STATUS VARCHAR(10),
STATUS_DATE DATE Different column names
)
WITH PRIMARY KEY (ORDER_NUM)
SEARCHABLE COLUMNS (ORDER_DATE, ORDER_STATUS, CUSTOMER)

September 27, 2010 24

12
REP SERVER (OUTBOUND)

Subscripions(s)
Routes (Direct & Indirect)

Rep Agent
Order Entry Stable Device
Replicate DB
Outbound Queue

Route
Queue

September 27, 2010 25

STEP 3: CREATE A SUBSCRIPTION


CREATE SUBSCRIPTION FINANCE_ORDERS_SUBS
FOR MYSAMPLE_RD WITH PRIMARY MYSERVER.MYDATABASE Data source
WITH REPLICATE AT FINANCE.BILLING_DB
Subscribing site
WHERE ORDER_STATUS=SHIPPED
WITHOUT MATERIALIZATION
SUBSCRIBE TO TRUNCATE TABLE What data to replicate

Whether RS should copy


the data or not from the
source system when
subscription is first created

September 27, 2010 26

13
STORED PROCEDURE REPLICATION
aka Function Replication

Similar to table replication (repdef + subscriptions)


Caveat is we are replicating the procedure call - not all the DML contained
within the procedure
Consequently the procedure operations can be different at primary and
replicate
Customizable delivery including
Deliver As clause allows different proc to be called at replicate
Function string support
Two Modes: Applied & Request
Applied - Typical normal replication of the procedure call
Request - Procedure call is replicated and impact at replicate is allowed to
replicate back (or to where ever necessary)
Remember the Web Security example
Normally, the only time RS applied data is allowed to re-replicate

September 27, 2010 27

WHAT IS A FUNCTION STRING?


Customizing Delivery

All delivery is accomplished by function strings


By default RS generates standard SQL for the replicated DML operation
Insert, update, delete, exec proc
Grouped into function classes according to type of connection
E.g. heterogeneous connections use different function classes due to differences
in SQL dialects
Each connection specifies which function class to use

A user-defined function string


Is written in the SQL dialect for the connection class
Looks similar to stored procedure body code
Can contain flow of control statements (if/else; while; etc.)
Embedded DML uses variables or literals (much like DML in a proc)
Variable syntax supports modifiers for before/after values of replicated DML
Can use ~30 system variables for items such as source commit time, source
database/server, source transaction name, source username, etc.

September 27, 2010 28

14
EXAMPLE PROC VS. FSTRING BODY
Example TSQL Stored Procedure Code Fragment
if (@new_balance >0)
update bank_account
set balance=balance + (@new_balance - @old_balance)
where account_id = @account_id
else
insert into overdrawn_accounts (@account_id, @new_balance)

Equivalent RS Function String Body


if (?balance!new? > 0)
update bank_account
set balance=balance + (?balance!new? - ?balance!old?)
where account_id=?account_id!old?
else
insert into overdrawn_accounts (?account_id!old?, ?balance!new?)
;

September 27, 2010 29

EXAMPLE USES OF FUNCTION STRINGS


Data change auditing
Record who, when and previous/new values for every data change
without impacting the production application
Data warehousing
Denormalizations, updates inserts/upsert, small schema mods
Bi-Directional Replication Conflict Avoidance
Cell level data change support (vs. typical row overwrite)
Handle unique constraint and other exceptions that typical conflict
resolution implementations can not
Application Upgrades/Migrations
Run parallel systems by mapping old new schema and using fstrings
to implement schema differences

September 27, 2010 30

15
SCC REPLICATION TROUBLESHOOTING
Trace replication from the target to the source

September 27, 2010 31

SCC REPLICATION TROUBLESHOOTING


Navigate to the path dashboard for NYEx.trade-LondonEx.trade

September 27, 2010 32

16
SCC REPLICATION TROUBLESHOOTING
Whats happening at the replicate database connection?

September 27, 2010 33

SCC REPLICATION TROUBLESHOOTING


Increase the sqt_max_cache_size configuration parameter

September 27, 2010 34

17
IMPLEMENTATION
CHALLENGES
DUCK & DODGE

September 27, 2010 35

KEY TIP #1: DIAGRAM IT


A Picture is Worth 100000 Words

Start from a business viewpoint


What business data needs to be replicated to which other
systems/locations.
Data volumes, peak times, latency thresholds
Transaction profile (conceptual)
Data flow dependencies
Understand what business processes are dependent on the data getting there

Add in the technical architecture viewpoint


Where are the systems of record
What is role of batch processing on data movement
Wait until batch processing is done to move data or apply batch process at
both locations???
Understand and document what risks you are trying mitigate

September 27, 2010 36

18
KEY TIP #2: UNDERSTAND IT
Is Ease of Use a euphemism Dont know what I am doing?

Ease of Use is importantbut not push-button


Reduce workload by making repetitive tasks easy
Make system outputs usable
Minimizing intervention
Understand the basics of the internals
Critical for understanding troubleshooting and performance
Understand the capabilities and limitations
Dont try pounding a square peg into a round hole
Create a sandbox
We now have 64-bit laptops with 4GB of memory..to run browsers??
Cheap Linux desktops for a destructive lab
Most labs take too much red tape to use (justification writeups) - result is avoidance
Make casual Friday a play day
Alternate between brown bag lunch & learns and hands-on play time

September 27, 2010 37

KEY TIP #3: SIMULATE IT


The norm shouldnt be the unexpected

Simulate key transaction profiles & volumes


Quick & dirty java or SQL drivers
Same order of magnitude
Use real system performance metrics to derive volume information

Run in isolate as well as mixed with typical workloads


Capture simulated performance metrics
Understand the system behavior .even if there is nothing you can do
about it

September 27, 2010 38

19
KEY TIP #4: EXPLOIT IT
Make Business Continuity a Regular Event

Use BC/DR site during normal maintenance windows


Keeps staff familiar with BC/DR procedures
Mitigates risk of errant maintenance causing an outage

Leverage BC/DR site for system changes & upgrades


Provides ability to back out a system change several days after in
production with zero loss of business data
Run parallel environments
Mitigate the risks of application, hardware or system changes
Ease the migration pressure by allowing phasing of changes
Split processing to improve response times

September 27, 2010 39

KEY TIP #5: DE-SERIALIZE IT


Reduce Latency & Improve Throughput

Product automated parallelism at the best can not achieve


what you can do by understanding the application
Technique: Multiple DSI
Requires some very very slight modification to system function strings
Requires definite knowledge of transaction profiles, volumes
Current limitation is single Replication Agent per source
Use separate connections to same replicate for:
Batch processes (via stored procedure replication)
Different groups of tables with different transaction profiles

September 27, 2010 40

20
September 27, 2010 41

THANK YOU
JEFF.TALLMAN@SYBASE.COM

September 27, 2010 42

21
September 27, 2010 43

22

Вам также может понравиться