Академический Документы
Профессиональный Документы
Культура Документы
Using Oracle RAC and Microsoft Windows 64-bit as the Foundation For a Database Grid Thursday, June 5, 2008 at 9:00 a.m. Pacific
Philip Newlan, RAC Pack Oracle
Abstract
Many Oracle customers chose to run their databases on Microsoft Windows. Oracle RAC and Windows 64bit continues to be a popular combination. This session will discuss real-world customer experiences from two of those who selected an Oracle RAC / Microsoft Windows 64bit combination as the platform of choice for a Database RAC Grid. You will hear why they selected the specific platform to fulfill their high-availability and scalability requirements. Why Windows 64bit is a much better choice than 32bit, migration from 32-bit to 64bit, single instance to Oracle RAC and conventional file system to Automatic Storage management will all be discussed.
Program Agenda
Introductions Why 64bit Windows ? Customer case study
TALX Intel as an Oracle customer
For anyone looking for general information on Implementing Oracle RAC on Microsoft Windows
Program Agenda
Introductions Why 64bit Windows ? Customer case study
TALX Intel as an Oracle customer
Why 64 bit ?
Why 64 Bit?
4 GB per process limitation was constraining our growth Used /3GB switch to maximize available memory Large number of connections was number one issue Running multiple concurrent instances on a 32-Bit platform would have been very challenging Performance improvements
otherwise Virtual Address Space per 64-bit process Paged Pool Non-Paged Pool System Cache Physical Memory and CPU Limits1 Windows XP Professional Windows Server 2003 Standard Edition Windows Server 2003 Enterprise Edition Windows Server 2003 Datacenter Edition (1)
Not applicable 470 MB 256 MB 1 GB 32-bit 4 GB / 1-2 CPUs 4 GB / 1-4 CPUs 64 GB / 1-8 CPUs 64 GB / 1-32 CPUs
8 TB 128 GB 128 GB 1 TB 64-bit 32 GB / 1-2 CPUs 32 GB / 1-4 CPUs 1 TB / 1-8 CPUs 1 TB / 1-64 CPUs
Product names listed are for general reference only and do not reflect actual product names. 64-bit Windows Server Standard Edition will be available for x64 only.
2GB
Background and foreground threads
Code
8TB
Background and foreground threads
Code
Program Agenda
Introductions Why 64bit Windows ? Customer case study
TALX Intel as an Oracle customer
10
TALX
Who are TALX ? TALX HISTORY WITH ORACLE SYSTEM ARCHITECTURE IMPLEMTATION CONSIDERATIONS MIGRATION CONSIDERATIONS LESSONS LEARNED SUMMARY
11
Who is TALX ?
Business Process Outsourcer Human Resources and Payroll related services The Work Number (Employment and Income Verification) ePayroll (Electronic Pay Statements, Direct Deposit, State and Federal W-4 maintenance, .) W-2 eXpress (Electronic issue and re-issue of W-2 forms, W2 correction processing, and paper W-2 printing and mailing) I-9 eXpress Electronic I-9 processing and compliance management HireXpress Automated on-boarding services Unemployment Cost Management Services Employment/Hiring Tax Credits and Incentives Talent Management and testing services
12
13
14
Why Windows?
TALX runs 99%+ of our IT on Windows technology and the Intel x86 platform Licensing costs on Intel platform
1 Processor license covers 2 cores
15
Strong belief in the Scale Out architectural principles embodied in Oracle RAC and most Windows technologies Deep technical understanding of Windows technology
Architecture Development Operations
100s of Man Years of internal experience with Windows Easy to hire talent for the Microsoft platform Our primary development platform is Microsoft .NET and C# Long term relationship with Microsoft Excellent overall support from Microsoft
Why RAC?
Continued Scale Up would have been VERY expensive
Commodity hardware stops at 8 Processors
16
Recovery time
Failed node would imply at least a 10 minute outage We believe we can tune our RAC environment to less than 3 minutes
Failsafe Configuration
HP EVA8000
Windows NTFS File System LUNs are exposed as Windows Mount Points Hardware Configuration Each Node HP DL740 (8) 2.8 Ghz Xeon processors (single core ) 8GB Memory Dual HBAs for SAN Dual 1000Base FX Nics for LAN Dual 100BaseT Nics for Heartbeat LAN
17
SAN
Software Configuration Each Node Windows Server 2003 Enterprise Edition 32 Bit Oracle 10GR1 Microsoft Cluster Services for failover cluster support Oracle Failsafe for database failover cluster support Database stored on NTFS filesystem presented as mount points under a single logical drive letter
SLDC1
18
19
20
ASM 1
ASM 2
ASM 3
ASM 4
ASM 5
ASM 6
Implementation considerations
21
Fully evaluate SAN / LAN environment for single points of failure and eliminate them
Single points of failure greatly reduce the value of RAC
Implementation considerations
Storage design
Insure storage platform has an Atomic Snapshot capability Optimize storage design for your storage array HP EVA platform has virtualization at the array level Maximize number of physical disks in a disk group Insure data striping implementation between ASM and Storage platform are compatible Optimize LUN sizing Make sure number of LUNs is compatible with storage Atomic Snapshot capabilities Insure you are fully utilizing Multipath capabilities Avoid controller / SAN path hot spots
22
Implementation Considerations
Failure handling
Transparent Application Failover Fast Application Notification Requires application changes to exploit TALX was not able to use this at this time due to using Microsoft Oracle Provider for .NET
23
Application Considerations
Sequence handling Need to pay particular attention to applications which count on sequences for temporal ordering of rows Sequence caching is extremely important for RAC scalability TALX implemented cache of 250 and ordered as follows: Record currval of sequence Delete sequence create sequence sample.unique_id_seq minvalue 1 maxvalue 999999999999999999999999999 start with (captured currval value) increment by 1 cache 250 order;
Migration Recipe
Implemented RAC on a completely new server farm Created a parallel environment
Support testing ALL applications Sufficient infrastructure to create stress that is 3-4 times expected capacity
24
Migration tests
Perform migrations until they had a repeatable recipe Benchmark migration times to insure they could work within the allotted timeframe It is extremely critical that your final test run exactly mirror your production migration process
Migration Recipe
LDAP integration for Service Name resolution is very useful to point applications and users to new environment
If you are not currently using LDAP, a recommended precursor to your migration is to migrate all Oracle Clients to use LDAP prior to migration
25
Schema comparison
Prior to migration capture a clean snapshot of all schema structures Post migration capture a snapshot of schema structure of new environment Compare
Make sure you have a well though out back out strategy You may need it.
Lessons Learned
Do NOT modify migration recipe after you have completed testing Make sure you understand how sequences are used in your applications and the caching options available
Determining root cause a RAC node failure can be very challenging if the Oracle Fence driver evicts the node
Fence driver will cause a windows Blue Screen failure if it needs to evict a node Debugging can require using of Windows Crash Dumps
26
Check and double check device driver versions and parameter settings are correct and consistent
Automated comparison against gold setup is the ideal solution
Lessons Learned
All updates to OS and device drivers must be well understood to avoid unintended consequences Insure you have stress tested all high volume areas of your application as CBO changes can have a big impact
If at all possible review all execution plans
27
Make sure your team receives adequate training on RAC and Grid Control Tape backup licensing Our vendor requires licenses for all RAC nodes even if you are only performing backups / restores using a subset of nodes.
Lessons Learned
Insure proper setup of parallel operations
By default any database operation that can use parallel operations will use all instances of the database in a cluster Improper setup can result in overloading the Cache Fusion Network and all nodes in the RAC environment failing
If you are using parallel operations make absolutely sure you have tested your scenarios Diagnosing this problem is EXTREMELY DIFFICULT with current tools
28
Our solution
Set the parallel_max_servers parameter Recommended value is 2 * # processors (cores) per server Our case: 2 * 4 processors * 2 cores/processor = 16 Use the parallel_instance_group parameter to define instance groupings that can be used for parallel operations. In init.ora specify: sss1.instance_groups = node1, 2-nodes sss2.instance_groups = node2,2-nodes For the query select /*+ full(sss_employee) */ max (last_signon_date_time) from sss_employee If NO alter session parallel_instance_group is performed All nodes supporting the SSS database are used If alter session set parallel_instance_group = node1 is performed only the node with the sss1 instance is used If alter session set parallel_instance_group = 2-nodes is performed only the nodes with the sss1 and sss2 instances are used To prevent the default of using all instances. In the spfile for each database, specify a default parallel_instance_group session parameter for each node. For sss1 we have a default session parameter node1.parallel_instance_group = node1 For sss2 we have a default session parameter node2.parallel_instance_group = node2
Future direction
Application enhancements to support Fast Application Notification
Requires switching from Microsoft .NET provider to Oracle ODP.NET
29
Summary
RAC Fault tolerance has been a significant improvement as compared to Failsafe When implementing RAC make sure you have allocated appropriate time for testing and tuning 64 Bit has removed our memory constraints
We are planning to increase our memory from 20GB to 48GB
30
Relevant resources
Parallel operations in RAC
http://www.oracle.com/technology/pub/articles/conlon_rac.html http://christianbilien.wordpress.com/2007/09/12/strategies-forrac-inter-instance-parallelized-queries-part-12/ http://www.dba-oracle.com/oracle_tips_cpu_count.htm
31
Program Agenda
Introductions Why 64bit Windows ? Customer case study
TALX Intel as an Oracle customer
32
33
Scaling Very Large Databases with Oracle RAC using mainstream Intel EM64T Servers
Srinagesh Battula, Deepen Chakraborty, Gayathri Seetharaman
Sr. Database Architects Intel Technology Manufacturing Group Intel Corporation
34
Agenda
Intel Factory Automation DSS systems Illustration of the Challenges faced Option Analysis of solutions Scalability using Intel Architecture based mainstream servers and RAC. Consolidated DSS Clustered Database DSS RAC Cluster - End to End Architecture Workload Management Application Integration High Availability at all Layers Results Some examples Key Learnings Summary
2
35
Who are we?
Database Architects of Intel Technology Manufacturing (TMG) Group.
We deliver and proliferate integrated cost effective Automation solutions for Intel manufacturing.
Provide high reliability, availability, manageability, performance and scalability of the systems
36
Intel Factory Automation DSS Systems
Intels Factory Automation DSS systems used for making critical Manufacturing decisions
Operational and Planning. Engineering Analysis. Process control.
Huge data explosion in Intel Factory Automation DSS was projected due to advanced manufacturing processes to support ever expanding Intel Product pipeline.
Rapid data retrieval and complex analysis is important for timely manufacturing decisions
4
37
Challenges Faced
Existing single instance implementations lacked sufficient headroom to address the scalability requirements.
Limitations of 32-bit architecture (8 cpu/8 Gig)
Limits the max users, size, query performance
Manageability challenge with Distributed databases. Cross DB Query performance Issues. Capacity growth requires re-platformization
38
Scaling Option Analysis
Criteria for Selection:
Lowest TCO Linear incremental Scalability Compatibility with existing Windows based apps
RAC/Linux
- Integration with existing Windows apps + Industry acceptance + Incremental scalability
32-cpu/HP-UX
- Integration with
RAC/Windows
32-cpu/windows
+ Few app changes - Re-platformization - Industry acceptance - Hardware costs
Performed internal benchmarking/Scalability study using RAC on Intel Commodity servers for a Multi-TB DSS db
Observed Near-Linear Scalability
Varying OS
Scale-out
vs.
Scale-up
Solution was to consolidate the 3 DSS databases on a single RAC cluster in each factory Stack is built on Intel EM64T servers running 64-bit windows 2003 6 Advanced server and 64-bit Oracle 10gR2 RAC.
39
RAC cluster Hardware/Software stack
Item
Server Config OS Cluster Software Database File System
18
40
Scalability using Intel Architecture based mainstream servers and RAC
Architecture has been validated against a projected 20TB database on a 10 node RAC cluster Moving to 64-bit Intel servers enabled each individual node to be scaled up
Providing more processing power and a larger memory footprint essential for resource intensive DSS queries.
41
Consolidated DSS Clustered Database
Nodes for database A
Node 1 Node 2 Node 3 Node 4
inst1
inst2
inst3
inst4
inst5
inst6
inst7
inst8
inst9
inst10
DB 10 Node RAC cluster Consolidated 3 DSS databases into a single cluster database Used workload management to confine functionality of various users to separate nodes
Database A Database B Database C : Node 1, 2, 3, 4 : Node 5, 6,7 : Node 8, 9, 10
8
42
DSS RAC cluster End to End Architecture
43
RAC cluster ASM DG layout
OCR/VD layout (RAW)
2 OCR 3 Voting Disks
Diskgroup layout
ASM with External Redundancy +EADATA diskgroup and +EAFRA diskgroup (i.e., Flash Recovery Area) All the Disks are Striped and Mirrored (SAME concept)
EADATA 1 copy of control file 1 member of the REDO log groups of all the 10 threads Duplexed archive log destination SYSAUX,SYSTEM,TEMPGROUPS,UNDO, App tablespaces SPfile, block change tracking file 19 EAFRA 1 copy of control file 1 member of the REDO log groups of all the 10 threads archivelogs Level 0, Level 1 image copies of the database
44
Workload Management
As query nodes
Services
Bs query nodes Bs loader node Node 5 Node 6 Node 7 Cs loader node Node 8
Cs query nodes
Node 9
Node 10
inst1
inst2
inst3
inst4
inst5
inst6
inst7
inst8
inst9
inst10
Instance Groups
Cs query instance grp Bs query instance grp As query instance grp Bs loader instance grp As loader instance grp Cs loader instance grp
Dedicated one instance each for the three database domains loader component Server Side Connect-Time and Client Side Connect-Time Load Balancing enabled for effective resource utilization Transparent Application Failover (TAF) enabled for all query services Enabled Multi-node parallelism via instance groups to match the services setup
10
45
Application Integration
Pre-RAC
Node 1 Node 2 Node 1 Node 2
RAC
Node 10
db ETL
inst1
inst2
inst10
ETL Node1
ETL Node2
Moved all ETL apps that were hosted on the db nodes to other failover-clustering (MSCS) systems.
Thus avoided multiple clustering software components (Clusterware and MSCS) on the RAC database nodes.
46
High Availability at all Layers
Level Db Level Application Level App connectivity to Db Network Public/Private SAN host level SAN level Backup level HA solution Used Oracle RAC MSCS cluster Services set to preferred/available, TAF Teamed NIC, going to two separate switches 2 HBA -> 2 switches -> 2 controllers per SAN RAID 10, 2 controllers per SAN 4 media servers, 4 db server nodes, 2 connections from each server node NIC failures, Network switch failures HBA on host failures, SAN switch failures SAN failure, controller failures, disk failures Channel failures, tape failures, tape software failures Solution resilient to failures Db Node failures, db instance failures Node failures, application instance failures Db Node failures, db instance failures
12
47
Results Some examples..
Category
Data loading
Observations
2x improvement - Allowed to scale to projected concurrent users
Contributors
64-bit stack on single node
Queries
combination of powerful single node (32GB RAM, 16 logical CPU), 10g 64-bit and RAC New architecture allowed queries to take advantage of multi-node parallelism
Backup
Disk to Disk Level 0 backups run at 1TB/hr Disk to Tape Level 0 backups run at 1.4TB/hr
Infrastructure design to utilize multiple nodes in parallel to drive backup End to end support for throughput (tape library, media servers, RAC db nodes)
13
48
Key Learnings
Integrated multi-domain team work is essential for delivering the system (App owners, DBAs, SysAdmin)
Solutions for HA, RAP evaluation, Performance Tuning, Backup/Recovery all were facilitated due to multi-domain team work
14
49
Key Learnings contd.,
Use Oracle RAC out of the box components for smoother integration and supportability
RDBMS, Clusterware, ASM, RMAN for e2e Backups, Enterprise Manager/Grid Control for Metrics and Monitoring.
15
50
Summary
Intel Architecture on RAC stack provided Infrastructural capability/capacity to push more workload
Faster Processing power via Intel Mainstream Servers Efficient I/O via ASM Larger global virtual shared cache with larger SGAs Efficient workload management & Multi-instance parallelism Incremental scalability
16
Program Agenda
Introductions Why 64bit Windows ? Customer case study
TALX Intel as an Oracle customer
51
Summary
Key Items to take away
Why 64bit ? Why RAC ? Team work is essential Within an organisation Amongst Hardware & Software partners Test, Test, Test .. And Test again !
52
Program Agenda
Introductions Why 64bit Windows ? Customer case study
TALX Intel as an Oracle customer
53
54
Q&A
55
search.oracle.com
RAC
56