Вы находитесь на странице: 1из 9

Introduction to the choice of implementing Oracle RAC Oracles Real Application Clusters (RAC) product is a great temptation for

DBAs and businesses alike. What could be better than 24x7 availability, true scalability, rock-bottom hardware prices due to commodity servers, high performance, and maximum user concurrency? It all sounds like a miracle piece of software, well worth the extra cost and implementation. However, there are things you need to know about RAC before you commit your business to using it. We will divide these concerns into three main sections: 100% Uptime Performance Scalability

For each of these sections, there are definite pros and cons. This presentation aims at providing a non-biased view based on years of experience with Oracle RAC to help you choose the best path for your business.

We need RAC to stay competitive! Find out what RAC is and report back immediately. What is Oracle RAC? Perhaps you heard about RAC from an Oracle sales sheet and were captivated by the wide range of benefits covered. Or maybe you heard about it from a colleague or an article in InfoWorld, where you saw another business implemented it and cut their downtime right to the 0.00001% mark. Wherever you hear about RAC, you usually only receive a single viewpoint. Some people really love it because of the benefits theyve gotten, though at a cost, where others absolutely hate it because of the trouble it has caused them. I have worked with companies that have felt both ways. Before we actually delve into all these pros and cons to gain a whole view of a RAC implementation, lets take a look at what RAC is from a high-level perspective. The Oracle RAC System Though we all refer to our implementations of Oracle as the database, a complete Oracle system is actually formed of two parts: the database and the instance. Component 1: The Database The database is simply our files on disk. An Oracle database consists of three specific required file types. Datafiles Control Files

Redo Logs

Datafiles in RAC Oracle Datafiles are the final storage location of our data. All data that is inserted, updated, or deleted will make its way to the datafiles (eventually) once they are committed. These files are physically stored on disk resources. These files make up what are known as tablespaces. A tablespace is a logical disk area that Oracle objects such as tables and indexes can be stored in. When a DBA or developer creates an object, the object is placed logically in the tablespace and physically in a data file. Objects are further broken down into extents and blocks, but this is beyond the scope of this explanation. In Oracle 10g and beyond, there are two tablespaces that are absolutely required: SYSTEM and SYSAUX. Control Files in RAC The Control File is the record keeper of the Oracle Database. It keeps track of the current state of the datafiles and redo logs, archive logs, and the database itself. In a standard Oracle system (one database, one instance) you may have multiple control files, but they are all copies of each other. This is known as multiplexing. Oracles Control File is a required file. If you lose a control file, the instance will crash until a recovery of some sort is performed. Redo Logs in RAC clusters Think of redo logs as a tape recorder that records every change in the Oracle database. As changes occur, they are regularly recorded in the online redo logs, just like you might record a movie on your VCR. Also like VCRs, Oracle can replay the saved transactions in the redo logs, and re-apply lost transactions back into the database. Many times, this means that Oracle can recover from a crash without the DBA having to do anything other than just telling the database to startup. At a minimum, Oracle requires that you have two redo logs assigned to the database. Oracle will write redo to the first log, and when the first log is full, Oracle will switch to the second log and write the same redo. Each of these individual online redo logs is known as an online redo log group. The reason we call them groups is that there can be mirrored copies of the online redo log files in each group. Like control files, its a good idea to have multiplexed copies of the redo logs. Each copy of a redo log file within a log group is called a redo log member. Each redo log group can have one or more members. Component 2: The Instance The Oracle Instance is the actual runtime aspect of Oracle. The instance is made up of: Binary Processes RAM Memory

Binary Processes in RAC Oracle actually runs as five critical and required binary processes that are activated when you start your instance.

SMON The System Monitor. SMON is primarily used to recover a crashed instance.

PMON - The Process Monitor. PMON cleans up dead processes and registers network services for the instance. DBWR Database Writer. DBWR is used to write blocks to datafiles (transition from instance to database) LGWR Log Writer. LGWR writes redo information to the redo log files. CKPT Checkpoint. CKPT assists in keeping all files in sync.

Please note that on Windows, these five separate processes are threaded under a single process called ORACLE.EXE. If any of these processes fail, the entire instance of Oracle crashes. In a single instance environment, this results in downtime. RAM Memory and RAC Oracle stores data in RAM in an area called the System Global Area (SGA). The SGA is broken down into pools where data can be temporarily stored before being discarded, overwritten, or flushed to disk. These pools, or memory areas, are: Buffer Cache Stores cached blocks of data from Oracle datafiles when queried. Also stores data written with inserts, updates, and deletes (called Data Manipulation Language, or DML). Data is flushed from this pool via DBWR to the datafiles. Shared Pool Caches the means by which SQL can be executed, called an execution plan. When SQL is run, it must be parsed; if the execution plan is cached in the shared pool, the parse phase is sped up considerably. Log Buffer Stores change data to be flushed to the current redo log file. Flushing occurs every commit, every three seconds, when the buffer is 1/3rd full, when it reaches 1MB, on checkpoint, or when required by DBWR. Note that the Buffer Cache is very important for RAC. I will explain this in a moment. Cache Fusion for RAC RAC provides us a multiple instance, single database system. In a RAC environment, there is one shared set of datafiles. Each instance in the cluster will have its own SGA (RAM areas) and binary processes. They will also have their own control files and redo log files, though these must be viewable by all nodes, or systems, in the cluster. A RAC environment uses something called Cache Fusion to bring all the instances in the cluster together. Each instance has its own Buffer Cache, as we saw in the previous section; however, Oracle fuses these caches together into a single global buffer cache. This occurs over a private network called a private or cluster interconnect. This cluster interconnect allows each node of the RAC cluster to share cached data located in the buffer cache with any other node on the cluster.

Figure 1. A simple view of Cache Fusion at Work Notice in the image above that Instance 1 (server 1) queries the centralized storage to find all employees between 1 and 10. Once this query has been executed and fetched, the data will be cached in Instance 1s Buffer Cache. If Instance 1 were to require any of this data again, it would have to look no further than local RAM. RAM is much faster than disk, and so the query would return much quicker. Now notice that Instance 2 runs a query that wants a row that Instance 1 already has cached. In this case, Instance 2 would receive the data over the high speed network interconnect using Cache Fusion. This RAM to RAM transfer over the network isnt as fast as local RAM, but it definitely beats going to disk for it! High Availability and RAC RAC also gives us the benefit of High Availability. If instance 2 above crashes; for instance, due to a power plug being kicked loose, or a fatal error of some sort on the system, Instance 1 will take over the user load. All connections that would have pointed to Instance 2 (and in some cases connections that were already pointing at instance 2) will fail over Scalability and RAC There are two ways to scale your hardware: horizontally and vertically. We all know about vertical scaling; we build up. We add CPUs, RAM, etc until the system we are on is full. To visualize scaling vertically, think of Manhattan. There is no more room on the horizontal plane; they cannot build new buildings. However, they can build up, taller, and therefore have the skyscrapers we all know, love, and sometimes avoid! Scaling horizontally is the practice of adding new systems to the cluster. For example, think Oklahoma. There is a lot of land available, acres and acres of spare room. When new developments are needed, they do not need to build taller buildings. Instead, they build out, scaling upon the horizontal plane (or plains, as the case may be). Is That It?

No, of course RAC does much more internally. There is software called Clusterware that must make the bridge between the nodes, or servers, of the cluster. Disks must be set up properly in order to allow this shared storage. Networks must be set up just so to allow data to transfer freely from node to node. Complex locking mechanisms must be in place to make sure data is reliable and secure. There is much more than this diagram to a fully functional RAC system. However, it provides us enough meat to start talking about the pros and cons of using Oracle 10g RAC. What Does RAC Do For My Business? The primary goal of RAC can be summed up in a single word: Uptime. Uptime and Oracle RAC Data drives business. Applications, DSS, expert systems, reporting, analytics, they all require a steady stream of data to keep them alive; and thus, your business requires data to stay above ground. If a bank loses its core transaction database for even a single hour, it can and will cause massive amounts of error, possible data corruption, and millions of dollars lost. And though this seems like a horrible loss, others can be more horrible still. Imagine if the data powering the FAAs air traffic control systems was suddenly lost, with the hundreds of planes in the air at all times? Or if the database powering a just in time provider of organs for transplant were to suddenly crash because the janitor pulled the plug? It sounds incredibly dramatic, but a crashed database could end lives. Oracle RAC is a High Availability (HA) system. It makes downtime more bearable by providing a multiple nodes to connect to. If you have a four node RAC cluster and a single node crashes, three nodes will take over immediately, without a single second of downtime, and allow your business to continue. Not all downtime is bad. Downtime comes in two categories: planned and unplanned. Unplanned Downtime and RAC databases Unplanned downtime was mentioned above, and is generally regarded as the worst type. It can last from seconds to hours in extreme situations, and can happen because of some of the most simple or unexpected issues. Some examples of events causing unplanned downtime: Power failure Overheating server room Kernel panic

Fat fingered mistake (for instance, a systems administrator kills a required process such as SMON) Oracle Internal errors Hackers Localized disasters (coffee spill on the new Sun server)

Planned Downtime and RAC Planned downtime is more graceful than unplanned of course, but in some ways can be worse than

unplanned downtime. Depending on the software on the server, it could require frequent restarts in order to keep things updated. Some developers and administrators want daily maintenance periods, which can cause planned downtime to be the bulk of your total downtime. RAC alleviates these issues by allowing you to have a single server down at a time. Work can progress in a rolling fashion, where one server at a time comes down, thereby allowing your operation to remain online. Scalability and RAC Oracle, other vendors, and consultants may mention that RAC is good because of the price. Though at first glance it seems expensive, an added cost per CPU on top of what you are already paying for Oracle, it can actually decrease costs by decreasing hardware requirements. Weve all seen spreadsheets for new project implementations where we list all the new hardware we will need to purchase. We have all seen the requests for huge multi-processor systems that are upgradeable to somewhere around 128 gigabytes of RAM and over 90 CPUs. They usually end up costing hundreds of thousands of dollars, and even run into the range of millions. RAC allows us to connect multiple low cost machines together in order to provide the same capability of a single large system, with the added benefit of high availability. For instance, we can use 4 16 CPU systems instead of a single 64 CPU server. We will probably save money using the lower-cost hardware, and now we can add new servers if we run out of capacity, whereas our 64 CPU system may be maxed out. In addition, a single system may have underutilized resources. If the system is waiting on a RAM resource, but the CPUs are at only 50% capacity, you are wasting half your CPUs. In a RAC environment, we can utilize every server to the max. The concurrent processes will be balanced across all the nodes of your cluster, and will therefore have a better chance to use otherwise unclaimed resources. What Could Possibly Go Wrong? After the previous section, you may be thinking Where do I sign up? Its not that simple. RAC has its drawbacks as well, from Implementation up to Usage. Implementation of Oracle RAC RAC is a complex system to implement. Most companies I have worked with require a consultant to come in to help plan their move to RAC and for the actual installation itself. There are many different pieces to the RAC environment, from networking to disk drives to Clusterware to Oracle itself. On top of that, there are some costly disk requirements. In order to implement a RAC system, you must use some sort of shared storage device. Whereas a single instance database can use Direct Attached Storage (DAS), which is an array of inexpensive disks connected to a single server, you must now use what is known as a Storage Area Network (SAN). A SAN is much more expensive, capable of connecting to many servers, usually through fibre-channel connections. This requires a unique set of hardware, ranging from Host Bus Adapters (HBA) to the SAN itself, and it can get very costly. Redundancy can also be costly. Even though you have multiple servers to fail over to, most administrators require redundancy within each server as well. This means doubling up on hardware, and double the hardware equals double the cost. For each server, you will want multiple Host Bus Adapters, multiple network cards, multiple power sources, etc. The multiple HBA cards are used in case a single one fails; but this usually requires expensive software to manage. Yet another cost is the network connection. Earlier we learned that the RAC system requires a

cluster interconnect in order to accommodate RAM-to-RAM transfers of data blocks. This interconnect must be very fast, high bandwidth with low latency. Interconnects such as Infiniband and Myrinet can accommodate this, but are very expensive. Though RAC does provide horizontal scalability, if your cluster interconnect cannot handle the traffic, extra servers will actually degrade your performance instead of helping it. The only way around this issue is to change your entire application to accommodate RAC, or to purchase other means of disk storage such as Solid State Disk. RAC learning for DBAs & System Administrators There is a definite learning curve when it comes to RAC. Because of all the different components that make up a RAC environment, multiple levels of training may be required. System Administrators will have to learn how to work with the disk resources. Complex SAN environments such as EMC and NetApp can require training of their own. In addition, Oracle RAC will only function when using specific disk setups (ASM, OCFS, RAW, or a 3rd Party CFS), and the administrator will have to assist in setup. Setting up and administering the hardware mentioned in the previous section on Implementation is no small task! Network Administrators will have to learn how to work with the new interconnect. If you use a specialized interconnect such as Infiniband, training and consulting may be required. Of all the staff, DBAs will have the greatest learning curve. They will have to understand how to set up and administer Clusterware, your volume manager or filesystem of choice, the RAC specific features of Oracle, and troubleshooting for clusters. While this does not sound like much, it makes up many days of training, lots of trial and error, and even a little bit of miracle work at times. Heck, by the time youre done, you the manager may require some training in dealing with setting up training sessions, consulting, and dealing with employees with some great new marks on their resumes! Usage for RAC Thankfully, once a RAC system has been implemented, it behaves much like a normal database. Oracles goal is to provide transparency for all users, so no one ever knows theyre even touching a complex RAC environment. However, this does not apply to the DBA. The DBA must keep everything in the RAC environment monitored, up to date, and running perfectly. With so many components, it is possible for more things to go wrong. The DBA must monitor the cluster, the shared disk setup, ASM or OCFS if theyre in use, the database, all instances, listeners, and more in-depth metrics such as cache coherency, interconnect latency, disk times from multiple systems, and many other things. While tools such as Grid Control help perform this monitoring, it costs more money, requires more implementation, and possible even training and consulting. Remember also that humans are fast becoming the most expensive part of the IT environment. With hardware costs falling on a daily basis while manpower costs remain the same, you may pay a hefty fee for the administration of this complex environment. DBAs that are RAC proficient are usually better paid. In addition, you may need more DBAs than you previously did to keep everything in top notch shape. Another note on usage comes from the architecture of RAC as a whole. Remember the Cache Fusion component we learned about in the last section? Well, its nice, but its not always a surefire winner. While RAM-to-RAM transfers over the network are indeed faster than reading from disk, theyre still not as fast as a local RAM read. You may notice key queries slowing down where they

used to be lightning fast due to the application pointing at varying nodes of the cluster. In addition, we learned in the last section on Implementation that the interconnect MUST be very fast with low latency in order to sustain your RAC cluster. If you bog down the interconnect with too many nodes, it could be that your performance hits rock bottom; this time may come sooner than you think. RAC is scalable, and it performs well, but its not the end all be all of performance. In fact, most database professionals find it easier to tune a single instance system than a RAC environment, due to the lower level of complexity and resources required for management. High Availability, Yes. Disaster Recovery, No. We have learned about instance failure, which is roughly the same as server failure. RAC protects us against this issue by providing multiple servers to which we may connect. However, remember that all data will be in centralized storage. There is still a possibility of data failure or data center loss. Data failure is the worst of the three we have seen thus far (instance and system failure), resulting in the loss or corruption of data. Some disk failures are non-disastrous; for instance, if a disk is mirrored with hardware or software RAID. Even then, if excessive disks are lost it is possible that production data could be lost as well, requiring some form of recovery. User error can also cause data loss if an operating system user removes database files with a command such as rm. In this case, the file will be removed, and the disk mirror will provide no protection. Lastly, corruption can occur if hardware or software bugs result in inappropriate data being written to the datafiles. Data Center Loss occurs when a system is completely lost, usually as the result of some sort of natural disaster. A hurricane, flood, or tornado may destroy or seriously disable an entire data center resulting in a combined loss of servers and disk. This is by far the worst unplanneddowntime scenario, and can only be protected against with extensive (and usually expensive) disaster recovery methods. Oracle provides many options for preventing downtime and data loss, all of which make up the Maximum Availability Architecture (MAA). The MAA provides us with redundancy on all components and employs different Oracle tools. RAC only makes up one piece of the MAA; it does not account for all possible problems. As we have seen in the previous section, these tools must protect us from planned and unplanned downtime. In addition, it must protect us from varying levels of unplanned downtime ranging from single server outages (which RAC covers) to entire data center loss (which RAC does not cover). Some businesses choose not to follow all the guidelines for maximum availability. When considering a high availability strategy, the DBA must consider: Recovery Time Objective (RTO) Recovery Point Objective (RPO) Downtime Cost-per-Minute Available Resources

The RTO defines the allowable downtime for the database. An advertising company may allow hours of downtime; however, a bank will usually allow no downtime whatsoever. RPO defines the allowable data loss if a failure occurs. If batch processes load our data, it may be that hours or even days of data could be reloaded. However, for a system that allows direct access by the end user, such as an online store or ATM machine, zero data loss is allowed. Downtime can be expensive. Depending on the system, costs can range from dollars per minute to tens-of-thousands of dollars lost for every minute the database is unavailable.

However, as we have seen here, uptime is expensive as well. In the previous sections weve talked about how costly RAC can be for your business; now we see that even more may be required for a fully bulletproof system. Figure 2: Example of an HA Configuration using MAA Best Practices (Oracle.com, property of Oracle Corporation) Conclusion RAC provides businesses with some outstanding benefits. Not only can you be much closer to 100% uptime, but you can also enjoy scalability with lower priced hardware, and possibly even a higher user load. But do not forget that these things come with a cost. The cost will not only be in licensing; it will be in the form of employees, training, consultants, software, hardware, and other little mentioned components of a RAC system. In addition, RAC only provides support for part of the availability spectrum. Other costs will have to be endured to become fully bulletproof. It is important for managers to understand these concepts before embarking on the RAC quest; remember that while your employees are hopefully top notch and know what they are doing, it is your credibility if you jump into a project without having a full view of its possible repercussions.

Question: I have to rebuild several very large indexes and I need to know the fastest way to rebuild the index. What are the performance options for a fast index rebuild? Answer: When Oracle rebuilds an index, you have several tuning options to make it faster: 1. PARALLEL Scan: Start an index rebuild with a full- scan, and this full-scan can be parallelized according to your cpu_count. 2. NOLOGGING: You can also use the NOLOGGING option for super-fast index rebuilding. The only danger with using nologging is that you must re-run the create index syntax if you perform a roll-forward database recovery. Using nologging with create index can make index rebuilding up to 30% faster. 3. Partition the index: If you have purchased the partitioning option, you can rebuild a local partitioned index faster than a single large index. 4. SSD: Some shops will use temporary solid-state disk to speed-up the initial index writes and move the index to platter disk storage at a later time. 5. Parallel index rebuild jobs: Also, remember that if you have the spare CPU and the indexes are on different disks, you can submit many index rebuild jobs simultaneously. On a large server you can simultaneously rebuild dozens of indexes, each using parallel query, sort of a parallel parallelism for fast index rebuilds.

Вам также может понравиться