Вы находитесь на странице: 1из 6

With end users getting accustomed to instantaneous response times, Oracle, more than

ever, is challenged to provide continuous availability to its database products. An


important tool the folks at Redwood Shores have to help them accomplish that is Oracle
Real Application Clusters (RAC).

What is RAC? In a nutshell, it is a software tool that allows a single database to be


accessed by many Oracle programs. If one server fails, transactions can be redirected to
another live server with a minimum of downtime.

Oracle advertises RAC as a cure for many ailments. IT shops can misunderstand such
marketing hype, however, and not recognize the cost and benefits of using RAC in a high
availability (HA) environment.

Let’s explore some Oracle RAC best practices and in the process shed some light on
common mistakes users make when using this cluster-based technology. In this Oracle
RAC guide, we’ll take a look at:

* RAC planning best practices


* RAC implementation best practices
* RAC infrastructure considerations
* Hardware architecture and RAC performance
* RAC backup and recovery best practices
* Performance and tuning best practices

One of the most common mistakes with Oracle RAC is misunderstanding its functions
and limitations. Oracle Real Application Clusters is used as part of a comprehensive
capacity planning strategy, but the technology’s strengths and limitations are not always
understood. Here is a list of the most common misperceptions about the technology.
Learn more about how Oracle’s RAC improves availability

Spotlight on RAC, grid and availability

Oracle Database 10g high availability with RAC,

Oracle RAC is ideal for scalability

Even though Oracle Corporation wants you to buy tiny “blade servers” and use their grid
computing solution for “horizontal scaling,” that’s not how most shops use RAC. Keep in
mind that RAC is only a legitimate scalability option for very large IT shops that need
more horsepower than a single server can deliver.

Instead, it’s an Oracle best practice to scale-up first, and then scale out by first building
up within a single server through “vertical scaling.” Only after you have saturated a large
server do you need to use RAC to “scale out” the application across multiple servers.
Today, a single server’s memory and CPU horsepower can be significantly expanded
compared with just several years ago, making it easier to add resources instead of
plunking in a new server to the RAC environment. In real-world environments, a single
server can handle thousands of transactions per second. Only the world’s largest Oracle
databases need to scale-out using RAC nodes.

Oracle RAC is a standalone high-availability solution

Remember that RAC only protects you against instance failure, and that’s only one of
many components that can cause an unplanned outage. For true continuous availability,
we must deploy triple-mirrored disks (with a mean-time-to-failure rate expressed in
centuries) and redundant network components.

For complete availability on each RAC server node, you will want multiple host bus
adapters, multiple network cards and multiple power sources. Just as we have failover at
the instance layer, you need to purchase software to allow the multiple host bus adapter
cards to automatically failover and issue a notification that one has failed.

As we have noted, RAC systems require a cluster interconnect in order to accommodate


RAM-to-RAM transfers of data blocks in the RCA cache fusion layer. This interconnect
must be very fast, with high bandwidth and low latency. Interconnects include:

* Dark fiber: Dense Wavelength Division Multiplexing (DWDM) technology


* Infiniband
* Myrinet

This cache fusion bottleneck is another reason why RAC scale-out, or horizontal
scalability, is problematic. If your cluster interconnect cannot handle the traffic, extra
servers will actually degrade your performance instead of helping it. The only way
around this problem is to change your entire application to accommodate RAC, or to
purchase faster storage such as Solid State Disk.

Oracle RAC ensures fast response time

Response time for transactions is always important, but it’s especially important for RAC
databases. This is because of the connection wait-time that is used to detect whether a
RAC node, or server, has failed. Consequently, you must plan to ensure that new
transactions are serviced in less than one second wall-clock time so that you can set a
failover time of two seconds.

Oracle RAC does not need a disaster recovery component

Except in the rare cases where you can deploy Dense Wavelength Division Multiplexing
(DWDM) technology, known as dark fiber, you still need to create a disaster recovery
solution. Because RAC nodes are normally located within a few miles of each other, a
natural disaster like a hurricane would still cause a global outage. So it has become a
RAC best practice to also deploy a fast-failover geographical solution like Data Guard --
or better still, n-way Streams.

Now that we understand the planning aspect of best practices, let’s take a closer look at
RAC best practices issues after we have implemented our new database.

Oracle RAC implementation best practices

Operational RAC databases follow many of the same best practices as any Oracle
database, but there are some that are unique to Oracle RAC systems. First, it’s an
important best practice to plan RAC servers in a way that minimizes the geographical
distance between the RAC nodes while still keeping them separate, in order to avoid a
failure of all nodes.

As a reference, you can take a look at what I wrote on how to implement RAC
implementation guidelines.

In a busy RAC database, the speed of the server interconnect is critical for fast response
times. It’s a commonly accepted best practice to use the fastest possible interconnect,
typically a fiber optics solution like dark fiber.

Some shops will place RAC nodes in separate buildings in the same neighborhood, but
with the advent of the superfast dark fiber interconnect, you can use “Extended RAC”
and place RAC nodes up to 100 miles apart. This allows you to combine high availability
with disaster recovery.

Dark fiber is rather expensive, however. To reduce costs, most shops adopt a best
practice where they combine RAC with disaster recovery solutions like n-way Streams
replication.

The whole point of RAC is to make end users automatically reconnect to a surviving
server when one server fails. This is done either at the Web-server level or with the
Oracle Transparent Application Failover (TAF) option. Whichever tool you choose, you
should wait about three seconds before assuming that the server is dead and re-trying a
new RAC server.

Next, let’s take a closer look at specific RAC technical best practices.

Oracle RAC interconnect best practices

Since RAC is a method in which many instances share the same database, shared data
blocks are transferred between the servers using a high-speed interconnect called “cache
fusion.” In order to keep performance fast, it’s critical that you pay close attention to the
interconnect layer and remember these points:
RAC likes small block sizes, the interconnect must have extremely fast network
hardware, and RAC load balancing is critical to performance.

Oracle RAC node load balancing best practices

I disagree with Oracle’s practice of load balancing using a least-loaded approach because
of the overhead it lays on top of the cache fusion layer. In the real world, like-minded end
users are directed to the same RAC server. If we have a RAC system with different types
of end users, we would want to load balance according to their data needs. For example,
customer processing might be on node one, order processing on node two, and product
processing on node three. Grouping RAC end users by data needs ensures that cache
fusion overhead is minimized.

Oracle RAC disk storage management best practices

In order to implement a RAC system, you should use a shared storage device because
many servers must have concurrent access to the disks. A single instance database can,
however, use Direct Attached Storage (DAS), which is an array of inexpensive disks
connected to a single server. You must now use what is known as a Storage Area
Network (SAN). A SAN, which is more expensive and complex, is a disk array capable
of connecting to many servers, usually through Fibre Channel. This requires a unique set
of hardware, ranging from host bus adapters to the SAN itself. It’s important that your
DBA have complete knowledge of the internals of the data storage layer.

Oracle RAC block size best practices

It has become a best practice in RAC to use a small 2 kilobyte block size in order to
minimize the “baggage” shipped across the cache fusion layer. Because the block size is
the unit of work, the smaller the block size, the higher the granularity of data being
transferred, with less overhead. If you have long rows (greater than 2 kilobytes), then you
will want to move to a 4 kilobyte block size.

The implementation of a RAC cluster is only the beginning, and it’s critical to constantly
monitor the health of your RAC clusters so that you can spot and fix impending problems
before you inconvenience your end users.

Oracle RAC monitoring best practices

To ensure that a RAC node never experiences a global problem, a proper monitoring
infrastructure is an absolute requirement. RAC databases rarely fail without warning. If
the DBA understands the proper metrics to watch, he can create an alert system that
notifies him of a looming problem so that he can fix it before the instance crashes.

The DBA must monitor the cluster, the shared disk setup, ASM (or OCFS), the database
instance, listeners, and more in-depth metrics such as cache coherency, interconnect
latency, disk times from multiple systems, and a range of other things.
While higher-cost performance monitoring tools such as Oracle Grid Control can help
perform rudimentary RAC monitoring for beginners, a RAC DBA should have the
coding skills to build his own RAC monitoring infrastructure using dictionary queries,
dbms_scheduler and email alert mechanisms.

Wrapping up the discussion of Oracle RAC best practices, let’s focus on the best way to
define job roles for a RAC database.

Oracle RAC staffing best practices

One best practice for RAC databases is to always hire an experienced RAC DBA to
manage your cluster, avoiding people who have had the RAC training but have no job
experience.

It’s important to recognize that human resource costs are the most expensive part of an
Oracle shop. Over the decades, hardware costs have steadily fallen while manpower costs
have remained the same.

It’s important to note that Oracle professionals with RAC skills command a hefty
premium over an ordinary DBA. A recent Oracle salary survey notes that an average
DBA earns about $97,000 per year, whereas RAC experts commonly earn $140,000 a
year. Those who manage multi-billion-dollar RAC databases typically command upwards
of $250,000 per year.

Sadly, there is no easy way to “grow your own” RAC DBA. The training courses are very
expensive, and there is no substitute for real-world experience. And training your own
DBA in RAC may make him more marketable. It’s not uncommon to spend tens of
thousands of dollars teaching RAC to your DBA only to lose him to a better job offer.

Oracle RAC job role best practices

There is a perpetual conflict between systems administrators (SAs), who traditionally


manage servers and disks, and the RAC DBAs who are responsible for managing the
RAC database. There are also clearly defined job roles for network administrators, who
are especially challenged in a RAC database environment to manage the cluster
interconnect and packet shipping between servers.

If your DBA is going to be held responsible for the performance of the RAC database,
then it’s only fair that he be given root access to the servers and disk storage subsystem.
However, not every DBA will have the required computer science skills to manage a
complex server and SAN environment, so each shop makes this decision on a case-by-
case basis.

Oracle RAC training best practices


One of the sure ways to set your company up for an unplanned outage is to fail to train
your SA, DBA and network administrator properly. SAN environments like EMC,
Tagmastore and NetApp have complex architectures, and they frequently require training
classes.

Disk configuration is also challenging, and RAC will function only when using specific
disk setups such as ASM, OCFS, RAW, or a third-party cluster file system. These tools
require training classes.

Network administrators must also receive training on how to work with the cluster
interconnect, as well as specialized interconnects such as Infiniband and DWDM.

Of all those on a RAC staff, DBAs will have the greatest learning curve. They will have
to understand how to set up and administer all of the complex RAC components,
including the clusterware and file system storage.

Вам также может понравиться