Вы находитесь на странице: 1из 54

RAN EMS HA/DR/B&R Solution For

SUN Series Server

ZTE University
Content

Chapter 1 HA/DR/B&R Overview


Chapter 2 HA Architecture and function
Description
Chapter 3 DR Architecture and function
Description
Chapter 4 B&R Architecture and function
Description
The importance of HA/DR/B&R
In September 11, 2001, the World
Trade Center Twin Towers suffered
unforeseen terrorist attack. One
year after the disaster, by official
statistic, 350 enterprises original
worked in the building, nearly 200 of
them are bankrupted and vanished,
because of the importance
information systems being
destroyed and critical data losing.

Compared with the bankrupted enterprise, just few hours after 9.11 disaster,
world financial giant Morgan Stanley decided all the service will be resumed
next day. This is because the company has established data backup and
remote disaster recovery systems that protect the company's important data.
Disaster recovery ability overview
The main requirement of Disaster recovery ability
To ensure continuity of application system service
Protect the security and integrity of system data
The main influence factor of system stability and
data security
Operator misuse and data deleted by mistake
Server hardware failure
Operating system failure
Application system failure
Natural disasters, such as flood, fire, earthquake

The Indicator of Disaster Recovery

Recovery Point Objective (RPO)


RPO is an indicator reflecting the resumption of data integrity, RPO
value is smaller, indicating that the data lost is less when a disaster
occurs, and also corresponding investment is higher.
Recovery Time Objective (RTO)
RTO is an indicator of service recovery time, means the required
time from service disruption to service recovery. The RTO value is
smaller, the disaster recovery ability is stronger, and also
corresponding investment is higher.
International Standard of Disaster Recovery

SHARE 78 International standard of disaster recovery


SHARE 78 define data backup and disaster recovery level, and classify 8 levels for
data backup and disaster recovery. Level 0 is the lowest level, which is not adopting
any backup technology, and Level 7 is the highest level, which is non data loss.
The level is higher, the data lost is less, and also corresponding investment is
higher.
ZTE HA/DR/B&R solution
ZTE provides HA/DR/B&R solution to ensure the
service continuity and data security
High Availability solution
Disaster Recovery solution
Backup & Restoration solution
Comparison with Single Server, HA, DR, and
B&R
ZTE provide HA/DR/B&R solution to ensure system stability and data security.
The difference function and application scenarios between with Single Server,
HA, DR, and B&R.
Single
No. Application Scenarios HA DR B&R
Server

1 SHARE 78 level Tier 0 Tier 0 Tier 6 Tier 3


2 Recovery Point Objective (RPO) Wks+ Wks+ Mins Hrs
3 Recovery Time Objective (RTO) Days Mins Mins Hrs
4 Application system redundancy and support fail over No Yes Yes No
5 Environment disaster No No Yes No
6 Application data on-line auto backup No No Yes Yes
7 Oracle database on-line auto backup No No Yes Yes
8 System on-line auto backup No No No Yes
9 Restore application data from Early recovery point No No No Yes
10 Restore Oracle DB data from Early recovery point No No No Yes
HA Solution Application Scenarios
Software failures
scenarios for HA
Application software
breakdown caused by
misuse
Operating system errors
Hardware failures
scenarios for HA
LAN failure
Server failure
DR Solution Application Scenarios
Software failures scenarios for
DR
Application system breakdown
caused by misuse
Operating system errors
Hardware failures scenarios for
DR
LAN failure
Server failure
Storage device damages
Environment disaster for DR
Fire disaster
Earthquake disaster
Flood disaster
B&R Solution Application Scenarios
NetNumen U31 System
Collapse Scenario
The whole system collapse
scenario is generally unusual.
The system collapse is classified
into three cases: OS collapse,
Oracle system collapse, U31
system collapse, and database
collapse.
OS Collapse Scenario
The OS collapse or some OS
file lost.
NetNumen U31 application
Collapse Scenario
The NetNumen U31 system
collapse or some
U31 system file lost.
DB Collapse Scenario
DB collapse or some DB file
lost.
Content

Chapter 1 HA/DR/B&R Overview


Chapter 2 HA Architecture and function
Description
Chapter 3 DR Architecture and function
Description
Chapter 4 B&R Architecture and function
Description
HA System Architecture

ZTE HA Solution Networking includes dual server, one is


active server, and the other one is standby server. These
two servers adopt dual fiber to connect same Disc Array,
and adopt dual FE port to connect dual 1000M Switch as
redundant transform network.
HA System Architecture

The active server and standby server are installed


Solaris system and Symantec local VCS software.
All Oracle data and application data are saved in
Disc Array.
Heartbeat line for HA system

Dual heartbeat lines between active server and standby


server
Dual heartbeat lines use FE
These dual heartbeat lines provide detect function, include the
operating system software and hardware, network communications
and applications running, etc.
HA Hardware & Software Configuration
HA Hardware Configuration
No. Item Model Quantity

1 EMS Server SUN T5220/M4000 2

2 Disk Array EMC CX4-120 1

3 Switch Cisco 2960 (1000M) 2

HA Software Configuration
No. Item Model Quantity

1 Symantec VCS Symantec local VCS 2


Network redundancy
Active Server Standby Server

NIC NIC

HBA HBA

When network port failure, it will automatically switch to


another device on standby to return to work
When HBA port failure, it will automatically switch to
another device on standby to return to work
Networking device switch just need a few seconds, so it not
influent active server application.
Server redundancy
Fail Over
Active Server Standby Server

When active server failure, it will automatically switch to the standby


server to continually provide application service.
Server fail over need to switch all application, including Solaris system,
Oracle DB application, and ZTE U31 application, so it usually take 15
to 20 min for whole procedure.
HA switch mode and key indicator
ZTE HA Solution supports manual switch mode and auto switch mode.
Manual switch mode: User can manually switch active server application to
standby server, for example testing HA function.
Auto switch mod: When active server failure, it will automatically switch to
standby server to continually provide application service.
Switch time
Network switch time: a few seconds
Server switch time: 15 to 20 minutes
HA Solution Advantages and Disadvantages
Advantages of ZTE HA solution
Improve system availability and increase the system sustainability,
eliminate single point of failure
Implement application-level automatically take over
Ensure critical applications away from impact of failure

Disadvantages of ZTE HA solution


Single Disc Array, cannot provide protection for U31 data.
Active server and standby server are in the same LAN, cannot
resist disaster.
Not support OS backup, all system and software must be installed
manually.
Not support system, application data, or Oracle database on-line
auto backup
Not supports to restore application data or Oracle database from
early recovery point
Content

Chapter 1 HA/DR/B&R Overview


Chapter 2 HA Architecture and function
Description
Chapter 3 DR Architecture and function
Description
Chapter 4 B&R Architecture and function
Description
DR System Architecture

ZTE DR solution adapts dual OMC system for 1+1 backup, including
active server, active disc array, standby server and standby disc array.
Active server use dual fiber to connect with active disc array, and
standby server also use dual fiber to connect with standby disc array
DR System Architecture

Transmission network also adapts dual lines, when


network cards or network is failure, it will automatically
switch to another device on standby to return to work.
DR System Data Replication

All the relevant data saved in active disc array, such as Oracle database instance,
application data and other relevant data are replication to site B system in real time
through the VVR. VVR is Symantec local VCS module, it can implement Volume Copy
and snapshot.
Symantec local VCS can real-time monitor site A and site B system, including two
servers, the database services, application software, and VVR implement replication
services.
Heartbeat line for DR system

Dual heartbeat lines, and transition network can use IP or Fiber.


These dual heartbeat lines provide detect function, include the
operating system software and hardware, network communications
and applications running, etc.
DR Hardware & Software Configuration
HA Hardware Configuration
No. Item Model Quantity

1 EMS Server SUN T5220/M4000 2

2 Disk Array EMC CX4-120 2

3 Switch Cisco 2960 (1000M) 2

4 Router Cisco 2811 2

HA Software Configuration
No. Item Model Quantity

1 Symantec VCS Symantec local VCS 2


Network redundancy
Site A Site B
Active Server Standby Server

NIC NIC

HBA HBA

IP/Fiber

When network port failure, it will automatically switch to another device on standby to return to work
When HBA port failure, it will automatically switch to another device on standby to return to work
Networking device switch just need a few seconds, so it not influent active server application.
Server redundancy
Site A Site B
Fail Over
Active Server Standby Server

IP/Fiber

When some disaster take place, active system in site A is failure, it will
automatically switch to the standby system in site B to continually provide
application service.
Because it need to switch all application, including Solaris system, Oracle DB
application, and ZTE EMS application, so it usually take 20 to 30 min for switch.
Data Reverse Replication
Site A Site B
Fail Over
Active Server Data reverse replication
Standby Server
Cumulative Incremental

IP/Fiber

After disaster in site A recovery, the whole system data needs to


rollback. Data reverse replication copies only the changed data during
disaster, unless all of the data is lost, it not need to reverse replication
all of the data.
DR Switch Mode
Site A Switch Site B
Switch
Active Server Standby Server
Manual switch mode

Update

IP/Fiber

ZTE DR Solution supports manual switch mode and auto switch mode
Manual switch mode: User can manually switch active system application in site A to
standby system in site B, for example for updating active system or testing Disaster
Recovery function.
Auto switch mode: When some disaster take place in site A and active system
failure, it will automatically switch to standby system in site B to continually provide
application service.
DR Key Indicator
Switch time
Network switch time: a few seconds
Server switch time: 20 to 30 minutes
Disaster recovery time
Disaster recovery time mainly concern with system
installs time and Data reverse replication time.
System install including Solaris system install, ZTE U31
software install, and Oracle database software install, it
usually need 2.5 hours (without adopt backup and
restoration solution).
Data reverse replication time mainly concern with
network bandwidth between site A and site B, and the
data size.
DR Key Indicator
Assume data reverse replication must finish in 3 hours, the network bandwidth
and data size are described as following table.
The network bandwidth can be customized according to required recovery
time, the more network bandwidth, the fewer recovery time is needed.
UMTS indicator 1000Cell 5000Cell 10000Cell 15000Cell
Data size 362GB 1370GB 2631GB 3892GB

Recovery time 3 hours 3 hours 3 hours 3 hours


34Mb/s 127Mb/s 244Mb/s 360Mb/s
Network bandwidth
or 5MB/s or 16MB/s or 30MB/s or 45MB/s

GSM indicator 2500TRX 12500TRX 25000TRX 37500TRX


Data size 194GB 728GB 1395GB 2063GB
Recovery time 3 hours 3 hours 3 hours 3 hours
18Mb/s 68Mb/s 130Mb/s 191Mb/s
Network bandwidth
or 3MB/s or 9MB/s or 17MB/s or 24MB/s
DR Advantages and Disadvantages
Advantages of ZTE DR solution
ZTE DR solution provides disaster recovery ability,
improves system availability and increases the system
sustainability, eliminates single point of failure
Implement application-level automatically take over
Ensure critical applications away from impact of failure
and disaster
Disadvantages of ZTE DR solution
Not support OS backup, all system and software must
be installed manually.
Not supports to restore application data or Oracle
database from early recovery point
Content

Chapter 1 HA/DR/B&R Overview


Chapter 2 HA Architecture and function
Description
Chapter 3 DR Architecture and function
Description
Chapter 4 B&R Architecture and function
Description
B&R Architecture and function

Section 1 B&R System Architecture


Section 2 Backup Mode and Policy
Section 3 Restoration Policy
B&R System Architecture
LAN Switch
OMC Server Backup Server

1000M

Fiber Fiber

Disk Array Tape Library

U31 B&R system includes U31 Server, backup server, disk


array, and tape library. And network use dual 1000M switch
LAN.
U31 Server runs OS, U31 system, and Oracle software, and
U31 server usually adopted SUN Solaris Server.
B&R System Architecture
LAN Switch
OMC Server Backup Server

1000M

Fiber Fiber

Disk Array Tape Library

Backup Server runs Symantec Netbackup software, and Backup Server also
usually adopted SUN Solaris Server.
Disk Array store U31 and Oracle data, and adopted dual fiber to connect U31
Server.
Tape library store all backup data, and also adopted dual fiber to connect
backup Server.
B&R Architecture and function

Section 1 B&R System Architecture


Section 2 Backup Mode and Policy
Section 3 Restoration Policy
Backup Mode and Policy
Data backed up via Symantec Netbackup software in any one of the
following three modes:
Full Backup
In a full backup, all of the data storage on a target system is backed up.
Cumulative Incremental Backup
A Cumulative Incremental Backup will back up the files that have been
modified or created since the most recent lower level backup (level n-1 or
lower).
Differential Backup
A special type of Cumulative Incremental Backup copies the files that have
been modified or created since a previous Full Backup.
Backup Mode and Policy
ZTE U31 System B&R Solution provides following backup policy:
Manual immediate on-line Backup policy
After abundant data is mended, operator can trigger immediate Backup
policy manually, to backup data.
Auto scheduled on-line auto Backup policy
Operator can set scheduled Backup policy automatically.
ZTE B&R Solution support to on-line auto backup following kinds of
data:
OS and application data
OMC system and application data
Oracle Database
OS and Application Data Backup Policy
OS and application data are refer to SUN Solaris system and
application data.
OS and application data backup policy recommended by ZTE
Retention
Task Backup Mode Time interval Backup Size
period
Backup OS and Immediately after
Full backup 1 month 10GB
APP data installation
Backup OS and
Full backup Weekly 1 month 10GB per week
APP data
Total data tracked (4*10+10) GB =50GB

OS and application data backup time schedule recommended by ZTE

SUN MON TUE WED THU FRI SAT


0:00~6:00 0:00~6:00 0:00~6:00 0:00~6:00 0:00~6:00 0:00~6:00 0:00~6:00
Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Task Incremental Incremental Incremental Incremental Incremental Incremental Full backup
Backup Backup Backup Backup Backup Backup
U31 System and Application Data Backup
Policy
U31 system and application data backup policy recommended by ZTE

Retention
Task Backup Mode Time Interval Backup Size
period
Immediately after
Backup U31 system Full backup 1 month 15G
installation
Backup U31 system Full backup Weekly 1 month 15G
Backup APP data Full backup Weekly 1 month 100GB per week
Cumulative Incremental
Backup APP data Daily 1 week 10GB per day
Backup
15+4*15+4*100+6*10=5
Total data tracked
35GB

OMC system and application data backup time schedule recommended by


ZTE
SUN MON TUE WED THU FRI SAT
0:00~6:00 0:00~6:00 0:00~6:00 0:00~6:00 0:00~6:00 0:00~6:00 0:00~6:00
Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Task Incremental Incremental Incremental Incremental Incremental Incremental Full backup
Backup Backup Backup Backup Backup Backup
Database on-line Backup Policy
Database backup policy by Symantec NBU
software, Oracle RMAN module, and achieve log.
Database on-line backup policy recommended by
ZTE
Backup Time Retention Backup Size per Backup Size per
Task
Mode interval period week (UMTS) week (GSM)

5000Cell: 773GB 5000Cell: 388GB


Database online Full week data
Weekly 1 Month 10000Cell: 1437GB 10000Cell: 715GB
Backup backup
15000Cell: 2100GB 15000Cell: 1042GB

Backup Cumulative 5000Cell: 404GB 5000Cell: 148GB


Performance raw Incremental Daily 1 Week 10000Cell: 699GB 10000Cell: 235GB
data online Backup 15000Cell: 994GB 15000Cell: 322GB

5000Cell: 5516GB 5000Cell: 2440GB


Total data tracked 10000Cell: 9942GB 10000Cell: 4270GB
15000Cell: 14364GB 15000Cell: 6100GB
Database on-line Backup Policy
According to the size of Database, ZTE
recommend backup data during 0:00 to 6:00,
doing Cumulative Incremental Backup form
Monday to Saturday, and doing full week data
backup on Sunday.
Database on-line backup time schedule
recommended by ZTE
SUN MON TUE WED THU FRI SAT
0:00~6:00 0:00~6:00 0:00~6:00 0:00~6:00 0:00~6:00 0:00~6:00 0:00~6:00

Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative


Full week
Task Incremental Incremental Incremental Incremental Incremental Incremental
data backup
Backup Backup Backup Backup Backup Backup
Total Size of Backup Tape
The following table describes the total of backup size for
backup OS and application data, U31 system and
application data, and database.
The size of data is not mandatory, it is mainly according to
data backup policy, and operator can adjust it.

Task Backup Size (UMTS) Backup Size (GSM)

OS and application data (A) 50G 50G

U31 system and application data (B) 535GB 535GB


5000Cell: 5516GB 5000Cell: 2440GB
Database hot backup (C) 10000Cell: 9942GB 10000Cell: 4270GB
15000Cell: 14364GB 15000Cell: 6100GB
5000Cell: 6101GB 5000Cell: 3025GB
Total data tracked
10000Cell: 10527GB 10000Cell: 4855GB
(A+B+C)
15000Cell: 14949GB 15000Cell: 6685GB
Transfer rate of backup data
Data transfer rate for backup is according to network
bandwidth and tape library I/O. LTO4 mode for tape library
I/O is 300GB/H, in dual tape library I/O diver, the rate is up
to 600 GB/H. And in 1000M LAN, the network bandwidth
is1000M, transfer rate is 315GB/H. So the typical data
transfer rate is 315GB/H of backup and restoration system.
B&R Architecture and function

Section 1 B&R System Architecture


Section 2 Backup Mode and Policy
Section 3 Restoration Policy
Whole System Restoration Policy
When the U31 server and disk array are broken down, the
whole system need to restore
Support to restore OS and application data
Restore U31 software and application data
Restore Oracle software and application data
Restore Oracle database
Because the performance raw data and performance
hourly data is abundant, under 315GB/H data transfer rate,
these abundant data need very long time to restore. So
ZTE recommend prior restore OS and application data,
U31 software and application data, Oracle software and
application data and main database excluding performance
raw data and performance hourly data.
Total data size of restoration
Backup Data Size Backup Data Size
Task
(UMTS) (GSM)
5000Cell: 1370GB 5000Cell: 726GB
Total data tracked 10000Cell: 2631GB 10000Cell: 1391GB
15000Cell: 3892GB 15000Cell: 2056GB

For example, in the case UMTS maximal 5000


Cell, there is 1370GB data need to be restored,
and data transfer rate is 315 GB/H, so the restore
time can calculate as:
RTO=1370GB/315GB=4.35H
And considering restoration OS, U31 software, and
Oracle software, it needs 6 hours to restore the
whole system.
OS and Application Restoration Policy
System restoration includes full system restoration and
partial data restoration.
Data size of OS and application for restoration
Task Backup Data Size (UMTS) Backup Data Size (GSM)
OS and application data backup
10GB 10GB
policy

When some data of OS is lost, it can use partial data


restoration policy.
When the whole OS is broken down, it can use Symantec
Netbackup software to restore full system. Whole OS data
size is nearly 10GB, and restoration time is about 15~30
minutes.
U31 System and Application Data Backup
Policy
Application file restoration includes full Application file
restoration and partial Application file restoration.
Partial Application files restoration

When some of application files are lost, user can use Symantec
Netbackup software to restore the lost Application file.
Full Application files restoration

When full application files are lost, user can use Symantec Netbackup
software to restore the full application files.
Data size of U31 system and application

Task Backup Data Size (UMTS) Backup Data Size (GSM)

U31 system and application data


115GB 115GB
backup policy

Whole OMC system and application data size is nearly


115GB, and restoration time is about 15~30 minutes.
Oracle Database Restoration
Oracle Database restoration includes full database restoration and
partial database restoration.
Partial database restoration

When the some files of database are lost, user can use Symantec Netbackup
software to restore the lost files.
Full database restoration

When the database collapses, user can use Symantec Netbackup software to
recover the full database.

Task Backup Data Size (UMTS) Backup Data Size (GSM)


5000Cell: 773GB 5000Cell: 388GB
Database hot backup policy 10000Cell: 1437GB 10000Cell: 715GB
15000Cell: 2100GB 15000Cell: 1042GB

For example, in the case UMTS 10000 Cell, there is 1437GB data need
to be restored, and data transfer rate is 315 GB/H, so the restore time
can calculate as:
RTO=1437/315=4.56H
B&R Solution Advantages and Disadvantages
Advantages of ZTE B&R solution:
It can support OS backup, all system and software can
be fast restoration.
It can support system, application data, or Oracle
database on-line auto backup.
It can support to restore application data or Oracle
database from early recovery point
Disadvantages of ZTE B&R solution:
Not support HA function.
Not support DR function.

Вам также может понравиться