Академический Документы
Профессиональный Документы
Культура Документы
WAN / Internet
Compute
Deployment/Recovery
Manual, Complex, Error Prone
Web
D
B
Web
D
B
Web
D
B
Web
D
B
Deployment/Recovery
Automated and Reliable
App
App
App
Storag
e
Deployment/Recovery
Automated and Reliable
App
Compute
Storag
e
Network
Network
WAN / Internet
Protected
Recovery
WAN / Internet
WAN / Internet
Network Fabric
Network Fabric
V
M
V
M
V
M
10.1.1.0/24
V
M
V
M
V
M
10.1.2.0/24
V
M
V
M
V
M
10.1.3.0/24
10.1.1.0/24
V
M
V
M
V
M
Data Center
Interconnect (DCI)
10.1.2.0/24
V
M
V
M
V
M
10.1.3.0/24
V
M
V
M
V
M
WAN / Internet
Network
Networking
and Security
Policies
Network Admin
Recovery Plan
Scripts/APIs/Tools
Recovery Management ?
Compute
and Storage
Recovery
Policies
Recovery Plan
App
Scripts/APIs/Tools
Web
D
B
App
Storag
e
Compute/Virtualization
Admin
D
B
Compute
Web
Huh!
vCenter
vCenter
vSphere
Cluster
Site A
(Active)
Stretched Storage
SRM
Site B
(Active)
Site A
(Active)
vCenter
SRM
vSphere
Cluster
Replicated Storage
Site B
(Passive)
Planned Maintenance
Planned maintenance of
one site without any service
downtime
Ability to migrate
Disaster Avoidance
Prevent service outages
before an impending
disaster (e.g. hurricane,
rising flood levels)
Automated Recovery
Automated initiation of VM
restart or recovery
10
Stretched Volumes
Stretched Volumes Across Sites
Storage
Controllers
Storage
Controllers
brain
Examples: EMC VPLEX, IBM SVC, NetApp
MetroCluster, etc.
Stretched Network
Backend Arrays
Site 1
Backend Arrays
Site 2
11
N-S Connectivity
N-S Connectivity
DB
Web
App
Web
Web
App
DB
App
DB
<10ms
Stretched Storage
12
Unplanned Failover
Recover from unexpected
site failure (full or partial)
Workflow driven
High degree of confidence
if regular test failovers
have been performed
Preventative Failover
Anticipate potential
datacenter outages
Initiate preventative
failover for smooth
migration of services
Graceful shutdown of
services to be migrated,
zero data loss
Planned Migration
Planned datacenter
maintenance
13
SRM
SRM
vCenter
vCenter
Site A
Site B
14
Recovery Site
Protected Site
SRM Pair
VC+SRM
VC+SRM
Replicated/Sync
DB
App
Web
DB
Protected
App
Web
Recovery
Active-Standby
Application Pair
Replicated
15
Primary Site
Recovery Site
Snapshot VM
4
Change IP Address
Reconfig Security
10.0.20.21
10.0.10.21
3 Recover
the VM
SAN
Major
RTO
Impact
SAN
Step 1&2
(e.g VMware SRM)
10.0.10/24
10.0.20/24
Physical Network Infrastructure
Replicate
VM & Storage
16
Primary Site
Recovery Site
Virtual Network
10.0.30/24
Virtual Network
10.0.30/24
10.0.30.21
2b
Snapshot VM
Snapshot
Network &
Security
NSX Controller
NSX Controller
SAN
3
Recover
the VM
10.0.30.21
80%
RTO
SAN
Step 1&2
(e.g VMware SRM)
10.0.10/24
2a
10.0.20/24
Physical Network Infrastructure
Replicate
VM & Storage
17
18
19
vCenter
vSphere
Cluster
Site A
(Active)
vCenter
vSphere
Cluster
Site B
(Active)
SRM layout
vCenter
SRM
vSphere Cluster
Site A
(Active)
Replicated Storage
vCenter
SRM
vSphere Cluster
Site B
(Active)
20
Dedicated
Sites for Prod
& DR
Production
Bi-directional
Failover
Active-Active
data centers
Production
Production
Site 1
Recovery
Legacy DR scenario
Expensive dedicated
resources
Recovery
Site 2
Production
Leverage recovery
infrastructure for test,
development, training
Production
applications at both
sites
Multi-site topologies with three or more sites are not shown here
21
Operational Watchdogs
Availability specific alarms, alerts and events
Configuration validation on the fly
No Orchestration or Testability
Stretched Clusters lack a repeatable, testable procedure to handle unplanned failures
HA will restart VMs based on VM restart order but doesnt give you granular control of VM
dependencies or customization
22
23
Description
Live Migration (vMotion) of workloads across sites and
vCenter instances for planned failover
Full orchestration of VM movement (vCenter and solution
configuration, storage, and live state).
Combined with DR orchestration to enable recovery of failed
VMs in the event of site failure
Benefits
Reuse same orchestrated Recovery Plan for unplanned
failures and Continuous Availability
Integration with Stretched Storage enables very low RTO for
unplanned failures and easier load balancing across sites
Non-disruptive test for unplanned failures
24
SRM
vCenter
SRM
Stretched Networks
Site 1 vSphere cluster
ESXi
ESXi
Volume A at Site 1
(Full R/W access)
ESXi
ESXi
Stretched Storage
ESXi
ESXi
Volume A at Site 2
(Full R/W access)
25
SRM
ESXi
Volume A at Site 1
(Full R/W access)
vCenter
Stretched Networks
ESXi
HA handles local
host failures
Stretched Storage
SRM
ESXi
ESXi
Volume A at Site 2
(Full R/W access)
26
vCenter
Execute
SRM vMotion
Plannedas per
SRM
invokes
Migration
VM
prioritywith
and vMotion
dependencies
BEFORE
disaster
in Recovery
Plan
SRM
vCenter
SRM
Stretched Networks
Site 1 vSphere cluster
ESXi
ESXi
ESXi
Volume A at Site 1
(Full R/W access)
ESXi
Stretched Storage
ESXi
ESXi
Volume A at Site 2
(Full R/W access)
27
SRM
ESXi
Can be automatically
Stretched
Networks
triggered
by an
external
system
ESXi
Volume A at Site 1
(Full R/W access)
Stretched Storage
SRM
ESXi
ESXi
Volume A at Site 2
(Full R/W access)
28
29
30
Key Takeaways
Roadmap
You don't have to trade-off Testability and Repeatability when you choose
Active-Active model
SRM + Live Migration is a game changer in IT operations
31
Backup Slides
Deep Dive on
SRM 6.1 & NSX 6.2
Storage
36
Site B
vCenter
Server
Site B NSX
Edge GW
Site A NSX
Edge GW
Uplink Net A
Uplink Net B
Universal Distributed Logical
Router
Control VM
w/ Local Egress
VM1
vCenter
Server
VM2
Control VM
w/ Local Egress
VM3
Universal Logical
Switch A
Locale ID:
NSX-A
Locale ID:
NSX-B
NSX 6.2 introduces the concept of Locale ID for routes sent to the NSX Controller. This value is set to the NSX Manager UUID by default
If Local Egress is not enabled on the UDLR, the Locale ID value is ignored
When Local Egress is enabled, the NSX Controller will only send routes to ESXi hosts with a matching Locale ID
Using a site specific uplink, each site can have a local routing configuration. This allows NSX 6.2 to support up to 8 sites with local egress
Locale ID can also be set on a per UDLR, per Cluster or per Host level if the same NSX Manager is used across multiple sites
37
Protected
vCenter
vC+SRM Pair
SR
M
vCenter
SR
M
Internal API
Master NSX Manager
Local Egress
Site A
ESG
Universal Controller
Cluster
Recovered on failure
Site B
ESG
Universal DFW
Non-SRM
Protected
Primary VMs
SG-Prod-01
Universal Logical
Switch
Placeholder
VMs
(SRM Protected)
Non-SRM
Protected
SG-Prod-01
Active-Active Pair
Replication
Active-Active Pair
CONFIDENTIAL
38
Active N-S
Site Local
Router
Site Local
Router
Protected
SR
M
SR
M
Recovery
NSX ESG
with ECMP
(Advertising 10.1.1.0 reachability)
U-DLR LIF
U-DLR LIF
Universal DLR
Universal
Control VM
10.1.1.0/24
U-DFW
Web
Universal
Control VM
Universal
Logical Switch
(Logical Switch for
Protected Workloads)
U-DFW
App
Application Tier
DB
Web
App
DB
39
Active N-S
Site Local
Router
Site Local
Router
Protected
SR
M
SR
M
Recovery
NSX ESG
with ECMP
(Advertising 10.1.1.0 reachability)
U-DLR LIF
U-DLR LIF
Universal DLR
Universal
Control VM
10.1.1.0/24
U-DFW
Web
Universal
Control VM
Universal
Logical Switch
(Logical Switch for
Protected Workloads)
U-DFW
App
Application Tier
DB
Web
App
DB
40
Active N-S
Site Local
Router
Site Local
Router
Protected
SR
M
SR
M
Recovery
NSX ESG
with ECMP
(Advertising 10.1.1.0 reachability)
U-DLR LIF
U-DLR LIF
Universal DLR
Universal
Control VM
10.1.1.0/24
U-DFW
Web
Universal
Control VM
Universal
Logical Switch
(Logical Switch for
Protected Workloads)
U-DFW
App
Application Tier
DB
Web
App
DB
41
Client
Physical Network
Datacenter A
Datacenter B
Active Route
VLAN
VLAN
10.114.212.198 / 28
10.114.208.86 / 28
ESG
OSPF
ESG
10.114.220.25
OSPF
10.114.220.25
Transit Switch
Transit Switch
10.114.220.26
10.114.220.26
DB Switch
App Switch
Web Switch
Protected
VMs
10.114.220.10 /29
10.114.220.18 /29
Protected Site
Logical Switches
Distributed Logical
Routers
DFW Rules
Placeholder
VMs
10.114.220.2 /29
10.114.220.10 /29
10.114.220.18 /29
Recovery Site
VMFS
42