Вы находитесь на странице: 1из 138

EMC VPLEX Metro Witness

Technology and High Availability

Version 2.0

• EMC VPLEX Witness


• VPLEX Metro High Availability
• Metro HA Deployment Scenarios

Jennifer Aspesi
Oliver Shorey
Copyright © 2010, 2011 EMC Corporation. All rights reserved.
EMC believes the information in this publication is accurate as of its publication date. The information is
subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO
REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS
PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR
FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable
software license.

For the most up-to-date regulatory document for your product line, go to the Technical Documentation and
Advisories section on EMC Powerlink.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.

All other trademarks used herein are the property of their respective owners.

Part number H7113.2

2 EMC VPLEX Metro Witness Technology and High Availability


Contents

Preface

Chapter 1 VPLEX Family and Use Case Overview


Introduction ....................................................................................... 16
VPLEX value overview .................................................................... 17
VPLEX product offerings ................................................................ 21
VPLEX Local, VPLEX Metro, VPLEX Geo ..............................21
Architecture highlights ..............................................................23
Metro High Availability design considerations............................ 27
Planned application mobility compared with disaster
restart ...........................................................................................28

Chapter 2 Hardware and Software


Introduction ....................................................................................... 32
VPLEX I/O ..................................................................................32
High-level VPLEX I/O discussion ...........................................32
Distributed coherent cache........................................................33
VPLEX family clustering architecture ....................................33
VPLEX single, dual, quad..........................................................35
VPLEX sizing tool.......................................................................36
Upgrade paths.............................................................................36
Hardware upgrades ...................................................................36
Software upgrades......................................................................37
VPLEX management interfaces ...................................................... 38
Web-based GUI ...........................................................................38
VPLEX CLI...................................................................................38
SNMP support for performance statistics...............................39
LDAP /AD support ...................................................................39

EMC VPLEX Metro Witness Technology and High Availability 3


Contents

VPLEX Element Manager API.................................................. 39


Simplified storage management..................................................... 41
Management server user accounts................................................. 42
Management server software.......................................................... 43
Management console ................................................................. 43
Command line interface ............................................................ 45
System reporting......................................................................... 46
Director software .............................................................................. 47
Configuration overview................................................................... 48
Small configurations .................................................................. 48
Medium configurations ............................................................. 49
Large configurations .................................................................. 50
I/O implementation ......................................................................... 52
Cache coherence ......................................................................... 52
Meta-directory ............................................................................ 52
How a read is handled............................................................... 52
How a write is handled ............................................................. 53

Chapter 3 System and Component Integrity


Overview............................................................................................ 56
Cluster ................................................................................................ 57
Path redundancy through different ports ..................................... 58
Path redundancy through different directors............................... 59
Path redundancy through different engines................................. 60
Path redundancy through site distribution .................................. 61
Safety check ....................................................................................... 62

Chapter 4 Foundations of VPLEX High Availability


Foundations of VPLEX High Availability .................................... 64
Failure handling without VPLEX Witness (Static bias)............... 72

Chapter 5 Introduction to VPLEX Witness


VPLEX Witness overview and architecture .................................. 84
VPLEX Witness target solution, rules, and best practice ............ 87
VPLEX Witness failure semantics................................................... 89
CLI example outputs........................................................................ 95
VPLEX Witness – The importance of the third failure
domain ....................................................................................... 102

4 EMC VPLEX Metro Witness Technology and High Availability


Contents

Chapter 6 Combining VPLEX High Availability and VPLEX Witness


Metro HA overview ........................................................................ 106
VPLEX Metro HA with Cross-Cluster Connect.......................... 107
VPLEX Metro HA without Cross-Cluster Connect.................... 116

Chapter 7 Conclusion
Conclusion........................................................................................ 124
Better protection from storage-related failures ....................125
Protection from a larger array of possible failures...............125
Greater overall resource utilization........................................126

Glossary

EMC VPLEX Metro Witness Technology and High Availability 5


Contents

6 EMC VPLEX Metro Witness Technology and High Availability


Figures

Title Page
1 Application and data mobility example ..................................................... 18
2 HA infrastructure example ........................................................................... 19
3 Distributed data collaboration example ..................................................... 20
4 VPLEX offerings ............................................................................................. 22
5 Architecture highlights.................................................................................. 24
6 VPLEX cluster example ................................................................................. 34
7 VPLEX Management Console ...................................................................... 44
8 Management Console welcome screen ....................................................... 45
9 VPLEX small configuration .......................................................................... 49
10 VPLEX medium configuration ..................................................................... 50
11 VPLEX large configuration ........................................................................... 51
12 Port redundancy............................................................................................. 58
13 Director redundancy...................................................................................... 59
14 Engine redundancy ........................................................................................ 60
15 Site redundancy.............................................................................................. 61
16 High level functional sites in communicaton............................................. 64
17 High level Site A failure ................................................................................ 65
18 High level Inter-site link failure ................................................................... 65
19 VPLEX active and functional between two sites ....................................... 66
20 VPLEX concept diagram with failure at Site A.......................................... 67
21 Correct resolution after volume failure at Site A....................................... 68
22 VPLEX active and functional between two sites ....................................... 69
23 Inter-site link failure and cluster partition ................................................. 70
24 Correct handling of cluster partition........................................................... 71
25 VPLEX static detach rule............................................................................... 73
26 Typical detach rule setup .............................................................................. 74
27 Non-preferred site failure ............................................................................. 75
28 Volume remains active at Cluster 1............................................................. 76
29 Typical detach rule setup before link failure ............................................. 77
30 Inter-site link failure and cluster partition ................................................. 78

EMC VPLEX Metro Witness Technology and High Availability 7


Figures

31 Suspension after inter-site link failure and cluster partition ................... 79


32 Cluster 2 has bias............................................................................................ 80
33 Preferred site failure causes full Data Unavailability ............................... 81
34 High Level VPLEX Witness architecture.................................................... 85
35 High Level VPLEX Witness deployment .................................................. 86
36 Supported VPLEX versions for VPLEX Witness ....................................... 88
37 VPLEX Witness volume types and rule support....................................... 88
38 Typical VPLEX Witness configuration ....................................................... 89
39 VPLEX Witness and an inter-cluster link failure....................................... 90
40 VPLEX Witness and static bias after cluster partition .............................. 91
41 VPLEX Witness typical configuration for Cluster 2 detaches ................. 92
42 VPLEX Witness diagram showing Cluster 2 failure ................................. 93
43 VPLEX Witness with static bias override ................................................... 94
44 Possible dual failure cluster isolation scenarios ...................................... 101
45 Highly unlikely dual failure scenarios that require manual
intervention..................................................................................................... 102
46 Two further dual failure scenarios that would require manual
intervention..................................................................................................... 103
47 High-level diagram of a Metro HA Cross-Cluster Connect solution
for VMware ..................................................................................................... 107
48 Metro HA Cross-Cluster Connect diagram with failure domains ....... 109
49 Metro HA Cross-Cluster Connect diagram with disaster in zone A1.. 110
50 Metro HA Cross-Cluster Connect diagram with failure in zone A2.... 111
51 Metro HA Cross-Cluster Connect diagram with failure in zone A3
or B3 ................................................................................................................. 112
52 Metro HA Cross-Cluster Connect diagram with failure in zone C1 .... 113
53 Metro HA Cross-Cluster Connect diagram with intersite link
failure .............................................................................................................. 115
54 Metro HA Standard High-level diagram ................................................. 116
55 Metro HA high-level diagram with fault domains ................................. 117
56 Metro HA high-level diagram with failure in domain A2..................... 119
57 Metro HA high-level diagram with intersite failure.............................. 121

8 EMC VPLEX Metro Witness Technology and High Availability


Tables

Title Page
1 Overview of VPLEX features and benefits .................................................. 25
2 Configurations at a glance ............................................................................. 35
3 Management server user accounts ............................................................... 42
4 Output from ls for brief VPLEX Witness status.......................................... 97
5 Output from ll command for brief VPLEX Witness component
status ..................................................................................................................98

EMC VPLEX Metro Witness Technology and High Availability 9


Tables

10 EMC VPLEX Metro Witness Technology and High Availability


Preface

This EMC Engineering TechBook describes and provides an insightful


discussion on how implementation of VPLEX will lead to a higher level of
availability.
As part of an effort to improve and enhance the performance and capabilities
of its product lines, EMC periodically releases revisions of its hardware and
software. Therefore, some functions described in this document may not be
supported by all versions of the software or hardware currently in use. For
the most up-to-date information on product features, refer to your product
release notes. If a product does not function properly or does not function as
described in this document, please contact your EMC representative.

Audience This document is part of the EMC VPLEX family documentation set,
and is intended for use by storage and system administrators.
Readers of this document are expected to be familiar with the
following topics:
◆ Storage Area Networks
◆ Storage Virtualization Technologies
◆ EMC Symmetrix and CLARiiON Products

Related Related documents include:


documentation
◆ EMC VPLEX Architecture Guide
◆ EMC VPLEX Installation and Setup Guide
◆ EMC VPLEX Site Preparation Guide
◆ Implementation and Planning Best Practices for EMC VPLEX
Technical Notes

EMC VPLEX Metro Witness Technology and High Availability 11


Preface

◆ Using VMware Virtualization Platforms with EMC VPLEX - Best


Practices Planning
◆ VMware KB: Using VPLEX Metro with VMware HA
This document is divided into the following chapters:
◆ Chapter 1, “VPLEX Family and Use Case Overview,”
summarizes the VPLEX family. It also covers some of the key
features of the VPLEX family system, architecture and use cases.
◆ Chapter 2, “Hardware and Software,” summarizes hardware,
software, and network components of the VPLEX system. It also
highlights the software interfaces that can be used by an
administrator to manage all aspects of a VPLEX system.
◆ Chapter 3, “System and Component Integrity,” summarizes how
VPLEX clusters are able to handle hardware failures in any
subsystem within the storage cluster.
◆ Chapter 4, “Foundations of VPLEX High Availability,”
summarizes the concepts of the industry-wide dilemma of
building absolute HA environments and how VPLEX Metro
functionality manually accepts the historical challenge.
◆ Chapter 5, “Introduction to VPLEX Witness,” explains how
VPLEX functionality can provide the absolute HA capability, by
introducing a “Witness” to the inter-cluster environment.
◆ Chapter 6, “Combining VPLEX High Availability and VPLEX
Witness,” provides a tactical approach to identifying how
application features by example, using VMware combined with
VPLEX and a Witness, create “absolute” HA integrity.
◆ Chapter 7, “Conclusion,” provides a summary of benefits using
VPLEX technology as related to VPLEX Witness and High
Availability.

Authors This TechBook was authored by the following individuals from the
Enterprise Storage Division, VPLEX Business Unit based at EMC
Headquarters, Hopkinton, MA.
Jennifer Aspesi has over 10 years of work experience with EMC in
Storage Area Networks (SAN), Wide Area Networks (WAN), and
Network and Storage Security technologies. Jen currently manages
the Corporate Systems Engineer team for the VPLEX Business Unit.
She earned her M.S. in Marketing and Technological Innovation from
Worcester Polytech Institute, Massachusetts.

12 EMC VPLEX Metro Witness Technology and High Availability


Preface

Oliver Shorey has over 11 years of working within the Business


Continuity arena, seven of which have been with EMC engineering,
designing and documenting high-end replication and
geographically-dispersed clustering technologies. He is currently a
Principal Corporate Systems Engineer in the VPLEX Business Unit.

Additional Additional contributors to this book include:


contributors
Colin Durocher has 8 years of experience in developing software for
the EMC VPLEX product as its predecessor and current state, testing
it, and helping customers implement it. He is currently working on
the product management team for the VPLEX business unit. He has a
B.S. in Computer Engineering from the University of Alberta and is
currently pursuing an MBA from the John Molson School of Business.
Gene Ortenberg has more than 15 years of experience in building
fault-tolerant distributed systems and applications. For the past 8
years he has been designing and developing highly-available storage
virtualization solutions at EMC. He currently holds a position of a
Software Architect for the VPLEX Business Unit under the EMC
Enterprise Storage Division.
Fernanda Torres has over 10 years of Marketing experience in the
Consumer Products industry, most recently in consumer electronics.
Fernanda is the Product Marketing Manager for VPLEX under the
EMC Enterprise Storage Division. She has undergraduate degree
from the University of Notre Dame and a bilingual degree
(English/Spanish) from IESE in Barcelona, Spain.

Typographical EMC uses the following type style conventions in this document:
conventions
Normal Used in running (nonprocedural) text for:
• Names of interface elements (such as names of windows, dialog
boxes, buttons, fields, and menus)
• Names of resources, attributes, pools, Boolean expressions,
buttons, DQL statements, keywords, clauses, environment
variables, functions, utilities
• URLs, pathnames, filenames, directory names, computer
names, filenames, links, groups, service keys, file systems,
notifications
Bold Used in running (nonprocedural) text for:
• Names of commands, daemons, options, programs, processes,
services, applications, utilities, kernels, notifications, system
calls, man pages

EMC VPLEX Metro Witness Technology and High Availability 13


Preface

Bold (cont.) Used in procedures for:


• Names of interface elements (such as names of windows, dialog
boxes, buttons, fields, and menus)
• What user specifically selects, clicks, presses, or types
Italic Used in all text (including procedures) for:
• Full titles of publications referenced in text
• Emphasis (for example a new term)
• Variables
Courier Used for:
• System output, such as an error message or script
• URLs, complete paths, filenames, prompts, and syntax when
shown outside of running text
Courier bold Used for:
• Specific user input (such as commands)
Courier italic Used in procedures for:
• Variables on command line
• User input variables
<> Angle brackets enclose parameter or variable values supplied by
the user
[] Square brackets enclose optional values

| Vertical bar indicates alternate selections - the bar means “or”

{} Braces indicate content that you must specify (that is, x or y or z)

... Ellipses indicate nonessential information omitted from the example

We'd like to hear from you!


Your feedback on our TechBooks is important to us! We want our
books to be as helpful and relevant as possible, so please feel free to
send us your comments, opinions and thoughts on this or any other
TechBook:
TechBooks@emc.com

14 EMC VPLEX Metro Witness Technology and High Availability


1
VPLEX Family and Use
Case Overview

This chapter provides a brief summary of the main use cases for the
EMC VPLEX family and design considerations for High Availability.
It also covers some of the key features of the VPLEX family system.
Topics include:
◆ Introduction ........................................................................................ 16
◆ VPLEX value overview ..................................................................... 17
◆ VPLEX product offerings ................................................................. 21
◆ Metro High Availability design considerations............................. 27

VPLEX Family and Use Case Overview 15


VPLEX Family and Use Case Overview

Introduction
The purpose of this TechBook is to introduce EMC® VPLEX™ High
Availability and the VPLEX Witness as it is conceptually
architectured typically by customer storage administrators and EMC
Solutions Architects. The introduction of VPLEX Witness provides
customers with “absolute” physical and logical fabric and cache
coherent redundancy as it is properly designed in the VPLEX Metro
environment.
This guide is designed to provide an overview of the features and
functionality associated with the VPLEX Metro configuration and the
importance of Active/Active data resiliency for today’s advanced
host applications.

16 EMC VPLEX Metro Witness Technology and High Availability


VPLEX Family and Use Case Overview

VPLEX value overview


At the highest level, VPLEX has unique capabilities that storage
administrators value and are seeking to enhance their existing data
centers. It delivers distributed, dynamic and smart functionality into
existing or new data centers to provide storage virtualization across
Geographical boundaries.
◆ VPLEX is distributed, because it is a single interface for
multi-vendor storage and it delivers dynamic data mobility,
which is being able to move applications and data in real-time,
with no outage required.
◆ VPLEX is dynamic, because is provides data availability and
flexibility as well as maintaining business through failures
traditionally requiring outages of manual restore procedures.
◆ VPLEX is smart, because its unique AccessAnywhere technology
can present and keep the same data consistent within and
between sites and enable distributed data collaboration.
Because of these capabilities, VPLEX delivers unique and
differentiated value to address three distinct requirements within our
target customers’ IT environments:
◆ The ability to dynamically move applications and data across
different compute and storage installations, be they within the
same data center, across a campus, within a Geographical region
– and now, with VPLEX Geo, across even greater distances.
◆ The ability to create high-availability storage and a compute
infrastructure across these same varied Geographies with
unmatched resiliency.
◆ The ability to provide efficient real-time data collaboration over
distance for such “big data” applications as video, Geographic
/oceanographic research, and more.
EMCVPLEX technology is a scalable, distributed-storage federation
solution that provides non-disruptive, heterogeneous data movement
and volume management functionality.
Insert VPLEX technology between hosts and storage in a storage area
network (SAN) and data can be extended over distance within,
between, and across data centers.

VPLEX value overview 17


VPLEX Family and Use Case Overview

The VPLEX architecture provides a highly available solution suitable


for many deployment strategies including:
◆ Application and Data Mobility — The movement of virtual
machines (VM) without downtime. An example is shown in
Figure 1.
• Storage administrators have the ability to automatically
balance loads through VPLEX, using storage and compute
resources from either cluster’s location. When combined with
server virtualization, VPLEX allows users to transparently
move and relocate Virtual Machines and their corresponding
applications and data over distance. This provides a unique
capability allowing users to relocate, share and balance
infrastructure resources between sites, which can be within a
campus or between data centers, up to 10 ms apart with
VPLEX Metro, or further apart (50ms RTT) across
asynchronous distances with VPLEX Geo.

Figure 1 Application and data mobility example

◆ HA Infrastructure — Reduces recovery time objective (RTO). An


example is shown in Figure 2.
• High Availability is a term that several products will claim
they can deliver. Ultimately, a High Availability solution is
supposed to protect against a failure and keep an application
online. Storage administrators plan around HA to provide
near continuous uptime for their critical applications, and

18 EMC VPLEX Metro Witness Technology and High Availability


VPLEX Family and Use Case Overview

automate the restart of an application once a failure has


occurred, with as little human intervention as possible. With
conventional solutions, customers typically have to choose a
Recovery Point Objective and a Recovery Time Objective. But
even while some solutions offer small RTOs and RPOs, there
can still be downtime, and for most customers, any downtime
at all can be costly.

Figure 2 HA infrastructure example

◆ Distributed Data Collaboration — Increases utilization of


passive data recovery (DR) assets and provides simultaneous
access to data. An example is shown in Figure 3 on page 20.
• This is when a workforce has multiple users at different sites
that need to work on the same data, and maintain consistency
in the dataset when changes are made. Use cases include
co-development of software where the development happens
across different teams from separate locations, and
collaborative workflows such as engineering, graphic arts,
videos, educational programs, designs, research reports, and
so forth.
• When customers have tried to build collaboration across
distance with the traditional solutions, they normally have to
save the entire file at one location and then send it to another
site using FTP. This is slow, can incur heavy bandwidth costs

VPLEX value overview 19


VPLEX Family and Use Case Overview

for large files, or even small files that move regularly, and
negatively impacts productivity because the other sites can sit
idle while they wait to receive the latest data from another site.
If teams decide to do their own work independent of each
other, then the dataset quickly becomes inconsistent, as
multiple people are working on it at the same time and are
unaware of each other’s most recent changes. Bringing all of
the changes together in the end is time-consuming, costly, and
grows more complicated as the data-set gets larger.

Figure 3 Distributed data collaboration example

20 EMC VPLEX Metro Witness Technology and High Availability


VPLEX Family and Use Case Overview

VPLEX product offerings


VPLEX first meets high-availability and data mobility requirements
and then scales up to the I/O throughput required for the front-end
applications and back-end storage.
High-availability and data mobility features are characteristics of
VPLEX Local, VPLEX Metro, and VPLEX Geo.
A VPLEX cluster consists of one, two, or four engines (each
containing two directors), and a management server. A dual-engine
or quad-engine cluster also contains a pair of Fibre Channel switches
for communication between directors.
Each engine is protected by a standby power supply (SPS), and each
Fibre Channel switch gets its power through an uninterruptible
power supply (UPS). (In a dual-engine or quad-engine cluster, the
management server also gets power from a UPS.)
The management server has a public Ethernet port, which provides
cluster management services when connected to the customer
network.

VPLEX Local, VPLEX Metro, VPLEX Geo


EMC offers VPLEX in three configurations to address customer needs
for high-availability and data mobility:
◆ VPLEX Local
◆ VPLEX Metro
◆ VPLEX Geo
Figure 4 on page 22 provides an example of each.

VPLEX product offerings 21


VPLEX Family and Use Case Overview

Figure 4 VPLEX offerings

VPLEX Local
VPLEX Local provides seamless, non-disruptive data mobility and
ability to manage multiple heterogeneous arrays from a single
interface within a data center.
The VPLEX Local allows increased availability, simplified
management, and improved utilization across multiple arrays.

VPLEX Metro with AccessAnywhere


VPLEX Metro with AccessAnywhere enables active-active, block
level access to data between two sites within synchronous distances.
The distance is limited as to what Synchronous behavior can
withstand as well as consideration to host application stability and
MAN traffic. It is recommended that depending on the application
that consideration for Metro be less than or equal to 5ms1 RTT.
The combination of virtual storage with VPLEX Metro and virtual
servers enables the transparent movement of virtual machines and
storage across a distance.This technology provides improved
utilization across heterogeneous arrays and multiple sites.

1. Refer to VPLEX and vendor-specific White Papers for confirmation of


latency limitations.

22 EMC VPLEX Metro Witness Technology and High Availability


VPLEX Family and Use Case Overview

VPLEX Geo with AccessAnywhere


VPLEX Geo with AccessAnywhere enables active-active, block level
access to data between two sites within asynchronous distances.
VPLEX Geo enables better cost-effective use of resources and power.
Geo provides the same distributed device flexibility as Metro but
extends the distance up to and within 50ms RTT. As with any
Asynchronous transport media, bandwidth is also important to
consider for optimal behavior as well as application sharing on the
link.
For the purpose of this TechBook, the focus on technologies is based
on Metro configuration only. VPLEX Witness is supported with
VPLEX Geo however beyond the scope of this TechBook.

Architecture highlights
VPLEX support is open and heterogeneous, supporting both EMC
storage and common arrays from other storage vendors, such as
HDS, HP, and IBM. VPLEX conforms to established world wide
naming (WWN) guidelines that can be used for zoning.
VPLEX supports operating systems including both physical and
virtual server environments with VMware ESX and Microsoft
Hyper-V. VPLEX supports network fabrics from Brocade and Cisco
including legacy McData SANs.
An example of the architecture is shown in Figure 5 on page 24.

VPLEX product offerings 23


VPLEX Family and Use Case Overview

Figure 5 Architecture highlights

24 EMC VPLEX Metro Witness Technology and High Availability


VPLEX Family and Use Case Overview

Table 1 lists an overview of VPLEX features along with the benefits.

Table 1 Overview of VPLEX features and benefits

Features Benefits

Mobility Move data and applications without impact on


users.

Resiliency Mirror across arrays without host impact, and


increase high availability for critical applications.

Distributed cache coherency Automate sharing, balancing, and failover of I/O


across the cluster and between clusters.

Advanced data caching Improve I/O performance and reduce storage array
contention.

Virtual Storage federation Achieve transparent mobility and access in a data


center and between data centers.

Scale-out cluster architecture Start small and grow larger with predictable service
levels.

For all VPLEX products, the appliance-based VPLEX technology:


◆ Presents storage area network (SAN) volumes from back-end
arrays to VPLEX engines
◆ Packages the SAN volumes into sets of VPLEX virtual volumes
with user-defined configuration and protection levels
◆ Presents virtual volumes to production hosts in the SAN via the
VPLEX front-end
◆ For VPLEX Metro and VPLEX Geo products, presents a global,
block-level directory for distributed cache and I/O between
VPLEX clusters.
Location and distance determine high-availability and data mobility
requirements. For example, if all storage arrays are in a single data
center, a VPLEX Local product federates back-end storage arrays
within the data center.
When back-end storage arrays span two data centers, the
AccessAnywhere feature in a VPLEX Metro or a VPLEX Geo product
federates storage in an active-active configuration between VPLEX
clusters. Choosing between VPLEX Metro or VPLEX Geo depends on
distance and data synchronicity requirements.

VPLEX product offerings 25


VPLEX Family and Use Case Overview

Application and back-end storage I/O throughput determine the


number of engines in each VPLEX cluster. High-availability features
within the VPLEX cluster allow for non-disruptive software upgrades
and expansion as I/O throughput increases.

26 EMC VPLEX Metro Witness Technology and High Availability


VPLEX Family and Use Case Overview

Metro High Availability design considerations


VPLEX Metro 5.0 introduces High Availability concepts beyond what
is traditionally known as physical high availability. To design the
high availability environment, introduction of the “Witness” prevents
failures and asserts the activity between clusters in a multi-site
architecture. EMC VPLEX is the first product to bring to market the
features and functionality provided by VPLEX Witness.
Through this TechBook, Storage Administrators and customers gain
an easy to understand overview on the high availability solution that
provides them:
◆ Automatic load balancing between their data centers
◆ Active/Active use of both of their data centers
◆ High availability for their applications (no single points of storage
failure, auto-restart)
◆ Fully automatic failure handling
◆ Better resource utilization
◆ Lower CapEx and lower OpEx as a result
Broadly speaking when one considers legacy environments we
typically see “highly” available designs implemented within a data
center, and Disaster Recovery type functionality deployed between
data centers.
One of the main reasons for this is that within data centers
components generally operate in an Active/Active (or
Active/Passive with automatic failover) whereas between data
centers legacy replication technologies use active passive techniques
which require manual failover to use the passive component.
When using VPLEX Metro Active/Active replication technology in
conjunction with new features such as Witness server (as described in
“Introduction to VPLEX Witness” on page 83,) the lines between local
High Availability and long distance Disaster Recovery are somewhat
blurred since HA can be stretched beyond the data center walls. Since
“replication” is a by-product of federated and distributed storage
disaster avoidance, is also achievable within these geographically
dispersed HA environments.

Metro High Availability design considerations 27


VPLEX Family and Use Case Overview

Planned application mobility compared with disaster restart


This section compares planned application mobility and disaster
restart.

Planned application Conceptually, a planned event wherein an application can be moved


mobility fully online (without disruption) from one location to another (be it
the same or remote data center) but critically this can only be
performed when all components that participate in this movement
are available and the running state of the application exists in volatile
memory.
An example of this online application mobility would be VMware
vMotion where a virtual machine would need to be fully operational
before it can be moved. It may sound obvious but if the VM was
offline then movement could not be performed on line (This is
important to understand and is the key difference over application
restart).
When vMotion is executed all live components that are required to
make the VM function are copied elsewhere in the background before
cutting the VM over.
Since these types of mobility tasks are totally seamless to the user
some of the use cases associated are for disaster avoidance where an
application or VM can be moved ahead of a disaster (such as,
Hurricane, Tsunami, etc.) as the running state is available to be
copied, or in other cases it can be used to enable the ability to load
balance across multiple systems or even data centers.
Due to the need for the running state to be available for these types of
relocations these movements are always deemed planned activities.

Disaster restart Disaster restart is where an application or service is re-started in


another location after a failure (be it on a different server or data
center) and will typically interrupt the service/application during the
failover.
A good example of this technology would be a VMware HA Cluster
configured over two geographically dispersed sites using VPLEX
Metro where a cluster will be formed over a number of ESX servers
and either single or multiple virtual machines can run on any of the
ESX servers within the cluster.

28 EMC VPLEX Metro Witness Technology and High Availability


VPLEX Family and Use Case Overview

If for some reason an active ESX server were to fail (perhaps due to
site failure) then the VM can be re-started on a remaining ESX server
within the cluster at the remote site as the datastore where it was
running spans the two locations since it is configured on a VPLEX
Metro distributed volume. This would be deemed an unplanned
failover which will incur a small outage of the application since the
running state of the VM was lost when the ESX server failed meaning
the service will be unavailable until the VM has restarted elsewhere.
Although comparing a planned application mobility event to an
unplanned disaster restart will result in the same outcome (i.e. a
service relocating elsewhere) we can now see that there is a big
difference since the planned mobility job keeps the application online
during the relocation whereas the disaster restart will result in the
application being offline during the relocation as a restart is
conducted.
A pre-requisite for a geographical cluster to perform disaster restart
would be an Active/Active underlying replication solution (VPLEX
Metro only at this publication). When using legacy Active/Passive
type solutions in these scenarios would also typically require an extra
step over and above standard application failover since a storage
failover would also be required. This is where VPLEX can assist
greatly since it is active/active therefore in most cases no manual
intervention at the storage layer is required.The value of VPLEX
Witness and application of following physically high available and
redundant hardware connectivity best practices will truly provide
customers with “Absolute” availability!

Metro High Availability design considerations 29


VPLEX Family and Use Case Overview

30 EMC VPLEX Metro Witness Technology and High Availability


2

Hardware and Software

This chapter provides insight into the hardware and software


interfaces that can be used by an administrator to manage all aspects
of a VPLEX system. In addition, a brief overview of the internal
system software is included. Topics include:
◆ Introduction ........................................................................................ 32
◆ VPLEX management interfaces........................................................ 38
◆ Simplified storage management ...................................................... 41
◆ Management server user accounts .................................................. 42
◆ Management server software ........................................................... 43
◆ Director software................................................................................ 47
◆ Configuration overview.................................................................... 48
◆ I/O implementation .......................................................................... 52

Hardware and Software 31


Hardware and Software

Introduction
This section provides basic information on the following:
◆ “VPLEX I/O” on page 32
◆ “High-level VPLEX I/O discussion” on page 32
◆ “Distributed coherent cache” on page 33
◆ “VPLEX family clustering architecture ” on page 33

VPLEX I/O
VPLEX is built on a lightweight protocol that maintains cache
coherency for storage I/O and the VPLEX cluster provides highly
available memory cache, processing power, front-end, and back-end
Fibre Channel interfaces.
EMC hardware powers the VPLEX cluster design so that all devices
are always available and I/O that enters the cluster from anywhere
can be serviced by any node within the cluster.
The AccessAnywhere feature in the VPLEX Metro and VPLEX Geo
products extends the cache coherency between data centers at a
distance.

High-level VPLEX I/O discussion


VPLEX abstracts a block-level ownership model into a high level
directory that is updated for every I/O and shared across all engines.
The directory uses a small amount of metadata and tells all other
engines in the cluster, in 4k block transmissions, which block of data
is owned by which engine and at what time.
After a write completes and ownership is reflected in the directory,
VPLEX dynamically manages read requests for the completed write
in the most efficient way possible.
When a read request arrives, VPLEX checks the directory for an
owner. After VPLEX locates the owner, the read request goes directly
to that engine.
On reads from other engines, VPLEX checks the directory and tries to
pull the read I/O directly from the engine cache to avoid going to the
physical arrays to satisfy the read.

32 EMC VPLEX Metro Witness Technology and High Availability


Hardware and Software

This model enables VPLEX to stretch the cluster as VPLEX distributes


the directory between clusters and sites. VPLEX is efficient with
minimal overhead and enables I/O communication over distance.

Distributed coherent cache


The VPLEX engine includes two directors that have a total of 26 GB of
local cache. Cache pages are keyed by volume and go through a
lifecycle from staging, to visible, to draining.
The global cache is a combination of all director caches that spans all
clusters. The cache page holder information is maintained in
in-memory data structure called a directory.
The directory is divided into chunks and distributed among the
VPLEX directors and locality controls where ownership is
maintained.
A meta-directory identifies which director owns which directory
chunks within the global directory.

VPLEX family clustering architecture


The VPLEX family uses a unique clustering architecture to help
customers break the boundaries of the data center and allow servers
at multiple data centers to have read/write access to shared block
storage devices. A VPLEX cluster, as shown in Figure 6 on page 34,
can scale up through the addition of more engines, and scale out by
connecting clusters into an EMC VPLEX Metro-Plex™ (two VPLEX
Metro clusters connected within Metro distances).

Introduction 33
Hardware and Software

Figure 6 VPLEX cluster example

VPLEX Metro transparently moves and shares workloads for a


variety of applications, VMs, databases and cluster file systems.
VPLEX Metro consolidates data centers, and optimizes resource
utilization across data centers. In addition, it provides non-disruptive
data mobility, heterogeneous storage management, and improved
application availability. VPLEX Metro supports up to two clusters,
which can be in the same data center, or at two different sites within
synchronous environments. Also, introduced with these solutions
architected by this TechBook, Geo cluster across distances achieves
the asynchronous partner to Metro. It is out of the scope of this
document to analyze Geo capabilities with VPLEX Witness.

34 EMC VPLEX Metro Witness Technology and High Availability


Hardware and Software

VPLEX single, dual, quad


The VPLEX cluster supports 16000 storage volumes and 16000 virtual
volumes with UI responsiveness under 10 seconds for common
operations. The VPLEX cluster also supports 2000 initiators per
cluster.
The VPLEX engine provides cache and processing power with
redundant directors that each include two I/O modules per director
and one optional WAN COM I/O module for use in VPLEX Metro
and VPLEX Geo configurations.
The rackable hardware components are shipped in NEMA standard
racks or provided, as an option, as a field rackable product. Table 2
provides a list of configurations.

Table 2 Configurations at a glance

Single engine Dual engine Quad engine

Directors 2 4 8

Redundant Engine SPSs Yes Yes Yes

FE Fibre Channel ports 8 16 32

BE Fibre Channel ports 8 16 32

Cache size 72 GB 144 GB 288 GB

Management Servers 1 1 1

Internal Fibre Channel switches None 2 2


(Local Comm)

Uninterruptable Power Supplies None 2 2


(UPSs)

Single-engine VPLEX
◆ Two directors
◆ 32 Fibre Channel ports
◆ 64 GB cache
◆ I/O throughput characteristics

Introduction 35
Hardware and Software

Dual-engine VPLEX
◆ Four directors
◆ 64 Fibre Channel ports
◆ 128 GB cache
◆ I/O throughput characteristics

Quad-engine VPLEX
◆ Eight directors
◆ 128 Fibre Channel ports
◆ 256 GB cache
◆ I/O throughput characteristics

VPLEX sizing tool


Use the EMC VPLEX sizing tool provided by EMC Global Services
Software Development to configure the right VPLEX cluster
configuration.
The sizing tool concentrates on I/O throughput requirement for
installed applications (mail exchange, OLTP, data warehouse, video
streaming, etc.) and back-end configuration such as virtual volumes,
size and quantity of storage volumes, and initiators.

Upgrade paths
VPLEX facilitates application and storage upgrades without a service
window through its flexibility to shift production workloads
throughout the VPLEX technology.
In addition, high-availability features of the VPLEX cluster allow for
non-disruptive VPLEX hardware and software upgrades.
This flexibility means that VPLEX is always servicing I/O and never
has to be completely shut down.

Hardware upgrades
Upgrades are supported for single-engine VPLEX systems to dual- or
quad-engine systems.
Two VPLEX Local systems can be reconfigured to work as a VPLEX
Metro or VPLEX Geo.

36 EMC VPLEX Metro Witness Technology and High Availability


Hardware and Software

Information for VPLEX hardware upgrades is in the Procedure


Generator that is available through EMC PowerLink.

Software upgrades
VPLEX features a robust non-disruptive upgrade (NDU) technology
to upgrade the software on VPLEX engines. Management server
software must be upgraded before running the NDU.
Due to the VPLEX distributed coherent cache, directors elsewhere in
the VPLEX installation service I/Os while the upgrade is taking
place. This alleviates the need for service windows and reduces RTO.
The NDU includes the following steps:
◆ Preparing the VPLEX system for the NDU
◆ Starting the NDU
◆ Transferring the I/O to an upgraded director
◆ Completing the NDU

Introduction 37
Hardware and Software

VPLEX management interfaces


Within the VPLEX cluster, TCP/IP-based management traffic travels
through a private network subnet to the components in one or more
clusters. In VPLEX Metro and VPLEX Geo, VPLEX establishes a VPN
tunnel between the management servers of both clusters. The VPLEX
management station also extends to the VPLEX Witness via VPN
tunnel (3-ways) once it is implemented into an environment.

Web-based GUI
VPLEX includes a Web-based graphical user interface (GUI) for
management. The EMC VPLEX Management Console Help provides
more information on using this interface.
To perform other VPLEX operations that are not available in the GUI,
refer to the CLI, which supports full functionality. The EMC VPLEX
CLI Guide provides a comprehensive list of VPLEX commands and
detailed instructions on using those commands.
The EMC VPLEX Management Console contains but not limited to
the following functions:
◆ Supports storage array discovery and provisioning
◆ Local provisioning
◆ Distributed provisioning
◆ Mobility Central
◆ Online help

VPLEX CLI
VPlexcli is a command line interface (CLI) to configure and operate
VPLEX systems. It also generates the EZ Wizard Setup process to
make installation of VPLEX easier and quicker.
The CLI is divided into command contexts. Some commands are
accessible from all contexts, and are referred to as ‘global commands’.
The remaining commands are arranged in a hierarchical context tree
that can only be executed from the appropriate location in the context
tree.

38 EMC VPLEX Metro Witness Technology and High Availability


Hardware and Software

The VPlexcli encompasses all capabilities in order to function if the


management station is unavailable. It is fully functional,
comprehensive, supporting full configuration, provisioning and
advanced systems management capabilities.

SNMP support for performance statistics


The VPLEX snmpv2c SNMP agent:
◆ Supports retrieval of performance-related statistics as published
in the VPLEX-MIB.mib.
◆ Runs on the management server and fetches performance related
data from individual directors using a firmware specific
interface.
◆ Provides SNMP MIB data for directors for the local cluster only.

LDAP /AD support


VPLEX offers Lightweight Directory Access Protocol (LDAP) or
Active Directory for an authentication directory service.

VPLEX Element Manager API


VPLEX Element Manager API uses the Representational State
Transfer (REST) software architecture for distributed systems such as
the World Wide Web. It allows software developers and other users to
use the API to create scripts to run VPLEX CLI commands.
The VPLEX Element Manager API supports all VPLEX CLI
commands that can be executed from the root context on a director.
The system management software for VPLEX family systems consists
of the following high-level components:
◆ Command line utility
◆ Management console (web interface)
◆ Business layer
◆ Firmware layer
Each cluster in a VPLEX deployment requires one management
server, which is embedded in the VPLEX cabinet along with other
essential components, such as the directors and internal Fibre

VPLEX management interfaces 39


Hardware and Software

Channel switches. The management server communicates through


private, redundant IP networks with each director. The management
server is the only VPLEX component that is configured with a public
IP address on the customer network.
The management server is accessed through a Secure Shell™ (SSH®).
Additionally the administrator may run VNC client to the
management server. Within the SSH session the administrator can
run a CLI utility called VPlexcli to manage the system. Alternatively,
the VPLEX management console web interface (GUI) can be started
by pointing a browser at the management server’s public IP address.
The following processes run on the management server:
◆ System Management Server — Communicates with the
directors, retrieves logs by querying system state, supports
multiple concurrent CLI and HTTP sessions, listens to the system
events and determines which events are of interest for call home,
and interprets the call home list and initiates the call home.
◆ EmaAdapter — Collects events from VPLEX components and
sends them to ConnectEMC.
◆ ConnectEMC — Receives the formatted events and sends them
to EMC.com

40 EMC VPLEX Metro Witness Technology and High Availability


Hardware and Software

Simplified storage management


VPLEX supports a variety of arrays from various vendors covering
both active/active and active/passive type arrays. VPLEX simplifies
storage management by allowing simple LUNs, provisioned from the
various arrays, to be managed through a centralized management
interface that is simple to use and very intuitive. In addition, a
Metro-Plex or Geo-Plex environment that spans data centers allows
the storage administrator to manage both locations through the one
interface from either location by logging in at the local site.

Simplified storage management 41


Hardware and Software

Management server user accounts


The management server requires the setup of user accounts for access
to certain tasks. Table 3 describes the types of user accounts on the
management server.

Table 3 Management server user accounts

Account type Purpose

admin (customer) • Performs administrative actions, such as user


management
• Creates and deletes Linux CLI accounts
• Resets passwords for all Linux CLI users
• Modifies the public Ethernet settings

service • Starts and stops necessary OS and VPLEX services


(EMC service) • Cannot modify user accounts
• (Customers do have access to this account)

Linux CLI accounts • Uses VPlexcli to manage federated storage

All account types • Uses VPlexcli


• Modifies their own password
• Can SSH or VNC into the management server
• Can SCP files off the management server from directories
to which they have access

Some service and administrator tasks require OS commands that


require root privileges. The management server has been configured
to use the sudo program to provide these root privileges just for the
duration of the command. Sudo is a secure and well-established
UNIX program for allowing users to run commands with root
privileges.
VPLEX documentation will indicate which commands must be
prefixed with "sudo" in order to acquire the necessary privileges. The
sudo command will ask for the user's password when it runs for the
first time, to ensure that the user knows the password for his account.
This prevents unauthorized users from executing these privileged
commands when they find an authenticated SSH login that was left
open.

42 EMC VPLEX Metro Witness Technology and High Availability


Hardware and Software

Management server software


The management server software is installed during manufacturing
and is fully field upgradeable. The software includes:
◆ VPLEX Management Console
◆ VPlexcli
◆ Server Base Image Updates (when necessary)
◆ Call-home software

Management console
The VPLEX Management Console provides a graphical user interface
(GUI) to manage the VPLEX cluster. The GUI can be used to
provision storage, as well as manage and monitor system
performance.
Figure 7 on page 44 shows the VPLEX Management Console window
with the cluster tree expanded to show the objects that are
manageable from the front-end, back-end, and the federated storage.

Management server software 43


Hardware and Software

Figure 7 VPLEX Management Console

The VPLEX Management Console provides online help for all of its
available functions. You can access online help in the following ways:
◆ Click the Help icon in the upper right corner on the main screen
to open the online help system, or in a specific screen to open a
topic specific to the current task.
◆ Click the Help button on the task bar to display a list of links to
additional VPLEX documentation and other sources of
information.

44 EMC VPLEX Metro Witness Technology and High Availability


Hardware and Software

Figure 8 is the welcome screen of the VPLEX Management Console


GUI, which utilizes a secure http connection via a browser. The
interface uses Flash technology for rapid response and unique look
and feel.

Figure 8 Management Console welcome screen

Command line interface


The VPlexcli is a command line interface (CLI) for configuring and
running the VPLEX system, for setting up and monitoring the
system’s hardware and intersite links (including com/tcp), and for
configuring global inter-site I/O cost and link-failure recovery. The
CLI runs as a service on the VPLEX management server and is
accessible using Secure Shell (SSH).

Management server software 45


Hardware and Software

For information about the VPlexcli, refer to the EMC VPLEX CLI
Guide.

System reporting
VPLEX system reporting software collects various configuration
information from each cluster and each engine. The resulting
configuration file (XML) is zipped and stored locally on the
management server or presented to the SYR system at EMC via call
home.
You can schedule a weekly job to automatically collect SYR data
(VPlexcli command scheduleSYR), or manually collect it whenever
needed (VPlexcli command syrcollect).

46 EMC VPLEX Metro Witness Technology and High Availability


Hardware and Software

Director software
The director software provides:
◆ Basic Input/Output System (BIOS ) — Provides low-level
hardware support to the operating system, and maintains boot
configuration.
◆ Power-On Self Test (POST) — Provides automated testing of
system hardware during power on.
◆ Linux — Provides basic operating system services to the Vplexcli
software stack running on the directors.
◆ VPLEX Power and Environmental Monitoring (ZPEM) —
Provides monitoring and reporting of system hardware status.
◆ EMC Common Object Model (ECOM) —Provides management
logic and interfaces to the internal components of the system.
◆ Log server — Collates log messages from director processes and
sends them to the SMS.
◆ GeoSynchrony (I/O Stack) — Processes I/O from hosts, performs
all cache processing, replication, and virtualization logic,
interfaces with arrays for claiming and I/O.

Director software 47
Hardware and Software

Configuration overview
The VPLEX configurations are based on how many engines are in the
cabinet. The basic configurations are small, medium, and large, as
shown in .
The configuration sizes refer to the number of engines in the VPLEX
cabinet. The remainder of this section describes each configuration
size.

Small configurations
The VPLEX-02 (small) configuration includes the following:
◆ Two directors
◆ One engine
◆ Redundant engine SPSs
◆ 8 front-end Fibre Channel ports
◆ 8 back-end Fibre Channel ports
◆ One management server
The unused space between engine 1 and the management server in
Figure 9 on page 49 is intentional.

48 EMC VPLEX Metro Witness Technology and High Availability


Hardware and Software

OFF OFF
O O
I I
ON ON

Management server

OFF OFF
O O
I I
ON ON

OFF OFF
O O
I I
ON ON

Engine 1

SPS 1

VPLX-000255

Figure 9 VPLEX small configuration

Medium configurations
The VPLEX-04 (medium) configuration includes the following:
◆ Four directors
◆ Two engines
◆ Redundant engine SPSs
◆ 16 front-end Fibre Channel ports
◆ 16 back-end Fibre Channel ports
◆ One management server
◆ Redundant Fibre Channel COM switches for local COM; UPS for
each Fibre Channel switch

Configuration overview 49
Hardware and Software

Figure 10 shows an example of a medium configuration.

ON ON
I I
O O
OFF OFF

ON ON
I I
O O
OFF OFF

ON ON
I I
O O
OFF OFF

Fibre Channel switch B

UPS B

Fibre Channel switch A

UPS A
OFF OFF
O O
I I
ON ON

Management server

Engine 2
OFF OFF
O O
I I
ON ON

SPS 2

OFF OFF
O O
I I
ON ON

Engine 1

SPS 1

VPLX-000254

Figure 10 VPLEX medium configuration

Large configurations
The VPLEX-08 (large) configuration includes the following:
◆ Eight directors
◆ Four engines
◆ Redundant engine SPSs
◆ 32 front-end Fibre Channel ports
◆ 32 back-end Fibre Channel ports
◆ One management server

50 EMC VPLEX Metro Witness Technology and High Availability


Hardware and Software

◆ Redundant Fibre Channel COM switches for local COM; UPS for
each Fibre Channel switch
Figure 11 shows an example of a large configuration.

ON
I
O
ON
I
O
Engine 4
OFF OFF

SPS 4
ON ON
I I
O O
OFF OFF

Engine 3

ON ON
I I
O O
OFF OFF

SPS 3

Fibre Channel switch B

UPS B

Fibre Channel switch A

UPS A
OFF OFF
O O
I I
ON ON

Management server

Engine 2
OFF OFF
O O
I I
ON ON

SPS 2

OFF OFF
O O
I I
ON ON

Engine 1

SPS 1

VPLX-000253

Figure 11 VPLEX large configuration

Configuration overview 51
Hardware and Software

I/O implementation
The VPLEX cluster utilizes a write-through mode whereby all writes
are written through the cache to the back-end storage. Writes are
completed to the host only after they have been completed to the
back-end arrays, maintaining data integrity.
This section describes the VPLEX cluster caching layers, roles, and
interactions. It gives an overview of how reads and writes are
handled within the VPLEX cluster and how distributed cache
coherency works. This is important to the introduction of high
availability concepts.

Cache coherence
Cache coherence creates a consistent global view of a volume.
Distributed cache coherence is maintained using a directory. There is
one directory per user volume and each directory is split into chunks
(4096 directory entries within each). These chunks exist only if they
are populated. There is one directory entry per global cache page,
with responsibility for:
◆ Tracking page owner(s) and remembering the last writer
◆ Locking and queuing

Meta-directory
Directory chunks are managed by the meta-directory, which assigns
and remembers chunk ownership. These chunks can migrate using
Locality-Conscious Directory Migration (LCDM). This
meta-directory knowledge is cached across the share group for
efficiency.

How a read is handled


When a host makes a read request, VPLEX first searches its local
cache. If the data is found there, it is returned to the host.

52 EMC VPLEX Metro Witness Technology and High Availability


Hardware and Software

If the data is not found in local cache, VPLEX searches global cache.
Global cache includes all directors that are connected to one another
within the VPLEX cluster. When the read is serviced from global
cache, a copy is also stored in the local cache of the director from
where the request originated.
If a read cannot be serviced from either local cache or global cache, it
is read directly from the back-end storage. In this case both the global
and local cache are updated to maintain cache coherency.

I/O flow of a read miss


1. Read request issued to virtual volume from host.
2. Look up in local cache of ingress director.
3. On miss, look up in global cache.
4. On miss, data read from storage volume into local cache.
5. Data returned from local cache to host.

I/O flow of a local read hit


1. Read request issued to virtual volume from host.
2. Look up in local cache of ingress director.
3. On hit, data returned from local cache to host.

I/O flow of a global read hit


1. Read request issued to virtual volume from host.
2. Look up in local cache of ingress director.
3. On miss, look up in global cache.
4. On hit, data read from owner director into local cache.
5. Data returned from local cache to host.

How a write is handled


All writes are written through cache to the back-end storage. Writes
are completed to the host only after they have been completed to the
back-end arrays.
When performing writes, the VPLEX system Data Management (DM)
component includes a per-volume caching subsystem that utilizes a
subset of the caching capabilities:

I/O implementation 53
Hardware and Software

◆ Local Node Cache: cache data management, and back-end I/O


interaction.
◆ Distributed Cache (DMG – Directory Manager): Cache
coherence, dirty data protection, and failure recovery mechanics
(fault-tolerance).

I/O flow of a write miss


1. Write request issued to virtual volume from host.
2. Look for prior data in local cache.
3. Look for prior data in global cache.
4. Transfer data to local cache.
5. Data is written through to back-end storage.
6. Write is acknowledged to host.

I/O flow of a write hit


1. Write request issued to virtual volume from host.
2. Look for prior data in local cache.
3. Look for prior data in global cache.
4. Invalidate prior data.
5. Transfer data to local cache.
6. Data is written through to back-end storage.
7. Write is acknowledged to host.

54 EMC VPLEX Metro Witness Technology and High Availability


3
System and
Component Integrity

This chapter explains how VPLEX clusters are able to handle


hardware failures in any subsystem within the storage cluster. Topics
include:
◆ Overview ............................................................................................. 56
◆ Cluster.................................................................................................. 57
◆ Path redundancy through different ports ...................................... 58
◆ Path redundancy through different directors ................................ 59
◆ Path redundancy through different engines .................................. 60
◆ Path redundancy through site distribution.................................... 61
◆ Safety check......................................................................................... 62

System and Component Integrity 55


System and Component Integrity

Overview
VPLEX clusters are capable of surviving any single hardware failure
in any subsystem within the overall storage cluster. These include
host connectivity subsystem, memory subsystem, etc. A single failure
in any subsystem will not affect the availability or integrity of the
data. Multiple failures in a single subsystem and certain
combinations of single failures in multiple subsystems may affect the
availability or integrity of data.
This availability requires that host connections be redundant and that
hosts are supplied with multipath drivers. In the event of a front-end
port failure or a director failure, hosts without redundant physical
connectivity to a VPLEX cluster and without multipathing software
installed may be susceptible to data unavailability.

56 EMC VPLEX Metro Witness Technology and High Availability


System and Component Integrity

Cluster
A cluster is a collection of one, two, or four engines in a physical
cabinet. A cluster serves I/O for one storage domain and is managed
as one storage cluster.
All hardware resources (CPU cycles, I/O ports, and cache memory)
are pooled:
◆ The front-end ports on all directors provide active/active access
to the virtual volumes exported by the cluster.
◆ For maximum availability, virtual volumes must be presented
through each director so that all directors but one can fail without
causing data loss or unavailability. All directors must be
connected to all storage.

Cluster 57
System and Component Integrity

Path redundancy through different ports


Because all paths are duplicated, when a director port goes down for
any reason, data seemlessly processes through a port of the other
director, as shown in Figure 12.

Figure 12 Port redundancy

Multipathing software plus redundant volume presentation yields


continuous data availability in the presence of port failures.

58 EMC VPLEX Metro Witness Technology and High Availability


System and Component Integrity

Path redundancy through different directors


If a a director were to go down, the other director can completely take
over the I/O processing from the host, as shown in Figure 13.

Figure 13 Director redundancy

Multipathing software plus volume presentation on different


directors yields continuous data availability in the presence of
director failures.

Path redundancy through different directors 59


System and Component Integrity

Path redundancy through different engines


In a clustered environment, if one engine goes down, another engine
completes the host I/O processing, as shown in Figure 14.

Figure 14 Engine redundancy

Multipathing software plus volume presentation on different engines


yields continuous data availability in the presence of engine failures.

60 EMC VPLEX Metro Witness Technology and High Availability


System and Component Integrity

Path redundancy through site distribution


Distributed site redundancy now enabled through Metro HA ensures
that if a site goes down, or even if the link to that site goes down, the
other site can continue seamlessly processing the host I/O, as shown
in Figure 15. On site failure of Site B, the I/O continues unhindered
on Site A.

Figure 15 Site redundancy

Path redundancy through site distribution 61


System and Component Integrity

Safety check
In addition to the redundancy fail-safe features, the VPLEX cluster
provides event logs and call home capability.

62 EMC VPLEX Metro Witness Technology and High Availability


4
Foundations of VPLEX
High Availability

This chapter explains VPLEX architecture and operation:


◆ Foundations of VPLEX High Availability ..................................... 64
◆ Failure handling without VPLEX Witness (Static bias) ................ 72

Foundations of VPLEX High Availability 63


Foundations of VPLEX High Availability

Foundations of VPLEX High Availability


The following section discusses several disruptive scenarios at a high
level to a multiple site VPLEX configuration. The purpose of this
section is to provide the customer or solutions’ architect the ability to
understand site failure semantics prior to the implementation of
VPLEX Witness and related solutions outlined in this book. This
section isn’t designed to highlight flaws in the current high
availability architecture as implemented in basic VPLEX best
practices. All solutions that are deployed in a Metro HA Active
/Active state be they VPLEX or not will run into the same issues
when not deploying a “Witness.” The decision for an architect to
apply the VPLEX Witness capabilities or enhance connectivity paths
across data centers using the Metro HA Cross-Cluster Connect
solution is dependent on their basic fail-over needs.

Note: To ensure the explanation of this subject remains at a high level, for the
following section the graphics have been broken down into major objects
(e.g. Site A, Site B and Link) please assume that within each site resides a
VPLEX cluster therefore when a site failure is shown it will also cause a full
VPLEX cluster failure within that site. Please also assume that the link object
between sites represents the main inter-cluster data network connected to
each VPLEX cluster in either site. Also, assume that each site shares the same
failure domain. A site failure will affect all components within this failure
domain including VPLEX cluster.

This representation of Figure 16 as described shows normal operation


where all three components are fully operational. (Note: green
symbolizes normal operation and red symbolizes failure)

Figure 16 High level functional sites in communication

Figure 17 on page 65 demonstrates that site A has failed. When we


observe this figure imagine if a service, application or VM was
running only in site A at the time of the incident it would now need
to be restarted at the remaining site B. We know this as we have an

64 EMC VPLEX Metro Witness Technology and High Availability


Foundations of VPLEX High Availability

external perspective since we can see the entire diagram, however if


we were looking at this purely from site B’s perspective all the
VPLEX would know is that communication has been lost to Site A,
although it would be impossible to distinguish if this was a full
failure at site A or simple a link failure.

Figure 17 High level Site A failure

A link failure as depicted by the red arrow in Figure 18 is


representative of an inter-cluster link failure.

Figure 18 High level Inter-site link failure

Similar to the previous example if we looking at this from an overall


perspective we can see that it is the link which is faulted, however if
we consider this from Site A or Site B’s perspective all that the VPLEX
knows is that communication is lost to site A (exactly like the
previous example) and cannot distinguish if it is the link or the site at
fault.
If you take the basic Site A or Site B failure scenario as a basic disaster
recovery scenario then apply the concepts of Active /Active
philosophy. The next section shows how different failures affect a
VPLEX distributed volume and highlights the different resolution
required in each case starting with the site failure scenario. The high
level Figure 19 on page 66 shows a VPLEX distributed volume
spanning two sites:

Foundations of VPLEX High Availability 65


Foundations of VPLEX High Availability

Figure 19 VPLEX active and functional between two sites

As shown, the distributed volume is made up of a mirror at each site


(M1 and M2) and using the distributed cache coherency semantics
provided by VPLEX GeoSynchrony a consistent data presentation of
a logical volume is achieved across both clusters. Furthermore due to
the cache coherency the ability to perform Active/Active data access
(both read and write) from two sites in enabled. Additionally shown
in the example is a distributed network where users are able to access
either site which would be true in a fully active/active environment.
In Figure 20 on page 67, if there was a failure at one of the sites (in
this case site A has failed) then the distributed volume would become
degraded since the hardware required at site A to support this
particular mirror leg is no longer available. For a resolution to this
example we would want to simply keep the volume active at site B so
the application can resume there.

66 EMC VPLEX Metro Witness Technology and High Availability


Foundations of VPLEX High Availability

Figure 20 VPLEX concept diagram with failure at Site A

Figure 21 on page 68 shows the desired resolution if failure at site A


was to occur. As discussed previously the outcome of this is to keep
the volume online in site B.

Foundations of VPLEX High Availability 67


Foundations of VPLEX High Availability

Figure 21 Correct resolution after volume failure at Site A

The next section discusses the outcome after an inter-cluster link


partition/failure. Figure 22 on page 69 shows the configuration
before the failure.

68 EMC VPLEX Metro Witness Technology and High Availability


Foundations of VPLEX High Availability

Figure 22 VPLEX active and functional between two sites

Recall based on the Site A / Site B simple failure scenarios, when a


link failed, neither site knew of the exact failure. With an Active /
Active distributed volume, a link failure would also degrade the
distributed volume since write I/O at either site would be unable be
propagated to the remote site.
Figure 23 on page 70 shows what would happen if there was no
“mechanism” to suspend I/O at one of the site in this scenario.

Foundations of VPLEX High Availability 69


Foundations of VPLEX High Availability

Figure 23 Inter-site link failure and cluster partition

As shown we can see this would lead to conflicting detach or split


brain since writes could be accepted on both sites therefore giving the
potential to end up with two different copies of the data. To protect
against data corruption this situation has to be avoided therefore
VPLEX must act and suspend access to the distributed volume on one
of the clusters.
Figure 24 on page 71 displays a valid and acceptable state in the event
of a link partition as site A is now suspended. This is the default and
automatic behavior of VPLEX distributed volumes and protects
against data corruption and split brain scenarios. The following
section explains in more detail how this functions.

70 EMC VPLEX Metro Witness Technology and High Availability


Foundations of VPLEX High Availability

Figure 24 Correct handling of cluster partition

Foundations of VPLEX High Availability 71


Foundations of VPLEX High Availability

Failure handling without VPLEX Witness (Static bias)


As previously demonstrated, in the presence of failures, VPLEX
Active/Active distributed solutions require different resolutions
depending on the type of failure, however since VPLEX version 4.0
had no means to perform external arbitration no “mechanism”
existed to distinguish between a site failure and a link failure. To
overcome this, a feature called “static bias” is used to guard against
split brain scenarios occurring.
The premise of static bias is to set a rule ahead of failure for each
distributed volume that spans two VPLEX clusters to effectively
define which cluster will be declared a preferred cluster and maintain
access to the volume and which cluster should be declared the
alternative therefore suspending access should either of the VPLEX
clusters lose communication with each other (this concept covers
both site and link failure). This is known as a “detach rule” and
means that one site can unilaterally detach the other cluster and
assume that the detached cluster is either dead or that it will stay
suspended if it is alive.
Figure 25 on page 73 shows how static bias can be set for each
distributed volume or for naming sake, referred to as DR1.

72 EMC VPLEX Metro Witness Technology and High Availability


Foundations of VPLEX High Availability

Figure 25 VPLEX static detach rule

This detach rule can either be set within the VPLEX GUI or via
VPLEX CLI.
Each volume can be either set to Cluster 1 detaches, or Cluster 2
detaches.
If the DR1 is set to Cluster 1 detaches, then in any failure scenario the
preferred cluster for that volume would be declared as Cluster 1, but
if the DR1 detach rule is set to Cluster 2 detaches, then in any failure
scenario the preferred cluster for that volume would be declared as
Cluster 2.

Note: Some people when looking at this prefer to substitute the word detaches
for the word preferred which is perfectly acceptable and can make it easier to
understand.

Failure handling without VPLEX Witness (Static bias) 73


Foundations of VPLEX High Availability

Setting the rule set on a volume to Cluster 1 detaches, would mean


that Cluster 1 would be the preferred site for the given volumes.
(Additionally the terminology that Cluster 1 has the bias for the given
volume is also appropriate)
Once this rule is set then regardless of the failure (be it link or site) the
rule will always be invoked.
Below shows some examples of the rule set in action for different
failures
The next example shows a site loss at B with a single DR1s set to
Cluster 1 detaches. Figure 26 shows the initial running setup of the
configuration. We can see that the volume is set to Cluster 1 detaches.

Figure 26 Typical detach rule setup

74 EMC VPLEX Metro Witness Technology and High Availability


Foundations of VPLEX High Availability

If there was a problem at site B, then the DR1 will become degraded
as shown in Figure 27.

Figure 27 Non-preferred site failure

As the bias rule was set to Cluster 1 detaches, then the distributed
volume will remain active at site A. This is shown in Figure 28 on
page 76.

Failure handling without VPLEX Witness (Static bias) 75


Foundations of VPLEX High Availability

Figure 28 Volume remains active at Cluster 1

Therefore in this scenario if the service, application or VM was


running only at site A (the preferred site) then it would continue
uninterrupted without needing to re-start, however if the application
was running only at site B on the given distributed volume then it
will need to be restarted at site A, but since VPLEX is an active/active
solution no manual intervention at the storage layer will be required
in this case.
The next example shows static bias working under link failure
conditions
Figure 29 on page 77 shows a configuration with a distributed
volume set to Cluster 1 detaches as per the previous configuration.

76 EMC VPLEX Metro Witness Technology and High Availability


Foundations of VPLEX High Availability

Figure 29 Typical detach rule setup before link failure

If the link were now lost then the distributed volume will again be
degraded as shown in Figure 30 on page 78.

Failure handling without VPLEX Witness (Static bias) 77


Foundations of VPLEX High Availability

Figure 30 Inter-site link failure and cluster partition

To ensure that split brain does not occur after this type of failure the
static bias rule is applied and IO is suspended at Cluster 2 in this case
as the rule is set to Cluster 1 detaches.
This can be observed in Figure 31 on page 79.

78 EMC VPLEX Metro Witness Technology and High Availability


Foundations of VPLEX High Availability

Figure 31 Suspension after inter-site link failure and cluster partition

Therefore in this scenario if the service, application or VM was


running only at site A then it would continue uninterrupted without
needing to re-start, however if the application was running only at
site B then it will need to be restarted at site A since the bias rule set
will suspend access for the given distributed volumes on Cluster2.
Again no manual intervention will be required in this case at the
storage level as the volume at Cluster 1 automatically remained
available.
In summary we can see Static Bias is a very effective method of
preventing split brain, however there is a particular scenario that will
result in manual intervention if the static bias feature is used alone.
This can happen if there is a VPLEX cluster or site failure at the
“preferred cluster” (such as the pre-defined preferred cluster for the
given distributed volume). This is shown in the example below
where we begin with the configuration shown in Figure 32 on
page 80 where there is distributed volumes which has Cluster 2
detaches set on the DR1.

Failure handling without VPLEX Witness (Static bias) 79


Foundations of VPLEX High Availability

Figure 32 Cluster 2 has bias

If site B had a total failure in this example disruption will now also
occur at site A as shown in Figure 33 on page 81.

80 EMC VPLEX Metro Witness Technology and High Availability


Foundations of VPLEX High Availability

Figure 33 Preferred site failure causes full Data Unavailability

As we can see the preferred site has now failed and bias rule has been
used, but since the rule is “static” and cannot distinguish between a
link failure or remote site failure we can see that in this example the
remaining site becomes suspended therefore in this case manual
intervention will be required to bring the volume on line at site A.
Static bias is a very powerful rule. It does provide zero RPO and zero
RTO resolution for non-preferred cluster failure and inter-cluster
partition scenarios. It completely avoids split brain manually and in
the presence of a preferred cluster failure providing non-zero RTO; it
is good to note that this feature is available without automation and
is a valuable failback when the Witness is unavailable or customer
infrastructure cannot accommodate.
However, what if there were a “mechanism” other than the standard
CLI intervention and provide a global view of failures in the Metro
environment in the previous example? VPLEX Witness has been
designed to overcome these scenarios since it can override the static
bias and leave what was the non preferred site ACTIVE.

Failure handling without VPLEX Witness (Static bias) 81


Foundations of VPLEX High Availability

82 EMC VPLEX Metro Witness Technology and High Availability


5
Introduction to VPLEX
Witness

This chapter explains VPLEX architecture and operation:


◆ VPLEX Witness overview and architecture ................................... 84
◆ VPLEX Witness target solution, rules, and best practice ............. 87
◆ VPLEX Witness failure semantics.................................................... 89
◆ CLI example outputs ......................................................................... 95

Introduction to VPLEX Witness 83


Introduction to VPLEX Witness

VPLEX Witness overview and architecture


VPLEX Metro 5.0 systems can now rely on a new component called
VPLEX Witness. VPLEX Witness is an optional component designed
to be deployed in customer environments where the regular bias rule
sets are insufficient to provide seamless zero or near-zero RTO
fail-over in the presence of site disasters and VPLEX cluster failures.
As described in the previous section, without VPLEX Witness, all
distributed volumes rely on configured rule set to identify the
preferred cluster in the presence of cluster partition or cluster/site
failure. However, if the preferred cluster happens to fail (in the result
of a disaster event, etc.), VPLEX is unable to automatically allow the
alternative cluster to continue I/O to the affected distributed
volumes. VPLEX Witness has been designed specifically overcome
this case.
An external VPLEX Witness Server is installed as a virtual machine
running on a customer supplied VMware ESX host deployed in a
failure domain separate from either of the VPLEX clusters (to
eliminate the possibility of a single fault affecting both the cluster and
the VPLEX Witness). VPLEX Witness connects to both VPLEX
clusters over the management IP network. By reconciling its own
observations with the information reported periodically by the
clusters, the VPLEX Witness enables the cluster(s) to distinguish
between inter-cluster network partition failures and cluster failures
and automatically resume I/O in these situations.
Figure 34 on page 85 shows a high level architecture of VPLEX
Witness and how it can augment an existing static bias solution.The
VPLEX Witness server resides in a fault domain separate from
VPLEX Cluster 1 and Cluster 2.

84 EMC VPLEX Metro Witness Technology and High Availability


Introduction to VPLEX Witness

Figure 34 High Level VPLEX Witness architecture

Since the VPLEX Witness server is external to both of the production


locations more perspective can be gained as to the nature of a
particular failure and the correct action taken since as mentioned
previously it is this perspective that is vital to be able to determine
between a site outage and a link outage as either one of these
scenarios requires a different action to be taken.
Figure 35 on page 86 shows a high-level circuit diagram of how the
VPLEX Witness Server should be connected.

VPLEX Witness overview and architecture 85


Introduction to VPLEX Witness

Figure 35 High Level VPLEX Witness deployment

As you can see the VPLEX Witness server is connected via the VPLEX
management IP network in a third failure or fault domain.
Depending on the scenarios that is to be protected against, this third
fault domain could reside in a different floor within the same
building as VPLEX Cluster 1 and Cluster 2. It can also be located in a
completely geographically dispersed data center which could be in a
different country.

Note: VPLEX Witness Server supports up to 1 second of network latency over


the management IP network.

Clearly with the example of the third floor in the building you would
not be protected from a total building failure so depending on the
requirement careful consideration should be given to choose this
third failure domain.

86 EMC VPLEX Metro Witness Technology and High Availability


Introduction to VPLEX Witness

VPLEX Witness target solution, rules, and best practice


VPLEX Witness is architecturally designed for intersite VPLEX
clusters. Customers who wish to use VPLEX Local will not require
VPLEX Witness functionality.
Furthermore VPLEX Witness is only suitable for customers who have
a third failure domain connected via two physical networks from
each of the data centers where the VPLEX clusters reside into each
VPLEX management station Ethernet port.
VPLEX Witness failure handling semantics only apply to Distributed
volumes in all consistency groups on a pair of VPLEX v5.x clusters if
VPLEX Witness is enabled.
VPLEX Witness failure handling semantics do not apply to:
◆ Local volumes
◆ Distributed volumes outside of a consistency group
◆ Distributed volumes within a consistency group if the VPLEX
Witness is disabled
At the time of writing only one VPLEX Witness Server can be
configured for a given Metro system and when it is configured and
enabled, its failure semantics applies to all configured consistency
groups.
Additionally a single VPLEX Witness Server (virtual machine) can
only support a single VPLEX Metro system (however more than one
VPLEX Witness Server can be configured onto a single physical ESX
host).
Figure 36 on page 88 shows the supported versions (at the time of
writing) for VPLEX Witness.

VPLEX Witness target solution, rules, and best practice 87


Introduction to VPLEX Witness

Figure 36 Supported VPLEX versions for VPLEX Witness

As mentioned in Figure 36, depending on the solution, VPLEX Static


bias alone without VPLEX Witness may still be relevant in some
cases. Figure 37 shows the volume types and rules which can be
supported with VPLEX Witness

Figure 37 VPLEX Witness volume types and rule support

Check the latest VPLEX ESSM (EMC simple support matrix) for the
latest information including VPLEX Witness server physical host
requirements and site qualification.

88 EMC VPLEX Metro Witness Technology and High Availability


Introduction to VPLEX Witness

VPLEX Witness failure semantics


As seen in the previous section VPLEX Witness will operate at the
consistency group level for a group of distributed devices and will
function in conjunction with the detach rule set within the
Consistency Group.
Starting with the inter-cluster link partition the next few pages
discuss failure scenarios (both site and link) which were raised in
previous sections and show how the failure semantics differ using
VPLEX Witness compared to just using static bias alone.
Figure 38 shows a typical setup for VPLEX 5.x with a single
distributed volume configured in a consistency group which has a
rule set configured for Cluster 2 detaches (such as Cluster 2 is
preferred). Additionally it shows the VPLEX Witness server is
connected via the management network in a third failure domain.

Figure 38 Typical VPLEX Witness configuration

VPLEX Witness failure semantics 89


Introduction to VPLEX Witness

If the inter-cluster link were to fail in this scenario VPLEX Witness


would still be able to communicate with both VPLEX clusters since
the management network that connects the VPLEX Witness server to
both of the VPLEX clusters is still operational. By communicating
with both VPLEX Clusters, the VPLEX Witness feature will deduce
that the inter-cluster link has failed since both VPLEX Clusters report
to the VPLEX Witness server that the connectivity with the remote
VPLEX cluster has been lost. (such as, Cluster 1 reports that Cluster 2
is unavailable and vice versa). This is shown in Figure 39.

Figure 39 VPLEX Witness and an inter-cluster link failure

In this case the clusters adhere to the pre-configured static bias rules
and volume access at Cluster 1 will be suspended since the rule set
was configured as Cluster 2 detaches. Figure 40 on page 91 shows the
final state after this failure.

90 EMC VPLEX Metro Witness Technology and High Availability


Introduction to VPLEX Witness

Figure 40 VPLEX Witness and static bias after cluster partition

The next example shows how VPLEX Witness can assist if we have a
site failure at the preferred site. As discussed above this type of
failure without VPLEX Witness would cause the volumes in the
remaining site to go offline. This is where VPLEX Witness greatly
improves the outcome of this event and remove the need for manual
intervention.
Figure 41 on page 92 shows a typical setup for VPLEX v5.x with a
distributed volume configured in a consistency group and has a rule
set configured for Cluster 2 detaches (such as, Cluster 2 wins).

VPLEX Witness failure semantics 91


Introduction to VPLEX Witness

Figure 41 VPLEX Witness typical configuration for Cluster 2 detaches

Figure 42 on page 93 shows that site B has now failed.

92 EMC VPLEX Metro Witness Technology and High Availability


Introduction to VPLEX Witness

Figure 42 VPLEX Witness diagram showing Cluster 2 failure

As we know from the previous section, when a site has failed then the
distributed volumes are now degraded, however unlike our previous
example where there was a site failure at the preferred site and the
static bias rule was used forcing volumes into a suspend state at
Cluster 1, VPLEX Witness will now observe that communication is
still possible to Cluster 1 (but not Cluster 2). Additionally since
Cluster 1 cannot contact Cluster 2, VPLEX Witness can make an
informed decision and instruct Cluster 1 to override the static rule set
and proceed with I/O.

VPLEX Witness failure semantics 93


Introduction to VPLEX Witness

Figure 43 shows the outcome:

Figure 43 VPLEX Witness with static bias override

Clearly this is big improvement on the scenario where this happened


with just the static bias rule set but not using VPLEX Witness. Since
volumes had to be suspended at Cluster 1 previously there was no
way to tell the difference between a site failure or a link failure.
Refer to VPLEX Witness product documentation to fully understand
all other rules and states of the feature such as cluster isolation.

94 EMC VPLEX Metro Witness Technology and High Availability


Introduction to VPLEX Witness

CLI example outputs


On systems where VPLEX Witness is deployed and configured, the
VPLEX Witness CLI context appears under the root context as
"Cluster-Witness." By default, this context is hidden and will not be
visible until VPLEX Witness has been deployed by running the
"Cluster-Witness configure" command. Once the user deployed
VPLEX Witness, the VPLEX Witness CLI context becomes visible.
The CLI context typically displays the following information:
VPlexcli:/> cd cluster-witness/

VPlexcli:/cluster-witness> ls

Attributes:
Name Value
------------- -------------
admin-state enabled
private-ip-address 128.221.254.3
public-ip-address 10.31.25.45

Contexts:
components

VPlexcli:/cluster-witness> ll components/
/cluster-Witness/components:

Name ID Admin State Operational State Mgmt Connectivity


---------- -- ----------- ------------------- -----------------
cluster-1 1 enabled in-contact ok
cluster-2 2 enabled in-contact ok
server - enabled clusters-in-contact ok

VPlexcli:/cluster-Witness> ll components/*
/cluster-Witness/components/cluster-1:
Name Value
----------------------- ------------------------------------------------------
admin-state enabled
diagnostic INFO: Current state of cluster-1 is in-contact (last
state change: 0 days, 13056 secs ago; last message
from server: 0 days, 0 secs ago.)
id 1
management-connectivity ok
operational-state in-contact

/cluster-witness/components/cluster-2:
Name Value
----------------------- ------------------------------------------------------
admin-state enabled

CLI example outputs 95


Introduction to VPLEX Witness

diagnostic INFO: Current state of cluster-2 is in-contact (last


state change: 0 days, 13056 secs ago; last message
from server: 0 days, 0 secs ago.)
id 2
management-connectivity ok
operational-state in-contact

/cluster-Witness/components/server:
Name Value
----------------------- ------------------------------------------------------
admin-state enabled
diagnostic INFO: Current state is clusters-in-contact (last state
change: 0 days, 13056 secs ago.) (last time of
communication with cluster-2: 0 days, 0 secs ago.)
(last time of communication with cluster-1: 0 days, 0
secs ago.)
id -
management-connectivity ok
operational-state clusters-in-contact

Details of cluster-Witness CLI context attributes


On systems where Cluster Witness is deployed, the Cluster Witness
CLI context appears as cluster-Witness under the root context. By
default, the Cluster Witness context is an optional hidden context and
must be created with the cluster-Witness configure command after
Cluster Witness deployment.
See the VPLEX CLI Guide for more information on the
cluster-Witness configure command.
The cluster-Witness context includes the following sub-contexts:
/cluster-Witness/components
/cluster-Witness/components/cluster-1
/cluster-Witness/components/cluster-2
/cluster-Witness/components/server

Use the ls and ll commands to display VPLEX Witness status


information.
Use ll command to display status related to the VPLEX Witness
components on Cluster 1, Cluster 2, and the VPLEX Witness Server.
VPlexcli:/cluster-Witness> ls

96 EMC VPLEX Metro Witness Technology and High Availability


Introduction to VPLEX Witness

Attributes:
Name Value
------------- -------------
admin-state enabled
private-ip-address 128.221.254.3
public-ip-address 10.31.25.45
Contexts:
components

Table 4 Output from ls for brief VPLEX Witness status

Field name Description

admin-state This attribute identifies whether VPLEX Witness functionality (as a whole) is enabled or disabled.
If VPLEX Witness functionality is enabled, the clusters send health observations to the VPLEX Witness
Server and the VPLEX Witness Server provides guidance to the clusters when the VPLEX Witness Server
observes inter-cluster partition and cluster failure/isolation scenarios.
If VPLEX Witness functionality is disabled, the clusters follow configured detach rule sets to allow or suspend
I/O to the distributed volumes in all consistency groups when inter-cluster partition or cluster failure/isolation
scenarios occur. When VPLEX Witness functionality is disabled, the communication of health observations
and guidance stops between the clusters and the VPLEX Witness Server. In this case, all distributed volumes
in all consistency groups leverage their pre-configured detach rule sets regardless of VPLEX Witness.
To determine the administrative state of individual components, refer to the admin-state attribute associated
with the individual component context.
This admin-state value at the top-level cluster-Witness context is one of the following:
unknown: There is partial management network connectivity between this Management Server and VPLEX
Witness components that are supposed to report their administrative state. To identify the component that is
unreachable over the management network, refer to the output of the individual component contexts.
enabled: All VPLEX Witness components are reachable over the management network and report their
administrative state as enabled.
disabled: All VPLEX Witness components are reachable over the management network and report their
administrative state as disabled.
inconsistent: All VPLEX Witness components are reachable over the management network but some
components report their administrative state as disabled while others report it as enabled. This should be an
extremely rare state, which may result from a potential but highly unlikely failure during enabling or disabling.
Please call EMC Customer Service if you see this state.

private- ip-address This read-only attribute identifies the private IP address of the VPLEX Witness Server VM (128.221.254.3)
that is used for VPLEX Witness-specific traffic.

public-ip-address This read-only attribute identifies the public IP address of the VPLEX Witness Server VM that is used as an
endpoint of the IPsec tunnel.

components This sub-context displays all the individual components of VPLEX Witness that include both VPLEX clusters
configured with VPLEX Witness functionality and the VPLEX Witness Server. Each sub-context displays
details for the corresponding individual component.

CLI example outputs 97


Introduction to VPLEX Witness

Use ll command to display status related to the VPLEX Witness


components on Cluster 1, Cluster 2, and the VPLEX Witness Server.
From the VPlexcli:/cluster-Witness> context, issue:

VPlexcli:/cluster-Witness> ll components/
/cluster-Witness/components:
Name ID Admin State Operational State Mgmt Connectivity
---------- -- ----------- ------------------- -----------------
cluster-1 1 enabled in-contact ok
cluster-2 2 enabled in-contact ok
server - enabled clusters-in-contact ok

Table 5 Output from ll command for brief VPLEX Witness component status
(page 1 of 2)

Field name Description

admin-state This field identifies whether the corresponding component is enabled or not. The supported values are:
enabled: VPLEX Witness functionality is enabled on this component
disabled: VPLEX Witness functionality is disabled on this component.
unknown: This component is not reachable and its administrative state cannot be determined.

diagnostic This is a diagnostic string is generated by CLI based on the analysis of the data and state information
reported by the corresponding component.

id The cluster-id for the cluster components. The VPLEX CLI ignores this field for the VPLEX Witness Server
and reports the value as a dash “-”.

management- This field displays the communication status to the VPLEX Witness component from the local CLI session
connectivity over the management network.
The possible values are:
ok: The component is reachable
failed: The component is not reachable

98 EMC VPLEX Metro Witness Technology and High Availability


Introduction to VPLEX Witness

Table 5 Output from ll command for brief VPLEX Witness component status
(page 2 of 2)

operational-state This field represents the operational state of the corresponding server component. The clusters-in-contact
(server component) state is the only healthy state. All other states indicate a problem.
clusters-in-contact: According to the latest data reported by each of clusters, both clusters are in contact
with each other over the inter-cluster network.
cluster-partition: According to VPLEX Witness Server observations, the clusters partitioned from each
other over the inter-cluster network, while the VPLEX Witness Server could still talk to each of them.
cluster-unreachable: According to VPLEX Witness Server observations, one cluster has either failed or
become isolated (that is partitioned from its peer cluster and disconnected from the VPLEX Witness
Server).
unknown: VPLEX Witness Server does not know the states of one or both of the clusters and needs to
learn them before it can start making decisions. VPLEX Witness Server assumes this state upon startup.
When the server operational state is set to "cluster-partition" or "cluster-unreachable", this operational state
may not necessarily reflect the current observation of the VPLEX Witness Server. After VPLEX Witness
Server transitions to this state and provides guidance to both clusters, it stays in this state regardless of
more recent observations until it observes complete recovery of the clusters and their inter-cluster
connectivity. (This prevents split brain.)
The VPLEX Witness Server state and the guidance that it provides to the clusters based on its state is
sticky in a sense that if VPLEX Witness Server observes a failure (changes its state and provides guidance
to the clusters), the VPLEX Witness Server will maintain this state even if current observations change.
VPLEX Witness Server will maintain its failure state and guidance until both cluster and their connectivity
fully recover. This policy is implemented in order to avoid potential Data Corruption scenarios due to
possible split brain.

operational-state This field represents the operational state of the corresponding cluster component.
(cluster component) in-contact: This cluster is in contact with its peer over the inter-cluster network. Rebuilds may be in
progress. Subject to other system-wide restrictions, I/O to all distributed volumes in all consistency groups
is allowed from VPLEX Witness’ perspective.
cluster-partition: This cluster is not in contact with its peer and VPLEX Witness Server declared that two
clusters partitioned. Subject to other system-wide restrictions, I/O to all distributed volumes in all
consistency groups is allowed from VPLEX Witness’ perspective.
remote-cluster-isolated-or-dead: This cluster is not in contact with its peer and the VPLEX Witness
Server declared that the remote cluster (i.e. the peer) was isolated or dead. Subject to other system-wide
restrictions, I/O to all distributed volumes in all consistency groups is allowed from VPLEX Witness’
perspective.
local-cluster-isolated: This cluster is not in contact with its peer and the VPLEX Witness Server declared
that the remote cluster (i.e. the peer) as the only proceeding cluster. This cluster must suspend I/O to all
distributed volumes in all consistency groups regardless of bias.
unknown: This cluster is not in contact with its peer over the inter-cluster network and is awaiting guidance
from the VPLEX Witness Server. I/O to all distributed volumes in all consistency groups is suspended
regardless of bias.

CLI example outputs 99


Introduction to VPLEX Witness

VPLEX Witness Cluster As discussed in the previous section we can see that deploying a
isolation semantics VPLEX solution with VPLEX Witness will give continuous
and dual failures availability to the storage volumes regardless of there being a site
failure or inter-cluster link failure. These types of failure are deemed
single component failures and we have shown no single point of
failure can induce data unavailability using the VPLEX Witness.
It should be noted, however, that in rare situations more than one
fault or component outage can occur especially when considering
inter-cluster communication links which if two failed at once would
lead to a VPLEX cluster isolation at a given site.
For instance, if we consider a typical VPLEX Setup with VPLEX
Witness we will automatically have three failure domains (let’s call
then A, B & C where VPLEX Cluster 1 resides at A, VPLEX Cluster 2
at B and the VPLEX Witness server resides at C). In this case there
will be in inter cluster link between A and B (Cluster 1 and 2), plus a
management IP link between A and C as well as a management IP
link between B and C effectively giving a triangulated topology.
In rare situations there is a chance that if any two of these three links
fail then one of the sites will be isolated (cut off).
Due to the nature of VPLEX Witness, these types of isolation can also
be dealt with effectively without manual intervention.
This is achieved since a site isolation is very similar in terms of
technical behavior to a full site outage the main difference being that
the isolated site is still fully operational and powered up (but needs
to be forced into I/O suspension) unlike a site failure where the failed
site is not operational.
In these cases the failure semantics and VPLEX Witness are
effectively the same however two further actions are taken at the site
that becomes isolated:
◆ I/O is shut off/suspended at the isolated site.
◆ The VPLEX cluster will attempt to call home.

100 EMC VPLEX Metro Witness Technology and High Availability


Introduction to VPLEX Witness

Figure 44 shows the three scenarios that are described above:

Figure 44 Possible dual failure cluster isolation scenarios

As discussed previously, it is extremely rare to experience a double


failure and figure 44 showed how VPLEX can automatically ride
through isolation scenarios; however there are also some other
possible situations where a dual failure could occur and require
manual intervention at one of the VPLEX Clusters as VPLEX Witness
will not be able to distinguish the actual failure

Note: If best practices are followed then the likely hood of these scenarios
occurring are significantly less likely than even the rare isolation incidents
discussed above mainly as the faults would have to disrupt components in
totally different fault domains that would be spread over many miles.

CLI example outputs 101


Introduction to VPLEX Witness

Figure 45 shows three scenarios where a double failure would require


manual intervention to bring the remaining component online since
VPLEX Witness would not be able to determine the gravity of the
failure.

Figure 45 Highly unlikely dual failure scenarios that require manual intervention

VPLEX Witness – The importance of the third failure domain


As discussed in the previous section we now understand that dual
failures can occur but are highly unlikely. As also mentioned many
times within this TechBook, it is imperative that if VPLEX Witness is
to be deployed then the VPLEX Witness server component is installed
into a different failure domain than either of the two VPLEX clusters.

102 EMC VPLEX Metro Witness Technology and High Availability


Introduction to VPLEX Witness

Figure 46 shows two further dual failure scenarios where both a


VPLEX cluster has failed as well as the VPLEX Witness server.

Figure 46 Two further dual failure scenarios that would require manual
intervention

Again as before if best practice is followed and each component


resides within its own fault domain then these two situations are just
as unlikely as the previous three scenarios that required manual
intervention, however now consider what could happen if the
VPLEX Witness server was not deployed within a third failure
domain, but rather in the same domain as one of the VPLEX clusters.
This situation would mean that a single domain failure would
potentially induce a dual failure as two components may have been
residing in the same fault domain. This effectively turns a highly
unlikely scenario into a more probable single failure scenario and
should be avoided.
By deploying the VPLEX Witness server into a third failure domain
the dual failure risk is substantially lowered therefore manual
intervention would never be required since a fault would have to
disable more than one dissimilar component potentially hundreds of
miles apart spread over different fault domains.

CLI example outputs 103


Introduction to VPLEX Witness

104 EMC VPLEX Metro Witness Technology and High Availability


6
Combining VPLEX High
Availability and VPLEX
Witness

This chapter explains VPLEX architecture and operation:


◆ Metro HA overview......................................................................... 106
◆ VPLEX Metro HA with Cross-Cluster Connect........................... 107
◆ VPLEX Metro HA without Cross-Cluster Connect..................... 116

Combining VPLEX High Availability and VPLEX Witness 105


Combining VPLEX High Availability and VPLEX Witness

Metro HA overview
From a technical perspective VPLEX Metro HA solutions are
effectively three new flavors of reference architecture which utilize
the new VPLEX Witness feature in VPLEX v5.0 and therefore greatly
enhance an overall solutions ability to tolerate component failure
causing less or no disruption than legacy solutions with little or no
human intervention over either Cross-Cluster or Metro distances
The two main architecture types enabled by VPLEX Witness feature
are:
◆ Metro HA with Cross-Cluster Connect defined as those clusters
that are within limitations of host ISL cross connectivity.
◆ Metro HA with distances higher than the limitations of ISL cross
connectivity.
This section will look at each of these solutions in turn and show how
value can be derived by stepping through the different failure
scenarios.

106 EMC VPLEX Metro Witness Technology and High Availability


Combining VPLEX High Availability and VPLEX Witness

VPLEX Metro HA with Cross-Cluster Connect


VPLEX Metro HA Cross-Cluster Connect can be deployed when two
sites are within campus distance of each other (up to 1ms round trip
latency). A VPLEX Metro distributed volume can then be deployed
across the two sites using a cross connect front end configuration and
a VPLEX Witness server installed within a different fault domain.
Figure 47 shows a high level schematic of a Metro HA Cross-Cluster
Connect solution for VMware.

Figure 47 High-level diagram of a Metro HA Cross-Cluster Connect solution for


VMware

The key benefit to this solution and can eliminate in most cases RTO
altogether if objects or components were to fail.

VPLEX Metro HA with Cross-Cluster Connect 107


Combining VPLEX High Availability and VPLEX Witness

Failure scenarios
Although the following VPLEX Metro HA environments are
compatible with multiple cluster technologies including HyperV and
Microsoft Cluster Services, we will assume for these failure scenarios
that vSphere 4.1 or higher is configured in a stretched HA topology
with DRS so that all of the physical hosts (ESX servers) are within the
same HA cluster.
As discussed previously this type of configuration brings the ability
to teleport virtual machine’s over distance which is extremely useful
in disaster avoidance, load balancing and cloud infrastructure use
cases all using out of the box features and functions, however
additional value can be derived from deploying the VPLEX Metro
HA Cross-Cluster Connect solution to ensure total availability.
Figure 48 on page 109 shows the topology of an Metro HA
Cross-Cluster Connect environment divided up into logical fault
domains. The following sections will demonstrate the recovery
automation for a single failure within any of these domains and show
how no single fault in any domain can take down the system as a
whole, and in most cases without even an interruption of service.

108 EMC VPLEX Metro Witness Technology and High Availability


Combining VPLEX High Availability and VPLEX Witness

Figure 48 Metro HA Cross-Cluster Connect diagram with failure domains

VPLEX Metro HA with Cross-Cluster Connect 109


Combining VPLEX High Availability and VPLEX Witness

If a physical host failure were to occur in either domain A1 or B1 the


VMware HA cluster would restart the affected virtual machine’s on
the remaining ESX servers.
Figure 49 shows all physical ESX hosts failing in domain A1. Since all
of the physical hosts in domain B1 are connected to the same
datastores via the VPLEX Metro Distributed device VMware HA can
restart the virtual machines on any of the physical ESX hosts in
domain B1.

Figure 49 Metro HA Cross-Cluster Connect diagram with disaster in zone A1

The next example describes what will happen in the unlikely event
that a VPLEX cluster was to fail in either domain A2 or B2. Examples
of how this could happen would include power outage, flood or fire.
In this instance there would be no interruption of service to any of the
virtual machines.

110 EMC VPLEX Metro Witness Technology and High Availability


Combining VPLEX High Availability and VPLEX Witness

Figure 50 shows a full VPLEX cluster outage in domain A2. As you


can see from the graphic since the ESX servers are cross connected to
both VPLEX clusters in each site VMware will simply re-route the
I/O to the alternate path which is still available since VPLEX is
configured with a VPLEX Witness protected distributed volume
which will ensure the distributed volume will remain online in
domain B2 as the VPLEX Witness Server will observe that it cannot
communicate with the VPLEX cluster in A2 and guide the VPLEX
cluster in B2 to remain online as this also cannot communicate with
A2 therefore meaning that A2 is either isolated or failed.

Note: Similarly in the event of a full isolation at A2 then the distributed


volumes would simply suspend since communication would not be possible
to either the VPLEX Witness Server or the VPLEX cluster in domain B2. In
this case the outcome is identical from a VMware perspective and there will
be no interruption.

Figure 50 Metro HA Cross-Cluster Connect diagram with failure in zone A2

VPLEX Metro HA with Cross-Cluster Connect 111


Combining VPLEX High Availability and VPLEX Witness

The next example describes what will happen in the event of a failure
to one (or all of) the back end storage arrays in either domain A3 or
B3.
Again in this instance there would be no interruption to any of the
virtual machines.
Figure 51 shows the failure to all storage arrays that reside in domain
A3. Since a cache coherent VPLEX Metro distributed volume is
configured between domains A2 and B2 IO can continue to be
actively serviced from the VPLEX in A2 even though the local back
end storage has failed. This is due to the embedded VPLEX cache
coherency which will efficiently cache any reads into the A2 domain
whilst also propagating writes to the back end storage in domain B3
via the remote VPLEX cluster in site B2.

Figure 51 Metro HA Cross-Cluster Connect diagram with failure in zone A3 or B3

112 EMC VPLEX Metro Witness Technology and High Availability


Combining VPLEX High Availability and VPLEX Witness

The next example describes what will happen in the event of a


VPLEX Witness server failure in domain C1.
Again in this instance there would be no interruption to any of the
virtual machines or VPLEX clusters.
Figure 52 shows a complete failure to domain C3 where the VPLEX
Witness server resides. Since the VPLEX Witness in not within the
I/O path and is only an optional component I/O will actively
continue for any distributed volume in domains A2 and B2 since the
inter-cluster link is still available therefore meaning cache coherency
can be maintained between the VPLEX cluster domains.
Although the service is uninterrupted, both VPLEX clusters will now
dial home and remote they have lost communication with the VPLEX
Witness Server. In this case, the system is in jeopardy of a DU should
the cluster failure in the inter-cluster network partition happen while
the Witness is down. The Witness may be disabled manually if
necessary to invoke static bias rules should it be disabled long-term.

Figure 52 Metro HA Cross-Cluster Connect diagram with failure in zone C1

VPLEX Metro HA with Cross-Cluster Connect 113


Combining VPLEX High Availability and VPLEX Witness

The next example describes what will happen in the event of a failure
to the inter-cluster link between domains A2 and B2.
Again in this instance there would be no interruption to any of the
virtual machines or VPLEX clusters.
Figure 53 on page 115 shows the inter-cluster link has failed between
domains A2 and B2. In this instance the static bias rule set which was
defined previously will be invoked since neither VPLEX cluster can
communicate with the other VPLEX cluster (but the VPLEX Witness
Server can communicate with both VPLEX Clusters) therefore access
to the given distributed volume within one of the domains A2 or B2
will be suspended. Since in this example there are alternate paths still
available to the remote VPLEX cluster where the volume remains
online VMware will simply re-route the traffic to the alternate VPLEX
cluster therefore the virtual machine will remain online and
unaffected whichever site it was running on.

Note: It is plausible in this example that the alternate path is physically


routing across the same ISL that has failed. In this instance there could be a
small interruption if a virtual machine was running is A1 as it will be
restarted in B1 since the alternate path is also dead.

114 EMC VPLEX Metro Witness Technology and High Availability


Combining VPLEX High Availability and VPLEX Witness

Figure 53 Metro HA Cross-Cluster Connect diagram with intersite link failure

VPLEX Metro HA with Cross-Cluster Connect 115


Combining VPLEX High Availability and VPLEX Witness

VPLEX Metro HA without Cross-Cluster Connect


VPLEX Metro HA without Cross-Cluster connection deployment is
very similar to Metro HA Cross-Cluster connect deployment as
mentioned in the previous section however this solution is designed
to cover distances beyond the campus range (i.e. campus would be
used for latencies of up to 1ms round trip) and into distances of a
metropolitan range where round trip latency would be around 5 ms
but does not exceed 10ms (assuming the application is tolerant to
this). A VPLEX Metro distributed volume can then be deployed
across the two sites as well as deploying a VPLEX Witness server
within a different third failure/fault domain.
Figure 54 shows a high level schematic of an Metro HA solution for
VMware. without the Cross-Cluster deployment.

Figure 54 Metro HA Standard High-level diagram

116 EMC VPLEX Metro Witness Technology and High Availability


Combining VPLEX High Availability and VPLEX Witness

The key benefit to this solution is a significant reduction and in some


cases the elimination of RTO altogether if objects or components were
to fail.

Failure scenarios
Again for this section we will assume for these failure scenarios that
vSphere 4.1 or higher is configured in a stretched HA topology so
that all of the physical hosts at either site (ESX servers) are within the
same HA cluster. Also, as with the previous section deploying a
stretched VMware configuration with Metro HA, it is also possible to
enable long distance virtual machine teleportation since the virtual
machine datastores still reside on a VPLEX Metro distributed
volume.
Figure 55 shows the topology of an Metro HA environment divided
up into logical fault domains. The next section will demonstrate the
recovery automation for a single failure within any of these domains.

Figure 55 Metro HA high-level diagram with fault domains

VPLEX Metro HA without Cross-Cluster Connect 117


Combining VPLEX High Availability and VPLEX Witness

The following example describes what will happen in the unlikely


event that a VPLEX cluster was to fail in either domain A2 or B2. In
this instance there would no interruption of service to any virtual
machine’s running in domain B1, however any virtual machine’s that
were running in domain A1 would see a minor interruption as the
virtual machine’s are restarted at B1.
Figure 56 on page 119 shows a full VPLEX cluster outage in domain
A2. As you can see from the graphic since the ESX servers are not
cross zoned/presented to both VPLEX clusters in each site VMware
will have to perform a HA restart for the virtual machines within
domain A2. It can do this since the distributed volumes will remain
active at B2 as the VPLEX is configured with VPLEX Witness
protected distributed volume which will deduce that the domain A2
is unavailable (since the neither the VPLEX Witness Server of the
VPLEX cluster in B2 can communicate with the VPLEX cluster in A2
therefore VPLEX Witness will guide the VPLEX cluster in B2 to
remain online).

118 EMC VPLEX Metro Witness Technology and High Availability


Combining VPLEX High Availability and VPLEX Witness

Figure 56 Metro HA high-level diagram with failure in domain A2

The next example describes what will happen in the event of a failure
to the inter-cluster link between domains A2 and B2.
One of two outcome of this scenario will happen:
◆ If the static bias for a given distributed volume was set to Cluster
1 detaches (assuming Cluster 1 resides in domain A2) and the
virtual machine was running at the same site where the volume
remains online (aka the preferred site) then there is no
interruption to service.
◆ If the static bias for a given distributed volume was set to Cluster
1 detaches (assuming Cluster 1 resides in domain A2) and the
virtual machine was running at the remote site (Domain B1) then
the virtual machine’s storage will be in the suspended state.
Most guest operating systems will fail in this case, allowing the

VPLEX Metro HA without Cross-Cluster Connect 119


Combining VPLEX High Availability and VPLEX Witness

virtual machine to be restarted in domain A1 after a small


amount of disruption. However, it is possible with vSphere
4.0/4.1 that the guest OS will simply hang and VMware HA will
not be prompted to restart it.

Note: Though it is beyond the scope of this TechBook, to avoid any


disruption, VMware DRS host affinity rules can be used to ensure that virtual
machines are always running in their preferred location – the location that
the storage they rely on is biased towards.

Figure 57 shows the inter-cluster link has failed between domains A2


and B2 In this instance the static bias rule set which was defined as
Cluster 1 detaches previously will be invoked since neither VPLEX
cluster can communicate with the other VPLEX cluster (but the
VPLEX Witness Server can communicate with both VPLEX clusters)
therefore access to the given distributed volume within the domains
B2 will be suspended for the given distributed volume whilst
remaining active at A2.
Therefore, virtual machines that were running at A1 will be
uninterrupted and virtual machine’s that were running at B1 will be
restarted at A1.

120 EMC VPLEX Metro Witness Technology and High Availability


Combining VPLEX High Availability and VPLEX Witness

Figure 57 Metro HA high-level diagram with intersite failure

The remaining failure scenarios with this solution are identical to the
previously discussed VPLEX Metro HA Cross-Cluster Connect
solutions. For failure handling in domains A1, B1, A3, B3 or C, see
“VPLEX Metro HA with Cross-Cluster Connect” on page 107.

VPLEX Metro HA without Cross-Cluster Connect 121


Combining VPLEX High Availability and VPLEX Witness

122 EMC VPLEX Metro Witness Technology and High Availability


7

Conclusion

This chapter provides a VPLEX conclusion:


◆ Conclusion ........................................................................................ 124

Conclusion 123
Conclusion

Conclusion
As outlined in this book, using VPLEX AccessAnywhereTM
technology in combination with High Availability and VPLEX
Witness, storage administrators and data center managers will be
able to provide absolute physical and logical high availability for
their organizations’ mission critical applications with less resource
overhead and dependency on manual intervention. Increasingly,
those mission critical applications are virtualized and in most cases
using VMware vSphere or Microsoft Hyper-V “virtual machine”
technologies. It is expected that VPLEX customers use the HA /
VPLEX Witness solution to incorporate several application-specific
clustering and virtualization technologies to provide HA benefits for
targeted mission critical applications.
As described, the storage administrator is provided with two specific
VPLEX Metro-based solutions around High Availability as outlined
specifically for VMware ESX 4.1 or higher as integrated into the
VPLEX Metro HA Cross-Cluster Connect and Metro environments.
VPLEX Metro HA Cross-Cluster Connect provides a slightly higher
level of HA than the VPLEX Metro HA deployment without
Cross-Cluster connectivity however it is limited to in-data center use
or cases where the network latency between data centers is
negligible.
Both solutions are ideal for customers who are not only currently or
planning on becoming highly virtualized but are looking for the
following:
◆ Elimination of the “night shift” storage and server administrator
positions. To accomplish this, they must be comfortable that their
applications will ride through any failures that happen during
the night.
◆ Reduction of capital expenditures by moving from an
active/passive data center replication model to a fully active
highly available data center model.
◆ Increase application availability by protecting against flood and
fire disasters that could affect their entire data center.
From a holistic view of both types of solutions and what it provides
the storage administrator, the following benefits are in common with
variances. What EMC VPLEX technology with Witness provides to
consumers are as follows:

124 EMC VPLEX Metro Witness Technology and High Availability


Conclusion

Better protection from storage-related failures


Within a data center, applications are typically protected against
storage-related failures through the use of multipathing software
such as EMC PowerPath™. This allows applications to ride through
HBA failures, switch failures, cable failures, or storage array
controller failures by routing I/O around the location of the failure.
The VPLEX Metro HA Cross-Cluster Connect solution extends this
protection to the rack and/or data center level by multipathing
between VPLEX clusters in independent failure domains. The VPLEX
Metro HA solution adds to this the ability to restart the application in
the other data center in case no alternative route for the I/O exists in
its current data center. As an example, if a fire where to affect an
entire VPLEX rack, the application could be restarted in the backup
data center automatically.This provides customers a much higher
level of availability and lower level of risk.

Protection from a larger array of possible failures


To highlight advantages of VPLEX Witness functionality, let’s recall
how VMware HA operates.
VMware HA and other offerings provides automatic restart of virtual
machines (applications) in the event of virtual machine failure for any
reason (server failure, failed connection to storage, etc.). This restart
involves a complete boot-up of the virtual machine’s guest operating
system and applications. While VM failure leads to an outage, the
recovery from that failure is usually automatic.
VMware FT (Fault Tolerance) provides an additional level of
protection by maintaining a “shadow VM” that matches the precise
state of the primary VM. If the primary VM should fail, the shadow
VM can take over immediately and without any significant
disruption to the application.VMware HA, on its own, provides
protection from server failures within a data center.
When combined with VPLEX in the Metro HA configuration, it
provides the same level of protection for data center scale disaster
scenarios.

Conclusion 125
Conclusion

Greater overall resource utilization


Using the same point of view of server virtualization based products
and their recovery capabilities, turning over to utlization, VMware
DRS (Distributed Resource Scheduler) can automatically move
applications between servers in order to balance their computational
and memory load over all the available servers. Within a data center,
this has increased server utilization because administrators no longer
need to size individual servers to the applications that will run on
them. Instead, they can size the entire data center to the suite of
applications that will run within it.
By adding HA configuration (Metro and Campus), the available pool
of server resources now covers both the primary and backup data
centers. Both can actively be used and excess compute capacity in one
data center can be used to satisfy new demands in the other.
Alternative Vendor Solutions:
◆ Microsoft Hyper-V Server 2008 R2 with Performance and
Resource Optimization (PRO)
Overall, as data centers continue their expected growth patterns and
storage administrators struggle to expand capacity and consolidate at
the same time, by introducing EMC VPLEX they can reduce several
areas of concern. To recap, these areas are:
◆ Hardware and component failures impacting data consistency
◆ System integrity
◆ High availability without manual intervention
◆ Witness to protect the entire highly available system
In reality, by reducing inter-site overhead and dependencies on
disaster recovery, administrators can depend on VPLEX to guarantee
that their data is available at anytime while the beepers and cell
phones are silenced.

126 EMC VPLEX Metro Witness Technology and High Availability


Glossary

This glossary contains terms related to VPLEX federated storage


systems. Many of these terms are used in this manual.

A
AccessAnywhere The breakthrough technology that enables VPLEX clusters to provide
access to information between clusters that are separated by distance.

active/active A cluster with no primary or standby servers, because all servers can
run applications and interchangeably act as backup for one another.

active/passive A powered component that is ready to operate upon the failure of a


primary component.

array A collection of disk drives where user data and parity data may be
stored. Devices can consist of some or all of the drives within an
array.

asynchronous Describes objects or events that are not coordinated in time. A process
operates independently of other processes, being initiated and left for
another task before being acknowledged.
For example, a host writes data to the blades and then begins other
work while the data is transferred to a local disk and across the WAN
asynchronously. See also ”synchronous.”

EMC VPLEX Metro Witness Technology and High Availability 127


Glossary

B
bandwidth The range of transmission frequencies a network can accommodate,
expressed as the difference between the highest and lowest
frequencies of a transmission cycle. High bandwidth allows fast or
high-volume transmissions.

bias When a cluster has the bias for a given DR1 it will remain online if
connectivity is lost to the remote cluster (in some cases this may get
over ruled by VPLEX Cluster Witness)

bit A unit of information that has a binary digit value of either 0 or 1.

block The smallest amount of data that can be transferred following SCSI
standards, which is traditionally 512 bytes. Virtual volumes are
presented to users as a contiguous lists of blocks.

block size The actual size of a block on a device.

byte Memory space used to store eight bits of data.

C
cache Temporary storage for recent writes and recently accessed data. Disk
data is read through the cache so that subsequent read references are
found in the cache.

cache coherency Managing the cache so data is not lost, corrupted, or overwritten.
With multiple processors, data blocks may have several copies, one in
the main memory and one in each of the cache memories. Cache
coherency propagates the blocks of multiple users throughout the
system in a timely fashion, ensuring the data blocks do not have
inconsistent versions in the different processors caches.

cluster Two or more VPLEX directors forming a single fault-tolerant cluster,


deployed as one to four engines.

cluster ID The identifier for each cluster in a multi-cluster deployment. The ID


is assigned during installation.

cluster deployment ID A numerical cluster identifier, unique within a VPLEX cluster. By


default, VPLEX clusters have a cluster deployment ID of 1. For
multi-cluster deployments, all but one cluster must be reconfigured
to have different cluster deployment IDs.

128 EMC VPLEX Metro Witness Technology and High Availability


Glossary

clustering Using two or more computers to function together as a single entity.


Benefits include fault tolerance and load balancing, which increases
reliability and up time.

COM The intra-cluster communication (Fibre Channel). The


communication used for cache coherency and replication traffic.

command line A way to interact with a computer operating system or software by


interface (CLI) typing commands to perform specific tasks.

continuity of The goal of establishing policies and procedures to be used during an


operations (COOP) emergency, including the ability to process, store, and transmit data
before and after.

controller A device that controls the transfer of data to and from a computer and
a peripheral device.

D
data sharing The ability to share access to the same data with multiple servers
regardless of time and location.

detach rule A rule set applied to a DR1 to declare a winning and a losing cluster
in the event of a failure.

device A combination of one or more extents to which you add specific


RAID properties. Devices use storage from one cluster only;
distributed devices use storage from both clusters in a multi-cluster
plex. See also ”distributed device.”

director A CPU module that runs GeoSynchrony, the core VPLEX software.
There are two directors in each engine, and each has dedicated
resources and is capable of functioning independently.

dirty data The write-specific data stored in the cache memory that has yet to be
written to disk.

disaster recovery (DR) The ability to restart system operations after an error, preventing data
loss.

disk cache A section of RAM that provides cache between the disk and the CPU.
RAMs access time is significantly faster than disk access time;
therefore, a disk-caching program enables the computer to operate
faster by placing recently accessed data in the disk cache.

EMC VPLEX Metro Witness Technology and High Availability 129


Glossary

distributed device A RAID 1 device whose mirrors are in Geographically separate


locations.

distributed file system Supports the sharing of files and resources in the form of persistent
(DFS) storage over a network.

Distributed RAID1 A cache coherent VPLEX Metro or Geo volume that is distributed
device (DR1) between two VPLEX Clusters

E
engine Enclosure that contains two directors, management modules, and
redundant power.

Ethernet A Local Area Network (LAN) protocol. Ethernet uses a bus topology,
meaning all devices are connected to a central cable, and supports
data transfer rates of between 10 megabits per second and 10 gigabits
per second. For example, 100 Base-T supports data transfer rates of
100 Mb/s.

event A log message that results from a significant action initiated by a user
or the system.

extent A slice (range of blocks) of a storage volume.

F
failover Automatically switching to a redundant or standby device, system,
or data path upon the failure or abnormal termination of the
currently active device, system, or data path.

fault domain A concept where each component of a HA solution is separated by a


logical or physical boundary so if a fault happens in one domain it
will not transfer to the other. The boundary can represent any item
which could fail (i.e. a separate power domain would mean that is
power would remain in the second domain if it failed in the first
domain).

fault tolerance Ability of a system to keep working in the event of hardware or


software failure, usually achieved by duplicating key system
components.

130 EMC VPLEX Metro Witness Technology and High Availability


Glossary

Fibre Channel (FC) A protocol for transmitting data between computer devices. Longer
distance requires the use of optical fiber; however, FC also works
using coaxial cable and ordinary telephone twisted pair media. Fibre
channel offers point-to-point, switched, and loop interfaces. Used
within a SAN to carry SCSI traffic.

field replaceable unit A unit or component of a system that can be replaced on site as
(FRU) opposed to returning the system to the manufacturer for repair.

firmware Software that is loaded on and runs from the flash ROM on the
VPLEX directors.

G
Geographically A system physically distributed across two or more Geographically
distributed system separated sites. The degree of distribution can vary widely, from
different locations on a campus or in a city to different continents.

Geoplex A DR1 device configured for VPLEX Geo

gigabit (Gb or Gbit) 1,073,741,824 (2^30) bits. Often rounded to 10^9.

gigabit Ethernet The version of Ethernet that supports data transfer rates of 1 Gigabit
per second.

gigabyte (GB) 1,073,741,824 (2^30) bytes. Often rounded to 10^9.

global file system A shared-storage cluster or distributed file system.


(GFS)
H
host bus adapter An I/O adapter that manages the transfer of information between the
(HBA) host computers bus and memory system. The adapter performs many
low-level interface functions automatically or with minimal processor
involvement to minimize the impact on the host processors
performance.

I
input/output (I/O) Any operation, program, or device that transfers data to or from a
computer.

internet Fibre Channel Connects Fibre Channel storage devices to SANs or the Internet in
protocol (iFCP) Geographically distributed systems using TCP.

EMC VPLEX Metro Witness Technology and High Availability 131


Glossary

intranet A network operating like the World Wide Web but with access
restricted to a limited group of authorized users.

internet small A protocol that allows commands to travel through IP networks,


computer system which carries data from storage units to servers anywhere in a
interface (iSCSI) computer network.

I/O (input/output) The transfer of data to or from a computer.

K
kilobit (Kb) 1,024 (2^10) bits. Often rounded to 10^3.

kilobyte (K or KB) 1,024 (2^10) bytes. Often rounded to 10^3.

L
latency Amount of time it requires to fulfill an I/O request.

load balancing Distributing the processing and communications activity evenly


across a system or network so no single device is overwhelmed. Load
balancing is especially important when the number of I/O requests
issued is unpredictable.

local area network A group of computers and associated devices that share a common
(LAN) communications line and typically share the resources of a single
processor or server within a small Geographic area.

logical unit number Used to identify SCSI devices, such as external hard drives,
(LUN) connected to a computer. Each device is assigned a LUN number
which serves as the device's unique address.

M
megabit (Mb) 1,048,576 (2^20) bits. Often rounded to 10^6.

megabyte (MB) 1,048,576 (2^20) bytes. Often rounded to 10^6.

metadata Data about data, such as data quality, content, and condition.

metavolume A storage volume used by the system that contains the metadata for
all the virtual volumes managed by the system. There is one
metadata storage volume per cluster.

132 EMC VPLEX Metro Witness Technology and High Availability


Glossary

Metro-Plex Two VPLEX Metro clusters connected within metro (synchronous)


distances, approximately 60 miles or 100 kilometers.

metroplex A DR1 device configured for VPLEX Metro

mirroring The writing of data to two or more disks simultaneously. If one of the
disk drives fails, the system can instantly switch to one of the other
disks without losing data or service. RAID 1 provides mirroring.

miss An operation where the cache is searched but does not contain the
data, so the data instead must be accessed from disk.

N
namespace A set of names recognized by a file system in which all names are
unique.

network System of computers, terminals, and databases connected by


communication lines.

network architecture Design of a network, including hardware, software, method of


connection, and the protocol used.

network-attached Storage elements connected directly to a network.


storage (NAS)

network partition When one site loses contact or communication with another site.

P
parity The even or odd number of 0s and 1s in binary code.

parity checking Checking for errors in binary data. Depending on whether the byte
has an even or odd number of bits, an extra 0 or 1 bit, called a parity
bit, is added to each byte in a transmission. The sender and receiver
agree on odd parity, even parity, or no parity. If they agree on even
parity, a parity bit is added that makes each byte even. If they agree
on odd parity, a parity bit is added that makes each byte odd. If the
data is transmitted incorrectly, the change in parity will reveal the
error.

partition A subdivision of a physical or virtual disk, which is a logical entity


only visible to the end user, not any of the devices.

EMC VPLEX Metro Witness Technology and High Availability 133


Glossary

plex A VPLEX single cluster.

R
RAID The use of two or more storage volumes to provide better
performance, error recovery, and fault tolerance.

RAID 0 A performance-orientated striped or dispersed data mapping


technique. Uniformly sized blocks of storage are assigned in regular
sequence to all of the arrays disks. Provides high I/O performance at
low inherent cost. No additional disks are required. The advantages
of RAID 0 are a very simple design and an ease of implementation.

RAID 1 Also called mirroring, this has been used longer than any other form
of RAID. It remains popular because of simplicity and a high level of
data availability. A mirrored array consists of two or more disks. Each
disk in a mirrored array holds an identical image of the user data.
RAID 1 has no striping. Read performance is improved since either
disk can be read at the same time. Write performance is lower than
single disk storage. Writes must be performed on all disks, or mirrors,
in the RAID 1. RAID 1 provides very good data reliability for
read-intensive applications.

RAID leg A copy of data, called a mirror, that is located at a user's current
location.

rebuild The process of reconstructing data onto a spare or replacement drive


after a drive failure. Data is reconstructed from the data on the
surviving disks, assuming mirroring has been employed.

redundancy The duplication of hardware and software components. In a


redundant system, if a component fails then a redundant component
takes over, allowing operations to continue without interruption.

reliability The ability of a system to recover lost data.

remote direct Allows computers within a network to exchange data using their
memory access main memories and without using the processor, cache, or operating
(RDMA) system of either computer.

Recovery Point The amount of data that can be lost before a given failure event.
Objective (RPO)

134 EMC VPLEX Metro Witness Technology and High Availability


Glossary

Recovery Time The amount of time the service takes to fully recover after a failure
Objective (RTO) event.

S
scalability Ability to easily change a system in size or configuration to suit
changing conditions, to grow with your needs.

simple network Monitors systems and devices in a network.


management
protocol (SNMP)

site ID The identifier for each cluster in a multi-cluster plex. By default, in a


non-Geographically distributed system the ID is 0. In a
Geographically distributed system, one clusters ID is 1, the next is 2,
and so on, each number identifying a physically separate cluster.
These identifiers are assigned during installation.

small computer A set of evolving ANSI standard electronic interfaces that allow
system interface personal computers to communicate faster and more flexibly than
(SCSI) previous interfaces with peripheral hardware such as disk drives,
tape drives, CD-ROM drives, printers, and scanners.

split brain Condition when a partitioned DR1 accepts writes from both clusters.

storage RTO The amount of time taken for the storage to be available after a failure
event (In all cases this will be a smaller time interval than the RTO
since the storage is a pre-requisite).

stripe depth The number of blocks of data stored contiguously on each storage
volume in a RAID 0 device.

striping A technique for spreading data over multiple disk drives. Disk
striping can speed up operations that retrieve data from disk storage.
Data is divided into units and distributed across the available disks.
RAID 0 provides disk striping.

storage area network A high-speed special purpose network or subnetwork that


(SAN) interconnects different kinds of data storage devices with associated
data servers on behalf of a larger network of users.

storage view A combination of registered initiators (hosts), front-end ports, and


virtual volumes, used to control a hosts access to storage.

EMC VPLEX Metro Witness Technology and High Availability 135


Glossary

storage volume A LUN exported from an array.

synchronous Describes objects or events that are coordinated in time. A process is


initiated and must be completed before another task is allowed to
begin.
For example, in banking two withdrawals from a checking account
that are started at the same time must not overlap; therefore, they are
processed synchronously. See also ”asynchronous.”

T
throughput 1. The number of bits, characters, or blocks passing through a data
communication system or portion of that system.
2. The maximum capacity of a communications channel or system.
3. A measure of the amount of work performed by a system over a
period of time. For example, the number of I/Os per day.

tool command A scripting language often used for rapid prototypes and scripted
language (TCL) applications.

transmission control The basic communication language or protocol used for traffic on a
protocol/Internet private network and the Internet.
protocol (TCP/IP)
U
uninterruptible power A power supply that includes a battery to maintain power in the
supply (UPS) event of a power failure.

universal unique A 64-bit number used to uniquely identify each VPLEX director. This
identifier (UUID) number is based on the hardware serial number assigned to each
director.

V
virtualization A layer of abstraction implemented in software that servers use to
divide available physical storage into storage volumes or virtual
volumes.

virtual volume A virtual volume looks like a contiguous volume, but can be
distributed over two or more storage volumes. Virtual volumes are
presented to hosts.

136 EMC VPLEX Metro Witness Technology and High Availability


Glossary

VPLEX Cluster Witness A new feature in VPLEX V5.x that can augment and improve upon
the failure handling semantics of Static Bias.

W
wide area network A Geographically dispersed telecommunications network. This term
(WAN) distinguishes a broader telecommunication structure from a local
area network (LAN).

world wide name A specific Fibre Channel Name Identifier that is unique worldwide
(WWN) and represented by a 64-bit unsigned binary value.

write-through mode A caching technique in which the completion of a write request is


communicated only after data is written to disk. This is almost
equivalent to non-cached systems, but with data protection.

EMC VPLEX Metro Witness Technology and High Availability 137


Glossary

138 EMC VPLEX Metro Witness Technology and High Availability

Вам также может понравиться