Вы находитесь на странице: 1из 8

 

Nutanix Tech Note


Data Protection and Disaster Recovery

Nutanix Virtual Computing Platform is engineered from the ground-up to


provide enterprise-grade availability for critical virtual machines and data.
This Tech Note covers the data protection and disaster recovery functionality
of the Virtual Computing Platform.
 

  ©2014 All Rights Reserved, Nutanix Corporation  


 

Nutanix Virtual Computing Platform

This Tech Note discusses the data protection and disaster recovery functionality in the
Nutanix Virtual Computing Platform. We also recommend reading the Nutanix Tech Note on
system reliability to learn more on about the resiliency features of the Virtual Computing
Platform including how hardware and disk failures are handled.

The Nutanix distributed software architecture runs a virtual storage controller (Controller VM
or CVM) on each Nutanix node or host on the Virtual Computing Platform, forming a
distributed system. All nodes actively work together to aggregate storage resources into a
single global pool that can be leveraged by all. The storage resources are managed by the
Nutanix Distributed File System (NDFS) to ensure that data and system integrity is preserved
in the event of node, disk or application or hypervisor software failure. NDFS also delivers
data protection and high availability functionality that keeps critical data and VMs protected
and applications running.

Starting with Snapshots

The foundation of the Nutanix data protection and disaster recovery functionality is the
concept of a VM-centric snapshot. To understand the advantage of Nutanix snapshot
functionality, it is important to understand the different types of snapshots available today.

A snapshot is an evolution of the traditional backup process. It is created when the storage
system creates a full or virtual copy of the metadata or the index of the stored data. This is
different from traditional backup solutions, which create separate copies of the stored data.
Because snapshots only need to copy the metadata or index at the time they are taken, they
can be near instantaneous, have little performance impact and require little incremental
space. IT organizations can take snapshot-based backups more frequently and improve
recovery point objective. Vendors such as CommVault and analysts such as ESG have
acknowledged the shift to snapshots as a viable option for backup and recovery.

However, it is important to note that not all snapshot implementations are created equal.
Each of the implementations has different storage requirements and pose different
restrictions on their use. The preferred implementation of snapshot is redirect-on-write
(ROW). In this method, any updates to existing protected data are redirected to a new
location. None of the existing data in snapshots needs to be copied or moved. As a result

  ©2014 All Rights Reserved, Nutanix Corporation  


 

ROW snapshots do not suffer the performance impact of the alternative copy-on-write
snapshot implementations. The performance impact for copy-on-write snapshots limits their
applicability for primary data.

Another consideration when implementing snapshots is the granularity of data that can be
protected. This determines the space overhead of the snapshots taken. Smaller block sizes
result in increased sharing of data between snapshots and greater space efficiency. With
large blocks, a change to a small portion of a block would create a full new block with mostly
duplicate data, causing the snapshot size to be much larger than the amount of data changed.

Snapshot granularity
 
  can impact storage
efficiency over time

Figure 1: Snapshot granularity of traditional infrastructure can result in significant amount of redundant data

The last aspect that needs to be considered for snapshot design is the unit of data that can
be protected and restored by the storage system. Traditional storage deployments typically
operate at the storage object or volume/LUN level with little to no understanding of what is
stored in those containers. In virtualized environment, this results in a simultaneous snapshot
of tens-to-hundreds of VMs, each with varying change rates. Consequently, it puts the burden
on the administrators to map the different VMs to the storage objects such as LUNs or
volumes. This results in additional steps and greater system complexity, especially when
recovering individual VMs. In the traditional approach, snapshot schedules can only be set at
a LUN or a volume level, leading to practices such as creating one LUN per VM as a
workaround in order to create individualized snapshot VM schedules.

An alternative to this method is taking a VM-centric approach to storage and data protection.
In this scenario, storage understands and operates at the virtual disk or VM-level. So
snapshots are taken at the VM-level and administrators can set schedules and retention
periods at the VM-level to meet service levels. Recovery is simple as administrators can
restore individual VMs without dealing with the underlying storage objects.

  ©2014 All Rights Reserved, Nutanix Corporation  


 

Snapshots on Nutanix

This brings us to the snapshot implementation on the Virtual Computing Platform. Nutanix
OS implements redirect-on-write, VM-granular snapshots. When a snapshot of a VM is initially
taken on the Nutanix Virtual Computing Platform, the system creates a read only zero-space
clone of the metadata (i.e. index to data) and makes the underlying VM data immutable or
read only; no VM data or virtual disks are actually copied or moved. The system creates a
read only copy of the VM that can be accessed similar to its active counterpart. Nutanix
snapshots take only a few seconds to create, eliminating application and VM backup
windows.

After a snapshot is taken and as the VM continues to run, any updates to existing data and
new writes are redirected. The original data in the snapshot remains unchanged and the
unchanged data is shared across the snapshots and active VM. The Virtual Computing
Platform handles this transparently so there is no change to how applications and the
virtualization stack accesses the VM.

From an efficiency standpoint, Nutanix snapshots can be taken with byte-level resolution. This
byte-incremental implementation means that only the changed data is captured between
successive snapshots. For even greater efficiency, all the data stored on the Virtual
Computing Platform including the snapshot can be compressed and deduplicated. Even
though individual deployment savings will vary with the specific workloads, average
deployments depending on the workload have seen anywhere from 25% to 75% reduction in
the amount of space needed.

 
 

Nutanix
snapshots have
byte-level
resolution

Figure 1: Nutanix snapshots are more efficient with byte-level granularity

The VM-granular snapshots can be set to be either crash consistent or VM-consistent and can
be scheduled on an hourly, daily, weekly or monthly basis depending on the Recovery Point

  ©2014 All Rights Reserved, Nutanix Corporation  


 

Objectives (RPO) and retention needs. The choice between taking crash-consistent or VM-
consistent snapshots should be based on recovery needs. Crash consistent snapshots are
instantaneous and are sufficient for workloads able to recover from operating system (OS) or
VM crashes. Stateless applications such as web-servers are best protected through crash
consistent snapshots. The alternative VM-consistent snapshots take advantage of host
framework and services such as Microsoft Volume Shadow Copy Service (VSS) to quiesce the
VM and supported applications; rendering them in to a known or consistent state. In the case
of VMware running Microsoft Windows guests, VSS support is provided with VMware tools
running in the guest OS. Using deep integration between Nutanix Virtual Computing
Platform and VMware vSphere, the VMware tools are called to quiesce the OS and supported
applications such as Microsoft Exchange and SQL Server before the Virtual Computing
Platform takes a VM-consistent snapshot of the VM.

Additionally, multiple VMs can be grouped together in a Nutanix protection domain enabling
them to be operated upon as a single entity with the same RPO. This is useful when trying to
protect complex applications such as Microsoft SQL Server-based applications or Microsoft
Exchange. The main advantage of using a protection domain approach of grouping VMs
versus the traditional SAN approach of consolidating different VMs on to a single LUN is VM
portability. VMs can be moved between different protection domains on a Nutanix Virtual
Computing Platform without the need for any data to be moved or copied. For traditional
SANs, changing a VM’s SLA will most likely require migrating the VM to another LUN or
volume.

Because of the unique NDFS design leveraging a


Keeping Data Optimized
shared nothing distributed approach to metadata,
Nutanix Virtual Computing Platform
there is no upper limit to the number of snapshots runs a distributed data
that can be taken with the Nutanix Virtual Computing management service in the
Platform. This scalable approach eliminates the need background. The MapReduce-
for separate storage systems for backup and long based service is responsible for
term archiving, as the VM snapshots are stored across executing tasks such as metadata
the entire cluster that makes up Nutanix Virtual optimization, garbage collection of
deleted VMs, data reduction,
Computing Platform.
tiering, consistency checking, and
rebalancing to optimize data across
Nutanix snapshot technology forms the basis of a
nodes and flash/disks with minimal
unique set of functionality and ecosystem for high
impact to performance.
availability and disaster-recovery. The first feature that
builds on the Nutanix snapshot capability is VM-granular cloning.

  ©2014 All Rights Reserved, Nutanix Corporation  


 

Cloning can be used for a variety of reasons including deployment and recovery. Integration
with the virtualization stack with functionality such as VMware vStorage APIs for Array
Integration (VAAI) and VMware View Composer API for Array Integration (VCAI) enables
administrators to simplify VM deployment using integrated cloning. For the purpose of this
document, the discussion will focus on recovering VMs.

The Virtual Computing Platform enables user-driven recovery of individual VMs from
snapshots. This is done by either replacing the existing active VM with the snapshot copy or
by creating a separate clone of a snapshot preserving the active VM. Depending on settings
of snapshot, the recovered VM will either be crash-consistent or VM-consistent upon recovery.

If needed, administrators can create a clone of a Nutanix VM-granular snapshot for the
purpose of recovering a single file without taking up additional space. Compared to a
traditional LUN/volume based approach, a VM-granular snapshot approach eliminates the
need for first recovering the storage object (LUN/volume) and then identifying and mounting
the VM, and recovering the file.

Extending Snapshots over the Network: Backup and Disaster Recovery

Nutanix VM-granular snapshots also make it possible to efficiently replicate individual virtual
machines from a primary Virtual Computing Platform to one or more secondary Nutanix
clusters. By supporting a fan-out and fan-in or multi-way model for replication, the Virtual
Computing Platform can create flexible multi-master virtualization environment for backup
and disaster recovery. Deployments supporting numerous remote and branch offices can
benefit from a flexible deployment model.

Figure 3: Multi-way protection domains make DR flexible

  ©2014 All Rights Reserved, Nutanix Corporation  


 

Since the software-defined replication functionality builds on VM-granular snapshots, policies


for replication are also set at the individual protection domain level rather than working at the
LUN/volume level. Only byte-level changes between snapshots of individual-VMs are sent
over the network to the remote cluster. NDFS also enables another host other than the one
serving IO on the active virtual disk in the cluster can do the work of calculating the changed
blocks; eliminating performance bottlenecks for critical VMs and their corresponding hosts.
So all nodes in the cluster participate in replication.

Host-based or storage VM-granular Replication


based replication with Nutanix
Figure 4: Eliminate bottlenecks by using all cluster resources for replication

To make the most out of WAN connectivity, the data can be deduplicated and compressed
before it is sent over the wire. First the fingerprint of changed blocks for individual VMs are
sent from the primary system to the different destinations. The different destination systems
report back with the unique blocks they need to create the destination, which is sent back by
the primary system. Deduplicating data over the wire can effectively cut the bandwidth
required by as much as 90% versus host-based full-copy backup solutions.

Nutanix VM-granular replication also makes it possible to create an affordable disaster


recovery solution. The converged compute and storage approach used by Virtual Computing
Platform along with the VM-centric approach to replication makes creating a disaster
recovery solution very simple. Using the protection domain concept, the groups of related
VMs can be replicated together and those VMs can be brought up on the secondary site with
a single command in case the primary site is down. Because the workloads are virtualized and
replication is not hardware dependent, the secondary site can be different from the primary
cluster. This is especially useful for remote sites deployment with centralized backup and
disaster recovery.

The Future is RESTful

Nutanix Virtual Computing Platform provides an exhaustive list of REST APIs accessed
through the Nutanix Prism management framework to various functions including around
data protection and disaster recovery. These APIs can be explored through the Nutanix Prism

  ©2014 All Rights Reserved, Nutanix Corporation  


 

API explorer. The REST APIs are the foundation for the Nutanix Prism management interface
and for failover run book automation through Nutanix Storage Replication Adapter (SRA) for
VMware Site Recovery Management (SRM).

Protected VMs Protected VMs


vCenter vCenter

SRM w ith Nutanix SRM w ith Nutanix


SRA SRA

REST APIs REST APIs


 

 
Controller Controller

  Nutanix
Primary Site Replication Secondary Site

Figure 5: Nutanix Prism APIs enable integration with VMware SRM

In addition to delivering the functionality needed by VMware SRM for replication and VM
management, using REST APIs the Nutanix SRA is able to tie directly to the user profile in
Nutanix cluster administration for authentication and authorization. Nutanix Prism APIs can
also be used to automate workflows using snapshots and replication for backup and disaster
through scripting languages, or workflow engines. The Prism APIs are also used to create an
automated run book for failover, automatically registering the VM at the DR site in VMware
vCenter and powering them on. For example, a custom script can be created using the Prism
APIs can trigger a Virtual Computing Platform to take and replicate a snapshot of the group
of critical VMs making up an order-entry system, based on the number of transactions being
executed.

Summary

With the increasing use of virtualization for critical workloads it is no longer optional to
deploy data protection and disaster recovery. With VM-granular snapshots and recovery, use
of policy-based protection domains, VM-granular replications, integration with the
virtualization layer (ex: VMware SRM), and the use of REST APIs, Nutanix Virtual Computing
Platform provides the functionality to backup critical data, protect applications, and survive
disasters efficiently and cost effectively.

  ©2014 All Rights Reserved, Nutanix Corporation  

Вам также может понравиться