Вы находитесь на странице: 1из 14

Datacore Test Lab

1.Basic Test Lab:


The Datacore Test Lab all runs inside virtualization environment using VMware
Workstation 10, with details as below.
This is network diagram:

Datacore system includes 2 Datacore servers running Windows Server 2012 R2 and
SANsymphony-V (latest & trial version). Datacore system is built using HighAvailability Pairs template as below:

2 Datacore servers will synchronize with each other and they use mirrored virtual
disks to keep 2 identical copies. Each Datacore server has 1 public NIC (connect to
external networks), 2 frontend NICs (connect to ESX hosts), 3 mirror teamed NICs
(connect to other datacore server), and 3 iSCSI teamed NICs (connect to iSCSI
storage device).

vSphere system includes 2 ESXi 5.5 hosts and 1 instance of vCenter 5.5. Each ESX
host has 2 iSCSI vmkernel ports (belonging to 2 separate IP networks), each
mapping with 1 physical NIC, to connect to Datacore servers. Since ESX vmkernel
iSCSI ports and Datacore frontend iSCSI ports belong to different subnets, I dont
configure port binding on ESX hosts.

Backend storage devices (Openfiler) use iSCSI, each has 3 NICs connecting to
Datacore server.

Datacore server combines local storage (SAS) and remote storage (iSCSI, Openfiler)
in a Disk Pool as below:

On each Datacore server, 2 virtual disks in Mirrored mode are created as below:

Virtual disk is up to date, connected (by ESX hosts) and mirror path is available, as
below:

From vSphere Client we can see that ESX hosts regconize the 2 virtual disks
presented by Datacore:

From above screenshot, we see that multipathing is controlled by NMP (vmware


Native Multi-Pathing plugin), but I dont know what does it mean with Hardware
Acceleration: Supported ? Maybe Datacore adds some advanced virtual hardware
functions which Openfiler doesnt have.
Below is active paths and path policy seen from vSphere Client:

Round Robin is used to load balance traffic over all active paths. LUN 0 can be
accessed (from this ESX host) via 4 active paths (corresponding to 2 Datacore
servers dc1 and dc2, each Datacore server listens on 2 iscsi frontend NICs). Because
LUN access is enabled via 2 datacore servers at the same time, seem that Datacore

supports failover and even Active/Active in this case ? Im also not sure what is the
difference between Active and Active (I/O) status from above screenshot, maybe
Active (I/O) means that the corresponding Datacore server (dc1) own the LUN (LUN
0) or it is the preferred/active Datacore server ?
Now I will shutdown Datacore server 1 (dc1, the one whose paths has Active I/O
status on screenshot above) to see if there is automatic failover. DC1 is now
shutdown. While dc1 is being shutdown, theres nothing abnormal seen from users
perspective, and I can run applications inside VM normally. Now we check the status
of paths in below screenshot:

As we expect, only Datacore server 2 (dc2) is currently online and present the LUN
with 2 active paths. Note the status is Active (I/O), not just Active, which may mean
that dc2 really own and present the LUN.
Now I turn on Datacore server 1 (dc1). One thing to note here is that when dc1 just
turns on, its data (stored on virtual disks) is not up to date, so it needs to
synchronize with dc2. DC1 reports that it needs Log Recovery and it temporarily
block access from hosts, as below:

The Log Recovery process will take some time, depending on the size of changed
data. Until synchronization is successfully, hosts cannot access the mentioned
virtual disk(s) on DC1, so here comes the risk: I wonder if DC2 is suddenly not
available while DC1 has not finished the Log Recovery process, then what
happens ? Datacore may then report a critical alert, LUN may not be available until
administrator intervenes correctly, and some data may be lost ! Btw, if the mirror
path is not available at this time (due to network connection issue or iSCSI target
not regconized ), the Log Recovery process will be pending until mirror path
comes back to work, which add more downtime.
One more thing I see from the lab is that the speed of Full Recovery is quite low,
even though changed data is very little:

As can be seen from above screenshot, virtual disk 1 from DC1 is still in sync with
DC2. The mirror link uses a NIC team (combination of 3 member NICs) at both ends,
but the recovery rate is quite slow, just about 10MB/s. It mean that it take a lot of
time before virtual disk 1 become up-to-date and available, and during this time
hosts access is blocked.
Regarding NIC aggregation, we have at least 2 choices: Windows NIC Teaming and
Windows MPIO (Multi Path I/O). In this lab I choose NIC Teaming for the mirror link
and dont see any improved performance (recovery rate is slow as seen above). I
tried using MPIO but I met problem: Windows iSCSI Initiator did not see iSCSI targets
(hosted by Datacore), so mirror path was unavailable. Next time I may try
configuring iSCSI from Datacore management console.

2.

Advanced Test Lab:

The aim of advanced Lab is to modify, upgrade and fine-tune the existing basic Lab
in order to increase functionality, availability, manageability and performance. For
example the following functions can be considered or configured: NIC load
balancing, LUN access, hardware acceleration, Continuous Data Protection,
Replication, Snapshot, backup integration, storage auto-tiering
I also want to make advanced lab similar to Arkema system as well as reproduce
Arkemas datacore settings and scenarios/events as much as possible. To do this I
need more detailed information about Arkemas implementation of Datacore.

Regarding the number of VMs per host, you already mentioned the ratio of VM/host
on your capacity report (average number is about 31 VMs per host). And here,
Datacore also recommend this ratio as it will impact datastore performance; please
read this paragraph from Datacore:
Large numbers of Virtual Machines running within a single Virtual Disk can
potentially
result in excessive SCSI reservation requests which, in turn, may lead to reservation
conflicts between VMware ESX Hosts sharing that same Virtual Disk. Too many
reservation
conflicts can then cause overall performance degradation while the VMware ESX
Hosts
resolve the conflicts between themselves. Excessive SCSI conflicts can be easily
fixed by
reducing the number of running Virtual Machines on a single Virtual Disk by moving
them
to other/new Virtual Disks.
Analyzing SCSI Reservation conflicts on VMware Infrastructure 3.x, vSphere 4.x and
vSphere 5.x (1005009)
http://kb.vmware.com/kb/1005009
For this reason, DataCore Software recommends no more than fifteen Virtual
Machines per
Virtual Disk. Note that this is only a recommendation - as each users requirements
will be
different - and should not be taken as a limitation of SANsymphony-V.
These steps below also help resolve potential sources of the reservation:

Increase the number of LUNs and try to limit the number of ESX hosts accessing the same
LUN.
Reduce the number of snapshots as they cause a lot of SCSI reservations.
Do not schedule backups (VCB or console based) in parallel from the same LUN.
Schedule antivirus or operating system updates outside normal business hours so that it does
not interfere with daily operations.
Try to reduce the number of virtual machines per LUN

Of course we only consider above workarounds if we have issue with datastore


performance. Datacores report says that Datacore can provide up to 500% Faster
performance, so I expect that improved performance is the biggest benefit we get
from Arkemas implementation of Datacore.
In my test lab I find that Datacore has issue with high availability, for example after
a successful failover Datacore system needs time to synchronize data and during
this in-sync time duration (Log Recovery) the Datacore system is no longer highly
available and hosts access is blocked. This is post-Failover problem and it does not
guarantee continuous high availability / failover of Datacore system. Im not sure
what is the case with Arkemas system.
In Arkemas system, replication/mirroring is also implemented at backend storagelevel (VNX), not just at datacore-level. But in my test lab, I use Openfiler 2.3 which
does not have advanced functions like cross-array replication; so I want to look for
another virtual storage solution which can provide this function, but I dont know so
far.
My question is that, Datacore servers have mirrored virtual disks which provide 2
identical copies of the same data, and which support failover without depending on
the backend storage devices failover/replication function, so Why do we need to
implement Replication at storage level (VNX) ?
In test lab I can apply NIC Teaming and MPIO to multiple iSCSI ports (iSCSI mirror
ports which connect 2 datacore servers, for example), and I wonder what can we do
to apply the similar things to multi Fiber Channel HBAs in Arkemas system ?
Enterprise-level storage system should support Active/Active mode (not just
Active/Standby failover) to maximize resource utilization and improve performance,
but background information regarding this topic is quite complex. Datacore can
manage if one or all datacore servers can be Preferred Server(s), or it just let the
hosts decide; this also depend on if ALUA is enabled or not. Not sure what is
deployed in Arkemas system, but generally I think we should enable ALUA. Below is
Datacores explanation about this topic; its quite long so you can skip it if you want:

Without ALUA enabled:

If Hosts are registered without ALUA support, the Preferred Server and Preferred
Path
settings will serve no function. All DataCore Servers and their respective Front End
(FE)
paths are considered equal.
It is up to the Hosts own Operating System or Failover Software to determine which

DataCore Server is its preferred server.

With ALUA enabled:


Setting the Preferred Server to Auto (or an explicit DataCore Server), determines
the
DataCore Server that is designated Active Optimized for Host IO. The other
DataCore
Server is designated Active Non-Optimized.
If for any reason the Storage Source on the preferred DataCore Server becomes
unavailable,
and the Host Access for the Virtual Disk is set to Offline or Disable, then the other
DataCore
Server will be designated the Active Optimized side. The Host will be notified by
both
DataCore Servers that there has been an ALUA state change, forcing the Host to recheck
the ALUA state of both DataCore Servers and act accordingly.
If the Storage Source on the preferred DataCore Server becomes unavailable but
the Host
Access for the Virtual Disk remains Read/Write, for example if only the Storage
behind the
DataCore Server is unavailable but the FE and MR paths are all connected or if the
Host
physically becomes disconnected from the preferred DataCore Server (e.g. Fibre
Channel or
iSCSI cable failure) then the ALUA state will not change for the remaining, Active
Non-optimized side. However, in this case, the DataCore Server will not prevent
access to the
Host nor will it change the way READ or WRITE IO is handled compared to the
Active
Optimized side, but the Host will still register this DataCore Servers Paths as
Active Non-Optimized which may (or may not) affect how the Host behaves
generally.

In the case where the Preferred Server is set to All, then both DataCore Servers are
designated Active Optimized for Host IO.
All IO requests from a Host will use all Paths to all DataCore Servers equally,
regardless of
the distance that the IO has to travel to the DataCore Server. For this reason, the
All setting
is not normally recommended. If a Host has to send a WRITE IO to a remote
DataCore
Server (where the IO Path is significantly distant compared to the other local
DataCore
Server), then the WAIT times accrued by having to send the IO not only across the
SAN to
the remote DataCore Server, but for the remote DataCore Server to mirror back to
the local
DataCore Server and then for the mirror write to be acknowledged from the local
DataCore
Server to the remote DataCore Server and finally for the acknowledgement to be
sent to the
Host back across the SAN, can be significant.
The benefits of being able to use all Paths to all DataCore Servers for all Virtual
Disks are
not always clear cut. Testing is advised.
For Preferred Path settings it is stated in the SANsymphony-V Help:
A preferred front-end path setting can also be set manually for a particular virtual
disk. In
this case, the manual setting for a virtual disk overrides the preferred path created
by the
Preferred Server setting for the host.
So for example, if the Preferred Server is designated as DataCore Server A and the

Preferred Paths are designated as DataCore Server B, then DataCore Server B will
be the
Active Optimized Side not DataCore Server A.
In a two-node Server group there is nothing to be gained by making the Preferred
Path
setting different to the Preferred Server setting and it can cause potential confusion
when
trying to diagnose problems, or redesigning appropriate settings for your DataCore
SAN
with regard to Host IO Paths. DataCore recommend leaving the preferred Path
setting alone
for two-node Server groups.
Where there are three or more DataCore Servers in a Server Group, and where one
or more
of those DataCore Servers shares Mirror Paths between the others in that same
Server
Group, then explicitly setting the Preferred Path makes more sense. So for example,
DataCore Server A has two mirrored Virtual Disks, one with DataCore Server B, and
one
with DataCore Server C and DataCore Server B also has a mirrored Virtual Disk with
DataCore Server C then using just the Preferred Server setting to designate the
Active
Optimized side for the Hosts Virtual Disks becomes more complicated. In this case
the
Preferred Path setting can be used to override the Preferred Server setting for a
much more
granular level of control.

Вам также может понравиться