Вы находитесь на странице: 1из 63

Acropolis Advanced Administration Guide

Acropolis 4.7
06-Oct-2016
Notice

Copyright
Copyright 2016 Nutanix, Inc.
Nutanix, Inc.
1740 Technology Drive, Suite 150
San Jose, CA 95110
All rights reserved. This product is protected by U.S. and international copyright and intellectual property
laws. Nutanix is a trademark of Nutanix, Inc. in the United States and/or other jurisdictions. All other marks
and names mentioned herein may be trademarks of their respective companies.

License
The provision of this software to you does not grant any licenses or other rights under any Microsoft
patents with respect to anything other than the file server implementation portion of the binaries for this
software, including no licenses or any other rights in any hardware or any devices or software that are used
to communicate with or in connection with this software.

Conventions
Convention Description

variable_value The action depends on a value that is unique to your environment.

ncli> command The commands are executed in the Nutanix nCLI.

user@host$ command The commands are executed as a non-privileged user (such as nutanix)
in the system shell.

root@host# command The commands are executed as the root user in the vSphere or Acropolis
host shell.

> command The commands are executed in the Hyper-V host shell.

output The information is displayed as output from a command or in a log file.

Default Cluster Credentials


Interface Target Username Password

Nutanix web console Nutanix Controller VM admin admin

vSphere Web Client ESXi host root nutanix/4u

Copyright | Acropolis Advanced Administration Guide | AOS | 2


Interface Target Username Password

vSphere client ESXi host root nutanix/4u

SSH client or console ESXi host root nutanix/4u

SSH client or console AHV host root nutanix/4u

SSH client or console Hyper-V host Administrator nutanix/4u

SSH client Nutanix Controller VM nutanix nutanix/4u

SSH client or console Acropolis OpenStack root admin


Services VM (Nutanix
OVM)

Version
Last modified: October 6, 2016 (2016-10-06 3:54:57 GMT-7)

Copyright | Acropolis Advanced Administration Guide | AOS | 3


Contents

1: Cluster Management............................................................................... 6
Controller VM Access.......................................................................................................................... 6
Port Requirements.................................................................................................................... 6
Starting a Nutanix Cluster................................................................................................................... 6
Stopping a Cluster............................................................................................................................... 7
Destroying a Cluster............................................................................................................................ 8
Creating Clusters from a Multiblock Cluster........................................................................................9
Cluster IP Address Configuration............................................................................................. 9
Configuring the Cluster........................................................................................................... 10
Verifying IPv6 Link-Local Connectivity....................................................................................13
Failing from one Site to Another....................................................................................................... 15
Disaster failover.......................................................................................................................15
Planned failover...................................................................................................................... 15
Fingerprinting Existing vDisks............................................................................................................15

2: Changing Passwords............................................................................ 17
Changing User Passwords................................................................................................................ 17
Changing the SMI-S Provider Password (Hyper-V)............................................................... 17
Changing the Controller VM Password............................................................................................. 18

3: Cluster IP Address Configuration........................................................20


Network Configuration (Virtual Interfaces, Virtual Switches, and IP Addresses)...............................20
Changing Controller VM IP Addresses............................................................................................. 22
Preparing to Set IP Addresses............................................................................................... 22
Changing Controller VM IP Addresses...................................................................................24
Changing a Controller VM IP Address (manual).................................................................... 25
Completing Controller VM IP Address Change...................................................................... 28

4: Creating a Windows Guest VM Failover Cluster................................ 30

5: Virtual Machine Pinning........................................................................33

6: Self-Service Restore..............................................................................35
Requirements and Limitations of Self-Service Restore.....................................................................35
Enabling Self-Service Restore...........................................................................................................36
Restoring a File as a Guest VM Administrator................................................................................. 37
Restoring a File as a Nutanix Administrator..................................................................................... 38

7: Logs........................................................................................................ 40
Sending Logs to a Remote Syslog Server........................................................................................40
Configuring the Remote Syslog Server Settings.................................................................... 41
Common Log Files.............................................................................................................................42
Nutanix Logs Root.................................................................................................................. 42

4
Self-Monitoring (sysstats) Logs...............................................................................................43
/home/nutanix/data/logs/cassandra...................................................................................... 43
Controller VM Log Files.......................................................................................................... 43
Correlating the FATAL log to the INFO file....................................................................................... 46
Stargate Logs.....................................................................................................................................47
Cassandra Logs................................................................................................................................. 48
Prism Gateway Log........................................................................................................................... 49
Zookeeper Logs................................................................................................................................. 50
Genesis Logs..................................................................................................................................... 50
Diagnosing a Genesis Failure................................................................................................ 51
ESXi Log Files................................................................................................................................... 52

8: Troubleshooting Tools.......................................................................... 53
Nutanix Cluster Check (NCC)........................................................................................................... 53
Installing NCC from an Installer File.......................................................................................54
Upgrading NCC Software....................................................................................................... 56
NCC Usage............................................................................................................................. 57
Diagnostics VMs................................................................................................................................ 58
Running a Test Using the Diagnostics VMs........................................................................... 58
Diagnostics Output.................................................................................................................. 59
Syscheck Utility.................................................................................................................................. 60
Using Syscheck Utility.............................................................................................................60

9: Controller VM Memory Configurations............................................... 61


Controller VM Memory and vCPU Configurations.............................................................................61
Controller VM Memory and vCPU Configurations (Broadwell/G5)....................................................62
Platform Workload Translation (Broadwell/G5)................................................................................. 63

5
1
Cluster Management
Although each host in a Nutanix cluster runs a hypervisor independent of other hosts in the cluster, some
operations affect the entire cluster.

Controller VM Access
Most administrative functions of a Nutanix cluster can be performed through the web console or nCLI.
Nutanix recommends using these interfaces whenever possible and disabling Controller VM SSH access
with password or key authentication. Some functions, however, require logging on to a Controller VM
with SSH. Exercise caution whenever connecting directly to a Controller VM as the risk of causing cluster
issues is increased.
Warning: When you connect to a Controller VM with SSH, ensure that the SSH client does
not import or change any locale settings. The Nutanix software is not localized, and executing
commands with any locale other than en_US.UTF-8 can cause severe cluster issues.
To check the locale used in an SSH session, run /usr/bin/locale. If any environment variables
are set to anything other than en_US.UTF-8, reconnect with an SSH configuration that does not
import or change any locale settings.

Port Requirements
Nutanix uses a number of ports for internal communication. The following unique ports are required for
external access to Controller VMs in a Nutanix cluster.

Table

Purpose Port Numbers


Remote site replication 2009 and 2020
Cluster and IP address configuration 2100
Remote support tunnel (outgoing connection to service centers 80 or 8443
nsc01.nutanix.net and nsc02.nutanix.net)

Management interface (web console, nCLI) 9440

Starting a Nutanix Cluster

1. Log on to any Controller VM in the cluster with SSH.

Cluster Management | Acropolis Advanced Administration Guide | AOS | 6


2. Start the Nutanix cluster.
nutanix@cvm$ cluster start

If the cluster starts properly, output similar to the following is displayed for each node in the cluster:

CVM: 10.1.64.60 Up
Zeus UP [3704, 3727, 3728, 3729, 3807, 3821]
Scavenger UP [4937, 4960, 4961, 4990]
SSLTerminator UP [5034, 5056, 5057, 5139]
Hyperint UP [5059, 5082, 5083, 5086, 5099, 5108]
Medusa UP [5534, 5559, 5560, 5563, 5752]
DynamicRingChanger UP [5852, 5874, 5875, 5954]
Pithos UP [5877, 5899, 5900, 5962]
Stargate UP [5902, 5927, 5928, 6103, 6108]
Cerebro UP [5930, 5952, 5953, 6106]
Chronos UP [5960, 6004, 6006, 6075]
Curator UP [5987, 6017, 6018, 6261]
Prism UP [6020, 6042, 6043, 6111, 6818]
CIM UP [6045, 6067, 6068, 6101]
AlertManager UP [6070, 6099, 6100, 6296]
Arithmos UP [6107, 6175, 6176, 6344]
SysStatCollector UP [6196, 6259, 6260, 6497]
Tunnel UP [6263, 6312, 6313]
ClusterHealth UP [6317, 6342, 6343, 6446, 6468, 6469, 6604, 6605,
6606, 6607]
Janus UP [6365, 6444, 6445, 6584]
NutanixGuestTools UP [6377, 6403, 6404]

What to do next: After you have verified that the cluster is running, you can start guest VMs.
(Hyper-V only) If the Hyper-V failover cluster was stopped, start it by logging on to a Hyper-V host and
running the Start-Cluster PowerShell command.

Warning: By default, Nutanix clusters have redundancy factor 2, which means they can tolerate
the failure of a single node or drive. Nutanix clusters with a configured option of redundancy factor
3 allow the Nutanix cluster to withstand the failure of two nodes or drives in different blocks.
• Never shut down or restart multiple Controller VMs or hosts simultaneously.
• Always run the cluster status command to verify that all Controller VMs are up before
performing a Controller VM or host shutdown or restart.

Stopping a Cluster
Before you begin: Shut down all guest virtual machines, including vCenter if it is running on the cluster.
Do not shut down Nutanix Controller VMs.
(Hyper-V only) Stop the Hyper-V failover cluster by logging on to a Hyper-V host and running the Stop-
Cluster PowerShell command.

Note: This procedure stops all services provided by guest virtual machines, the Nutanix cluster,
and the hypervisor host.

1. Log on to a running Controller VM in the cluster with SSH.

2. Stop the Nutanix cluster.


nutanix@cvm$ cluster stop

Wait to proceed until output similar to the following is displayed for every Controller VM in the cluster.

Cluster Management | Acropolis Advanced Administration Guide | AOS | 7


CVM: 172.16.8.191 Up, ZeusLeader
Zeus UP [3167, 3180, 3181, 3182, 3191, 3201]
Scavenger UP [3334, 3351, 3352, 3353]
ConnectionSplicer DOWN []
Hyperint DOWN []
Medusa DOWN []
DynamicRingChanger DOWN []
Pithos DOWN []
Stargate DOWN []
Cerebro DOWN []
Chronos DOWN []
Curator DOWN []
Prism DOWN []
AlertManager DOWN []
StatsAggregator DOWN []
SysStatCollector DOWN []

Destroying a Cluster
Before you begin: Reclaim licenses from the cluster to be destroyed by following Reclaiming Licenses
When Destroying a Cluster in the Web Console Guide.
Note: If you have destroyed the cluster and did not reclaim the existing licenses, contact Nutanix
Support to reclaim the licenses.

Note: If you have a cluster running the Cloud Platform System (CPS), the procedure to destroy it
is different. See the "Destroying the Cluster" section in the CPS Standard on Nutanix Guide.

Destroying a cluster resets all nodes in the cluster to the factory configuration. All cluster configuration and
guest VM data is unrecoverable after destroying the cluster.
Note: If the cluster is registered with Prism Central (the multiple cluster manager VM), unregister
the cluster before destroying it. See Registering with Prism Central in the Web Console Guide for
more information.

1. Log on to any Controller VM in the cluster with SSH.

2. Stop the Nutanix cluster.


nutanix@cvm$ cluster stop

Wait to proceed until output similar to the following is displayed for every Controller VM in the cluster.

CVM: 172.16.8.191 Up, ZeusLeader


Zeus UP [3167, 3180, 3181, 3182, 3191, 3201]
Scavenger UP [3334, 3351, 3352, 3353]
ConnectionSplicer DOWN []
Hyperint DOWN []
Medusa DOWN []
DynamicRingChanger DOWN []
Pithos DOWN []
Stargate DOWN []
Cerebro DOWN []
Chronos DOWN []
Curator DOWN []
Prism DOWN []
AlertManager DOWN []
StatsAggregator DOWN []

Cluster Management | Acropolis Advanced Administration Guide | AOS | 8


SysStatCollector DOWN []

3. Destroy the cluster.

Caution: Performing this operation deletes all cluster and guest VM data in the cluster.

nutanix@cvm$ cluster destroy

Follow the prompts to confirm destruction of the cluster.

Creating Clusters from a Multiblock Cluster


The minimum size for a cluster is three nodes.

1. Remove nodes from the existing cluster.


→ If you want to preserve data on the existing cluster, remove nodes from the cluster using the
Hardware > Table > Host screen of the web console.
→ If you want multiple new clusters, destroy the existing cluster by following Destroying a Cluster on
page 8.

2. Create one or more new clusters by following Configuring the Cluster on page 10.

Cluster IP Address Configuration

AOS includes a web-based configuration tool that automates assigning IP addresses to cluster
components and creates the cluster.

Requirements

The web-based configuration tool requires that IPv6 link-local be enabled on the subnet. If IPv6 link-local is
not available, you must configure the Controller VM IP addresses and the cluster manually. The web-based
configuration tool also requires that the Controller VMs be able to communicate with each other.
All Controller VMs and hypervisor hosts must be on the same subnet. The hypervisor can be multihomed
provided that one interface is on the same subnet as the Controller VM.
Guest VMs can be on a different subnet.

Cluster Management | Acropolis Advanced Administration Guide | AOS | 9


Configuring the Cluster
Before you begin: Check that the cluster is ready to be configured by following Preparing to Set IP
Addresses on page 22.
Note: This procedure has been deprecated (superseded) in AOS 4.5 and later releases. Instead,
use the Foundation tool to configure a cluster. See the "Creating a Cluster" topics in the Field
Installation Guide for more information.

Video: Click here to see a video (MP4 format) demonstration of this procedure. (The video may
not reflect the latest features described in this section.)

Figure: Cluster IP Address Configuration Page

1. Open a web browser.


Nutanix recommends using Internet Explorer 9 for Windows and Safari for Mac OS.

Note: Internet Explorer requires protected mode to be disabled. Go to Tools > Internet
Options > Security, clear the Enable Protected Mode check box, and restart the browser.

2. In the browser, go to http://[cvm_ipv6_addr]:2100/cluster_init.html.


Replace [cvm_ipv6_addr] with the IPv6 address of any Controller VM that should be added to the
cluster.
Following is an example URL to access the cluster creation page on a Controller VM:
http://fe80::5054:ff:fea8:8aae:2100/cluster_init.html

Cluster Management | Acropolis Advanced Administration Guide | AOS | 10


If the cluster_init.html page is blank, then the Controller VM is already part of a cluster. Connect to a
Controller VM that is not part of a cluster.
You can obtain the IPv6 address of the Controller VM by using the ifconfig command.
Example
nutanix@cvm$ ifconfig
eth0 Link encap:Ethernet HWaddr 52:54:00:A8:8A:AE
inet addr:10.1.65.240 Bcast:10.1.67.255 Mask:255.255.252.0
inet6 addr: fe80::5054:ff:fea8:8aae/64 Scope:Link
...etc...

The value of the inet6 addr field up to the / character is the IPv6 address of the Controller VM.

3. Type a meaningful value in the Cluster Name field.


This value is appended to all automated communication between the cluster and Nutanix support. It
should include the customer's name and, if necessary, a modifier that differentiates this cluster from any
other clusters that the customer might have.
Note: This entity has the following naming restrictions:
• The maximum length is 75 characters (for vSphere and AHV) and 15 characters (for Hyper-
V).
• Allowed characters are uppercase and lowercase standard Latin letters (A-Z and a-z),
decimal digits (0-9), dots (.), hyphens (-), and underscores (_).

4. Type a virtual IP address for the cluster in the Cluster External IP field.
This parameter is required for Hyper-V clusters and is optional for vSphere and AHV clusters.
You can connect to the external cluster IP address with both the web console and nCLI. In the event
that a Controller VM is restarted or fails, the external cluster IP address is relocated to another
Controller VM in the cluster.

5. (Optional) If you want to enable redundancy factor 3, set Cluster Max Redundancy Factor to 3.
Redundancy factor 3 has the following requirements:
• Redundancy factor 3 can be enabled only when the cluster is created.
• A cluster must have at least five nodes for redundancy factor 3 to be enabled.
• For guest VMs to tolerate the simultaneous failure of two nodes or drives in different blocks, the data
must be stored on containers with replication factor 3.
• Controller VMs must be configured with 24 GB of memory.

6. Type the appropriate DNS and NTP addresses in the respective fields.

Note: You must enter NTP servers that the Controller VMs can reach in the CVM NTP Servers
field. If reachable NTP servers are not entered or if the time on the Controller VMs is ahead of
the current time, cluster services may fail to start.

For Hyper-V clusters, the CVM NTP Servers parameter must be set to the IP addresses of one or more
Active Directory domain controllers.
The Hypervisor NTP Servers parameter is not used in Hyper-V clusters.

7. Type the appropriate subnet masks in the Subnet Mask row.

8. Type the appropriate default gateway IP addresses in the Default Gateway row.

9. Select the check box next to each node that you want to add to the cluster.

Cluster Management | Acropolis Advanced Administration Guide | AOS | 11


All unconfigured nodes on the current network are presented on this web page. If you are going to
configure multiple clusters, be sure that you only select the nodes that should be part of the current
cluster.

10. Provide an IP address for all components in the cluster.


Ensure that all components satisfy the cluster subnet requirements. See Cluster IP Address
Configuration on page 9. The use of a DHCP server is not supported for Controller VMs, so make
sure to not use DHCP for Controller VMs.

Note: The unconfigured nodes are not listed according to their position in the block. Ensure
that you assign the intended IP address to each node.

11. Click Create.


Wait until the Log Messages section of the page reports that the cluster has been successfully
configured.
Output similar to the following indicates successful cluster configuration.
Configuring IP addresses on node 13SM71450003/A...
Configuring IP addresses on node 13SM71450003/A...
Configuring IP addresses on node 13SM71450003/A...
Configuring IP addresses on node 13SM71450003/A...
Configuring the Hypervisor DNS settings on node 13SM71450003/A...
Configuring the Hypervisor DNS settings on node 13SM71450003/A...
Configuring the Hypervisor DNS settings on node 13SM71450003/A...
Configuring the Hypervisor DNS settings on node 13SM71450003/A...
Configuring the Hypervisor NTP settings on node 13SM71450003/A...
Configuring the Hypervisor NTP settings on node 13SM71450003/A...
Configuring the Hypervisor NTP settings on node 13SM71450003/A...
Configuring the Hypervisor NTP settings on node 13SM71450003/A...
Configuring Zeus on node 13SM71450003/A...
Configuring Zeus on node 13SM71450003/A...
Configuring Zeus on node 13SM71450003/A...
Configuring Zeus on node 13SM71450003/A...
Initializing cluster...
Cluster successfully initialized!
Initializing the CVM DNS and NTP servers...
Successfully updated the CVM NTP and DNS server list

The cluster is started automatically after creation.

12. Log on to any Controller VM in the cluster with SSH.

13. Verify that all services are up on all Controller VMs.


nutanix@cvm$ cluster status

If the cluster is running properly, output similar to the following is displayed for each node in the cluster:

CVM: 10.1.64.60 Up
Zeus UP [3704, 3727, 3728, 3729, 3807, 3821]
Scavenger UP [4937, 4960, 4961, 4990]
SSLTerminator UP [5034, 5056, 5057, 5139]
Hyperint UP [5059, 5082, 5083, 5086, 5099, 5108]
Medusa UP [5534, 5559, 5560, 5563, 5752]
DynamicRingChanger UP [5852, 5874, 5875, 5954]
Pithos UP [5877, 5899, 5900, 5962]
Stargate UP [5902, 5927, 5928, 6103, 6108]
Cerebro UP [5930, 5952, 5953, 6106]
Chronos UP [5960, 6004, 6006, 6075]
Curator UP [5987, 6017, 6018, 6261]
Prism UP [6020, 6042, 6043, 6111, 6818]
CIM UP [6045, 6067, 6068, 6101]

Cluster Management | Acropolis Advanced Administration Guide | AOS | 12


AlertManager UP [6070, 6099, 6100, 6296]
Arithmos UP [6107, 6175, 6176, 6344]
SysStatCollector UP [6196, 6259, 6260, 6497]
Tunnel UP [6263, 6312, 6313]
ClusterHealth UP [6317, 6342, 6343, 6446, 6468, 6469, 6604, 6605,
6606, 6607]
Janus UP [6365, 6444, 6445, 6584]
NutanixGuestTools UP [6377, 6403, 6404]

Verifying IPv6 Link-Local Connectivity


The automated IP address and cluster configuration utilities depend on IPv6 link-local addresses, which
are enabled on most networks. Use this procedure to verify that IPv6 link-local is enabled.

1. Connect two Windows, Linux, or Apple laptops to the switch to be used.

2. Disable any firewalls on the laptops.

3. Verify that each laptop has an IPv6 link-local address.


→ Windows (Control Panel)
Start > Control Panel > View network status and tasks > Change adapter settings > Local
Area Connection > Details

→ Windows (command-line interface)


> ipconfig

Ethernet adapter Local Area Connection:

Connection-specific DNS Suffix . : corp.example.com


Link-local IPv6 Address . . . . . : fe80::ed67:9a32:7fc4:3be1%12
IPv4 Address. . . . . . . . . . . : 172.16.21.11
Subnet Mask . . . . . . . . . . . : 255.240.0.0
Default Gateway . . . . . . . . . : 172.16.0.1

Cluster Management | Acropolis Advanced Administration Guide | AOS | 13


→ Linux
$ ifconfig eth0

eth0 Link encap:Ethernet HWaddr 00:0c:29:dd:e3:0b


inet addr:10.2.100.180 Bcast:10.2.103.255 Mask:255.255.252.0
inet6 addr: fe80::20c:29ff:fedd:e30b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2895385616 errors:0 dropped:0 overruns:0 frame:0
TX packets:3063794864 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2569454555254 (2.5 TB) TX bytes:2795005996728 (2.7 TB)

→ Mac OS
$ ifconfig en0

en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500


ether 70:56:81:ae:a7:47
inet6 fe80::7256:81ff:feae:a747 en0 prefixlen 64 scopeid 0x4
inet 172.16.21.208 netmask 0xfff00000 broadcast 172.31.255.255
media: autoselect
status: active

Note the IPv6 link-local addresses, which always begin with fe80 . Omit the / character and anything
following.

4. From one of the laptops, ping the other laptop.


→ Windows
> ping -6 ipv6_linklocal_addr%interface

→ Linux/Mac OS
$ ping6 ipv6_linklocal_addr%interface

• Replace ipv6_linklocal_addr with the IPv6 link-local address of the other laptop.
• Replace interface with the interface identifier on the other laptop (for example, 12 for Windows, eth0
for Linux, or en0 for Mac OS).
If the ping packets are answered by the remote host, IPv6 link-local is enabled on the subnet. If the
ping packets are not answered, ensure that firewalls are disabled on both laptops and try again before
concluding that IPv6 link-local is not enabled.

5. Reenable the firewalls on the laptops and disconnect them from the network.

Results:
• If IPv6 link-local is enabled on the subnet, you can use automated IP address and cluster configuration
utility.
• If IPv6 link-local is not enabled on the subnet, you have to manually set IP addresses and create the
cluster.

Note: IPv6 connectivity issue might occur if mismatch occurs because of VLAN tagging. This
issue might occur because ESXi that is shipped from the factory does not have VLAN tagging,
hence it might have VLAN tag as 0. The workstation (laptop) that you have connected might be
connected to access port, so it might use different VLAN tag. Hence, ensure that ESXi port must
be in the trunking mode.

Cluster Management | Acropolis Advanced Administration Guide | AOS | 14


Failing from one Site to Another

Disaster failover

Connect to the backup site and activate it.


ncli> pd activate name="pd_name"

This operation does the following:


1. Restores all VM files from last fully-replicated snapshot.
2. Registers VMs on recovery site.
a. All the VMs are registered on a single host in the cluster.
b. The VMs are not powered on automatically. You need to manually start the VMs.
c. It is recommended to enable DRS in the cluster so that the hypervisor migrates the VMs them
once they are powered on.
Caution: The VM registration might fail if the container is not mounted on the selected
host.

3. Marks the failover site protection domain as active.

Planned failover

Connect to the primary site and specify the failover site to migrate to.
ncli> pd migrate name="pd_name" remote-site="remote_site_name2"

This operation does the following:


1. Creates and replicates a snapshot of the protection domain.
2. Shuts down VMs on the local site.
3. Creates and replicates another snapshot of the protection domain.
4. Unregisters all VMs and removes their associated files.
5. Marks the local site protection domain as inactive.
6. Restores all VM files from the last snapshot and registers them on the remote site.
7. Marks the remote site protection domain as active.

Fingerprinting Existing vDisks


The vDisk manipulator utility fingerprints vDisks that existed in the cluster before deduplication was
enabled.
Before you begin: The container must have fingerprint-on-write enabled.

Run the vDisk manipulator utility from any Controller VM in the cluster.
nutanix@cvm$ vdisk_manipulator --operation="add_fingerprints" \
--stats_only="false" --nfs_container_name="ctr_name" \
--nfs_relative_file_path="vdisk_path"

• Replace ctr_name with the name of the container where the vDisk to fingerprint resides.

Cluster Management | Acropolis Advanced Administration Guide | AOS | 15


• Replace vdisk_path with the path of the vDisk to fingerprint relative to the container path (for
example, Win7-desktop11/Win7-desktop11-flat.vmdk). You cannot specify multiple vDisks in this
parameter.
Note: You can run vdisk_manipulator in a loop to fingerprint multiple vDisks, but run only one
instance of vdisk_manipulator on each Controller VM at a time. Executing multiple instances
on a Controller VM concurrently would generate significant load on the cluster.

Cluster Management | Acropolis Advanced Administration Guide | AOS | 16


2
Changing Passwords

Changing User Passwords


You can change user passwords, including for the default admin user, in the web console or nCLI.
Changing the password through either interface changes it for both.
To change a user password, do one of the following:

• (Web console) Log on to the web console as the user whose password is to be changed and select
Change Password from the user icon pull-down list of the main menu.
For more information about changing properties of the current users, see the Web Console Guide.

• (nCLI) Specify the username and passwords.


$ ncli -u 'username' -p 'old_pw' user change-password current-password="curr_pw" \
new-password="new_pw"

• Replace username with the name of the user whose password is to be changed.
• Replace curr_pw with the current password.
• Replace new_pw with the new password.
Note: If you change the password of the admin user from the default, you must specify the
password every time you start an nCLI session from a remote system. A password is not
required if you are starting an nCLI session from a Controller VM where you are already logged
on.

Changing the SMI-S Provider Password (Hyper-V)


If you change the password of the Prism admin user, you have to update the Prism run-as account in
SCVMM.

1. Log on to the system where the SCVMM console is installed and start the console.

2. Go to Settings > Security > Run As Account.

3. Right-click the account named cluster_name-Prism and select Properties.

Changing Passwords | Acropolis Advanced Administration Guide | AOS | 17


Figure: Prism Run As Account in SCVMM

4. Update the username and password to include the new credentials and ensure that Validate domain
credentials is not checked.

5. Go to Fabric > Storage > Providers.

6. Right-click the provider with Name cluster_name and select Refresh.

Figure: Storage Provider

Changing the Controller VM Password


Perform these steps on every Controller VM in the cluster.
Note: Nutanix recommends that you set the same password for the nutanix user on all the
Controller VMs.

1. Log on to the Controller VM with SSH.

2. Change the nutanix user password.


nutanix@cvm$ passwd

3. Respond to the prompts, providing the current and new nutanix user password.
Changing password for nutanix.
Old Password:
New password:
Retype new password:
Password changed.

Changing Passwords | Acropolis Advanced Administration Guide | AOS | 18


Note: The password must meet the following complexity requirements:
• At least 9 characters long
• At least 2 lowercase characters
• At least 2 uppercase characters
• At least 2 numbers
• At least 2 special characters

Changing Passwords | Acropolis Advanced Administration Guide | AOS | 19


3
Cluster IP Address Configuration

AOS includes a web-based configuration tool that automates the modification of Controller VM IP
addresses and configures the cluster to use these new IP addresses. Other cluster components must be
modified manually.

Network Configuration (Virtual Interfaces, Virtual Switches, and IP Addresses)


By default, Nutanix hosts have the following virtual switches:

Internal Virtual Switch

The internal virtual switch manages network communications between the Controller VM and the
hypervisor host. This switch is associated with a private network on the default VLAN and uses the
192.168.5.0/24 address space. The traffic on this subnet is typically restricted to the internal virtual switch,
but might be sent over the physical wire, through a host route, to implement storage high availability on
ESXi and Hyper-V clusters. This traffic is on the same VLAN as the Nutanix storage backplane.
Note: For guest VMs and other devices on the network, do not use a subnet that overlaps with
the 192.168.5.0/24 subnet on the default VLAN. If you want to use an overlapping subnet for such
devices, make sure that you use a different VLAN.
The following tables list the interfaces and IP addresses on the internal virtual switch on different
hypervisors:

Interfaces and IP Addresses on the Internal Virtual Switch virbr0 on an AHV Host

Device Interface Name IP Address

AHV Host virbr0 192.168.5.1

Cluster IP Address Configuration | Acropolis Advanced Administration Guide | AOS | 20


Device Interface Name IP Address

Controller VM eth1 192.168.5.2

eth1:1 192.168.5.254

Interfaces and IP Addresses on the Internal Virtual Switch vSwitchNutanix on an ESXi Host

Device Interface Name IP Address

ESXi Host vmk1 192.168.5.1

Controller VM eth1 192.168.5.2

eth1:1 192.168.5.254

Interfaces and IP Addresses on the Internal Virtual Switch InternalSwitch on a Hyper-V Host

Device Interface Name IP Address

Hyper-V Host vEthernet (InternalSwitch) 192.168.5.1

Controller VM eth1 192.168.5.2

eth1:1 192.168.5.254

External Virtual Switch

The external virtual switch manages communication between the virtual machines, between the virtual
machines and the host, and between the hosts in the cluster. The traffic on this virtual switch also includes
Controller VM–driven replication traffic for the purposes of maintaining the specified replication factor (RF),
as well as any ADSF traffic that cannot be processed locally. The external switch is assigned a NIC team or
bond as the means to provide connectivity outside of the host.

Note: Make sure that the hypervisor and Controller VM interfaces on the external virtual switch
are not assigned IP addresses from the 192.168.5.0/24 subnet.
The following tables list the interfaces and IP addresses on the external virtual switch on different
hypervisors:

Interfaces and IP Addresses on the External Virtual Switch br0 on an AHV Host

Device Interface Name IP Address

AHV Host br0 User-defined

Controller VM eth0 User-defined

Guest VM br0 or user-defined Open vSwitch User-defined


bridge)

Cluster IP Address Configuration | Acropolis Advanced Administration Guide | AOS | 21


Interfaces and IP Addresses on the External Virtual Switch vSwitch0 on an ESXi Host

Device Interface Name IP Address

ESXi Host vmk0 User-defined

Controller VM eth0 User-defined

Guest VM vSwitch0 or user-defined switch User-defined

Interfaces and IP Addresses on the External Virtual Switch ExternalSwitch on a Hyper-V Host

Device Interface Name IP Address

Hyper-V Host vEthernet (ExternalSwitch) User-defined


Controller VM eth0 User-defined

Guest VM vEthernet (ExternalSwitch) or user- User-defined


defined switch

Changing Controller VM IP Addresses


Warning: If you are reassigning a Controller VM IP address to another Controller VM, you must
perform this complete procedure twice: once to assign intermediate IP addresses and again to
assign the desired IP addresses.
For example, if Controller VM A has IP address 172.16.0.11 and Controller VM B has IP address
172.16.0.10 and you want to swap them, you would need to reconfigure them with different IP
addresses (such as 172.16.0.100 and 172.16.0.101) before changing them to the IP addresses in
use initially.

1. Place the cluster in reconfiguration mode. See the "Before you begin" section in Changing a Controller
VM IP Address (manual) on page 25.

2. Configure the Controller VM IP addresses.


→ If IPv6 is enabled on the subnet, follow Changing Controller VM IP Addresses on page 24.
→ If IPv6 is not enabled on the subnet, follow Changing a Controller VM IP Address (manual) on
page 25 for each Controller VM in the cluster.

3. Complete cluster reconfiguration by following Completing Controller VM IP Address Change on


page 28.

Preparing to Set IP Addresses


Before you configure the cluster, check that these requirements are met.

• Confirm that IPv6 link-local is enabled on the subnet.


IPv6 link-local is required only for discovery of nodes. It is not required after cluster creation except to
add nodes to an existing cluster.

• Confirm that the system you are using to configure the cluster meets the following requirements:

Cluster IP Address Configuration | Acropolis Advanced Administration Guide | AOS | 22


• IPv6 link-local enabled.
• Windows 7, Vista, or MacOS.
• (Windows only) Bonjour installed (included with iTunes or downloadable from http://
support.apple.com/kb/DL999).

• (Hyper-V only) Confirm that the hosts have only one type of NIC (10 GbE or 1 GbE) connected during
cluster creation. If the nodes have multiple types of network interfaces connected, disconnect them until
after you join the hosts to the domain.

• Determine the IPv6 service name of any Controller VM in the cluster.


The service name depends on a unique identifier for the system.

Nutanix Serial Number

IPv6 service names are uniquely


generated at the factory and have the
following form (note the final period):
NTNX-block_serial_number-node_location-
CVM.local.

On the right side of the block toward the front


is a label that has the block_serial_number (for
example, 12AM3K520060).
The node_location is A for one-node blocks, A-
B for two-node blocks, and A-D for four-node
blocks.
If you do not have access to get the block serial
number, see the Nutanix support knowledge base
for alternative methods.

Cluster IP Address Configuration | Acropolis Advanced Administration Guide | AOS | 23


Dell Service Tag

IPv6 service names are uniquely


generated at the factory and have the
following form (note the final period):
NTNX-system_service_tag-node_location-
CVM.local.

On the front left side of the system is a slide-out


label that contains the system_service_tag (for
example, B57PW12).
The node_location is A for one-node blocks.

Changing Controller VM IP Addresses


Before you begin: Check that the cluster is ready to be configured by following Preparing to Set IP
Addresses on page 22.

Warning: If you are reassigning a Controller VM IP address to another Controller VM, you must
perform this complete procedure twice: once to assign intermediate IP addresses and again to
assign the desired IP addresses.
For example, if Controller VM A has IP address 172.16.0.11 and Controller VM B has IP address
172.16.0.10 and you want to swap them, you would need to reconfigure them with different IP
addresses (such as 172.16.0.100 and 172.16.0.101) before changing them to the IP addresses in
use initially.
The cluster must be stopped and in reconfiguration mode before changing the Controller VM IP addresses.

1. Open a web browser.


Nutanix recommends using Internet Explorer 9 for Windows and Safari for Mac OS.

Note: Internet Explorer requires protected mode to be disabled. Go to Tools > Internet
Options > Security, clear the Enable Protected Mode check box, and restart the browser.

2. In the browser, go to http://[cvm_ipv6_addr]:2100/ip_reconfig.html.


Replace [cvm_ipv6_addr] with the IPv6 address of any Controller VM in the cluster.
You can obtain the IPv6 address of the Controller VM by using the ifconfig command.

Cluster IP Address Configuration | Acropolis Advanced Administration Guide | AOS | 24


Example
nutanix@cvm$ ifconfig
eth0 Link encap:Ethernet HWaddr 52:54:00:A8:8A:AE
inet addr:10.1.65.240 Bcast:10.1.67.255 Mask:255.255.252.0
inet6 addr: fe80::5054:ff:fea8:8aae/64 Scope:Link
...etc...

The value of the inet6 addr field up to the / character is the IPv6 address of the Controller VM.

3. Update one or more cells on the IP Reconfiguration page.


Ensure that all components satisfy the cluster subnet requirements. See Cluster IP Address
Configuration on page 20.

4. Click Reconfigure.

5. Wait until the Log Messages section of the page reports that the cluster has been successfully
reconfigured, as shown in the following example.
Configuring IP addresses on node S10264822116570/A...
Success!
Configuring IP addresses on node S10264822116570/C...
Success!
Configuring IP addresses on node S10264822116570/B...
Success!
Configuring IP addresses on node S10264822116570/D...
Success!
Configuring Zeus on node S10264822116570/A...
Configuring Zeus on node S10264822116570/C...
Configuring Zeus on node S10264822116570/B...
Configuring Zeus on node S10264822116570/D...
Reconfiguration successful!

The IP address reconfiguration disconnects any SSH sessions to cluster components. The cluster is
taken out of reconfiguration mode.

Changing a Controller VM IP Address (manual)


Before you begin:
• Ensure that the cluster NTP and DNS servers are reachable from the new Controller VM addresses.
If different NTP and DNS servers are to be used, remove the existing NTP and DNS servers from the
cluster configuration and add the new ones. If the new addresses are not known, remove the existing
NTP and DNS servers before cluster reconfiguration and add the new ones afterwards.

Web Console
> Name Servers

> NTP Servers

nCLI ncli>cluster remove-from-name-servers servers="name_servers"


ncli>cluster add-to-name-servers servers="name_servers"
ncli>cluster remove-from-ntp-servers servers="ntp_servers"
ncli>cluster add-to-ntp-servers servers="ntp_servers"

• Log on to a Controller VM in the cluster and check that all hosts are part of the metadata store.
nutanix@cvm$ ncli host ls | grep "Metadata store status"

For every host in the cluster, Metadata store enabled on the node should be shown.

Cluster IP Address Configuration | Acropolis Advanced Administration Guide | AOS | 25


Warning: If Node marked to be removed from metadata store is displayed, do not proceed
with the IP address reconfiguration, and contact Nutanix support to resolve the issue.

1. Log on to any Controller VM in the cluster with SSH.


2. Stop the Nutanix cluster.
nutanix@cvm$ cluster stop

Wait to proceed until output similar to the following is displayed for every Controller VM in the cluster.

CVM: 172.16.8.191 Up, ZeusLeader


Zeus UP [3167, 3180, 3181, 3182, 3191, 3201]
Scavenger UP [3334, 3351, 3352, 3353]
ConnectionSplicer DOWN []
Hyperint DOWN []
Medusa DOWN []
DynamicRingChanger DOWN []
Pithos DOWN []
Stargate DOWN []
Cerebro DOWN []
Chronos DOWN []
Curator DOWN []
Prism DOWN []
AlertManager DOWN []
StatsAggregator DOWN []
SysStatCollector DOWN []
3. Put the cluster in reconfiguration mode.
nutanix@cvm$ cluster reconfig

Type y to confirm the reconfiguration.


Wait until the cluster successfully enters reconfiguration mode, as shown in the following example.

INFO cluster:185 Restarted Genesis on 172.16.8.189.


INFO cluster:185 Restarted Genesis on 172.16.8.188.
INFO cluster:185 Restarted Genesis on 172.16.8.191.
INFO cluster:185 Restarted Genesis on 172.16.8.190.
INFO cluster:864 Success!

To manually change the Controller VM IP address, do the following.

1. Log on to the hypervisor host with SSH or the IPMI remote console (vSphere or AHV) or remote
desktop connection (Hyper-V).

2. Log on to the Controller VM.


→ vSphere or AHV root@host# ssh nutanix@192.168.5.254
→ Hyper-V > ssh nutanix@192.168.5.254
Accept the host authenticity warning if prompted, and enter the Controller VM nutanix password.

3. Restart genesis.
nutanix@cvm$ genesis restart

If the restart is successful, output similar to the following is displayed:

Stopping Genesis pids [1933, 30217, 30218, 30219, 30241]


Genesis started on pids [30378, 30379, 30380, 30381, 30403]

Cluster IP Address Configuration | Acropolis Advanced Administration Guide | AOS | 26


4. Change the network interface configuration.

Caution: Do not place a backup copy of the ifconfig-ethX script files in the /etc/sysconfig/
network-scripts/ directory. Ensure that there are no backup files (.bkp or similar file extension)
in this location.

a. Open the network interface configuration file.


nutanix@cvm$ sudo vi /etc/sysconfig/network-scripts/ifcfg-eth0

Enter the nutanix password.

b. Press A to edit values in the file.

c. Update entries for netmask, gateway, and address.


The block should look like this:
ONBOOT="yes"
NM_CONTROLLED="no"
NETMASK="subnet_mask"
IPADDR="cvm_ip_addr"
DEVICE="eth0"
TYPE="ethernet"
GATEWAY="gateway_ip_addr"
BOOTPROTO="none"

• Replace cvm_ip_addr with the IP address for the Controller VM.


• Replace subnet_mask with the subnet mask for cvm_ip_addr.
• Replace gateway_ip_addr with the gateway address for cvm_ip_addr.
Warning: Carefully check the file to ensure there are no syntax errors, whitespace at the
end of lines, or blank lines in the file.

d. Press Esc.

e. Type :wq and press Enter to save your changes.

5. Update the IP addresses of the Zookeeper hosts.

Caution: Failure to update these entries according to the provided steps might result in
services failing to start or other network-related anomalies that might prevent cluster use.

a. Verify that the Zookeeper mappings can be updated.


nutanix@cvm$ edit-zkmapping check_can_modify_zk_map

• For each node in the cluster, a message containing the current Zookeeper mapping is displayed.
Found mapping {'10.4.100.70': 1, '10.4.100.243': 3, '10.4.100.251': 2}

The second numbers in the pairs are the Zookeeper IDs. Each cluster has either three or five
Zookeeper nodes.

• A message Safe to proceed with modification of Zookeeper mapping indicates that you can
change the Zookeeper IP addresses.

b. If the check passes, update the mapping.


nutanix@cvm$ edit-zkmapping modify_zk_server_config \
cvm_ip_addr1:1,cvm_ip_addr2:2,cvm_ip_addr3:3[,cvm_ip_addr4:4,cvm_ip_addr5:5]

Replace cvm_ip_addr1, cvm_ip_addr2, and cvm_ip_addr3 with the new IP addresses that you want
to assign to the Controller VMs.

Cluster IP Address Configuration | Acropolis Advanced Administration Guide | AOS | 27


Include cvm_ip_addr4 and cvm_ip_addr5 only if the check_can_modify_zk_map showed five
Zookeeper nodes.
Caution: While changing the IP addresses, do not change the mapping between CVMs
and Zookeeper nodes. For example, if the IP address for the CVM that is Zookeeper node 2
changes to 192.168.1.3, specify the mapping as 192.168.1.3:2, not :1 or :3.

6. Restart the Controller VM.


nutanix@cvm$ sudo reboot

Enter the nutanix password if prompted.

Completing Controller VM IP Address Change

1. If you changed the IP addresses by modifying the Controller VM configuration files directly rather than
using the Nutanix utility, take the cluster out of reconfiguration mode.
Perform these steps for every Controller VM in the cluster.

a. Log on to the Controller VM with SSH.

b. Take the Controller VM out of reconfiguration mode.


nutanix@cvm$ rm ~/.node_reconfigure

c. Restart genesis.
nutanix@cvm$ genesis restart

If the restart is successful, output similar to the following is displayed:

Stopping Genesis pids [1933, 30217, 30218, 30219, 30241]


Genesis started on pids [30378, 30379, 30380, 30381, 30403]

2. Log on to any Controller VM in the cluster with SSH.

3. Start the Nutanix cluster.


nutanix@cvm$ cluster start

If the cluster starts properly, output similar to the following is displayed for each node in the cluster:

CVM: 10.1.64.60 Up
Zeus UP [3704, 3727, 3728, 3729, 3807, 3821]
Scavenger UP [4937, 4960, 4961, 4990]
SSLTerminator UP [5034, 5056, 5057, 5139]
Hyperint UP [5059, 5082, 5083, 5086, 5099, 5108]
Medusa UP [5534, 5559, 5560, 5563, 5752]
DynamicRingChanger UP [5852, 5874, 5875, 5954]
Pithos UP [5877, 5899, 5900, 5962]
Stargate UP [5902, 5927, 5928, 6103, 6108]
Cerebro UP [5930, 5952, 5953, 6106]
Chronos UP [5960, 6004, 6006, 6075]
Curator UP [5987, 6017, 6018, 6261]
Prism UP [6020, 6042, 6043, 6111, 6818]
CIM UP [6045, 6067, 6068, 6101]
AlertManager UP [6070, 6099, 6100, 6296]
Arithmos UP [6107, 6175, 6176, 6344]
SysStatCollector UP [6196, 6259, 6260, 6497]

Cluster IP Address Configuration | Acropolis Advanced Administration Guide | AOS | 28


Tunnel UP [6263, 6312, 6313]
ClusterHealth UP [6317, 6342, 6343, 6446, 6468, 6469, 6604, 6605,
6606, 6607]
Janus UP [6365, 6444, 6445, 6584]
NutanixGuestTools UP [6377, 6403, 6404]

What to do next: Run the following NCC checks to verify the health of the Zeus configuration:
• ncc health_checks system_checks zkalias_check_plugin

• ncc health_checks system_checks zkinfo_check_plugin

Cluster IP Address Configuration | Acropolis Advanced Administration Guide | AOS | 29


4
Creating a Windows Guest VM Failover Cluster
A guest VM failover cluster allows high availability for cluster-aware workloads within the VM and enables
an application to seamlessly failover to another VM on the same host or on a different host. This feature is
supported on Hyper-V and ESXi hypervisor environments. Windows guest failover clustering is supported
for Windows Server 2008 and Windows Server 2012 R2.
Note: Windows Guest VM Failover Clustering is for guest VMs only.

1. From the Server Manager, add and enable the Multipath I/O feature in Tools > MPIO.

a. Add support for iSCSI devices by checking the box in the Discovered Multipaths tab.

b. Enable multipath for the targets by checking the box in the Microsoft iSCSI Initiator and selecting
the IP addresses for the Target Portal IP.

2. Set the default load balancing policy for all LUNS to Fail Over Only by running the following PowerShell
cmdlet on each Windows Server 2012 VM that is being used for Windows Failover Clustering:
> Set-MSDSMGlobalDefaultLoadBalancePolicy -Policy FOO

3. Log on to any Controller VM in your cluster through an SSH session and access the Acropolis
command line.
nutanix@cvm$ acli
<acropolis>

4. Create a volume group, then add a disk to the newly-created volume group. Verify that the new volume
group and disk were created successfully

a. Create a volume group, where vg_name is the name of the volume group.
<acropolis> vg.create vg_name shared=true

b. Add a disk to the newly-created volume group, where container_name is the name of the container
and disk_size is the disk size. (Optional) Use index_number to index the disk within the cluster,
(for example, create_size=1000G creates a disk with a capacity of 1000G). Otherwise, the system
automatically assigns index numbers.

Note: If you allocate disk size, the lower case g (in size=20g) indicates gigabyte and capital
G (in size=1000G) indicates gibibytes.
<acropolis> vg.disk_create vg_name container=container_name \
create_size=disk_size index=index_number

Note: For best results, Nutanix recommends that you configure 1 vDisk per volume group.

c. Verify the volume group.

Creating a Windows Guest VM Failover Cluster | Acropolis Advanced Administration Guide | AOS | 30
Note: If you have more than one disk inside the target, the vg.get command displays all
disks within the target.
<acropolis> vg.get vg_name

Next, log on to your Windows Server VM to perform the following steps.

5. From Windows Server, get the iSCSI initiator name. Then, from the Acropolis CLI, attach the external
initiator to the volume group and verify the connection.

a. From the Windows Server Manager VM, in the Disk Management window, click Tools iSCSI
Initiator, then click the Configuration tab and copy the iSCSI initiator name from the text box.

b. From the Acropolis CLI, attach the external initiators, where initiator_name is the copied initiator
name.
<acropolis> vg.attach_external vg_name initiator_name

c. Repeat this step for any remaining external initiators. Verify that the external initiators are connected.
<acropolis> vg.get vg_name

Note: You can also create a volume group and enable multiple initiators to access the
volume group by using Prism web console. For more information, see Creating a Volume
Group section of Prism Web Console Guide.

6. Allow the Controller VM to be discoverable by the external initiators.

a. From the Windows Server Manager VM, in the iSCSI Initiator Properties window, click Discovery
and add the Controller VM IP address, then click OK.

b. Repeat this step for the remaining Controller VM IP addresses by doing the same to the next target.

c. To verify that the IP addresses are connected, go to the Targets tab and click Refresh.

d. Click OK to exit.

7. From the Server Manager, place the target disks online and create a New Simple Volume.

a. In the Disk Management window, right-click each disk and choose the Online option.
Repeat for any remaining disks.

b. Click New Simple Volume Wizard and verify the information in the following windows until you
reach the Format Platform window.

c. Enter a name for the volume in Volume label and complete the remaining wizard steps.

Note: (Optional) If a formatting window appears, you can format the Simple Volume.

8. From the Server Manager, create a Windows Guest VM Failover Cluster and add the disks.

a. In the Server Manager, click Tools > Failover Cluster Manager and click Create Cluster.

b. Click Browse and in the Select Computer window enter the names of the VMs you want to add, then
click OK. Click Yes to validate configuration tests.

c. Verify the information, then enter a name and IP address for the Windows Failover Cluster. Click OK.

Creating a Windows Guest VM Failover Cluster | Acropolis Advanced Administration Guide | AOS | 31
d. In the Failover Cluster Manager, click the volume group and click Storage > Disks. Choose Add
disk.

e. Select the disks you want to add to the cluster and click OK.

The new cluster and disks have been created and configured.

Creating a Windows Guest VM Failover Cluster | Acropolis Advanced Administration Guide | AOS | 32
5
Virtual Machine Pinning
VM pinning provides an ability to the administrator to select the storage tier preference for a particular
virtual disk (vDisk) of the virtual machine that may be running some latency sensitive mission critical
applications. For example, a cluster running a mission critical applications (such as SQL database)
workload with a large working set alongside other workloads may be too large to fit into the SSD tier (hot
tier) and could potentially migrate to the HDD tier (cold tier). For extremely latency sensitive workloads, this
migration to HDD tier could seriously affect the read and write performance of the workload.
By using VM pinning, you can specify particular virtual disk to be pinned to a SSD tier so that all the data of
that virtual disk resides only in the SSD tier and never gets down migrated into the HDD tier. By using this
feature, you can ensure that all the critical VMs get consistent performance with higher throughput after
they are pinned to a SSD tier. Maximum of 25% of your overall SSD capacity across the cluster is available
for the virtual disk pinning.
You can configure VM pinning in the following two ways.
• Full pinning
• Partial pinning

Full Pinning

If you configure pinning for a virtual disk equal to the total size of virtual disk then the virtual disk is
considered to be fully pinned to the SSD tier. This policy only applies to the new data written to or read
from the storage tier. Once the data gets accessed or written to the storage system, the data first gets
placed on the SSD tier (subjected to SSD tier space availability). In case of fully pinned virtual disk,
once data is placed into SSD tier, it never gets down migrated to HDD tier if pinning configurations are
preserved. All the sequential and random read or write requests are served from the SSD tier.

Caution: If the SSD tier is 100% full and if you have fully pinned a virtual disk to SSD tier, then the
subsequent writes to that particular virtual disk may fail.

Partial Pinning

If you configure pinning of a virtual disk less than the total size of virtual disk, then its considered to be
partially pinned to the SSD tier. If you do not want to place the complete virtual disk in the SSD tier, then
partial pinning provides you with flexibility to select the amount of virtual disk to be pinned to the SSD tier.
This ensures that only working set of the virtual disk resides on the SSD tier at all the times. If the virtual
disk uses more SSD space than configured, then excess data may get down migrated to HDD tier. Partial
pinning provides more flexibility to keep only hot data in the SSD tier for a particular virtual disk and still
provides the higher IOPS and throughput.

VM Pinning Rules and Guidelines

• If the virtual disk is partially pinned to the SSD tier, data usage of a virtual disk above the pinned space
on the SSD tier may be down migrated to a HDD tier.
• If the virtual disk is partially pinned to the SSD tier, more preference is given to hot working set of the
virtual disk while down migrating the data to HDD tier.

Virtual Machine Pinning | Acropolis Advanced Administration Guide | AOS | 33


• If the virtual disk is fully pinned to the SSD tier and if the SSD tier is running on 100% capacity, then
further writes to the virtual disk fails since no space is available on the SSD tier for new writes.
• If a pinned virtual disk is cloned, then pinning policies are not automatically applied to cloned virtual
disks.
• Pinning policies are not automatically applied if VM is restored from the DR snapshot on the local or the
remote site.
• Pinning policies can be used for controlling down migration from the hot tier to the cold tier. The policies
does not restrict the usage for the virtual disk on the hot tier. If space is available on the hot tier, more
data from the virtual disk can reside on the hot tier.
• If virtual machine is Nutanix DR protected and one or more virtual disks are pinned to the SSD tier, then
pinning policies for the virtual disks are not stored as a part of the DR snapshots.
For more information about configuring VM pinning, see Acropolis Command Reference guide.

Virtual Machine Pinning | Acropolis Advanced Administration Guide | AOS | 34


6
Self-Service Restore
The self-service restore (also known as file-level restore) feature allows virtual machine administrators
to do a self-service recovery from the Nutanix data protection snapshots with minimal administrator
intervention.
The Nutanix administrator should deploy NGT on the VM and then enable this feature. For more
information on enabling and mounting NGT, see the Enabling and Mounting Nutanix Guest Tools section of
Prism Web Console Guide. After the feature is enabled and a disk is attached, the guest VM administrator
can recover files within the guest operating system. If the guest VM administrator fails to detach the disk, it
gets automatically detached from the VM after 24 hours.
Note:
• The Nutanix administrator can enable this feature for a VM only through nCLI, and in-guest
actions can be performed only by using NGT.
• Only Async-DR workflow is supported for the self-service restore feature.

Self-Service Enabled Disks Impact on Disaster Recovery

Disks with this feature enabled running on ESXi are not backed up by the disaster recovery workflows.
However original disks of the VMs are backed up. If you replicate the VMs with self-service enabled disks,
the following scenarios occur.
• If attaching a self-service disk is attempted for a VM that is part of a protection domain and is getting
replicated to a remote site previous to AOS 4.5, an error message is displayed during the disk attach
operation.
• If a snapshot with the attached self-service disk is replicated to a remote site previous to AOS 4.5, an
alert message is displayed during the replication process.

Requirements and Limitations of Self-Service Restore


See Nutanix Guest Tools Requirements and Limitations section of the Prism Web Console Guide for more
information on general NGT requirements (Windows operating system requirements, port requirements,
hypervisor requirements, etc). Requirements specific to this feature are as follows.

Requirements

Privilege Requirements and Limitations


Guest VM Administrator • JRE 1.8 or later releases.
• Linux VMs are not supported.

Self-Service Restore | Acropolis Advanced Administration Guide | AOS | 35


Privilege Requirements and Limitations
Nutanix Administrator • The disk.enableUUID should be present in
the .vmx file for the VMs on ESXi.
• Guest VM must have configured Nutanix
snapshots by adding VM to a protection domain.
Snapshots that are created in AOS 4.5 or later
releases are only supported.
• Ensure that sufficient logical drive letters are
available to bring the disk online.
• Volume groups are not supported.
• File Systems. NTFS on simple volumes,
spanned volumes, striped volumes, mirrored
volumes, and RAID-5 volumes are not
supported.
• IDE/SCSI disks are only supported. SATA, PCI,
and delta disks are not supported.

Enabling Self-Service Restore


After enabling NGT for a VM, the Nutanix administrator can enable the self-service restore for a VM.

1. Log in to the Controller VM.

2. Verify whether NGT is enabled or not.


ncli> ngt get vm-id=virtual_machine_id

Replace virtual_machine_id with the ID of the VM. You can retrieve the ID of a VM by using ncli> vm ls
command. If NGT is not enabled, output similar to the following is displayed.

VM Id: 00051a34-066f-72ed-0000-000000005400::38dc7bf2-a345-4e52-9af6-c1601e759987
Nutanix Guest Tools Enabled: false
File Level Restore: false

3. Enable self-service restore feature.


ncli> ngt enable application-names=application_names vm-id=virtual_machine_id

Replace virtual_machine_id with the ID of the VM. Replace application_names with file-
level-restore. For example, to enable NGT and self-service restore for the VM with ID
00051a34-066f-72ed-0000-000000005400::38dc7bf2-a345-4e52-9af6-c1601e759987, use the following
command.
ncli> ngt enable application-names=file-level-restore \
vm-id=00051a34-066f-72ed-0000-000000005400::38dc7bf2-a345-4e52-9af6-c1601e759987

4. (Optional) To disable self-service restore for a VM, use the following command.
ncli> ngt disable-applications application-names=file-level-restore vm-
id=virtual_machine_id

Self-Service Restore | Acropolis Advanced Administration Guide | AOS | 36


Restoring a File as a Guest VM Administrator
After the administrator installs the NGT software inside the guest VM, the guest VM administrator can
restore the desired file or files from the VM.
Before you begin:
• Mount NGT for a VM. For more information about enabling NGT, see Enabling and Mounting Nutanix
Guest Tool topic of Prism Web Console guide.
• Verify that you have configured your Windows VM to use NGT. For more information, see Installing NGT
on Windows Machines topic of Prsim Web Console guide.

1. Login to the guest VM.

2. Open the command prompt as an administrator.

3. Go to the ngtcli directory in Program Files > Nutanix.


cd c:\Program Files\Nutanix\ngtcli

4. Run the ngtcli.cmd command.


If JRE is not installed on the VM, an error message is displayed. Otherwise, a command line interface to
run the ngtcli is displayed.

5. List the snapshots and virtual disks that are present for the guest VM by using the following command.
ngtcli> flr ls-snaps

The snapshot ID, disk labels, logical drives, and create time of the snapshot is displayed. The guest VM
administrator can use this information and take a decision to restore the files from the relevant snapshot
that has the data. For example, if the files are present in logical drive “C:” for the snapshot 41 (figure
below) and disk label scsi0:0, the guest VM administrator can use this snapshot ID and disk label to
attach the disk.

Figure: List Snapshots

6. Attach the disk from the snapshots.


ngtcli> flr attach-disk disk-label=disk_label snapshot-id=snap_id

For example, to attach a disk with snapshot ID 16353 and disk label scsi0:1, type the command.
ngtcli> flr attach-disk snapshot-id=16353 disk-label= scsi0:1

An output similar to the following is displayed.

Self-Service Restore | Acropolis Advanced Administration Guide | AOS | 37


Figure: Attached Disks

After successfully running the command, a new disk with label "G" is attached to the guest VM.
If sufficient logical drive letters are not present, bringing disks online action fails. In this case, you should
detach the current disk, create enough free slots by detaching other self-service disks and re-attach the
disk again.

7. Go to the attached disk label drive and restore the desired files.

8. To detach a disk, use the following command.


ngtcli> flr detach-disk attached-disk-label=attached_disk_label

For example, to remove the disk with disk label scsi0:3, type the command.
ngtcli> flr detach-disk attached-disk-label=scsi0:3

If the disk is not removed by the guest VM administrator, the disk is automatically removed after 24
hours.

9. To view all the attached disks to the VM, use the following command.
ngtcli> flr list-attached-disks

Restoring a File as a Nutanix Administrator


The Nutanix administrator can also attach or remove the disks from the VM. However, the attached disk
does not come online automatically in the VM. The administrators should use the disk management utility
to make the disk online.

1. Log into the Controller VM.

2. Retrieve the self-service restore capable snapshots of a VM.


ncli> vm list-flr-snapshots vm-id=virtual_machine_id

Replace virtual_machine_id with the ID of the VM.

3. Attach a disk from a self-service restore capable snapshot.


ncli> vm attach-flr-disk vm-id=virtual_machine_id snap-id=snapshot_id \
disk-label=disk_label

For example, to attach a disk with VM ID 00051a34-066f-72ed-0000-000000005400::5030468c-32db-


c0cc-3e36-515502787dec, snapshot ID 4455 and disk label scsi0:0, type the command.
ncli> vm attach-flr-disk vm-id=00051a34-066f-72ed-0000-000000005400::5030468c-32db-
c0cc-3e36-515502787dec snap-id=4455 disk-label=scsi0:0

The attached disk does not come automatically online. Administrators should make the disk online by
using disk management utility of Windows.
Once the disk is attached, the guest VM administrator can restore the files from the attached disk.

Self-Service Restore | Acropolis Advanced Administration Guide | AOS | 38


4. To remove a self-service restore disk from a VM.
ncli> vm detach-flr-disk vm-id=virtual_machine_id attached-disk-label=attached_disk_label

For example, to remove a disk with VM ID 00051a34-066f-72ed-0000-000000005400::5030468c-32db-


c0cc-3e36-515502787dec, and disk label scsi0:0, type the command.
ncli> vm detach-flr-disk vm-id=00051a34-066f-72ed-0000-000000005400::5030468c-32db-
c0cc-3e36-515502787dec attached-disk-label=scsi0:0

5. View all the self-service restore capable snapshots attached to a VM.


ncli> vm list-attached-flr-snapshots vm-id=virtual_machine_id

Replace virtual_machine_id with the ID of the VM.

Self-Service Restore | Acropolis Advanced Administration Guide | AOS | 39


7
Logs

Sending Logs to a Remote Syslog Server


The Nutanix command-line interface (nCLI) command rsyslog-config enables you to send logs from your
Nutanix cluster to a remote syslog server.
• The Command Reference contains more information about rsyslog-config command syntax.
• The Acropolis Advanced Administration Guide troubleshooting topics have more detailed information
about common and AOS logs (such as Stargate and Cassandra logs).

Recommendations and considerations

• As the logs are forwarded from a Controller VM, the logs display the IP address of the Controller VM.
• You can only configure one rsyslog server; you cannot specify multiple servers.
• The remote syslog server is enabled by default.
• Supported transport protocols are TCP and UDP.
• rsyslog-config supports and can report messages from the following Nutanix modules:

AOS Module Names for rsyslog-config

Logs are located in /home/nutanix/data/logs.

Module name With monitor logs disabled, these logs are With monitor logs enabled, these logs are
forwarded also forwarded

cassandra cassandra/system.log, cassandra_monitor.loglevel


dynamic_ring_changer.out,
dynamic_ring_changer.loglevel

cerebro cerebro.loglevel cerebro.out

curator curator.loglevel curator.out


genesis genesis.out genesis.out

prism prism_gateway.log prism_monitor.loglevel, prism.out

stargate stargate.loglevel stargate.out

zookeeper zookeeper.out zookeeper_monitor.loglevel

Logs | Acropolis Advanced Administration Guide | AOS | 40


AOS Log Level Mapping to syslog Log Levels

AOS log levels Contain information from these syslog log levels

INFO DEBUG, INFO

WARNING NOTICE, WARNING

ERROR ERROR

FATAL CRITICAL, ALERT, EMERGENCY


• rsyslog-config also supports the system module SYSLOG_MODULE, which logs operating system
messages in /var/log/messages. As of AOS 4.7, adding the SYSLOG_MODULE module to the rsyslog
configuration will also configure rsyslog on compatible AHV hosts. A compatible host must be running
an AHV release later than AHV-20160217.2.
• Enable module logs at the ERROR level, unless you require more information. If you enable more
levels, the rsyslogd daemon sends more messages. For example, if you set the SYSLOG_MODULE
level to INFO, your remote syslog server might receive a large quantity of operating system messages.
• CPU usage might reach 10 percent when the rsyslogd daemon is initially enabled and starts
processing existing logs. This is an expected condition on first use of an rsyslog implementation.

Configuring the Remote Syslog Server Settings


Before you begin: Install the Nutanix command-line interface (nCLI) and connect to a Controller VM in
your cluster. See the Command Reference for details.
Note: As the logs are forwarded from a Controller VM, the logs display the IP address of the
Controller VM.

1. As the remote syslog server is enabled by default, disable it while you configure settings.
ncli> rsyslog-config set-status enable=false

2. Create a syslog server (which adds it to the cluster) and confirm it has been created.
ncli> rsyslog-config add-server name=remote_server_name ip-address=remote_ip_address
port=port_num network-protocol={tcp | udp}
ncli> rsyslog-config ls-servers

Name : remote_server_name
IP Address : remote_ip_address
Port : port_num
Protocol : TCP or UDP

remote_server_name A descriptive name for the remote server receiving the specified
messages
remote_ip_address The remote server's IP address
port_num Destination port number on the remote server.
tcp | udp Choose tcp or udp as the transport protocol

3. Choose a module to forward log information from and specify the level of information to collect.
ncli> rsyslog-config add-module server-name=remote_server_name module-name=module
level=loglevel include-monitor-logs={ false | true }

• Replace module with one of the following:

Logs | Acropolis Advanced Administration Guide | AOS | 41


• cassandra
• cerebro
• curator
• genesis
• prism
• stargate
• zookeeper
• Replace loglevel with one of the following:
• INFO
• WARNING
• ERROR
• FATAL
Enable module logs at the ERROR level unless you require more information.

• (Optional) Set include-monitor-logs to specify whether the monitor logs are sent. It is enabled
(true) by default. If disabled (false), only certain logs are sent.

Note: If enabled, the include-monitor-logs option sends all monitor logs, regardless of the
level set by the level= parameter.

4. Configure additional modules if desired with rsyslog-config add-module.

5. Enable the server.


ncli> rsyslog-config set-status enable=true

Logs are now forwarded to the remote syslog server.

Common Log Files


Nutanix nodes store log files in different directories, depending on the type of information they contain.

Nutanix Logs Root


The location for Nutanix logs is /home/nutanix/data/logs.
This location of the logs directory contains all the Nutanix process logs at the INFO, WARNING, ERROR
and FATAL levels. It also contains the directories for the system stats (sysstats), and Cassandra system
logs (cassandra).
The most recent FATAL log only contains the reason for the process to fail. More information can be found
in the other types of logs by analyzing the entries leading up to the failure.

Note: The symbolic link component_name.[INFO|WARNING|ERROR|FATAL] points to the most recent


component log. For example:
stargate.FATAL -> stargate.NTNX-12AM3K490006-2-
CVM.nutanix.log.FATAL.20130712-141913.30286

.FATAL Logs

If a component fails, it creates a log file named according to the following convention:
component-name.cvm-name.log.FATAL.date-timestamp

Logs | Acropolis Advanced Administration Guide | AOS | 42


• component-name identifies to the component that created the file, such as Curator or Stargate.
• cvm-name identifies to the Controller VM that created the file.
• date-timestamp identifies the date and time when the first failure within that file occurred.
Each failure creates a new .FATAL log file.
Log entries use the following format:
[IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg

The first character indicates whether the log entry is an Info, Warning, Error, or Fatal. The next four
characters indicate the day on which the entry was made. For example, if an entry starts with F0820, it
means that at some time on August 20th, the component had a failure.

Tip: The cluster also creates .INFO and .WARNING log files for each component. Sometimes, the
information you need is stored in one of these files.

Self-Monitoring (sysstats) Logs


Self-monitoring logs are in /home/nutanix/data/logs/sysstats.
The node self-monitors itself by running several Linux tools every few minutes, including ping, iostat, sar,
and df.
This directory contains the output for each of these commands, along with the corresponding timestamp.

/home/nutanix/data/logs/cassandra

This is the directory where the Cassandra metadata database stores its logs. The Nutanix process that
starts the Cassandra database (cassandra_monitor) logs to the /home/nutanix/data/logs directory.
However, the most useful information relating to the Cassandra is found in the system.log* files located in
the /home/nutanix/data/logs/cassandra directory.
This directory contains the output for each of these commands, along with the corresponding timestamp.

Controller VM Log Files


These log files are present on Controller VMs.

Location: /home/nutanix/data/logs

Log Contents Frequency

alert_manager.[out, ERROR, FATAL, INFO, Alert manager process output


WARNING]

cassandra_monitor.[out, ERROR, FATAL, INFO] Cassandra database monitor


process output

catalina.out Catalina/Tomcat for Prism


process output

cerebro.[out, ERROR, FATAL] DR and replication activity

check-cores.log Core file processing every 1 min

check-fio fio-status output every 1 hour

Logs | Acropolis Advanced Administration Guide | AOS | 43


Log Contents Frequency

check-hardware.log Power supply, fan speed, and every 1 min


DIMM temperature status
check_intel.log Intel PCIe-SSD status every 1 min

check-ip-connectivity.log Network connectivity status to every 1 min


IPMI, hypervisor, and Controller
VM of all nodes in the cluster

chronos_node_main.[INFO, ERROR, FATAL, Write-ahead log (WAL) status


WARNING]

connection_splicer.[out, ERROR, FATAL, INFO, Internal process connection status


WARNING]

cron_avahi_monitor.log Avahi process status


cron_time_check.log Check time difference across every 1 min
Controller VMs

curator.[out, ERROR, FATAL, INFO, WARNING] Metadata health and ILM activity

disk_usage.log Disk and inode usage of all every 1 min


partitions on the Controller VM

dynamic_ring_changer.[out, ERROR, FATAL] Metadata migration across nodes


activity

genesis.out Nutanix software start process


output

hyperint_monitor.[out, ERROR, FATAL, INFO, Hypervisor integration activity


WARNING]

pithos.[out, ERROR, FATAL, INFO, WARNING] vDisk configuration activity

prism_gateway.[out, ERROR, FATAL, INFO] Prism leader activity

prism_monitor.[out, ERROR, FATAL, INFO] Prism (Web console, nCLI, REST


API) monitor process output

scavenger.out Log and core file clean-up status

send-email.log E-mail alerts sent from the every 1 min


Controller VM

snmp_manager.out SNMP service logs.

ssh_tunnel.log Connect status to


nsc.nutanix.com for the remote
support tunnel

stargate.[out, ERROR, FATAL, INFO, WARNING] NFS interface activity

stats_aggregator.[out, ERROR, FATAL, INFO] Statistics aggregator process


output

support-info.log Daily automated support (ASUP)


alerts

using-gflags.log gflags status

Logs | Acropolis Advanced Administration Guide | AOS | 44


Log Contents Frequency

zeus_config_printer.INFO Contents of cluster configuration


database
zookeeper_monitor.[out, ERROR, INFO] Cluster configuration and cluster
state activity

Location: /home/nutanix/data/logs/cassandra

Log Contents

system.log Cassandra system activity

Location: /home/nutanix/data/logs/sysstats

Log Contents Frequency Command

df.info Mounted filesystems every 5 sec df -h

disk_usage.INFO Disk usage across disks every 1 hour du

interrupts.INFO CPU interrupts every 5 sec

iostat.INFO I/O activity for each physical disk every 5 sec sudo iostat

iotop.INFO Current I/O in realtime every 5 sec sudo iotop

lsof.INFO List of open files and processes every 1 min sudo lsof
that open them

meminfo.INFO Memory usage every 5 sec cat /proc/


meminfo

metadata_disk_usage.INFO Disk usage for metadata drives every 5 sec

mpstat.INFO CPU activities per CPU every 5 sec mpstat

ntpq.INFO NTP information every 1 min ntpq -pn

ping_gateway.INFO Pings to the default gateway every 5 sec ping

ping_hosts.INFO Pings to all other Controller VMs every 1 min ping

sar.INFO Network bandwidth every 5 sec sar -n DEV, -n


EDEV

top.INFO Real-time CPU and memory every 5 sec top


activity

Location: /home/nutanix/data/serviceability/alerts

Log Contents
num.processed Alerts that have been processed

Logs | Acropolis Advanced Administration Guide | AOS | 45


Location: /var/log

Log Contents

dmesg OS start messages

kernel OS kernel messages

messages OS messages after starting

Correlating the FATAL log to the INFO file

When a process fails, the reason for the failure is recorded in the corresponding FATAL log. There are two
ways to correlate this log with the INFO file to get more information:

1. Search for the timestamp of the FATAL event in the corresponding INFO files.

a. Determine the timestamp of the FATAL event.

b. Search for the timestamp in the corresponding INFO files.

c. Open the INFO file with vi and go to the bottom of the file (Shift+G).

d. Analyze the log entries immediately before the FATAL event, especially any errors or warnings.

In the following example, the latest stargate.FATAL determines the exact timestamp:
nutanix@cvm$ cat stargate.FATAL

Log file created at: 2013/09/07 01:22:23


Running on machine: NTNX-12AM3K490006-2-CVM
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
F0907 01:22:23.124495 10559 zeus.cc:1779] Timed out waiting for Zookeeper session
establishment

In the above example, the timestamp is F0907 01:22:23 , or September 7 at 1:22:23 AM.
Next, grep for this timestamp in the stargate*INFO* files:
nutanix@cvm$ grep "^F0907 01:22:23" stargate*INFO* | cut -f1 -
d:stargate.NTNX-12AM3K490006-2-CVM.nutanix.log.INFO.20130904-220129.7363

This tells us that the relevant file to look at is stargate.NTNX-12AM3K490006-2-


CVM.nutanix.log.INFO.20130904-220129.7363.

2. If a process is repeatedly failing, it might be faster to do a long listing of the INFO files and select
the one immediately preceding the current one. The current one would be the one referenced by the
symbolic link.
For example, in the output below, the last failure would be recorded in the file
stargate.NTNX-12AM3K490006-2-CVM.nutanix.log.INFO.20130904-220129.7363.
ls -ltr stargate*INFO*

-rw-------. 1 nutanix nutanix 104857622 Sep 3 11:22 stargate.NTNX-12AM3K490006-2-


CVM.nutanix.log.INFO.20130902-004519.7363
-rw-------. 1 nutanix nutanix 104857624 Sep 4 22:01 stargate.NTNX-12AM3K490006-2-
CVM.nutanix.log.INFO.20130903-112250.7363
-rw-------. 1 nutanix nutanix 56791366 Sep 5 15:12 stargate.NTNX-12AM3K490006-2-
CVM.nutanix.log.INFO.20130904-220129.7363

Logs | Acropolis Advanced Administration Guide | AOS | 46


lrwxrwxrwx. 1 nutanix nutanix 71 Sep 7 01:22 stargate.INFO ->
stargate.NTNX-12AM3K490006-2-CVM.nutanix.log.INFO.20130907-012223.11357
-rw-------. 1 nutanix nutanix 68761 Sep 7 01:33 stargate.NTNX-12AM3K490006-2-
CVM.nutanix.log.INFO.20130907-012223.11357

Tip: You can use the procedure above for the other types of files as well (WARNING and
ERROR) in order to narrow the window of information. The INFO file provides all messages,
WARNING provides only warning, error, and fatal-level messages, ERROR provides only error
and fatal-level messages, and so on.

Stargate Logs

This section discusses common entries found in Stargate logs and what they mean.
The Stargate logs are located at /home/nutanix/data/logs/stargate.[INFO|WARNING|ERROR|FATAL].

Log Entry: Watch dog fired

F1001 16:20:49.306397 6630 stargate.cc:507] Watch dog fired

This message is generic and can happen for a variety of reasons. While Stargate is initializing, a watch dog
process monitors it to ensure a successful startup process. If it has trouble connecting to other components
(such as Zeus or Pithos) the watch dog process stops Stargate.
If Stargate is running, this indicates that the alarm handler thread is stuck for longer than 30 seconds. The
stoppage could be due to a variety of reasons, such as problems connecting to Zeus or accessing the
Cassandra database.
To analyze why the watch dog fired, first locate the relevant INFO file, and review the entries leading up to
the failure.

Log Entry: HTTP request timed out

E0820 09:14:05.998002 15406 rpc_client.cc:559] Http request timed out

This message indicates that Stargate is unable to communicate with Medusa. This may be due to a
network issue.
Analyze the ping logs and the Cassandra logs.

Log Entry: CAS failure seen while updating metadata for egroup egroupid or Backend returns error
'CAS Error' for extent group id: egroupid

W1001 16:22:34.496806 6938 vdisk_micro_egroup_fixer_op.cc:352] CAS failure seen while


updating metadata for egroup 1917333

This is a benign message and usually does not indicate a problem. This warning message means that
another Cassandra node has already updated the database for the same key.

Log Entry: Fail-fast after detecting hung stargate ops: Operation with id <opid> hung for 60secs

F0712 14:19:13.088392 30295 stargate.cc:912] Fail-fast after detecting hung stargate ops:
Operation with id 3859757 hung for 60secs

Logs | Acropolis Advanced Administration Guide | AOS | 47


This message indicates that Stargate restarted because an I/O operation took more than 60 seconds to
complete.
To analyze why the I/O operation took more than 60 seconds, locate the relevant INFO file and review the
entries leading up to the failure.

Log Entry: Timed out waiting for Zookeeper session establishment

F0907 01:22:23.124495 10559 zeus.cc:1779] Timed out waiting for Zookeeper session
establishment

This message indicates that Stargate was unable to connect to Zookeeper.


Review the sysstats/ping_hosts.INFO log to determine if there were any network issues around that time.

Log Entry: Too many attempts trying to access Medusa

F0601 10:14:47.101438 2888 medusa_write_op.cc:85] Check failed: num_retries_ < 5 (5 vs. 5)


Too many attempts trying to access Medusa

This message indicates that Stargate had 5 failed attempts to connect to Medusa/Cassandra.
Review the Cassandra log (cassandra/system.log) to see why Cassandra was unavailable.

Log Entry:multiget_slice() failed with error: error_code while reading n rows from
cassandra_keyspace

E1002 18:51:43.223825 24634 basic_medusa_op.cc:1461] multiget_slice() failed with error: 4


while reading 1 rows from 'medusa_nfsmap'. Retrying...

This message indicates that Stargate cannot connect to Medusa/Cassandra.


Review the Cassandra log (cassandra/system.log) to see why Cassandra was unavailable.

Log Entry: Forwarding of request to NFS master ip:2009 failed with error kTimeout.

W1002 18:50:59.248074 26086 base_op.cc:752] Forwarding of request to NFS master


172.17.141.32:2009 failed with error kTimeout

This message indicates that Stargate cannot connect to the NFS master on the node specified.
Review the Stargate logs on the node specified in the error.

Cassandra Logs
After analyzing Stargate logs, if you suspect an issue with Cassandra/Medusa, analyze the Cassandra
logs. This topic discusses common entries found in system.log and what they mean.
The Cassandra logs are located at /home/nutanix/data/logs/cassandra. The most recent file is named
system.log. When the file reaches a certain size, it rolls over to a sequentially numbered file (example,
system.log.1, system.log.2, and so on).

Logs | Acropolis Advanced Administration Guide | AOS | 48


Log Entry: batch_mutate 0 writes succeeded and 1 column writes failed for
keyspace:medusa_extentgroupidmap

INFO [RequestResponseStage:3] 2013-09-10 11:51:15,780 CassandraServer.java


(line 1290) batch_mutate 0 writes succeeded and 1 column writes failed for
keyspace:medusa_extentgroupidmap cf:extentgroupidmap row:lr280000:1917645 Failure Details:
Failure reason:AcceptSucceededForAReplicaReturnedValue : 1

This is a common log entry and can be ignored. It is equivalent to the CAS errors in the stargate.ERROR
log. It simply means that another Cassandra node updated the keyspace first.

Log Entry: InetAddress /x.x.x.x is now dead.

INFO [ScheduledTasks:1] 2013-06-01 10:14:29,767 Gossiper.java (line 258) InetAddress /


x.x.x.x is now dead.

This message indicates that the node could not communicate with the Cassandra instance at the specified
IP address.
Either the Cassandra process is down (or failing) on that node or there are network connectivity issues.
Check the node for connectivity issues and Cassandra process restarts.

Log Entry: Caught Timeout exception while waiting for paxos read response from leader: x.x.x.x

ERROR [EXPIRING-MAP-TIMER-1] 2013-08-08 07:33:25,407 PaxosReadDoneHandler.java (line 64)


Caught Timeout exception while waiting for paxos read reponse from leader: 172.16.73.85.
Request Id: 116. Proto Rpc Id : 2119656292896210944. Row no:1. Request start time: Thu Aug
08 07:33:18 PDT 2013. Message sent to leader at: Thu Aug 08 07:33:18 PDT 2013 # commands:1
requestsSent: 1

This message indicates that the node encountered a timeout while waiting for the Paxos leader.
Either the Cassandra process is down (or failing) on that node or there are network connectivity issues.
Check the node for connectivity issues or for the Cassandra process restarts.

Prism Gateway Log

This section discusses common entries found in prism_gateway.log and what they mean. This log is
located on the Prism leader. The Prism leader is the node which is running the web server for the Nutanix
UI. This is the log to analyze if there are problems with the UI such as long loading times.
The Prism log is located at /home/nutanix/data/logs/prism_gateway.log on the Prism leader.
To identify the Prism leader, you can run cluster status | egrep "CVM|Prism" and determine which node
has the most processes. In the output below, 10.3.176.242 is the Prism leader.
nutanix@cvm$ cluster status | egrep "CVM|Prism"

2013-09-10 16:06:42 INFO cluster:946 Executing action status on CVMs


10.3.176.240,10.3.176.241,10.3.176.2422013-09-10
16:06:45 INFO cluster:987 Success!
CVM: 10.3.176.240 Up
Prism UP[32655, 32682, 32683, 32687]
CVM: 10.3.176.241 Up
Prism UP[11371, 25913, 25925, 25926]
CVM: 10.3.176.242 Up, ZeusLeader

Logs | Acropolis Advanced Administration Guide | AOS | 49


Prism UP[4291, 4303, 4304, 19468, 20072, 20074, 20075, 20078,
20113]

Log Entry: Error sending request: java.net.NoRouteToHostException: Cannot assign requested


address

The stats_aggregator component periodically issues an RPC request for all Nutanix vdisks in the cluster. It
is possible that all the ephemeral ports are exhausted.

The ss -s command shows you the number of open ports.


nutanix@cvm$ ss -s

Total: 277 (kernel 360)


TCP: 218 (estab 89, closed 82, orphaned 0, synrecv 0, timewait 78/0), ports
207

Transport Total IP IPv6


* 360 - -
RAW 1 1 0
UDP 23 13 10
TCP 136 84 52
INET 160 98 62
FRAG 0 0 0

If there are issues with connecting to the Nutanix UI, escalate the case and provide the
output of the ss -s command as well as the contents of prism_gateway.log.

Zookeeper Logs

The Zookeeper logs are located at /home/nutanix/data/logs/zookeeper.out.


This log contains the status of the Zookeeper service. More often than not, there is no need to look at this
log. However, if one of the other logs specifies that it is unable to contact Zookeeper and it is affecting
cluster operations, you may want to look at this log to find the error Zookeeper is reporting.

Genesis Logs

When checking the status of the cluster services, if any of the services are down, or the Controller VM
is reporting Down with no process listing, review the log at /home/nutanix/data/logs/genesis.out to
determine why the service did not start, or why Genesis is not properly running.
Check the contents of genesis.out if a Controller VM reports multiple services as DOWN, or if the entire
Controller VM status is DOWN.
Like other component logs, genesis.out is a symbolic link to the latest genesis.out instance and has the
format genesis.out.date-timestamp.
An example of healthy output:
nutanix@cvm$ tail -F genesis.out

2013-09-09 12:30:06 INFO node_manager.py:910 Starting 12th service:

Logs | Acropolis Advanced Administration Guide | AOS | 50


2013-09-09 12:30:06 INFO stats_aggregator_service.py:142 Initialized StatsAggregator
2013-09-09 12:30:06 INFO service_utils.py:136 Starting stats_aggregator with rlimits {5:
268435456}
2013-09-09 12:30:06 INFO service_utils.py:153 Starting stats_aggregator with cmd /home/
nutanix/bin/stats_aggregator_monitor --enable_self_monitoring=true |& /home/nutanix/cluster/
bin/logpipe -o /home/nutanix/data/logs/stats_aggregator.out
2013-09-09 12:30:06 INFO node_manager.py:910 Starting 13th service:
2013-09-09 12:30:06 INFO service_utils.py:136 Starting sys_stat_collector with rlimits {9:
1572864000,5: 134217728}
2013-09-09 12:30:06 INFO service_utils.py:153 Starting sys_stat_collector with cmd /
home/nutanix/bin/sys_stats_collector.py --out_dir=/home/nutanix/data/logs/sysstats --
max_log_size_MB=100 --data_collect_period_secs=5 --enable_self_monitoring --logtostderr |& /
home/nutanix/cluster/bin/logpipe -o /home/nutanix/data/logs/sys_stat_collector.out
2013-09-09 12:30:09 INFO avahi.py:165 Unpublishing service Nutanix Controller
QTF3ME521900350
2013-09-09 12:30:09 INFO avahi.py:141 Publishing service Nutanix Controller QTF3ME521900350
of type _nutanix._tcp on port 2100
2013-09-09 12:30:13 INFO zookeeper_service.py:424 Zookeeper is running as follower

Under normal conditions, the genesis.out file logs the following messages periodically:
Unpublishing service Nutanix Controller
Publishing service Nutanix Controller
Zookeeper is running as [leader|follower]

Prior to these occasional messages, you should see Starting [n]th service. This is an indicator that all
services were successfully started. As of 4.1.3, there are 20 services.

Tip: You can ignore any INFO messages logged by Genesis by running the command:
grep -v -w INFO /home/nutanix/data/logs/genesis.out

Possible Errors

2013-09-09 16:20:01 ERROR rpc.py:303 Json Rpc request for unknown Rpc object NodeManager

2013-09-09 16:20:18 WARNING command.py:264 Timeout executing scp -q -o CheckHostIp=no


-o ConnectTimeout=15 -o StrictHostKeyChecking=no -o TCPKeepAlive=yes -o
UserKnownHostsFile=/dev/null -o PreferredAuthentications=keyboard-interactive,password -o
BindAddress=192.168.5.254 'root@[192.168.5.1]:/etc/resolv.conf' /tmp/resolv.conf.esx: 30
secs elapsed

2013-09-09 16:20:18 ERROR node_dns_ntp_config.py:287 Unable to download ESX DNS


configuration file, ret -1, stdout , stderr

2013-09-09 16:20:18 WARNING node_manager.py:2038 Could not load the local ESX configuration

2013-09-09 16:19:48 ERROR node_dns_ntp_config.py:492 Unable to download the ESX NTP


configuration file, ret -1, stdout , stderr

Any of the above messages means that Genesis was unable to log on to the ESXi host using the
configured password.

Diagnosing a Genesis Failure


Determine the cause of a Genesis failure based on the information available in the log files.

1. Examine the contents of the genesis.out file and locate the stack trace (indicated by the CRITICAL
message type).

Logs | Acropolis Advanced Administration Guide | AOS | 51


2. Analyze the ERROR messages immediately preceding the stack trace.
...
2015-06-26 00:14:12 INFO node_manager.py:4170 No cached Zeus configuration found.
2015-06-26 00:14:12 INFO hyperv.py:142 Using RemoteShell ...
2015-06-26 00:14:12 INFO hyperv.py:282 Updating NutanixUtils path
2015-06-26 00:14:12 ERROR hyperv.py:290 Failed to update the NutanixUtils path: [Errno
104] Connection reset by peer
2015-06-26 00:14:12 CRITICAL node_manager.py:3559 File "/home/nutanix/cluster/bin/
genesis", line 207, in <module>
main(args)
File "/home/nutanix/cluster/bin/genesis", line 149, in main
Genesis().run()
File "/home/nutanix/jita/main/28102/builds/build-danube-4.1.3-stable-release/python-
tree/bdist.linux-x86_64/egg/util/misc/decorators.py", line 40, in wrapper
File "/home/nutanix/jita/main/28102/builds/build-danube-4.1.3-stable-release/python-
tree/bdist.linux-x86_64/egg/cluster/genesis/server.py", line 132, in run
File "/home/nutanix/jita/main/28102/builds/build-danube-4.1.3-stable-release/python-
tree/bdist.linux-x86_64/egg/cluster/genesis/node_manager.py", line 502, in initialize
File "/home/nutanix/jita/main/28102/builds/build-danube-4.1.3-stable-release/python-
tree/bdist.linux-x86_64/egg/cluster/genesis/node_manager.py", line 3559, in discover
...

In the example above, the certificates in AuthorizedCerts.txt were not updated, which means that you
failed to connect to the NutanixHostAgent service on the host.

ESXi Log Files


These log files are present on ESXi hosts.

Location: /var/logs

Log Contents

hostd.log hostd (daemon to communicate with vmkernel) process output

vmkernel.log vmkernel activity

vpxa.log vpxa (daemon to commmunicate with vCenter) process output

Location: /vmfs/volumes/

Log Contents

datastore/vm_name/vmware.log Virtual machine activity and health

Logs | Acropolis Advanced Administration Guide | AOS | 52


8
Troubleshooting Tools

Nutanix Cluster Check (NCC)


Nutanix Cluster Check (NCC) is a framework of scripts that can help diagnose cluster health. NCC can
be run provided that the individual nodes are up, regardless of cluster state. The scripts run standard
commands against the cluster or the nodes, depending on the type of information being retrieved.
When run from the Controller VM command line, NCC generates a log file with the output of the diagnostic
commands selected by the user.
NCC actions are grouped into plugins and modules.
• Plugins are objects that run the diagnostic commands.
• Modules are logical groups of plugins that can be run as a set.
Note: Some plugins run nCLI commands and might require the user to input the nCLI password.
The password is logged on as plain text. If you change the password of the admin user from
the default, you must specify the password every time you start an nCLI session from a remote
system. A password is not required if you are starting an nCLI session from a Controller VM where
you are already logged on.

Comprehensive documentation of NCC is available in the Nutanix Command Reference.

NCC Output

Each NCC plugin is a test that completes independently of other plugins. Each test completes with one of
these status types.
PASS
The tested aspect of the cluster is healthy and no further action is required.
FAIL
The tested aspect of the cluster is not healthy and must be addressed.
WARN
The plugin returned an unexpected value and must be investigated.
INFO
The plugin returned an expected value that however cannot be evaluated as PASS/FAIL.

Running Health Checks

In addition to running all health checks, you can checks as follows:

Troubleshooting Tools | Acropolis Advanced Administration Guide | AOS | 53


Run two or more individual checks at a time
• You can specify two or more individual checks from the command line, with each check separated
by a comma. Ensure you do not use any spaces between checks, only a comma character. For
example:
ncc health_checks system_checks \
--plugin_list=cluster_version_check,cvm_reboot_check

Re-run failing checks


• You can re-run any NCC checks or plug-ins that reported a FAIL status.
ncc --rerun_failing_plugins=True

Run NCC health checks in parallel


• You can specify the number of NCC health checks to run in parallel to reduce the amount of time
it takes for all checks to complete. For example, the command ncc health_checks run_all --
parallel=4 runs four of the health checks in parallel (which is the maximum). [ENG-32466]

Use npyscreen to display NCC status


• You can specify npyscreen as part of the ncc command to display status to the terminal window.
Specify --use_npyscreen as part of the ncc health_checks command.

Installing NCC from an Installer File


Before you begin:
Note:
• If you are adding one or more nodes to expand your cluster, the latest version of NCC might not
be installed on each newly-added node. In this case, re-install NCC in the cluster after you have
finished adding the one or more nodes.
• This topic describes how to install NCC from the command line. To install NCC software from
the web console, see Upgrading NCC Software on page 56.

Note: Ensure that:


• Each node in your cluster is running the same NCC version.
• Prism Central and each cluster managed by Prism Central are all running the same NCC
version.
To check the NCC version, open the Prism Web Console, click the user icon in the main menu and
then select About Nutanix. An About Nutanix window appears that includes the AOS and NCC
version numbers. (The Prism Central version is the same as the AOS version.)
You can download the NCC installation file from the Nutanix support portal under Downloads > Tools &
Firmware. The file type to download depends on the NCC version:

Tip: Note the MD5 value of the file as published on the support portal.

• Some NCC versions include a single installer file (ncc_installer_filename.sh) that you can download and
run from any Controller VM.
• Some NCC versions include an installer file inside a compressed tar file (ncc_installer_filename.tar.gz)
that you must first extract, then run from any Controller VM.

Troubleshooting Tools | Acropolis Advanced Administration Guide | AOS | 54


• The directory to which you copy the installation package should exist on all nodes in the cluster (/home/
nutanix is suggested). Additionally, the folder should be owned by any accounts that uses NCC.

1. Download the installation file to any controller VM in the cluster and copy the installation file to the /
home/nutanix directory.

2. Check the MD5 value of the file. It must match the MD5 value published on the support portal. If the
value does not match, delete the file and download it again from the support portal.
nutanix@cvm$ md5sum ./ncc_installer_filename.sh

3. Perform these steps for NCC versions that include a single installer file (ncc_installer_filename.sh)

a. Make the installation file executable.


nutanix@cvm$ chmod u+x ./ncc_installer_filename.sh

b. Install NCC.
nutanix@cvm$ ./ncc_installer_filename.sh

The installation script installs NCC on each node in the cluster.

NCC installation file logic tests the NCC tar file checksum and prevents installation if it detects file
corruption.
• If it verifies the file, the installation script installs NCC on each node in the cluster.
• If it detects file corruption, it prevents installation and deletes any extracted files. In this case,
download the file again from the Nutanix support portal.

4. Perform these steps for NCC versions that include an installer file inside a compressed tar file
(ncc_installer_filename.tar.gz).

a. Extract the installation package.


nutanix@cvm$ tar xvmf ncc_installer_filename.tar.gz --recursive-unlink

Replace ncc_installer_filename.tar.gz with the name of the compressed installation tar file.
The --recursive-unlink option is needed to ensure old installs are completely removed.

b. Run the install script. Provide the installation tar file name if it has been moved or renamed.
nutanix@cvm$ ./ncc/bin/install.sh [-f install_file.tar]

The installation script copies the install_file.tar tar file to each node and install NCC on each node in
the cluster.

5. Check the output of the installation command for any error messages.
• If installation is successful, a Finished Installation message is displayed. You can check any
NCC-related messages in /home/nutanix/data/logs/ncc-output-latest.log.
• In some cases, output similar to the following is displayed. Depending on the NCC version installed,
the installation file might log the output to /home/nutanix/data/logs/ or /home/nutanix/data/
serviceability/ncc.
Copying file to all nodes [ DONE ]
-------------------------------------------------------------------------------+
+---------------+
| State | Count |
+---------------+
| Total | 1 |
+---------------+
Plugin output written to /home/nutanix/data/logs/ncc-output-latest.log

Troubleshooting Tools | Acropolis Advanced Administration Guide | AOS | 55


[ info ] Installing ncc globally.
[ info ] Installing ncc on 10.130.45.72, 10.130.45.73
[ info ] Installation of ncc succeeded on nodes 10.130.45.72, 10.130.45.73.

What to do next:
• As part of installation or upgrade, NCC automatically restarts the cluster health service on each node
in the cluster, so you might observe notifications or other slight anomalies as the service is being
restarted.

Upgrading NCC Software


Before you begin:

Note:
• If you are adding one or more nodes to expand your cluster, the latest version of NCC might not
be installed on each newly-added node. In this case, re-install NCC in the cluster after you have
finished adding the one or more nodes.
• This topic describes how to install NCC software from the Prism web console. To install NCC
from the command line, see Installing NCC from an Installer File on page 54.

Note: Ensure that:


• Each node in your cluster is running the same NCC version.
• Prism Central and each cluster managed by Prism Central are all running the same NCC
version.
To check the NCC version, open the Prism Web Console, click the user icon in the main menu and
then select About Nutanix. An About Nutanix window appears that includes the AOS and NCC
version numbers. (The Prism Central version is the same as the AOS version.)

1. Run the Nutanix Cluster Check (NCC) from a Controller VM.


nutanix@cvm$ ncc health_checks run_all

If the check reports a status other than PASS, resolve the reported issues before proceeding. If you are
unable to resolve the issues, contact Nutanix support for assistance.

2. Do this step to download and automatically install the NCC upgrade files. To manually upgrade, see the
next step.

a. Log on to the Prism web console for any node in the cluster.

b. Click Upgrade Software from the gear icon in the Prism web console, then click NCC in the dialog
box.

c. If an update is available, click Upgrade Available and then click Download.

d. When the download is complete, do one of the following:


→ To run only the pre-upgrade installation checks on the controller VM where you are logged on,
click Upgrade > Pre-upgrade. These checks also run as part of the upgrade procedure.
→ Click Upgrade > Upgrade Now, then click Yes to confirm.

3. Do this step to download and manually install the NCC upgrade files.

a. Log on to the Nutanix support portal and select Downloads > Tools & Firmware.

Troubleshooting Tools | Acropolis Advanced Administration Guide | AOS | 56


b. Click the download link to save the binary gzipped TAR (.tar.gz) and metadata (.json) files on your
local media.

c. Log on to the Prism web console for any node in the cluster.

d. Click Upgrade Software from the gear icon in the Prism web console, then click NCC in the dialog
box.

e. Click the upload a binary link.

f. Click Choose File for the NCC metadata and binary files, respectively, browse to the file locations,
and click Upload Now.

g. When the upload process is completed, click Upgrade > Upgrade Now, then click Yes to confirm.

The Upgrade Software dialog box shows the progress of your selection, including pre-installation
checks.
As part of installation or upgrade, NCC automatically restarts the cluster health service on each node
in the cluster, so you might observe notifications or other slight anomalies as the service is being
restarted.

NCC Usage
The general usage of NCC is as follows:
nutanix@cvm$ ncc ncc-flags module sub-module [...] plugin plugin-flags

Typing ncc with no arguments yields a table listing the next modules that can be run. The Type column
distinguishes between modules (M) and plugins (P). The Impact tag identifies a plugin as intrusive or non-
intrusive. By default, only non-intrusive checks are used if a module is run with the run_all plugin.
nutanix@cvm$ ncc

+-----------------------------------------------------------------------------+
| Type | Name | Impact | Short help |
+-----------------------------------------------------------------------------+
| M | cassandra_tools | N/A | Plugins to help with Cassandra ring analysis |
| M | file_utils | N/A | Utilities for manipulating files on the |
| | | | cluster |
| M | health_checks | N/A | All health checks |
| M | info | N/A | Contains all info modules (legacy |
| | | | health_check.py) |
+-----------------------------------------------------------------------------+

The usage table is displayed for any module specified on the command line. Specifying a plugin runs its
associated checks.
nutanix@cvm$ ncc info

+------------------------------------------------------------------------------+
| Type | Name | Impact | Short help |
+------------------------------------------------------------------------------+
| M | cluster_info | N/A | Displays summary of info about this cluster. |
| M | cvm_info | N/A | Displays summary of info about this CVM. |
| M | esx_info | N/A | Displays summary of info about the local esx |
| | | | host. |
| M | ipmi_info | N/A | Displays summary of info about the local IPMI.|
| P | run_all | N/A | Run all the plugins in this module |
+------------------------------------------------------------------------------+

Troubleshooting Tools | Acropolis Advanced Administration Guide | AOS | 57


The file_utils module does not run any checks. It exists to help you manage files in the cluster.
nutanix@cvm$ ncc file_utils

+------------------------------------------------------------------------------+
| Type | Name | Impact | Short help |
+------------------------------------------------------------------------------+
| P | file_copy | Non-Intrusive | Copies a local file to all CVMs. |
| P | remove_old_cores | Non-Intrusive | Removing cores older than 30 days |
| P | remove_old_fatals | Non-Intrusive | Removing fatals older than 90 days|
| P | run_all | N/A | Run all the plugins in this module|
+------------------------------------------------------------------------------+

Usage Examples

• Run all health checks.


nutanix@cvm$ ncc health_checks run_all

• Display default command flags.


nutanix@cvm$ ncc --ncc_interactive=false module sub-module [...] plugin \
--helpshort

• Run the NCC with a named output file and a non-standard path for ipmitool.
nutanix@cvm$ ncc --ncc_plugin_output_history_file=ncc.out health_checks \
hardware_checks ipmi_checks run_all --ipmitool_path /usr/bin/ipmitool

Note: The flags override the default configurations of the NCC modules and plugins. Do not
run with these flags unless your cluster configuration requires these modifications.

Diagnostics VMs
Nutanix provides a diagnostics capability to allow partners and customers to run performance tests on the
cluster. This is a useful tool in pre-sales demonstrations of the cluster and while identifying the source of
performance issues in a production cluster. Diagnostics should also be run as part of setup to ensure that
the cluster is running properly before the customer takes ownership of the cluster.
The diagnostic utility deploys a VM on each node in the cluster. The Controller VMs control the diagnostic
VM on their hosts and report back the results to a single system.
The diagnostics test provide the following data:
• Sequential write bandwidth
• Sequential read bandwidth
• Random read IOPS
• Random write IOPS
Because the test creates new cluster entities, it is necessary to run a cleanup script when you are finished.

Running a Test Using the Diagnostics VMs


Before you begin:

Troubleshooting Tools | Acropolis Advanced Administration Guide | AOS | 58


• Ensure that 10 GbE ports are active on the ESXi hosts using esxtop or vCenter. The tests run very slow
if the nodes are not using the 10 GbE ports. For more information about this known issue with ESXi 5.0
update 1, see VMware KB article 2030006.

1. Log on to any Controller VM in the cluster with SSH.

2. Set up the diagnostics test.


nutanix@cvm$ ~/diagnostics/diagnostics.py cleanup

(vSphere only) In vCenter, right-click any diagnostic VMs labeled as "orphaned", select Remove from
Inventory, and click Yes to confirm removal.

3. Start the diagnostics test.


nutanix@cvm$ ~/diagnostics/diagnostics.py run

Include the parameter --default_ncli_password='admin_password' if the Nutanix admin user password


has been changed from the default.
If the command fails with ERROR:root:Zookeeper host port list is not set, refresh the environment
by running source /etc/profile or bash -l and run the command again.
The diagnostic may take up to 15 minutes to complete for a four-node cluster. Larger clusters take
longer time.
The script performs the following tasks:
1. Installs a diagnostic VM on each node.
2. Creates cluster entities to support the test, if necessary.
3. Runs four performance tests, using the Linux fio utility.
4. Reports the results.

4. Review the results.

5. Remove the entities from this diagnostic.


nutanix@cvm$ ~/diagnostics/diagnostics.py cleanup

(vSphere only) In vCenter, right-click any diagnostic VMs labeled as "orphaned", select Remove from
Inventory, and click Yes to confirm removal.

Diagnostics Output
System output similar to the following indicates a successful test.

Checking if an existing storage pool can be used ...


Using storage pool sp1 for the tests.
Checking if the diagnostics container exists ... does not exist.
Creating a new container NTNX-diagnostics-ctr for the runs ... done.
Mounting NFS datastore 'NTNX-diagnostics-ctr' on each host ... done.
Deploying the diagnostics UVM on host 172.16.8.170 ... done.
Preparing the UVM on host 172.16.8.170 ... done.
Deploying the diagnostics UVM on host 172.16.8.171 ... done.
Preparing the UVM on host 172.16.8.171 ... done.
Deploying the diagnostics UVM on host 172.16.8.172 ... done.
Preparing the UVM on host 172.16.8.172 ... done.
Deploying the diagnostics UVM on host 172.16.8.173 ... done.
Preparing the UVM on host 172.16.8.173 ... done.
VM on host 172.16.8.170 has booted. 3 remaining.

Troubleshooting Tools | Acropolis Advanced Administration Guide | AOS | 59


VM on host 172.16.8.171 has booted. 2 remaining.
VM on host 172.16.8.172 has booted. 1 remaining.
VM on host 172.16.8.173 has booted. 0 remaining.
Waiting for the hot cache to flush ... done.
Running test 'Prepare disks' ... done.
Waiting for the hot cache to flush ... done.
Running test 'Sequential write bandwidth (using fio)' ... bandwidth MBps
Waiting for the hot cache to flush ... done.
Running test 'Sequential read bandwidth (using fio)' ... bandwidth MBps
Waiting for the hot cache to flush ... done.
Running test 'Random read IOPS (using fio)' ... operations IOPS
Waiting for the hot cache to flush ... done.
Running test 'Random write IOPS (using fio)' ... operations IOPS
Tests done.

Note:
• Expected results vary based on the specific AOS version and hardware model used.
• The IOPS values reported by the diagnostics script is higher than the values reported by the
Nutanix management interfaces. This difference is because the diagnostics script reports
physical disk I/O, and the management interfaces show IOPS reported by the hypervisor.
• If the reported values are lower than expected, the 10 GbE ports may not be active. For more
information about this known issue with ESXi 5.0 update 1, see VMware KB article 2030006.

Syscheck Utility
Syscheck is a tool that runs load on a cluster and evaluate its performance characteristics. This tool
provides pass or fail feedback on all the checks. The current checks are network throughput and direct
disk random write performance. Syscheck tracks the tests on a per node basis and prints the result at the
conclusion of the test.

Using Syscheck Utility


Perform the following procedure to run the syscheck utility on AOS clusters.
Note:
• Run this test on a newly created cluster or a cluster that is idle or has minimal load.
• Do not run this test if systems are sharing the network as it may interfere with their operation.
• Do not run this test if the guest VMs have already been deployed.

1. Log into the Controller VM.

2. Run the syscheck utility.


nutanix@cvm$ /usr/local/nutanix/syscheck/bin/syscheck

After executing the command, a message that records all the considerations of running this test is
displayed. When prompted with the message, type yes to run the check.
The test returns either pass or fail result. The latest result is placed under /home/nutanix/data/syscheck
directory. An output tar file is also placed in /home/nutanix/data/ directory after every time you run this
utility.

Troubleshooting Tools | Acropolis Advanced Administration Guide | AOS | 60


9
Controller VM Memory Configurations

Controller VM Memory and vCPU Configurations


This topic lists the recommended Controller VM memory allocations for models and features.

Controller VM Memory Configurations for Base Models

Platform Default

Platform Recommended Default Memory vCPUs


Memory (GB) (GB)
Default configuration for all platforms 16 16 8
unless otherwise noted

The following tables show the minimum amount of memory and vCPU requirements and recommendations
for the Controller VM on each node for platforms that do not follow the default.

Nutanix Platforms

Platform Recommended Default Memory vCPUs


Memory (GB) (GB)
NX-1020 12 12 4
NX-6035C 24 24 8
NX-6035-G4 24 16 8
NX-8150 32 32 8
NX-8150-G4 32 32 8
NX-9040 32 16 8
NX-9060-G4 32 16 8

Dell Platforms

Platform Recommended Default Memory vCPUs


Memory (GB) (GB)
XC730xd-24 32 16 8

Controller VM Memory Configurations | Acropolis Advanced Administration Guide | AOS | 61


Platform Recommended Default Memory vCPUs
Memory (GB) (GB)
XC6320-6AF 32 16 8
XC630-10AF 32 16 8

Lenovo Platforms

Platform Default Memory (GB) vCPUs


HX-5500 24 8
HX-7500 24 8

Controller VM Memory Configurations for Features

The following table lists the minimum amount of memory required when enabling features. The memory
size requirements are in addition to the default or recommended memory available for your platform
(Nutanix, Dell, Lenovo) as described in Controller VM Memory Configurations for Base Models. Adding
features cannot exceed 16 GB in additional memory.

Note: Default or recommended platform memory + memory required for each enabled feature =
total Controller VM Memory required

Feature(s) Memory (GB)


Capacity Tier Deduplication (includes Performance Tier Deduplication) 16
Redundancy Factor 3 8
Performance Tier Deduplication 8
Cold Tier nodes (6035-C) + Capacity Tier Deduplication 4
Performance Tier Deduplication + Redundancy Factor 3 16
Capacity Tier Deduplication + Redundancy Factor 3 16

Controller VM Memory and vCPU Configurations (Broadwell/G5)


This topic lists the recommended Controller VM memory allocations for workload categories.

Controller VM Memory Configurations for Base Models

Platform Default

Platform Recommended Default Memory vCPUs


Memory (GB) (GB)
Default configuration for all platforms 16 16 8
unless otherwise noted

Controller VM Memory Configurations | Acropolis Advanced Administration Guide | AOS | 62


The following table show the minimum amount of memory required for the Controller VM on each node for
platforms that do not follow the default. For the workload translation into models, see Platform Workload
Translation (Broadwell/G5) on page 63.
Note: To calculate the number of vCPUs for your model, use the number of physical cores per
socket in your model. The minimum number of vCPUS your Controller VM can have is eight and
the maximum number is 12.
If your CPU has less than eight logical cores, allocate a maximum of 75 percent of the cores of a
single CPU to the Controller VM. For example, if your CPU has 6 cores, allocate 4 vCPUs.

Nutanix Broadwell Models

The following table displays the categories for the platforms.


Platform Default Memory (GB)
VDI, server virtualization 16
Storage only 24
Light Compute 24
Large server, high-performance, all-flash 32

Platform Workload Translation (Broadwell/G5)


The following table maps workload types to the corresponding Nutanix and Lenovo models.

Workload Nutanix Lenovo


Features NX Model HX Model

VDI NX-1065S-G5 HX3310


SX-1065-G5 HX3310-F
NX-1065-G5 HX2310-E
NX-3060-G5 HX3510-G
NX-3155G-G5 HX3710
NX-3175-G5 HX3710-F
- HX2710-E

Storage Heavy NX-6155-G5 HX5510


NX-8035-G5 -
NX-6035-G5 -

Light Compute nodes NX-6035C-G5 HX5510-C

High Performance and All-Flash NX-8150-G5 HX7510


NX-9060-G5 -

Controller VM Memory Configurations | Acropolis Advanced Administration Guide | AOS | 63

Вам также может понравиться