Вы находитесь на странице: 1из 79

Welcome to VxRail Troubleshooting and Remote Support Training.

Copyright © 2018 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC, and other trademarks are trademarks of Dell Inc. or its
subsidiaries. Other trademarks may be the property of their respective owners. Published in the USA.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” DELL EMC MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE
INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any DELL EMC software described in this publication requires an applicable software license. The trademarks, logos, and service marks (collectively
"Trademarks") appearing in this publication are the property of DELL EMC Corporation and other parties. Nothing contained in this publication should be construed as granting any license
or right to use any Trademark without the prior written permission of the party that owns the Trademark.

AccessAnywhere Access Logix, AdvantEdge, AlphaStor, AppSync ApplicationXtender, ArchiveXtender, Atmos, Authentica, Authentic Problems, Automated Resource Manager, AutoStart,
AutoSwap, AVALONidm, Avamar, Aveksa, Bus-Tech, Captiva, Catalog Solution, C-Clip, Celerra, Celerra Replicator, Centera, CenterStage, CentraStar, EMC CertTracker. CIO Connect,
ClaimPack, ClaimsEditor, Claralert ,CLARiiON, ClientPak, CloudArray, Codebook Correlation Technology, Common Information Model, Compuset, Compute Anywhere, Configuration
Intelligence, Configuresoft, Connectrix, Constellation Computing, CoprHD, EMC ControlCenter, CopyCross, CopyPoint, CX, DataBridge , Data Protection Suite. Data Protection Advisor,
DBClassify, DD Boost, Dantz, DatabaseXtender, Data Domain, Direct Matrix Architecture, DiskXtender, DiskXtender 2000, DLS ECO, Document Sciences, Documentum, DR Anywhere,
DSSD, ECS, elnput, E-Lab, Elastic Cloud Storage, EmailXaminer, EmailXtender , EMC Centera, EMC ControlCenter, EMC LifeLine, EMCTV, Enginuity, EPFM. eRoom, Event Explorer,
FAST, FarPoint, FirstPass, FLARE, FormWare, Geosynchrony, Global File Virtualization, Graphic Visualization, Greenplum, HighRoad, HomeBase, Illuminator , InfoArchive, InfoMover,
Infoscape, Infra, InputAccel, InputAccel Express, Invista, Ionix, Isilon, ISIS,Kazeon, EMC LifeLine, Mainframe Appliance for Storage, Mainframe Data Library, Max Retriever, MCx,
MediaStor , Metro, MetroPoint, MirrorView, Mozy, Multi-Band Deduplication,Navisphere, Netstorage, NetWitness, NetWorker, EMC OnCourse, OnRack, OpenScale, Petrocloud, PixTools,
Powerlink, PowerPath, PowerSnap, ProSphere, ProtectEverywhere, ProtectPoint, EMC Proven, EMC Proven Professional, QuickScan, RAPIDPath, EMC RecoverPoint, Rainfinity,
RepliCare, RepliStor, ResourcePak, Retrospect, RSA, the RSA logo, SafeLine, SAN Advisor, SAN Copy, SAN Manager, ScaleIO Smarts, Silver Trail, EMC Snap, SnapImage, SnapSure,
SnapView, SourceOne, SRDF, EMC Storage Administrator, StorageScope, SupportMate, SymmAPI, SymmEnabler, Symmetrix, Symmetrix DMX, Symmetrix VMAX, TimeFinder,
TwinStrata, UltraFlex, UltraPoint, UltraScale, Unisphere, Universal Data Consistency, Vblock, VCE. Velocity, Viewlets, ViPR, Virtual Matrix, Virtual Matrix Architecture, Virtual Provisioning,
Virtualize Everything, Compromise Nothing, Virtuent, VMAX, VMAXe, VNX, VNXe, Voyence, VPLEX, VSAM-Assist, VSAM I/O PLUS, VSET, VSPEX, Watch4net, WebXtender, xPression,
xPresso, Xtrem, XtremCache, XtremSF, XtremSW, XtremIO, YottaYotta, Zero-Friction Enterprise Storage.

Revision Date: January 2018

Revision Number: VCE-7WNVXRAILAPPRST

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 1
This course covers how to remotely monitor, diagnose, and troubleshoot VxRail Appliance.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 2
This module covers common VxRail Appliance architecture components including the cross component
integration points.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 3
This lesson covers the hardware and software components involved in a VxRail including nodes customer
supplied switches vCenter vSAN, and VxRail Manager.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 4
In this class we classify the VxRail Architecture under hardware and software components. Major software
components include Vmware vSphere VMware Vsan, and VxRail Manager. Major hardware components include
physical disks and nodes based on PowerEdge servers.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 5
Dell EMC combines hardware and software to create a hyper-converged architecture. Designed to be agile,
simple to manage, highly reliable, offer predictable performance while being cost effective. VxRail Appliance is
able to do that by using Dell EMC branded nodes. Also possible by using the Dell EMC, or customer provided
network switch and VxRail Manager automated provisioning software. Nodes can be clustered together to scale
up to 64 nodes. Top of Rack network switches are a critical component to the VxRail Appliance. Customer
provides network switches. Major switch vendors have similar commands and syntax for generating log files. The
Dell EMC Network Switch S4048T-ON with BaseT or S4048-ON with SFP+ could be used with a VxRail
Appliance.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 6
There are three groups of software components included on a VxRail: the software defined storage VSAN,
deployment and management tools, and data protection that is included at no additional cost. Within the VSAN
software defined storage component, there is VSAN enterprise, a vCenter Server instance for VM management,
and vRealize Log Insight for activity logging. VxRail Manager is the primary deployment and acts as an element
manager interface. VxRail Manager is also where you add nodes to a cluster. You can find key support
information and tools, such as the knowledge base and ESRS – which also includes the dial home functionality
for supporting troubleshooting. Last but not least, one of the huge advantages of VxRail is that it comes with
excellent data protection options. Data protection licenses are included in the price of the appliance and can be
activated in the VxRail Manager user interface. RecoverPoint for Virtual Machines, CloudArray, vSphere
Replication, and vSphere Data protection are covered later in this presentation.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 7
vSAN is a clustered datastore that is designed for vSphere environments. Because vSAN is used exclusively with
vSphere, it is easy to configure and optimized for performance since vSAN is integrated with the ESXi kernel. The
vSAN cluster can use all or some of the nodes in a vSphere cluster. In the VxRail, the vSAN cluster uses all the
nodes in the vSphere VxRail cluster. One difference between vSAN and traditional datastores is that it is not a
filesystem it is an object store. It works better with the large data structures that datastores usually hold.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 8
VxRail Manager streamlines deployment, configuration, and management for easier initial setup and ongoing
operations. VxRail Manager also provides integration for Dell EMC services and support to help get the most out
of the VxRail Appliance. You can use VxRail Manager to monitor system health with deep hardware intelligence
and graphical representation. View appliance software versions and updates. Access online support and
community resources such as the user forum and knowledgebase. Use the VxRail Market to access qualified
software products. Perform maintenance operations such as replacing hardware, adding drives, and cycling
power to the cluster or nodes. Perform system software upgrades, and expand the cluster by adding nodes. Dell
EMC or Dell EMC Partners execute the initial setup of the VxRail Appliance. VxRail Manager is accessed via a
supported web browser – Https://<VxRail Manager hostname or IP address>. Log in to VxRail Manager with the
administrator or management user names that were used during the VxRail initial setup. The VxRail Manager
software stack runs on a VM hosted on the VxRail vSAN cluster.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 9
VxRail virtual infrastructure is managed through the VMware vCenter Server interface. It provides a
familiar vSphere experience that enables streamlined deployment and the ability to extend the use of
existing IT tools and processes. Within a VxRail solution, vCenter provides several services and
interfaces, including:

• Core VM and resource services such as an inventory service, task scheduling, statistics logging,
alarm and event management, and VM provisioning and configuration.

• Distributed services such as vSphere vMotion, vSphere DRS, and vSphere HA Your VxRail Appliance
can join an existing External vCenter server during its initial configuration. It allows you to use a remote central
vCenter server to manage multiple VxRail Appliances from a single pane of glass. The External vCenter
server can be:
• Physical or virtual
• Embedded PSC or external PSC

The VxRail bundled vCenter license cannot be used for the External vCenter. To join an existing External vCenter
server, provide an existing datacenter and a nonconflicting cluster name during the initial configuration of the
appliance. Once customers have completed initial installation, they cannot change the configured vCenter without
a reset of the system. A reset results in data loss of all VxRail Appliance data not transferred and saved before
the reconfiguration. Check that the release notes for the minimum supported external vCenter version for the
version of VxRail that the customer is running.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 10
This lesson covers the VxRail integration components that move information from hardware to software. It also
addresses some of the support tools including the master KB for VxRail and SolVe.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 11
Personality module is used to change the firmware and BIOS settings. Partners can use personality
module to show that the PowerEdge server is part of their solution. This way they can use the latest
Dell EMC firmware without having to redevelop it so that it has the appropriate branding. It also
enables the appropriate BIOS and firmware setting to be applied. For example on a VxRail processor
virtualization should be enabled. It is done during manufacturing with a personality module. Here is a
list of some of the BIOS configuration done with the personality module for your reference:

• Boot Settings->Boot mode: BIOS

• Boot Settings->BIOS Boot Settings->Boot Sequence: Hard Drive C: the first

• Boot Settings->BIOS Boot Settings-> Hard-Disk Drive Sequence: SATADOM the first

• Processor Settings->Virtualization Technology: Enabled

• Processor Settings->X2Apic Mode: Enabled

• Integrated Devices->SR-IOV Global Enable: Enabled

• Integrated Devices->I/OAT DMA Engine: Enabled

• System BIOS Settings->System Profile Setting: Performances

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 12
PTAgent and the iDRAC Service Module support the API that moves information from the virtualized
environment to and from iDRAC. If that information is not propagating, there is an issue. It would be
displayed as lack of hardware information in vCenter and VxRail Manager.

Fort instance To fix this issue, use the following commands to restart the daemons on the affected VxRail
nodes:
• /etc/init.d/DellPTagent restart
• /etc/init.d/dcism-netmon-watchdog restart

Then check that the daemons are running correctly with the following commands:
• /etc/init.d/DellPTagent status
• /etc/init.d/dcism-netmon-watchdog status

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 13
Remote management is also available for all generations of VxRail Hardware. Remote management can provide,
but is not limited to, remote console access, power controls, virtual media, and BIOS access type activities. Dell
iDRAC – Integrated Dell Remote Access – is used for the Dell PowerEdge based VxRail Nodes. IPMI/BMC –
Intelligent Platform Management Interface/Baseboard Management Controller is used for remote management of
the older generation VxRail hardware. When Using the Remote Management interface, you must adhere to the
power guidelines in the Dell EMC VxRail Appliance Guide unless otherwise instructed. Use VxRail Manager to
handle VxRail cluster shutdown operations. This action enforces proper cluster shutdown unless working on a
single node, such as a node replacement or node maintenance. Integrated Dell Remote Access Controller
Service Module is a lightweight optional software application is installed on PowerEdge based VxRail nodes. The
iDRAC Service Module complements iDRAC interfaces. Architecture uses IP socket communication and provides
server Management data to iDRAC and presents one-to-many consoles with access to Systems Management
data through OS standard interfaces.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 14
PTAgent and the iDRAC Service Module are both installed on each node in the cluster. VxRail PTAgent
Configuration information is stored in Https:/scratch/dell/DellPTAgent/bin/.The iDRAC Service Module
complements iDRAC interfaces. You can choose to configure the features installed and supported by the
operating system. Architecture uses IP socket communication and provides server Management data to iDRAC
and presents one-to-many consoles with access to Systems Management data through OS standard interfaces.

Online iSM Resources:


• iSM v2.5 Technical Guide
– Topics-cdn.dell.com/pdf/idrac-service-module-v2.5_Install%20Guide5_en-us.pdf
• Dell TechCenter iDRAC Service Module:
– http://en.community.dell.com/techcenter/systems-management/w/wiki/11434.idrac-service-module

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 15
Dell EMC SolVe Desktop is a utility for procedure generation for an extensive range of Dell EMC products. Dell
EMC SolVe Desktop can be downloaded from the Dell EMC online support site. To access VxRail Appliance
procedures, download and install the Solve Desktop utility on your laptop/desktop. Authenticate and download the
content for VxRail Appliance. The list of available procedures depends on your access level. Customers have the
Customer access level. The graphic displays the SolVe Desktop Customer view for VxRail Appliance. SolVe
Desktop and the procedures therein are constantly updated. So always be sure to use the latest version of SolVe
Desktop, and generate the specific procedure just before performing a task.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 16
DELL EMC Secure Remote Services Virtual Edition is a proactive and predictive customer service capability that
is included in with the VxRail Appliance warranty or maintenance agreement. It allows customer service to
monitor and access a VxRail Appliance in a secure, high-speed manner that operates 24/7.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 17
ESRS VE receives all the connect home files transferred through the VxRail Appliance. However the CE on site
still needs to configure connect home that uses the ESRS VE for connect home. The ESRS VE performs the
same basic functions as the ESRS Gateway, ESRS Gateway is ESRS version 2, and the ESRS VE is ESRS
version 3.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 18
The process of determining what action to take may take many forms. Some of them are systematic. Others may
be to enhance the existing troubleshooting procedures. Monitoring the log files after altering the system helps
validate whether the system is performing as expected. VxRail Appliance includes knowledge base (KB) articles
that provide both structured and unstructured information used to size, deploy and support the system. VxRail
Appliance has a master Knowledge Base (KB) article that references all VxRail Appliance KBs created for the
product line. Issues from hardware are reported in iDRAC and VxRail Manager, vCenter may report them as well.
Hardware issues arise from hardware wearing out or a connection wearing out. Software issues are reported
through vCenter, software issues can also come from misaligned software levels or misconfigurations in the
software itself.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 19
This module covered the VxRail Architecture, cross component integration points and VxRail support tools.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 20
This module covers the key points to monitor VxRail Appliance Health and the process to collect various log files
for troubleshooting.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 21
This lesson covers monitoring VxRail Appliance. It also covers the health monitoring of vSAN and VxRail
Manager from the perspective of troubleshooting.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 22
VxRail Manager and the VMware vSphere Web Client can be used to monitor the health of the VxRail appliance.
The steps to run the VxRail Manager system diagnostic and vSAN health are shown in this lesson.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 23
In the vSphere Web Client you are able to inventory monitor, and manage the entire virtualized environment of
the VxRail Appliance. This includes the cluster network virtual distributed switch datastore virtual SAN ESXi
hosts, and virtual machines. You are also able to get information on the hardware environment when needed.
VxRail currently supports version 6.5 and 6.0 of vCenter. In addition to the vCenter virtual machine. There is also
a Platform Services Controller (PSC) VM that provides license management, single sign-on including integration
with the customer’s directory services. For external vCenter deployments, the Platform Services Controller (PSC)
can be either an internal or external configuration.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 24
The vSphere Web Client enables you to monitor the status of all manageable services and nodes across vCenter
Server systems. The summary page of vCenter Server Appliance shows the basic Health
Status information of the appliance. If there are any health-related messages, they would be
reflected in this section of the Summary screen.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 25
To view the hardware status of individual VxRail nodes, select the VxRail node in the navigation panel, select the
Monitor tab, and then select Hardware Status. The graphic is showing the sensor data for one of the VxRail
nodes.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 26
The vSAN health service is turned on by default, and a periodic health check is run every hour. The periodic
health check can be turned off or on, and the interval can be changed. The vSAN health service includes
preconfigured health check tests to monitor, troubleshoot, diagnose the cause of cluster component problems,
and identify any potential risk. The vSAN performance service includes statistical charts used to monitor IOPS,
throughput, latency, and congestion. The performance service is disabled by default. Turn on the vSAN
performance service to monitor the performance of a vSAN cluster, host, disk group, disk, and VMs. The vSAN
performance service stores the statistical data in a Stats database object in the vSAN datastore. The Stats
database requires a storage policy. To manage vSAN health and performance services, select the VxRail cluster,
select the Configure tab, select vSAN, and then select Health and Performance. To change the health check time
interval or to turn off/on the periodic health check, click the health service edit settings button. To turn on the
performance service, click the performance service edit settings button. The vSAN default storage policy is
adequate for the Stats database. Make sure that the vSAN cluster is properly configured and has no unresolved
health problems before the performance service is turned on.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 27
To monitor the health of a vSAN cluster, select the VxRail cluster in the navigation panel, select the Monitor tab,
select vSAN, and then select Health. vSAN health check runs periodically and can also be run on demand by
clicking the Retest button. You can use vSAN health checks to monitor the status of cluster components,
diagnose issues, and troubleshoot problems. The health checks cover hardware compatibility, network
configuration and operation, advanced vSAN configuration options, storage device health, and virtual machine
objects. The vSAN health checks are divided into categories. Each category contains individual health checks.
Drill into each category to see the individual tests. In the screenshot above, the Hardware compatibility category
is expanded revealing subitems related to this category. Selecting an item from this expanded list displays details
below. In this example, there is a warning related to Host issues retrieving hardware info. The details seem to
indicate a timeout when querying HCL info. Click on Ask VMware button to open the relevant knowledge base
article. The KB article describes the health check and provides information on issue resolution.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 28
To monitor the capacity of a vSAN cluster, select Capacity in the Monitor vSAN page. One can monitor the
capacity of the vSAN datastore, deduplication and compression efficiency, and a breakdown of capacity usage.
The Capacity Overview displays the storage capacity of the vSAN datastore, including used space, free space,
and vSAN overhead. The Used Capacity Breakdown displays the percentage of capacity used by different object
types or data types. Object types – lists information about various objects – virtual disks, VM home objects, swap
objects, and so on. Object types also include file system overhead and checksum overhead. Data types –
displays the percentage of capacity used by primary VM data vSAN overhead, and temporary
overhead. On all flash systems with deduplication and compression enabled, the Deduplication and
Compression overview displays the space savings data.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 29
A trigger generates an alarm. Alarms are a good starting point in troubleshooting. It helps in identifying a trigger.
You can also customize alarms by defining actions that the system performs when the alarm is triggered. There
are at least 56 vSAN alarms predefined in vCenter server 6.0u2. Some are shown here, and the majority relate to
vSAN Health issues.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 30
To run a VxRail Manager system diagnostic, go to the Config General tab and click the diagnostic button. The
diagnostic highlights any errors and point to relevant knowledge base articles. The example on the right is a failed
diagnostic – failure due to missing power supplies. The example on the left shows a healthy VxRail cluster.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 31
Click the Events tab, and view the list of current VxRail system events. If there are critical events detected, the
events icon displays the number of unread events in the navigation bar. The events list can be sorted by clicking
the column heading – ID, Severity, Component, or Time. The search box can be used to filter the list of events by
ID, severity, or component. Events can be exported as a .CSV file. All events can be marked as read. Select a
specific event, and view its details. Clicking the Component ID in the event details view brings you to the physical
view of the specific component in the VxRail Manager Health tab.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 32
The Physical health view of an appliance shows the status of components if there are any critical, error, or
warning events. Clicking the status icon on the component takes you to the Events view and highlight the specific
event. In this example we see a warning status on the nodes of a G series appliance. Clicking the warning icon
on Node 1 takes us to the specific event. In this example we see that the message is a host heath warning. One
would have to use the vSphere Web client to explore further.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 33
The graphic shows the VxRail Manager Dashboard viewed from the customer perspective. Customer uses the
screen to reach out to Dell emc Support. The VxRail Manager has tabs for Dashboard, Support, Events, Health,
and Config. Click a specific tab to navigate and use the functionality of that tab. VxRail Manager also has online
help and a link to the vSphere Web Client for the vCenter Server managing the VxRail cluster. The support tab
shows the last heartbeat status for ESRS and links to start a chat session with support and to open a service
request.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 34
This lesson covers the process of collecting log information for VxRail Manager, vCenter server and ESXi..

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 35
The VxRail has logging enabled on multiple layers of the software stack from both VMware and Dell EMC. The
VxRail Manager interface has a simple button to automatically collect and download all the logs. Sometimes the
ESXi logs and vCenter server logs may also be needed. The VMware related support log bundle can be created
via the vSphere Web client. The relevant Dell EMC knowledge base articles related to VxRail logs are listed on
the slide.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 36
To generate a new log bundle in VxRail Manager, go to the Config General tab and click the generate New Log
Bundle button. Save the file to a known location. The file is a zip archive. The whole bundle can be sent to the
Dell EMC support team for troubleshooting and diagnosis.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 37
Dell EMC support or VMware support may request diagnostic information related to vCenter Server or the ESXi
nodes to diagnose issues. The vSphere Web Client can be used to export the system logs for the ESXi hosts,
vCenter Server, and vSphere Web Client. Performance data from the ESXi nodes can also be optionally
included. vSAN support logs are contained in a normal ESXi support bundle in the form of vSAN traces. The
vSAN support logs are gathered automatically by gathering the ESXi support bundle for the hosts. As vSAN is
distributed across multiple ESXi hosts, one should gather the ESXi support logs for all hosts configured for vSAN.
vSphere PowerCLI can also be used to collect the relevant logs. To export system logs from the vSphere Web
Client, right click the vCenter server in the inventory list and select Export System Logs. Refer to the VMware KB
articles for more information.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 38
In step 1 of the Export System Logs dialog, select all the relevant ESXi nodes. Optionally check the Include
vCenter Server and vSphere Web Client logs box. Click Next.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 39
In step 2 of the Export System Logs dialog, select the specific system logs and optionally include performance
data in the log bundle. Typically the customer support resource specifies the logs that need to be collected. Click
Finish to generate the log bundle. Specify a name and location for the log bundle. The log bundle can be large
and may take some time to download. You can follow the progress of the download in the Recent Tasks panel.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 40
Gather TSR/SAC report using the support assistant tab in Idrac.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 41
vRealize Log Insight is included in the VxRail Appliance. Delivers highly scalable log management with intuitive,
actionable dashboards, sophisticated analytics and extensibility, providing deep operational visibility and faster
troubleshooting for issues across the components within the appliance. For the most up-to-date information on
capturing Log Insight logs reference:
• Collecting diagnostic information for VMware vRealize Log Insight (2056760).

Copyright © 2018 Dell Inc. VxRail 4.0 Deployment and Implementation - Module 7 42
This module presented VxRail maintenance and troubleshooting. The topics presented were VxRail health check,
log collection, maintenance procedures, and troubleshooting.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 43
Upon completion of this module, you should be able to: Describe common troubleshooting VxRail Appliance
issues related to Network, Software, and Hardware.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 44
This lesson covers the Dell Server Physical disk replacement Issue, node replacement Issue, and damaged node
replacement issue.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 45
Many VxRail issues come from mismatched network configurations. Since multiple groups may be involved in the
management of the network and nodes. Both of which have physical and virtual components may be managed
differently. It is common for the configurations not to match leading to various issues that may appear as storage
issues or VM issues.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 46
Two severe network issues that cause slightly different and potentially unexpected outcomes. A partitioned
cluster has two or more groups that are not able to communicate with each other, because of physical hardware
failures or because of misconfigurations of the software. In a partitioned cluster the VMs that still have access to
their storage stay powered on. VMs that no longer have access to their storage are powered off so that they can
be powered on by a partition that has sufficient resources. An isolated node is where one node is isolated and not
able to communicate with any other nodes. The node powers off all the VMs that are running. An issue that can
occur is that there may be no viable partitions to run nodes or all nodes may be in an isolated state. Then VMs
are shut down.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 47
To check network connectivity, use the vmkping command. The command functions similarly to ping on bare
metal systems.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 48
The network neighbor list shows which nodes have recently had communication.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 49
The topology view in vCenter is useful for comparing virtual component configurations.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 50
On the VxRail Appliance the management network require IPv6 with multicast support for the Management port
groups VLAN to allow Loudmouth communication. There is also loudmouth client on the ESXi hosts @
/usr/lib/vmware/loudmouth/loudmouthc.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 51
Network partitions can be the result of logical configuration on the hosts or physical configuration on the physical
network especially with Multicast and IGMP settings. Most network partitions do not impact vSAN health long-
term, as long as the misconfiguration is identified, and corrected. Standard networking tests should be used for
basic connectivity. When troubleshooting a vSAN network, verify that all vSAN hosts can ping each other over the
vSAN-enabled vmkernel ports. You should also verify that the vSAN can ping each other over logical
configuration of addresses subnets VLANs, and physical connectivity

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 52
The network is a critical component to the VxRail Appliance. It is provided by the customer who is also
responsible for its configuration. When troubleshooting the system, it is important to know how to use basic
networking tools to validate the appliance configuration.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 53
VxRail installation configuration and management issues are covered in this lesson.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 54
When investigating a problem believed to be software-related, first check the knowledge base for articles related
to the issue. If the issue is after a recent component replacement or expansion, validate systems are operating at
the correct software and firmware levels. Also validate systems are at the same maintenance or patch level as
there can be compatibility issues across maintenance releases.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 55
The marvin.log file can provide more details about anything that has been done in the VxRail system including
installation issues. The .log file is located in the vCenter Server at the path described here. During installation, it
is useful to connect via SSH and tail the log file. As the installation progresses, command line is automatically
updated.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 56
If there are problems seen with powering on a virtual machine with errors about the swap file. Check cluster
connectivity. Resulting in a situation where the VM is unable to create a file. Also confirm all the hosts and drives
are in the cluster and there is enough capacity.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 57
If the vSAN health check reports “component metadata health” errors this may be related to “Ops backlog”. Then
run the health check again in a few minutes. If it reports an error related to an “invalid state”, is a known issue
and is believed to be resolved in the release running on the VxRail Appliance. If the health check reports states
that the physical disks have failed. Then it may be a logical problem that a host reboot typically resolves this type
of failure or it could be an actual drive failure. If the heath check reports errors about “stats.db” then it may be
related to the performance service being enable.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 58
An ESX host that has been instructed to enter maintenance mode may show "in progress" for an indefinite
amount of time under Tasks in vCenter. When it happens, not all virtual machines are migrated and the task
cannot be canceled. Before a host can enter maintenance mode, all virtual machines need to be powered off or
migrated to another host. If a host is unable to be placed in maintenance mode in VxRail, there may be a problem
with inaccessible/unhealthy vSAN objects or resource availability.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 59
Top issues preventing successful deployment of ESRS on VxRail are, Name resolution where DNS is unable to
resolve hostnames on either the ESRS VE Virtual Machine or on the VxRail Manager. Customer EMC support
account related issues can be problematic, if the support account being used to deploy ESRS is not present in
the site id where the VxRail serial number resides or is not web support enabled. External vCenter – ESRS
deployment may fail with “Invalid target datastore specified” or similar during the OVF deployment phase of
ESRS deployment. VxRail Manager may not properly identify the external vCenter components to deploy the
ESRS OVF properly. Manual ESRS deployment would be required in this instance. Blocked ports – Customer
firewall may be blocking ports required for successful ESRS deployment.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 60
The first indication that there is an installation issue is on the VxRail GUI interface. The GUI displays an error if
there is a problem. The percentage shown helps determine at what point in the installation it failed. The lower
right hand side of the screen indicates the potential problem description.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 61
One of the common issues with PTAgent is the appliance shows as missing in physical health view. Error
message may read "This appliance is missing. The following is from cached information”. In such as
case the issue will be due to VxRail Manager unable to get response about hardware information from
PTAgent on the appliance. To resolve this Open SSH session as root on the affected node. If you
cannot get response, check if there is a route defined to network 169.254.0.1. You can use the delete
command to delete the route. Ping 169.254.0.1 again to confirm success. On physical health view,
click refresh button to see whether appliance is back.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 62
vSAN includes several proactive tests to validate the cluster. One of these is the VM creation test. It creates a VM
on each host and then deletes it. The test only takes a few seconds and can diagnose nodes being isolated,
insufficient resources available or the operator entering an invalid command.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 63
This test can be used to help diagnose performance issues.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 64
Rapid Appliance Self Recovery (RASR) provides a method to return the system’s operating system and VxRail
software back to the state it was in when it left the Dell factory. This bare-metal recovery tool is beneficial if a
hardware or software failure that requires the reinstallation of the system’s software. Restoration back to initial
factory state can also be beneficial in cases where the system is being used for demonstration, training, or
evaluation purposes and must be reset before to placing the system into production. The factory software is
retained on a Dell Internal Dual SD Module (IDSDM) on all VxRail models. This allows RASR to restore the
system back to factory state even if the primary operating system disk suffers a catastrophic failure.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 65
RASR is installed to the IDSDM during the factory process and available for use on all VxRail platforms, provided
the IDSDM and SD card are in a healthy state. RASR may be installed to a USB disk—Create RASRUSB is
menu item 2 on the slide. To run RASR from the USB disk, reboot the system gracefully or power it on, and
during the Power On Self Test (POST), press F11 to enter the BIOS Boot Manager. Select "Disk connected to
USB" as the boot device. RASR will boot to the RASR Main Menu. In both procedures you boot the Dell node to
the SD card after you finish the factory install. The SD card has the required software to perform the factory
reset. This procedure is ideal to use if a Dell node arrives at a customers site DOA. RASR is a total data
destruction operation.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 66
Let us now look at a example of a troubleshooting scenario. This is a useful problem to trace because it can
happen in a number of different areas like VM creation, storage vMotion and taking a snapshot. It can also be
caused by a number of different issues. The basic problem is that the requested operation needs to create a file
and is unable to. This causes the entire operation to fail. Since we are using vSAN software defined storage
running on top of a network there are a multitude of issues that this could be traced back to. One of the vSAN
daemons could have failed. This would prevent vSAN from properly handling the write command leading to the
file not being created. There could be an issue with the physical storage that prevents it from accepting the write.
There could be an issue with the network that makes the communication of the need for a write to not go through
or manage the data to be written in such a way that the write is unsuccessful. There could also be an issue with
the various policies and limits involved in creating this file that causes it to fail. We look at three different possible
causes and trace down how to see if that is the real issue and how to resolve the issue. This allows us to look at
checking daemon status and restarting daemons.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 67
The Cluster Level Object Manager Daemon (CLOMD) runs on every ESXi host in a VxRail and is responsible for
creating new objects, repairing existing objects after failures, moves data due to evacuations and policy changes
as well as both automated and manual rebalancing. It is not in the data path but it initiates data path operations.
Any operation which requires the creation of a new object needs CLOMD. This includes some unexpected
operations like power on a VM (where a swap object is created.) CLOMD is monitored in the vSAN health pane
under cluster. You can also log in to the affected server and manually check for the daemon. The command
/etc/init.d/clomd has options to start, stop, status and restart the daemon.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 68
VMware does not make any recommendation around the use of jumbo frames. Testing to date has revealed no
noticeable improvement in vSAN performance by using jumbo frames. However, jumbo frames are supported for
use with vSAN should there be a requirement to use them. MTU is set multiple places your switch, IP interfaces
or DVS. The most important thing is that they all match.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 69
Remember VMware vSAN is an object store. Each VM deployed is a set of objects. These include the VM home
namespace, VMDK , VM swap. In addition each snapshot creates a delta VMDK and a memory snap. In the
event of a cluster supporting a large number of small VMs with many snapshots it is possible to hit the limit of
9000 objects per host. This can be checked under limits in the vSAN Health pane. The health pane also
monitors what the situation would be if one more node became unavailable. The fix is to remove enough objects
so that there are sufficient available to complete the operation. For a more complete discussion of components
check the VMware vSAN Design and Sizing Guide 6.5.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 70
This lesson covers diagnosing and troubleshooting hardware components.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 71
First step in troubleshooting hardware issues is to check for notifications and errors. You can follow diagnostic
procedures mentioned in VxRail knowledge base to identify the error and cause of the issue. When replacing
hardware components, you can use SolVe procedures. Ensure you download the latest SolVe, as procedures are
periodically updated.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 72
In the VxRail G series systems. You can only replace power supply, capacity HDD, SDD, Compute node and the
fan module.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 73
Compared to VxRail G Series VxRail E/P/V/S series offer more replaceable hardware componets. For instance,
you can change system memory, system battery, control panel assembly, NICs., processors and more.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 74
When physical disk fails, VxRail node will set the VSAN disk group to offline. This node will not participate in the
VSAN cluster until the failed disk has been replaced. For the standard steps by referring to VxRail KB 462945 to
replace a failed HDD or SDD. Knowledge base is periodically updated.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 75
First step in troubleshooting a failed compute node is to collect system information. It is important to note that full
node replacement is only done on G Series. With respect to e/p/v/s series specific components are replaced.
Faulted compute nodes are identified using amber LED indication. Once you confirm the node has to
be replaced. Remove the node by unplugging the cables attached to the faulted compute node. Once
you complete this, you can unpack the part and place it on a static-free surface. Follow standard solve
procedures for hardware node replacement by selecting the appropriate VxRail system. For more
information on SolVe procedure refer to Hardware component replacement for VxRail Appliance
Model: VxRail G410. Refer to VxRail: How to create dispatch for Dell components - KB 512908 to
learn about standard procedures to follow.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 76
This module covered common troubleshooting issues and procedures for network software, and hardware issues.
Under network issues we covered network connectivity and vSAN related issues. Under software issues the
module covered the common issues related to VxRail installation virtual machines vSAN, and ESRS. It also
provided an example of troubleshooting scenario and steps for resolution. Under hardware issues, this module
covered diagnosing the hardware issues and provided the procedures for disk and node replacement.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 77
During this course you learnt the major hardware, software and integration components of VxRail Appliance. You
also learnt to monitor the VxRail appliance and perform health checks to keep the system trouble-free. In the
event of an issue, it is important to collect log files. During this course you also learnt the process to collect
incident logs from VxRail components. It also provided the most common troubleshooting scenarios and various
tools and procedures to resolve issues relating to network hardware, and software within VxRail Appliance.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 78
Thank you.

Copyright © 2018 Dell Inc. VxRail Appliance Remote Support and Troubleshooting 79

Вам также может понравиться