Академический Документы
Профессиональный Документы
Культура Документы
Before jumping into the marvelous world of command lines, as a starter, well check the
state of the environment from the vSphere Web Client UI standpoint.
Authenticate to your web client, and click on Network & Security > Installation >
Management .
Each compute node should have a Virtual Tunnel Endpoint (VTEP) vmkernel interface
(vmk3 here) with an IP Address assigned to it.
Dont worry if you get any errors, this article is meant to help you troubleshoot the root
cause.
Transport Network
If VXLAN Connectivity isnt operational, I mean if a VM on a VXLAN cannot ping another
one on the same logical switch the most common reason is a misconfiguration on the
transport network. To check that, SSH to a Compute node and type :
ping ++netstack=vxlan -d -s 1572 -I vmk3 1.2.3.4
++netstack=vxlan instruct the ESXi host to use the VXLAN TCP/IP stack.
-d set Dont Fragment bit on IPv4 packet
-s 1572 set packet size to 1572 to check if MTU is correctly setup up to 1600
-I VXLAN vmkernel interface name
1.2.3.4 Destination ESXi host IP Address
If the ping fails, launch another one without the dont fragment/size argument set
ping ++netstack=vxlan -I vmk3 1.2.3.4
If this one succeed, it means your MTU isnt correctly set to at least 1600 on your transport
network.
If both fails its a VLAN ID or Uplink misconfiguration. Before going any further you have to
make sure that these pings works.
If both succeed, but you still dont have connectivity on the virtual wire, Ill show you, in the
Compute node controller connectivity section, how to investigate that using net-vdl2 -l .
Note: If you dont know the name of your VXLAN vmkernel you can easily check it, by
looking at the configuration of your VDS.
But youve also seen that information in the Logical Network Preparation UI above.
Controller
You can get the IP Address of your Controller by clicking on the VM
named NSX_Controller_<ID> in the vSphere Web Client.
To investigate controller issues, SSH to one of your controller VM to use the CLI (login:
admin, password: the one set at deployment time).
status
# show control-cluster status
Type
Status
Since
-------------------------------------------------------------------------------Join status:
Join complete
09/14 14:08:46
Majority status: Connected to cluster majority
09/18 08:45:16
Restart status:
This controller can be safely restarted
09/18 08:45:06
Cluster ID:
b20ddc88-cd62-49ad-b120-572c23108520
Node UUID:
b20ddc88-cd62-49ad-b120-572c23108520
Role
Configured status Active status
-------------------------------------------------------------------------------api_provider
enabled
activated
persistence_server enabled
activated
switch_manager
enabled
activated
logical_manager
enabled
activated
directory_server enabled
activated
List the implemented role on your controller, the Not Configured for api_provider is
normal, its the NSX-Manager whos published the NSX-v API.
# show control-cluster roles
Listen-IP Master? Last-Changed Count
api_provider
Not configured
Yes 09/18 08:45:17
6
persistence_server
N/A
Yes 09/18 08:45:17
5
switch_manager
127.0.0.1
Yes 09/18 08:45:17
6
logical_manager
N/A
Yes 09/18 08:45:17
6
directory_server
N/A
Yes 09/18 08:45:17
6
Controller networking
# show network interface
Interface
breth0
eth0
Address/Netmask
MTU
192.168.110.201/24 1500
1500 UP
Admin-Status Link-Status
UP
UP
UP
#
#
#
#
L2 networking troubleshooting
First make sure to connect on the master controller of the virtual network you want to
troubleshoot. You can then use the following commands
# show control-cluster logical-switches vni 5001
VNI
Controller
BUM-Replication ARP-Proxy Connections VTEPs
5001
192.168.110.201 Enabled
Enabled 0
0
As you can see above, you should now connect to 192.168.110.201 to troubleshoot VNI
5001.
Let see what we can use on that controller node to get more information on this virtual wire
(VNI = 5001).
If you want to check the managemement TCP connection between Controller and ESXi
# show control-cluster logical-switches connection-table 5001
Host-IP
Port ID
192.168.110.51 17528 2
192.168.110.52 46026 3
192.168.210.56 42257 4
192.168.210.51 30969 5
192.168.210.57 12127 6
192.168.210.52 30280 7
List all VMs Mac addresses on each VNIs of a specific ESXi with their associated VTEP.
# show control-cluster logical-switches mac-records <ESXi_MGT_IP>
L3 networking troubleshooting
First you can list all of your logical routers
# show control-cluster logical-routers instance all
LR-Id
LR-Name
Hosts[]
Edge-Connection Service-Controller
1460487509 default+edge-1
192.168.110.51
192.168.110.201
192.168.110.52
192.168.210.52
192.168.210.51
192.168.210.57
192.168.210.56
You can then use the LR-Id above to get interface details on one instance
# show control-cluster logical-routers interface-summary 1460487509
Interface
Type Id
IP[]
570d45550000000b
vlan 100
570d45550000000c
vxlan 5004
10.10.10.1/24
570d45550000000a
vxlan 5000
Use the Interface name to get even more details on VXLAN 5004 LIF for example
# show control-cluster logical-routers interface 1460487509
570d45550000000c
Interface-Name: 570d45550000000c
Logical-Router-Id:1460487509
Id:
5004
Type:
vxlan
IP:
10.10.10.1/24
DVS-UUID:
1cec0e50-029c-a921-b6d8-d0fc73e57969
ee660e50-e861-6d04-b4d8-1d462df952bc
Mac:
02:50:56:8e:21:35
Mtu:
1500
Multicast-IP:
0.0.0.1
Designated-IP:
Is-Sedimented: false
Bridge-Id:
Bridge-Name:
Bridging
To get more information on a all bridge instance hosted on your logical router
# show control-cluster logical-routers bridges <lr-id> all
LR-Id
Bridge-Id Host
Active
1460487509 1
192.168.110.52 true
Compute Nodes
In the introduction weve seen how to send ping on the transport network. But from a SSH
connection to an ESXi node, we can use many more troubleshooting commands. This
section will details most of them.
VIBs
When you prepare a compute host for NSX-v, vSphere Installation Bundle (VIBs) are
automatically installed by the NSX Manager via ESX Agency Manager (EAM). You can
check they were correctly installed (output abridged):
# esxcli software vib get --vibname esx-vxlan
...
Summary: Vxlan and host tool
Description: This package loads module and configures firewall for vxlan
networking.
...
Provides: vxlan = 2.0.0.0-nsx, vdr = 1.0.0.0
Maintenance Mode Required: False
...
# esxcli software vib get --vibname esx-vsip
...
Summary: vsip module
Description: This package contains DFW and NetX data and control plane
components.
...
Provides: vsip = 1.0.0-0
Maintenance Mode Required: False
...
# esxcli software vib get --vibname esx-dvfilter-switch-security
...
Summary: dvfilter-switch-security module
Description: This package contains dvfilter-switch-security module.
...
Provides: switchSecurity = 0.1.0.0
Maintenance Mode Required: False
...
When you remove a host from an NSX Prepared cluster the VIBs will be automatically
removed, but you can remove them from the command line :
# esxcli software vib remove -n esx-vsip
# esxcli software vib remove -n esx-vxlan
# esxcli software vib remove -n esx-dvfilter-switch-security
MTU
1600
1600 Intel
1600
1600
To check TCP segmentation offload (TSO) and checksum offload (CSO) settings for a Nic
# esxcli network nic tso get -n vmnic0
# esxcli network nic cso get -n vmnic0
If you see some strange behavior while using LACP, you can use the following commands
to leave only one interface up to verify if LACP negotiation is reponsible for your issues:
# esxcli network nic down -n vmnic1
# esxcli network nic down -n vmnic2
# esxcli network nic down -n vmnic3
To revert it:
# esxcli network nic up -n vmnic1
# esxcli network nic up -n vmnic2
# esxcli network nic up -n vmnic3
VMs
Get a list of all VMs on the compute node
# esxcfg-vswitch -l
Switch Name
Num Ports Used Ports Configured Ports MTU
vSwitch0
1536
1
128
1500
PortGroup Name
Uplinks
VM Network
DVS Name
Num Ports Used Ports Configured Ports MTU
Uplinks
Mgmt_Edge_VDS 1536
13
512
1600 vmnic1,vmnic0
DVPort ID
In Use
Client
897
1
vmnic0
767
1
vmk2
639
1
vmk1
510
1
vmk0
895
0
905
1
vmk3
907
1
vmnic1
506
1
NSX_Controller_c6aea614-0dc7-40fd-b6460230608d4709.eth0
497
1
dr-4-bridging-0.eth0
127
1
br-sv-01a.eth0
VMkernel Port
# esxcfg-vmknic -l
Interface Port Group/DVPort IP Family IP Address
Netmask
Broadcast
MAC Address
MTU
TSO MSS Enabled Type
vmk0
510
IPv4
192.168.110.52
255.255.255.0 192.168. 110.255 00:50:56:09:08:3c 1500 65535
true
STATIC
vmk1
639
IPv4
10.10.20.52
255.255.255.0
10.10.20 .255 00:50:56:64:f0:9b 1500 65535
true STATIC
vmk2
767
IPv4
10.10.30.52
255.255.255.0
10.10.30 .255 00:50:56:65:67:8e 1500 65535
true STATIC
vmk3
905
IPv4
192.168.150.52
255.255.255.0 192.168. 150.255 00:50:56:6e:5e:e3 1600 65535
true
STATIC
ARP Table
# esxcli network ip neighbor list
Neighbor
Mac Address
Vmknic Expiry State Type
--------------- ----------------- ------ -------- ----- ------192.168.110.202 00:50:56:8e:52:25 vmk0
674 sec
Unknown
192.168.110.42 00:50:56:09:45:60 vmk0 1196 sec
Unknown
192.168.110.10 00:50:56:03:00:2a vmk0 1199 sec
Unknown
192.168.110.203 00:50:56:8e:7a:a4 vmk0
457 sec
Unknown
192.168.110.201 00:50:56:8e:ea:bd vmk0
792 sec
Unknown
192.168.110.22 00:50:56:09:11:07 vmk0 1146 sec
Unknown
10.10.20.60
00:50:56:27:49:6b vmk1
506 sec
Unknown
VXLAN
Dump VXLAN configuration
List VXLAN networks, great to see whos the master controller for each VXLAN.
# esxcli network vswitch dvs vmware vxlan network list --vdsname=<VDS_NAME>
VXLAN ID Multicast IP
Control Plane
Controller
Connection Port Count MAC Entry Count ARP Entry Count MTEP Count
-------- ------------------------- ----------------------------------- --------------------- ------------------------ --------------- ---------5000 N/A (headend replication) Enabled (multicast proxy,ARP proxy)
192.168.110.202 (up)
1
1
0
0
5004 N/A (headend replication) Enabled (multicast proxy,ARP proxy)
192.168.110.203 (up)
1
0
0
0
But you can also get specific information on the VXLAN like mac, arp, port, mtep and stats
like this
For example to list all the remote Mac addresses, pushed by the controller for a specific
VNI:
# esxcli network vswitch dvs vmware vxlan network mac list -vdsname=<VDS_NAME> --vxlan-id=<VXLAN_ID>
You can get also the remote IP Addresses/Mac Addresses that are still in the local ESXi
cache. They timeout after 5 if no traffic.
# esxcli network vswitch dvs vmware vxlan network arp list --vdsname=<VDS_NAME> --vxlan-id=<VXLAN_ID>
To get a list of remote known ESXi for a VNI. Youll also see whos the MTEP on each
Transport Network subnet.
# esxcli network vswitch dvs vmware vxlan network vtep list --vdsname=<VDS_NAME> --vxlan-id=<VXLAN_ID>
Note: if you dont get the vtep argument but he mtep one intead, just
run /etc/init.d/hostd restart
# esxcli network vswitch dvs vmware vxlan network port list --vdsname=<VDS_NAME> --vxlan-id=<VXLAN_ID>
# esxcli network vswitch dvs vmware vxlan network stats list --vdsname=<VDS_NAME> --vxlan-id=<VXLAN_ID>
Controller connectivity
To check Controller connectivity from ESXi (VDL= Virtual Distributed Layer 2)
# net-vdl2 -l
VXLAN Global States:
Control plane Out-Of-Sync:
No
UDP port:
8472
VXLAN VDS:
Mgmt_Edge_VDS
VDS ID: 1c ec 0e 50 02 9c a9 21-b6 d8 d0 fc 73 e5 79 69
MTU: 1600
Segment ID:
192.168.150.0
Gateway IP:
192.168.150.2
Gateway MAC: 00:50:56:27:48:7d
Vmknic count: 1
VXLAN vmknic: vmk3
VDS port ID: 905
Switch port ID: 50331656
Endpoint ID: 0
VLAN ID:
0
IP:
192.168.150.52
Netmask:
255.255.255.0
Segment ID:
192.168.150.0
IP acquire timeout:
0
Multicast group count: 0
Network count: 2
VXLAN network: 5000
Multicast IP: N/A (headend replication)
Control plane: Enabled (multicast proxy,ARP proxy)
Controller:
192.168.110.202 (up)
MAC entry count:
1
ARP entry count:
0
Port count:
1
VXLAN network: 5004
Multicast IP: N/A (headend replication)
Control plane: Enabled (multicast proxy,ARP proxy)
Controller:
192.168.110.203 (up)
MAC entry count:
0
ARP entry count:
0
Port count:
1
Or
# esxcli network vswitch dvs vmware vxlan network list -vds-name <vds
name>
VXLAN ID Multicast IP
Control Plane
Controller
Connection Port Count MAC Entry Count ARP Entry Count MTEP Count
-------- ------------------------- ----------------------------------- --------------------- ------------------------ --------------- ---------5000 N/A (headend replication) Enabled (multicast proxy,ARP proxy)
192.168.110.202 (up)
1
1
0
0
5004 N/A (headend replication) Enabled (multicast proxy,ARP proxy)
192.168.110.203 (up)
1
0
0
0
If you see a controller down message above, you can fix it by restarting netcpa like this
# /etc/init.d/netcpad restart
Note: netcpa is a user world agent that communicate thru SSL with the NSX Controller.
To check ESXi controller connections.
# esxcli network ip connection list| grep tcp | grep 1234
tcp
0
0 192.168.110.52:43925 192.168.110.203:1234
ESTABLISHED
44923 newreno netcpa-worker
tcp
0
0 192.168.110.52:46026 192.168.110.202:1234
ESTABLISHED
46232 newreno netcpa-worker
tcp
0
0 192.168.110.52:39244 192.168.110.201:1234
ESTABLISHED
44923 newreno netcpa-worker
Gateway
-----
Flags
---------
10.10.10.0
255.255.255.0
570d45550000000c
0.0.0.0
UCI
1 MANUAL 410777
ARP information
# net-vdr --nbr -l default+edge-1
VDR default+edge-1:1460487509 ARP Information :
Legend: [S: Static], [V: Valid], [P: Proxy], [I: Interface]
Legend: [N: Nascent], [L: Local], [D: Deleted]
Network
Mac
Flags
Expiry
SrcPort Interface Refcnt
------------------- --------- --------- -----10.10.10.1
02:50:56:56:44:52 VI
permanent 0
570d45550000000c 1
Bridging
Dump bridge info
# net-vdr --bridge -l <vdrName>
VDR default+edge-1:1460487509 Bridge Information :
Bridge config:
Name:id
mybridge:1
Portset name:
DVS name:
Mgmt_Edge_VDS
Ref count:
2
Number of networks: 2
Number of uplinks: 0
Network 'vlan-100-type-bridging' config:
Ref count:
2
Network type:
1
VLAN ID:
100
VXLAN ID:
0
Ageing time:
300
Fdb entry hold time:1
FRP filter enable: 1
Network port '50331655' config:
Ref count:
2
Port ID:
0x3000007
VLAN ID:
4095
IOChains installed: 0
Network 'vxlan-5000-type-bridging' config:
Ref count:
2
Network type:
1
VLAN ID:
0
VXLAN ID:
5000
Ageing time:
300
Fdb entry hold time:1
FRP filter enable: 1
Network port '50331655' config:
Ref count:
2
Port ID:
0x3000007
VLAN ID:
4095
IOChains installed: 0
Distributed Firewall
To investigate the Distributed Firewall Rules applied to a Virtual Machine vNic, first get the
VM UUID (vcUuid) for it using (output abridged)
# summarize-dvfilter
..
world 230869 vmm0:br-sv-01a vcUuid:'50 11 97 6c fa 73 a7 5b-a7 0e 36 1f a2
f5 84 38'
..
: nic-230869-eth0-vmware-sfw.2
: 50 11 97 6c fa 73 a7 5b-a7 0e 36 1f a2 f5 84 38
:0
: --NOT SET--
Use the filter name above to list the associated Distributed Firewall Rules
Packet Capture
vSphere 5 offers a new command, pktcap-uw to capture packet at different level of the
processing.
As you can see on the diagram above, we can now capture traffic at the vmnic , vmknic ,
vnic level.
Let see how it works from the outside world to the VM. Im not going to include the ouput of
the command here, I advice you to try on your hosts instead. By the way I also advice to
save the output to a file in pcap format with -o ./save.pcap , youll then be able to open it
from Wireshark.
uplink/vmnic
You can open up the DVUplinks section of your VDS to get the name of your uplink
interface. Here well be using vmnic0 . So to capture packets received on this uplink, use
pktcap-uw --uplink vmnic0
By default it will only capture received traffic (RX), to capture packets sent on the uplink to
the outside world use the --capture argument like this
pktcap-uw --uplink vmnic0 --capture UplinkSnd
Well details all the filtering options at the end of this section, but in the meantime you can
for example filter out only ICMP packet received on a specific destination by using -proto 0x01 and --destip <ip>
Or to capture ICMP Packets that are sent on vmnic0 from an IP Address 192.168.25.113
pktcap-uw --uplink vmnic0 --capture UplinkSnd proto 0x01 --srcip
192.168.25.113
Switchport
To capture on a specific switchport, you first have to get the ID of the port. Launch
# esxtop
Type n , to get a list of all the ports with the corresponding attachment. Take note of Port ID
of the port youre interested in and use a --switchport argument like this
pktcap-uw switchport <port-id> -proto 0x01
Traffic direction
For switchport , vmk , uplink , dvfilter , direction of traffic is specified using -dir 0 for inbound and --dir 1 for outbound but inbound is assumed.
0- Rx (Default)
1- Tx
So dont be surprised, pktcap-uw doesnt work like tcpdump and by default only capture the
received (RX) traffic. Dont forget to change that if necessary by specifying --dir 1 , it will
switch the capture to the Transmit (Tx) direction.
Argument and Filtering Options
-o save.pcap to save capture to a file in pcap format
-c 25 capture only 25 packets
-vxlan <segment id> to specify VXLAN VNI of flow
--vlan <VLANID> filter for VLAN ID
-ip <x.x.x.x> filter for SRC or DST
--srcmac <xx:xx:xx:xx:xx> filter for source mac address
--dstmac <xx:xx:xx:xx:xx> filter for source mac address
--srcip <x.x.x.x[/<range>]> filter for source IP
Decoding capture
Weve shown you how to save the captured packets to a file, to get a quick overview of the
kind of traffic passing by, you can decode the pcap using tcpdump like this
tcpdump-uw -r save.pcap
But using Wireshark will give you a better vision of the traffic, with all the details.
Tracing
If you are interested in seeing even more details on the processing of the packet through
the ESXi TCP/IP stack, just add --trace argument to see packet traversing the ESXi
network stack. Looks for Drop message that indicate something went wrong in the
processing.
Drops
When things dont work as you expect, one really usefull command is
pktcap-uw --capture Drop
You should see here some errors like VLAN Mismatch or something else that will give you
a hint about why traffic isnt flowing as you would expect.
DVFilter
This command captures packets as seen by the dvfilter (before the filtering happens)
pktcap-uw --capture PreDVFilter --dvfilterName <filter name>
Capture point
You can get a list of all possible capture point with -A
pktcap-uw -A
Supported capture points:
1: Dynamic -- The dynamic inserted runtime capture point.
2: UplinkRcv -- The function that receives packets from uplink dev
3: UplinkSnd -- Function to Tx packets on uplink
4: Vmxnet3Tx -- Function in vnic backend to Tx packets from guest
5: Vmxnet3Rx -- Function in vnic backend to Rx packets to guest
Let me share with you a little bit more details about some of them.
PortOutput show traffic delivered from the vSwitch to the Guest when used with switch
VPN
SSL-VPN
LB
FW
DHCP Relay
Lets now details the command line interface available from a SSH connection to an NSX
Edge
show ip route
show ip ospf neighbor
show ip ospf database
show
show
show
show
show
show
show
show
show
show
show
configuration {ospf|bgp|isis|static-routing}
configuration {firewall|nat|dhcp|dns}
configuration {loadbalancer|ipec|sslvpn-plus}
interface [IFNAME]
firewall
ip {route|ospf|bgp|forwarding}
arp
system {cpu|memory|network-stats|storage|uptime}
service {dhcp|dns|highavailability|ipsec|loadbalancer|sslvpn-plus}
log {follow|reverse}
floatable
But one of the most convenient one is the following which enable you to dump all
ingress/egress traffic on a specific edge interface
debug packet display interface vNic_0
Youll then get a live display of whats flowing on that virtual network interface:
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vNic_0, link-type EN10MB (Ethernet), capture size 65535 bytes
22:43:44.190770 IP 172.16.16.3 > 224.0.0.5: OSPFv2, Hello, length 48
22:43:45.914868 IP 172.16.16.1.17773 > 192.168.2.20.8080: Flags [S], seq
1790616868, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 3],
length 0
22:43:45.915713 IP 192.168.2.20.8080 > 172.16.16.1.17773: Flags [S.], seq
841084052, ack 1790616869, win 29200, options [mss
1460,nop,nop,sackOK,nop,wscale 6], length 0
22:43:45.915750 IP 172.16.16.1.17773 > 192.168.2.20.8080: Flags [.], ack 1,
win 1825, length 0
22:43:45.915990 IP 172.16.16.1.17773 > 192.168.2.20.8080: Flags [P.], seq
1:101, ack 1, win 1825, length 100
22:43:45.916497 IP 192.168.2.20.8080 > 172.16.16.1.17773: Flags [.], ack
101, win 457, length 0
22:43:45.922654 IP 192.168.2.20.8080 > 172.16.16.1.17773: Flags [P.], seq
4381:4434, ack 101, win 457, length 53
22:43:45.922674 IP 192.168.2.20.8080 > 172.16.16.1.17773: Flags [F.], seq
4434, ack 101, win 457, length 0
22:43:45.922842 IP 172.16.16.1.17773 > 192.168.2.20.8080: Flags [.], ack 1,
win 1825, options [nop,nop,sack 1 {4381:4434}], length 0
You can use the same filtering syntax as the one used by tcpdump, for example :
debug packet display interface vNic_0 icmp
If you have multiple words for your filter, just add underscore between them
debug packet display interface vNic_0 dst_port_443
Logs
Controller Logs
Check ESXi connectivity issues from the Controller
show log cloudnet/cloudnet_java-vnet-controller.<start-time-stamp>.log
NSX Manager
SSH (l: admin) to the NSX Manager and use the following command to access logs
nsxmgr-l-01a> show manager log follow
ESXi Logs
/var/log/esxupdate.log check this file if you have VIB installation issues
/var/log/vmkernel.log Distributed Firewall logs are sent to this file
/var/log/netcpa.log User World Agent logs
EAM Logs
EAM logs should be checked when installation of the VIBs module fails.
/storage/log/vmware/vpx/eam.log on Linux vCenter
ProgramData/VMware/VMware VirtualCenter/Logs/ on Windows vCenter
Change the XMLs /config/log/level value to verbose, save and restart netcpad
# /etc/init.d/netcpad start