NSX-T Troubleshooting

30 Apr 2020 by Simon Greaves

NSX-T Troubleshooting

Check L2 before L3.

Check (L2)

  • MTU
  • VLAN
  • TEP
    • IP
    • MTU
  • CCP

N-VDS settings (L3)

  • MTU (L2)
  • Routing table (L4)
  • TEP
  • vTEP tables
  • MAC tables

Manager Troubleshooting

CorfuDB3 nodesQuorum must be up, at least 2 corfu servers required for quorumGroup Member Leader Election Server (GMLE) helps in detecting the fault with an NSX Manager node failure.  It also helps elect a new leader per group.Day  2 OperationsUse st en to enter engineering mode (root privileged mode)

Logs

   
Component Log Files and Locations
NSX Policy Manager /var/log/policy/policy.log
NSX Manager
NSXAPI Logs
CorfuDB logsCluster BootstrapManager (CBM)
/var/log/syslog/var/log/manger.log/var/log/proton/nsxapi.log/var/log/nsx-audit.log/var/log/corfu/var/log/cbm
NSX Controller /var/log/cloudnet/nsx-ccp.log
ESXi host
DFW
/var/log/cfgAgent.logesxupdate.lognsxa-opsagent.lognsx-syslog/var/log/dfwpktlogs.log (only fills if logging enabled on rule)
KVM host
DFW
/var/log/vmware/nsx-syslog/var/log/syslog/var/log/openvswitch/ovswitchd.log/var/log/dpkg.log/var/log/dfwpktlogs.log (only fills if logging enabled on rule)
Edge NodesLoad Balancer errors Syslog (get log-file syslog)Access-log [follow]Error-log [follow]

##

Set logging level on NSX Manager with

Set service manager logging-level debug

Log Message IDs

Infrastructure Preparation Logs

Policy Manager logs

View with get log-file policy.logget log-file syslogController LogCFG Agent Log (ESXi)KVM

Syslog

Configure Syslog ExporterYou get vRLI with NSX.

Protocols Supported

  • TCP
  • UDP
  • TLS

Severity Level

  1. Emergency
  2. Alert
  3. Critical
  4. Error
  5. Warning
  6. Notice
  7. Informational
  8. Debug

Management and Edge Node configuration

set logging-server <hostname-or-ip-address[:port]> proto level

ESXi Configuration

  • esxcli network firewall ruleset set -r syslog -e true
  • esxcli system syslog config set –loghost=<hostname-or-ip-address[:port]>
  • esxcli system syslog reload

KVM Configuration

  • Login as root
  • Create this file

/etc/rsyslog.d/40-vmware-remote-logging.conf

  • Add this line to the file

.@:514;RFC5424fmt'

  • Restart syslog

Systemctl restart rsyslog

Granular tech support bundles added in 2.4

Monitoring Dashboards

Packet Capture

If you need detailed traffic info, use port mirroring.Can use CLI to setup packet capture on:

  • NSX Manager

start capture interface [file ] [count ] [expression ]

  • NSX Edges

set capture session interface direction

  • ESXi
    • Collect packets

pktcap-uw

  • View packets

tcpdump -uw

  • KVM

Tcpdump

Troubleshooting scenarios

NSX Manager

If file corrupt check OVA or QCOW2 install files12 characters minimum on passwordCheck logs

Installation problems

NSX CLI get servicesget service get cluster status get configuration get managers get configurationget servicesget managersget cluster status

Nsxcli

Can see that ESXi is connected to 46, and KVM is on 47, showing the Shards are working correctly.

Logical Switching

Common switching problems

N-VDS is incorrectly configured on a hostOverlay tunnel (GENEVE) is misconfiguredTEPs unable to reach each other

Validate switch

esxcfg-vswitch -l

Check GENEVE VMKernel

esxcli network ip interface ipv4 getVmk10 is the TEP for NSX.Vmk50 is for intra-tier networking/routing and containers.

Verifying overlay tunnel reachability

vmkping ++netstack=vxlan -s Vxlan is used by host rather than GENEVE.  It's the same stack for ESXi.Try 1572 if 1575 fails.  This is the minimum size needed to support GENEVE. GENEVE adds 72 bytes to a 1500 byte data packet.If 1572 fails try 1472.  if that works, the overhead for the overlay hasn’t been configured.

N-VDS Not Initialised on a Host

If a VM is not able to communicate on a specific host, check that the segment is present, if it isn’t showing on the host, go into the GUI, and check the N-VDS segment is present.  If it is, check the advanced settings virtual switches and look for any errors like Partial Success Shown below.

If this happens, check that the agents are running on the host.

/etc/init.d/nsx-mpa statusesxcli network ip connection list grep 5671/etc/init.d/nsx-proxy statusesxcli network ip connection list grep 1235/etc/init.d/nsx-opsagent status

Routing Problems

  • Check if BGP neighbours are not misconfigured and as a result the neighbour relationship is not established.
  • The internal route advertisement on the Tier-1 router is misconfigured
  • Route redistribution on the Tier-0 router is misconfigured

Especially those check boxes! Check Routing Table get logical-router Check the SR for routingValidate the routing table for the Tier-0 SR VRFvrfget route b = BGP

For DR check the forwarder for similar informationget forwarding

BGP neighbour

get bgp neighbor summaryCheck the status is established.  Active means still setting up!

BGP route table

T0 SR can show BGP route infoget bgp ipv4

Firewall

Most common firewall issues are

  • Firewall policy rules are configured but not enabled or published
  • Firewall policy rules are not applied to the intended entity
  • The sequence of rules is incorrect, remember it’s top to bottom

get firewall statussummary KVMOvs-appctl used for configuration of Firewall.Validate with

ovs-appctl -t /var/run/openvswitch/nsxa-ctl dfw/vifGet the VIF then typeovs-appctl -t /var/run/openvswitch/nsxa-ctl dfw/rules  Rules are defined with addrsets (address sets).  These have GUIDs on them as well.

ESXiUse vsipioctl and summarize-dvfilter Summarize-dvfilter | grep Look for the filter name then use vsipiolctl getrules -f

The example adds the -A16 variable which tells grep to add 16 lines to the output.This is without the -A16  and with

Can also use the addrsets in the filter instead of name.  Same commands again but with -f addrset number.The edges give definition of what’s in the rule sets using

get firewall ruleset rules You get the interface_id by running get firewall interfaces

nsxdp-cli

Can get deeper analysis with nsxdp-cli

Again, this command only shows 1 line as the -A1 is used in the egrep.

Edge Validation

nsxcli

get configurationget node-uuidget interfacesget managersget host-switchesget tunnel-portsget vteps

Comments are closed for this post.