Tag : vcenter-server

Nexus 1000v ESXi Uplink Port Greyed Out

If you find that when you look in vCenter at the uplinks for a Nexus 1000v distributed switch and you see that they are greyed out or the error message port blocked by admin this can be a sign that the host is not communicating correctly with the Nexus switch.Greyed Out Ports

 

Ports blocked by admin.  (Names blanked out to protect the innocent.)
Port Blocked By Admin

There are two parts to this fix, the first if you can’t get online with virtual machines and the second if you can.  Scroll down to resolution part two to read more.

Resolution part one

Each ESXi host has a VEM, a Virtual Ethernet Module running on it and the VEM talks to the VSM, the Virtual Supervisor Module over the control and packet VLAN networks to discover what the configuration of the ports are and what portgroups and VLANs etc are assigned to the uplinks of the host.
It is worth checking that the correct VEM is installed as the Nexus can communicate only with a VEM that is on the same or earlier version than the VSM.  Don’t install a later version, even if it contains patches and hotfixes as the Nexus wont see it and although it will appear that it is connected to the switch, albeit with greyed out uplinks, virtual machines won’t be able to communicate with any other VMs external to this host.

You may find that logging into the Nexus you see that the host is missing and that the error logs state the following error message:

%VEM_MGR-2-VEM_MGR_NOT_BC: Module cannot be inserted because it is not backward compatible

Now the error states backwards compatible but in fact the error is that it is not forwards compatible.  Great huh?!

 

If you have an environment with existing hosts connected to the Nexus switch you can run the following command on the Nexus switch to compare the version of the VEM currently installed on the existing hosts and then compare that to the version installed on the host with connectivity issues.

# show module

The output will be similar to the following:

Mod   Ports  Module-Type     Model     Status

— —– ——————————- —————— ————

1    248    Virtual Ethernet Module           NA                  ok

 

Mod    Sw                  Hw

—  ——————  ————————————————

1    4.2(1)SV1(5.2)      VMware ESXi 5.0.0 Releasebuild-721882 (3.0)

The information under the Sw section shows the module installed.

I suggest you then go to one of the working hosts and type the following.

# esxcli software vib list

This will list out all the installed modules.  Look for the one made by Cisco with the same revision number and note down the particular build number, then compare this to the one in the host that has the greyed out uplink and check they are the same.  If they are not you can download the same version either from the web interface of the Nexus switch (management IP address/HTTP service, if configured/enabled) or by downloading it from either the VMware downloads site or the Cisco website.
Please note both these sites require an active support subscription for you to download them.
Once downloaded remove the host from the Nexus dvSwitch in vCenter, SSH to the host and remove the existing Cisco module and install the new one using these commands.

# esxcli software vib remove -n name:version

# esxcli software vib add -d /PathToVIBModule.zip

Finally list out the installed VIB modules to make sure you now have the correct one installed.

# esxcli software vib list

Assuming all the above is done you should now find that the error messages on the Nexus have gone away and that the host is no longer missing on the switch leaving you free to connect the host back to the switch and to test VM network connectivity.

Resolution part two

Assuming that you are able to get online with VMs now attached to the switch portgroups you may still find that the icon is greyed out.  The fix for this is a simple one.

Open vCenter>Inventory>Networking select the uplink icon for the one that is greyed out and select the port tab.  Now select the host and click start monitoring port state.  If it is running already stop monitoring port state and start monitoring port state.  Viola.  Hopefully your problem has now gone away.
Start Monitoring Port State

vMotion CPU Compatibility

vMotion has quite a few requirements that need to be in place before it will work correctly. Here is a list of the key requirements for vMotion to work.

  • Each host must be correctly licensed
  • Each host must meet shared storage requirements
  • Each host must meet the networking requirements
  • Each compatible CPU must be from the same family

 

When configuring vMotion between hosts I would recommend keeping to one brand of server per cluster, i.e. Dell, HP, IBM. Also always ensure that these servers are compatible with each other.  You can confirm this by speaking to the server manufacturer.
A very important item to consider is to always ensure you are using the latest BIOS version on each of your hosts.

Ensuring that the CPU’s are compatible with each other is essential for vMotion to work successfully, this is because the host that the virtual machine migrates to has to be capable of carrying on any instructions that the first host was running.
If a virtual machine is successfully running an application on one host and you migrate it to another host without these capabilities the application would most likely crash, possibly even the whole server would crash, hence why vMotion compatibility is required between hosts before you can migrate a running virtual machine.

It is user-level instructions that bypass the virtualisation layer such as Streaming SIMD Extensions (SSE), SSE2 SSSE3, SSE4.1 and Advanced Encryption Standard (AES) Instruction Sets that can differ greatly between CPU models and families of processors, and so can cause application instability after the migration.

Always ensure that all hardware is on the VMware compatibility guide.
To confirm compatibility between same family CPU models check the charts below.

This is a chart from Dell showing which Intel CPU’s support vMotion.



This second chart also from Dell illustrates which AMD processors support vMotion

Further information on vMotion requirements between hosts can be found in the vSphere Datacenter Administration Guide

VMware NIC Trunking Design

Having read various books, articles, white papers and best practice guides I have found it difficult to find consistently good advice on vNetwork and physical switch teaming design so I thought I would write my own based on what I have tested and configured myself.

To begin with I must say I am no networking expert and may not cover some of the advanced features of switches, but I will provide links for further reference where appropriate.

 

The basics

Each physical ESX(i) host has at least one physical NIC (pNIC) which is called an uplink.

Each uplink is known to the ESX(i) host as a vmnic.

Each vmnic is connected to a virtual switch (vSwitch).

Each virtual machine on the ESX(i) host has at least one virtual NIC (vNIC) which is connected to the vSwitch.

The virtual machine is only aware of the vNIC, only the vSwitch is aware of the uplink to vNIC relationship.

This setup offers a one to one relationship between the virtual machine (VM) connected to the vNIC and the pNIC connected to the physical switch port, as illustrated below.

When adding another virtual machine a second vNIC is added, this in turn is connected to the vSwitch and they share that same pNIC and the physical port the pNIC is connected to on the physical switch (pSwitch).

When adding more physical NIC’s we then have additional options with network teaming.

 

NIC Teaming

NIC teaming offers us the option to use connection based load balancing, which is balanced by the number of connections and not on the amount of traffic flowing over the network.

This load balancing can provide us resilience on our connections by monitoring the links and if a link goes down, whether it’s the physical NIC or physical port on the switch, it will resend that traffic over the remaining uplinks so that no traffic is lost.  It is also possible to use multiple physical switches provided they are all on the same broadcast range.  What it will not do is to allow you to send traffic over multiple uplinks at once, unless you configure the physical switches correctly.

There are four options with NIC teaming, although the fourth is not really a teaming option

  1. Port-based NIC teaming
  2. MAC address-based NIC teaming
  3. IP hash-based NIC teaming
  4. Explicit failover

Port-based NIC teaming

Route based on the originating virtual port ID or port-based NIC teaming as it is commonly known as will do as it says and route the network traffic based on the virtual port on the vSwitch that it came from.   This type of teaming doesn’t allow traffic to be spread across multiple uplinks.  It will keep a one to one relationship between the virtual machine and the uplink port when sending and receiving to all network devices.  This can lead to a problem where the amount of physical ports exceeds the number of virtual ports as you would then end up with uplinks that don’t do anything.  As such the only time I would recommend using this type of teaming is when the amount of virtual NIC’s exceeds the number of physical uplinks.

MAC address-based NIC teaming

Route based on MAC hash or MAC address-based NIC teaming sends the traffic out of the originating vNIC’s MAC address.  This works in a similar way to the port-based NIC teaming in that it will send its network traffic over only one uplink.  Again the only time I would recommend using this type of teaming is when the amount of virtual NIC’s exceeds the number of physical uplinks.

IP hash-based NIC teaming

Route based on IP hash or IP hash-based NIC teaming works differently from the other types of teaming.  It takes the source and destination IP address and creates a hash.  It can work on multiple uplinks per VM and spread its traffic across multiple uplinks when sending data to multiple network destinations.

Although IP-hash based can utilise multiple uplinks it will only use one uplink per session.  This means that if you are sending a lot of data between one virtual machine and another server that traffic will only transfer over one uplink.  Using the IP hash based teaming we can then use teaming or trunking options on the physical switches.  (Depending on the switch type)  IP hash requires Ether Channel (again depending on switch type) which for all other purposes should be disabled.

Explicit failover

This allows you to override the default ordering of failover on the uplinks.  The only time I can see this being useful is if the uplinks are connected to multiple physical switches and you wanted to use them in a particular order.  Either that or you think a pNIC In the ESX(i) host is not working correctly.  If you use this setting it is best to configure those vmnics or adapters as standby adapters as any active adapters will be used from the highest in the order and then down.

 

 

The other options

Network failover detection

There are two options for failover detection.  Link status only and beacon probing.  Link status only will monitor the status of that link, to ensure that a connection is available on both ends of the network cable. If it becomes disconnected it will mark it as unusable and send the traffic over the remaining NIC’s.  Beacon probing will send a beacon up the network on all uplinks in the team.  This includes checking that the port on the pSwitch is available and is not being blocked by configuration or switch issues.  Further information is available on page 44 of the ESXi configuration guide.  Do not set to beacon probing if using route based on IP-hash.

 

Notify switches

This should be left set to yes (default) to minimise route table reconfiguration time on the pSwitches.  Do not use this when configuring Microsoft NLB in unicast mode.

Failback

Failback will re-enable the failed uplink when it is working correctly and send the traffic over it that was sent over the standby uplink.  Best practice is to leave this set to yes unless using IP based storage.  This is because if the link were to go up and down quickly it could have a negative impact on iSCSI traffic performance.

Incoming traffic is controlled by the pSwitch routing the traffic to the ESX(i) host, and hence the ESX(i) host has no control over which physical NIC traffic arrives. As multiple NIC’s will be accepting traffic, the pSwitch will use whichever one it wants.

Load balancing on incoming traffic can be achieved by using and configuring a suitable pSwitch.

pSwitch configuration

The topics covered so far describe egress NIC teaming, with physical switches we have the added benefit of using ingress NIC teaming.

Various vendors support teaming on the physical switches, however quite a few call trunking teaming and vice-versa.

From the switches I have configured I would recommend the following.

All Switches

A lot of people recommend disabling Spanning Tree Protocol (STP) as vSwitches don’t require it as they know the MAC address of every vNIC connected to it.  I have found that the best practice is to enable STP and set it to Portfast.  Without Portfast enabled there can be a delay whereby the pSwitch has to relearn the MAC addresses again during convergence which can take 30-50 seconds.  Without STP enabled there is a chance of loops not being detected on the pSwitch.

802.3ad & LACP

Link aggregation control protocol (LACP) is a dynamic link aggregation protocol (LAG) which can dynamically make other switches aware of the multiple links and combine them into one single logical unit.  It also monitors those links and if a failure is detected it will remove that link from the logical unit.

VMware doesn’t support LACP.  However VMware does support IEEE 802.3ad which can be achieved by configuring a static LACP trunk group or a static trunk.  The disadvantage of this is that if one of those links goes down, 802.3ad static will continue to send traffic down that link.

 

Dell switches

Set Portfast using

Spanning-tree portfast

To configure follow my Dell switch aggregation guide

Further information on Dell switches is available through the product manuals.

Cisco switches

Set Portfast using

Spanning-tree portfast (for an access port)

Spanning-tree portfast trunk (for a trunk port)

Set etherchannel

Further information is available through the Sample configuration of EtherChannel / Link aggregation with ESX and Cisco/HP switches

HP switches

Set Portfast using

Spanning-tree portfast (for an access port)

Spanning-tree portfast trunk (for a trunk port)

Set static LACP trunk using

trunk < port-list > < trk1 … trk60 > < trunk | lacp >

Further information is available through the Sample configuration of EtherChannel / Link aggregation with ESX and Cisco/HP switches