VMware LogoVMware software provides a completely virtualized set of hardware to the guest operating system. VMware software virtualizes the hardware for a video adapter, a network adapter, and hard disk adapters. The host provides pass-through drivers for guest USB, serial, and parallel devices. In this way, VMware virtual machines become highly portable between computers, because every host looks nearly identical to the guest. In practice, a system administrator can pause operations on a virtual machine guest, move or copy that guest to another physical computer, and there resume execution exactly at the point of suspension. Alternatively, for enterprise servers, a feature called vMotion allows the migration of operational guest virtual machines between similar but separate hardware hosts sharing the same storage (or, with vMotion Storage, separate storage can be used, too). Each of these transitions is completely transparent to any users on the virtual machine at the time it is being migrated.

Main website: VMware

ESX Server NIC Teaming and VLAN Trunking

There’s a bit of confusion regarding NIC teaming in ESX Server and when switch support is required. You can most certainly create NIC teams (or bonds) in ESX Server without any switch support whatsoever. Once those NIC teams have been created, you can configure load balancing and failover policies. However, those policies will affect outbound traffic only. In order to control inbound traffic, we have to get the physical switches involved. This article is written from the perspective of using Cisco Catalyst IOS-based physical switches. (In my testing I used a Catalyst 3560.)

To create a NIC team that will work for both inbound and outbound traffic, we’ll create a port channel using the following commands.

Switch(config)#int port-channel1
Switch(config-if)#description NIC team for ESX server
Switch(config-if)#int gi0/23
Switch(config-if)#channel-group 1 mode on
Switch(config-if)#int gi0/24
Switch(config-if)#channel-group 1 mode on

This will report the current load balancing algorithm in use by the switch. On my Catalyst 3560 running IOS 12.2(25), the default load balancing algorithm was set to “Source MAC Address”. On my ESX Server 3.0.1 server, the default load balancing mechanism was set to “Route based on the originating virtual port ID”. The result? The NIC team didn’t work at all I couldn’t ping any of the VMs on the host, and the VMs couldn’t reach the rest of the physical network. It wasn’t until I matched up the switch/server load balancing algorithms that things started working.

To set the switch load-balancing algorithm, use one of the following commands in global configuration mode.

port-channel load-balance src-dst-ip (to enable IP-based load balancing)
port-channel load-balance src-mac (to enable MAC-based load balancing)

There are other options available, but these are the two that seem to match most closely to the ESX Server options. I was unable to make this work at all without switching the configuration to “src-dst-ip” on the switch side and “Route based on ip hash” on the ESX Server side. From what I’ve been able to gather, the “src-dst-ip” option gives you better utilization across the members of the NIC team than some of the other options. (Anyone care to contribute a URL that provides some definitive information on that statement?)

Creating the NIC team on the ESX Server side is as simple as adding physical NICs to the vSwitch and setting the load balancing policy appropriately. At this point, the NIC team should be working.

Configuring VLAN Trunking

In my testing, I set up the NIC team and the VLAN trunk at the same time. When I ran into connectivity issues as a result of the mismatched load balancing policies, I thought they were VLAN-related issues, so I spent a fair amount of time troubleshooting the VLAN side of things. It turns out, of course, that it wasn’t the VLAN configuration at all. (In addition, one of the VMs that I was testing had some issues as well, and that contributed to my initial difficulties.)

To configure the VLAN trunking, use the following commands on the physical switch.

s3(config)#int port-channel1
s3(config-if)#switchport trunk encapsulation dot1q
s3(config-if)#switchport trunk allowed vlan all
s3(config-if)#switchport mode trunk
s3(config-if)#switchport trunk native vlan 4094

This configures the NIC team (port-channel1, as created earlier) as a 802.1q VLAN trunk. You then need to repeat this process for the member ports in the NIC team: (With other IOS version this could be done automatically)

s3(config)#int gi0/23
s3(config-if)#switchport trunk encapsulation dot1q
s3(config-if)#switchport trunk allowed vlan all
s3(config-if)#switchport mode trunk
s3(config-if)#switchport trunk native vlan 4094
s3(config-if)#int gi0/24
s3(config-if)#switchport trunk encapsulation dot1q
s3(config-if)#switchport trunk allowed vlan all
s3(config-if)#switchport mode trunk
s3(config-if)#switchport trunk native vlan 4094

If you haven’t already created VLAN 4094, you’ll need to do that as well.

s3(config)#int vlan 4094
s3(config-if)#no ip address

The “switchport trunk native vlan 4094″ command is what fixes the problem I had last time I worked with ESX Server and VLAN trunks; namely, that most switches don’t tag traffic from the native VLAN across a VLAN trunk. By setting the native VLAN for the trunk to something other than VLAN 1 (the default native VLAN), we essentially force the switch to tag all traffic across the trunk. This allows ESX Server to handle VMs that are assigned to the native VLAN as well as other VLANs.

On the ESX Server side, we just need to edit the vSwitch and create a new port group. In the port group, specify the VLAN ID that matches the VLAN ID from the physical switch. After the new port group has been assigned, you can place your VMs on that new port group (VLAN) and assuming you have a router somewhere to route between the VLANs you should have full connectivity to your newly segregated virtual machines.

Final Notes

I did encounter a couple of weird things during the setup of this configuration (I plan to leave the configuration in place for a while to uncover any other problems).

  • First, during troubleshooting, I deleted a port group on one vSwitch and then re-created it on another vSwitch. However, the virtual machine didn’t recognize the connection. There was no indication inside the VM that the connection wasn’t live; it just didn’t work. It wasn’t until I edited the VM, set the virtual NIC to a different port group, and then set it back again that it started working as expected. Lesson learned: don’t delete port groups.
  • Second, after creating a port group on a vSwitch with no VLAN ID, one of the other port groups on the same vSwitch appeared to “lose” its VLAN ID, at least as far as VirtualCenter was concerned. In other words, the VLAN ID was listed as “*” in VirtualCenter, even though a VLAN ID was indeed configured for that port group. The “esxcfg-vswitch -l” command (that’s a lowercase L) on the host still showed the assigned VLAN ID for that port group, however.
  • It was also the “esxcfg-vswitch” command that helped me troubleshoot the problem with the deleted/recreated port group described above. Even after recreating the port group, esxcfg-vswitch still showed 0 used ports for that port group on that vswitch, which told me that the virtual machine’s network connection was still somehow askew.

Hopefully this information will prove useful to those of you out there trying to set up NIC teaming and/or VLAN trunking in your environment. I would recommend taking this one step at a time, this will make it easier to troubleshoot problems as you progress through the configuration.