Wednesday, September 24, 2008

VMWare ESX VM Network Aggregation / EtherChannel / LACP

I recently went through a series of tests with our VMWare ESX 3.5 environment to test link aggregation and failover. We tested a variety of Link aggregation methods including LACP, PgaP, and standard etherchannel while experimenting with different settings on the vSwitch.

Here's what the lab consisted of: a Dell R900 with 8 physical nics (3 used for the test), a Cisco 4507 Switch with 2 Gig high speed blades, 2 laptops and 3 Virtual Machines. This would also work on 2 cisco 3750's connected with a stackwise cable. We setup a virtual switch on the ESX Server that was just for Virtual Machine networks with 3 pnics. Our ports are set for trunk mode so we can have multiple vlan's on our VM's. We setup a ping from each of the Virtual Machines to one of the laptops with a command like: "ping -t -w 500" and setup the same from the 2 laptops back to the virtual machines. This way we could see how many packets we lost on each setting change when unplugging a cable from the switch or taking a blade offline.

After quite a bit of testing we found Standard Etherchannel to work the best. With a standard etherchannel setup we would loose between 1 and 3 packets (at the faster retry time) if a network cable or switch blade was brought offline. In my opinion, this was an acceptable behavior, although I would like to see an LACP aggregation running.

Here's what our final configuration looked like:

VMWare ESX vSwitch Configuration: (under vSwitch Properties->Nic Teaming)
Load Balancing: Route based on IP Hash
Network Failover Detection: Link Status Only
(this could be beacon probing depending on your network)

Notify Switches: Yes
Failback: Yes

Switch Config (Cisco 4507 or 3750)
# Set Switch load balance to IP
port-channel load-balance src-dst-ip

# Add port 1/2 to group

interface GigabitEthernet1/2

switchport mode trunk

channel-group 1 mode on

# Add port 2/1 to group

interface GigabitEthernet2/1
switchport mode trunk
channel-group 1 mode on

# Add port 2/2 to group
interface GigabitEthernet2/2
switchport mode trunk

channel-group 1 mode on

# Setup Port Channel Group

interface Port-channel1

switchport mode trunk
spanning-tree portfast trunk


  1. I've read that link aggregation at the switch was not necessary since ESX will internally alternate NICs for load balancing. Is this true?

  2. Kind of. It will work, but in my tests you lost approx 4 times the number of ping packets without the switch level aggregation setup. This is because you are relying on the server to move the links out of the virtual switch rather than the physical switch just re-routing the traffic.

    My tests included 2 VM's pinging out with a -t -w 500 option and 2 physical systems pinging in to those VM's with the "-t -w 500" option.

    I then when through a series of unplugging nics from the switch with up to 2 at a time and back and forth with my vSwitch's that were aggregated and contained 3 pnics.