Hi Guys,
I have a small problem regarding VMWare and the load balancing of physical nics on my esx servers. I have attached a diagram of how I have setup the network(ing) for my esx hosts and SAN (VMWare Physical Setup.pdf). I have 2 switches and have followed guidelines from VMWare white papers and forum users for the configuration of my setup but I seem to have a problem with the load balancing of physical nics on my esx hosts. All network cards are gigabit and I have 1 dual port card used solely for iSCSI traffic on a seperate port based vlan to keep the traffic off of the default vlan which our computers reside on. I also have another dual port card used solely for vmotion traffic, again on its own vlan. I then have a quad port card which is used for all virtual machine traffic and is plugged into the default vlan that all of our computers sit on. I have attached a diagram of the setup for this (networking.png) which shows how I have the networking configured on each esx host.
The problem I have is with the load balancing of physical network cards on the server. I first noticed this when I was on an esx host and realised that each esx host server could only see 2 targets on my SAN (not 4 as per the 4 nics in the SAN). I remember in my test environment I could see all 4 targets but I was only using one switch with everything plugged into it. So I logged into the service console and tried pinging all 4 ip's of each nic on the SAN. I only got replies from nic 1 and nic 3. At first I thought it was a problem with the second switch (because after tracing the path back from the 2 iscsi nics on the esx host one routes through one switch and one through the other) but after pulling the network lead that joined the esx host to the first switch I discovered that I could now ping nic 3 and nic 4 on the SAN! So it seems to me that the 2 nics being used for iscsi traffic on the esx host server are actually configured for fail over and not load balancing. I had a quick look in the performance monitor for the nics and sure enough of the 2 nics assigned for iscsi traffic only one seems to be in use (no load balancing). Similarly and more worrying is that I have 4 nics assigned for virtual machine traffic and according to the performance chart (attached as performance.png) only one of the four nics is being used for ALL virtual machine traffic.
I cant believe this is the case because I have read on the vmware website that nic teaming provides load balancing as well as failover!
The link is here (NIC Teaming heading):
Creating a Virtual Networks (VLAN) in a Virtual Infrastructure - VMware
I have had a look at the Virtual Switch configuration (Switch Config.png as below) that connects physical nics to virtual services like the console/vmkernel and load balancing is configured for 'Route based on the originating virtual port ID'. I assumed that this as the default setting would be fine for load balancing. However nic load balancing just doesnt seem to be working (but nic failover does).
As a side note, I have read that load balancing in ESX only applies to outbound traffic. To set up inbound load balancing, I need to enable VLAN trunking. I've googled this, but it doesn't strike me as the solution for my problem.
Has anyone had a similar experience or is there some simple configuration setting that I'm missing here? As a result of the load balancing problem I believe we are experiencing performance problems, especially at peak times when the VMs are under load. Im sure that load balancing the nics would greatly improve performance, especially between the esx hosts that are running the virtual machines and the SAN where connectivity is currently limited to a single gigabit cable for iscsi traffic.
Thanks guys,
James
Hi JamesC,
Could you tell us which switch brand you are using, and could you post the config off those switch ports?
bio..
Hi bio. Im using Dell powerconnect 5224 switches. I've been looking at this and it seems that the problem is that I am using 'Route based on the original virtual port ID' as the load balancing option on the virtual switch. The way I understand it is that I have to use 'Route based on ip hash'. Apparently this can cause switch problems unless the switches support 802.3ad link aggregation because packets could come appear to come from the same source, but be bound for different destinations. The switches I have appear to support 802.3ad.
Here's the config for the switch:
Code:no spanning-tree vlan database vlan 2-3 exit interface range ethernet g(1-5) switchport access vlan 2 exit interface range ethernet g(6-8) switchport access vlan 3 exit interface vlan 2 name iSCSI exit interface vlan 3 name VMotion exit voice vlan oui-table add 0001e3 Siemens_AG_phone________ voice vlan oui-table add 00036b Cisco_phone_____________ voice vlan oui-table add 00096e Avaya___________________ voice vlan oui-table add 000fe2 H3C_Aolynk______________ voice vlan oui-table add 0060b9 Philips_and_NEC_AG_phone voice vlan oui-table add 00d01e Pingtel_phone___________ voice vlan oui-table add 00e075 Polycom/Veritel_phone___ voice vlan oui-table add 00e0bb 3Com_phone______________ iscsi target port 860 address 0.0.0.0 iscsi target port 3260 address 0.0.0.0 interface vlan 1 ip address 192.1.1.24 255.255.255.0 exit username admin password xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx level 15 encrypted snmp-server community Dell_Network_Manager rw view DefaultSuper
Hello JamesC,
You are on the right track indeed. When I read your thread i remembered the same problem that we had in our school network.
You're ESX settings should be:
Load Balancing : Route based on ip hash
Network Failover : Link Status Only
Notify switches : Yes
Failback : Yes
We use 3com switches but i think we can get it to work on your dell switches as well.
First a small piece from our switch:
When i look at your dell config it seems that you are NOT grouping port 9+10 and port 11+12 and 13+14Code:interface GigabitEthernet2/0/28 stp edged-port enable duplex full speed 1000 port link-type trunk port trunk permit vlan all flow-control broadcast-suppression pps 3000 undo enable snmp trap updown port link-aggregation group 3 # interface GigabitEthernet2/0/29 stp edged-port enable duplex full speed 1000 port link-type trunk port trunk permit vlan all flow-control broadcast-suppression pps 3000 undo enable snmp trap updown port link-aggregation group 3 #
* You must create link-aggregation group (3com term) or channel-groups (Dell term) for these ports or you will never have load balancing.
* As you stated your dell switches support the 802.3ad link aggregation so no worries about that.
* Do not use LACP but config it manually
* Also force your switch ports on 1000MBS full duplex (no autodetection)
* you must enable jumbo frames on your switches
* you must enable STP edgeports on your switchports
Your config should look something like this on both switches:
On the SAN side you'll have to check yourself (port 1-2-15) since i do not know your SAN that well.Code:no spanning-tree jumbo frame vlan database vlan 2-3 exit interface port-channel 1 exit interface port-channel 2 exit interface port-channel 3 exit interface vlan 2 name iSCSI exit interface vlan 3 name VMotion exit interface vlan 4 name VMnetwork exit interface ethernet g 1/3 spanning-tree edge-port speed-duplex 1000full no negotiation switchport access vlan 2 flowcontrol exit interface ethernet g 1/4 spanning-tree edge-port speed-duplex 1000full no negotiation switchport access vlan 2 flowcontrol exit interface ethernet g 1/5 spanning-tree edge-port speed-duplex 1000full no negotiation switchport access vlan 2 flowcontrol exit interface ethernet g 1/6 spanning-tree edge-port speed-duplex 1000full no negotiation switchport access vlan 3 flowcontrol exit interface ethernet g 1/7 spanning-tree edge-port speed-duplex 1000full no negotiation switchport access vlan 3 flowcontrol exit interface ethernet g 1/8 spanning-tree edge-port speed-duplex 1000full no negotiation switchport access vlan 3 flowcontrol exit interface ethernet g 1/9 spanning-tree edge-port speed-duplex 1000full no negotiation switchport access vlan 4 flowcontrol channel-group 1 exit interface ethernet g 1/10 spanning-tree edge-port speed-duplex 1000full no negotiation switchport access vlan 4 flowcontrol channel-group 1 exit interface ethernet g 1/11 spanning-tree edge-port speed-duplex 1000full no negotiation switchport access vlan 4 flowcontrol channel-group 2 exit interface ethernet g 1/12 spanning-tree edge-port speed-duplex 1000full no negotiation switchport access vlan 4 flowcontrol channel-group 2 exit interface ethernet g 1/13 spanning-tree edge-port speed-duplex 1000full no negotiation switchport access vlan 4 flowcontrol channel-group 3 exit interface ethernet g 1/14 spanning-tree edge-port speed-duplex 1000full no negotiation switchport access vlan 4 flowcontrol channel-group 3 exit
Let me know how it work for you
bio..
Last edited by bio; 28th May 2008 at 09:14 AM.
I wasn't aware of esx supporting jumbo frames yet, although it might have been added in 3.5.
What if not all the device on that same vlan supports jumbo frames, could that cause problems?
When you say don’t use LACP do you mean don’t use dynamic LACP but manual instead?
Bio,
Question for you, we are using 3Com 5500g's in our environment and you are one of the only other people that I have found that are using 3Com. The settings that you posted from your congif, are those pointing to your IScsi san for load balancing or are those the settings for your VM operational network, or do you use those settings for both?
Trunking is ok for the LAN only side.... however you shouldn't use it for your iSCSI traffic.
Have a look at this article for setting up iSCSI in VMWare.
A Multivendor Post on using iSCSI with VMware vSphere - Virtual Geek
There are currently 1 users browsing this thread. (0 members and 1 guests)