Microsoft Network Load Balancing on VMWare ESX Server

I have been working on implementing the new “Terminal Services Gateway” service that will be released with Server 2008.  In order to cluster TS Gateways, we need to have a network load balancing solution in place.  Hardware solutions are supported, but getting access to those would be a pain.  Thus, I am back to looking at Microsoft Network Load Balancing services.

My last attempt as using NLB ran into some troubles… NLB-fronted Server 2003 terminal servers were pokey at best.  This time, I thought I should look at the implications of running Microsoft NLB on ESX server, which is the platform I am using for the TS Gateways.

Unsurprisingly, I learned some new things…

  • It is suggested that IGMP multicast is not necessary in a virtual network environment:
    Thus, we will forgo the use of “IGMP Multicast mode” NLB config… Which is just as well as IGMP appears to break NLB on our ESX cluster!
  • Another advantage of multicast is that it does limit inter-node communication, so we do not need to add a second virtual network adapter to each ESX guest just to allow the NLB manager to be run on a NLB node!  This is not specifically referenced in any Server 2008 documentation that I can find… it is just something that I noticed… When I had the cluster configured in Unicast mode, the NLB manager complained mightily about its inability to talk to its sibling node.  In Multicast mode, I get the same warning, but parameter reconfiguration succeeds.
  • Connections to the cluster still are a bit slow on initial lookup… why why why?
  • 2 thoughts on “Microsoft Network Load Balancing on VMWare ESX Server

    1. J. Greg

      It turns out the initial cluster connection delay is limited to Internet Explorer TLS/SSL and RDP protocols only. The problem does not occur when using Firefox.

      Some Wireshark analysis shows that my client workstation is doing some funky DHCP Inform requests, and also attempting to resolve a host named “WPAD” using WINS. Some google-based analysis hints that this behavior is being caused by automatic proxy detection in IE (WPAD = “Web Proxy Auto Discovery”). Indeed, I had applied some creative proxy settings in IE when troubleshooting SharePoint problems last month. After disabling proxy discovery, we find that we are able to access web pages on the cluster without delays. Interestingly, RDP sessions to the cluster load faster now, too… the RDC client must use the “autoproxy” settings from the “Internet Options” control panel, too. Fun!

      I still can’t ping the cluster, but that is predictable behavior based on information in the online help for NLB. I have a request into our network services team to add static routes to the cluster multicast address, and/or to enable “proxy ARP”.

    Leave a Reply