Image
{{'2016-11-01T12:11:58.8449761Z' | utcToLocalDate }}
Simon Sabin

Are you running VMware ESX 6 Update 1 and connecting to SQL - be warned

If you are running any of your application on windows 2012 and up and on VMware ESX 6 with Update 1 connecting to SQL you must read on.

We’ve been working at a client recently on an issue where requests to SQL have a 500ms latency between requests. What’s really odd is that the time seems to be lost between the client code making a request and the request coming to SQL. Profiler shows a gap of 500ms between the end time of one request and the start time of another request.

The issue would only affect some applications and would appear to work for periods, then appear again, then go away again. All very odd and very expensive to diagnose

We’ve seen stuff before with name resolution, whether it be DNS or something in SQL like the AG listener. However this all occurred on the same connection.

After much digging we came across

https://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=2129176

Reading the article it seems there is an optimisation in the network stack in Windows 2012 and higher that reduces the overhead of processing network packets, by not having to process the header ever time. Something in the VMware stack was resulting in that optimisation not working and the 500ms latency. The symptom was exactly the issue we had found.

To test if you have the same issue you can run the following repo. Change the server name in the connection string to that of your SQL Server, run it from a windows 2012 machine to a SQL server. If you only ever get values back that are < 20ms then you are good. If you get any that are > 500ms you have the problem.

If you want to know how we got to the repro then you can read about it here http://sabin.io/blog/vmware-network-performance-buggetting-a-repro

The Fix

The fix is to apply Update 2 for VMware, however you might want to do that until you’ve tested it in your environment. That’s your call, thankfully there are a few workarounds

The Workaround

imageThe workaround is to disable receive side scaling. Oddly it appears that if you go into the adapter settings and disable receive side scaling it doesn’t do anything because there is a global override

What you need to do is disable the tcp level setting

This shows you the state of Receive Side Scaling

netsh int tcp show rsc

And this disables it

netsh int tcp set global rsc=disabled

comments powered by Disqus