Okay so we’ve had VMware since version 3.5, and have upgraded through 4, 4.1 and then a big leap to 5.5 just recently. We started getting some really big issues with performance on some of the VMs, but it was not what I first expected!
We had a web and SQL application server (Virtual Machine) that was getting slower and slower, so we added more RAM from 4GB then to 8GB, then to 16GB then finally to 24GB, each time after a short while, performance would drop off and memory usage would balloon (hint for later.)
So when checking task manager I’d see that the server had 24GB RAM physical memory in total, all the processes listed in task manager only really added up to about 4GB tops. This included SQL and IIS and all the other odds and ends. But and this was the key thing, the server said it was using 22GB of the 24GB of RAM, i.e. it said it was using 4GB and that was all that was accountable in the task manager.
So where the hell has the other 18GB odd gone, its been used somewhere but doesn’t show up. So now the server was dog slow, logging on was nearly impossible, SQL was unusable and the web app crawled taking 20 – 30 seconds to load a page up.
There is a neat tool called RAMMap from SysEssentials, this is a useful thing to use to unpick that missing chunk of memory and what is using it. http://technet.microsoft.com/en-gb/sysinternals/ff700229.aspx
I thought to start with this was a memory leak from the web application, but nothing seemed to show it, the memory usage for IIS was small, because it was being squeezed so badly.
When running RAMMap, it was obvious, I’ve attached a screen shot here, unfortunately I didn’t take one when the server was screwed, but you can see the “Driver Locked” memory space is only using 1MB of memory, now on this starving server it was using about 18GB of memory, ah there is the problem, question is what is using 18GB of memory!
So now, what to do about the 18GB driver locked, memory and then more pressing what is it?
It turns out it was the balloon driver in VMware tools. upon uninstalling VMware tools the issue went away, the driver locked memory was freed up and all was well, installing VMware tools again, the issue came back.
Why was VMware tools ballooning memory out, the host had 128GB RAM, the server was allocated 24GB, with all the other VMs the total host memory usage barely pushed 50% what’s up.
Well it turned out I thought it was set to 24GB, and the VM was allocated 24GB, BUT, there is a little known problem with VMware and adding RAM to VMs that sometimes affects VMs. Especially if you’ve moved up through the versions of VMware and upgraded VMs along the way. The full detail was here: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003470
What happens is that if the VM was created with 4GB (i.e. like mine was), the Resource Memory limit is set to 4GB too, and its not set to be unlimited, see the screen shot below of the offending tick box.
So when the VM has its memory increased to say 16GB, VMware allocates it 4GB, the OS thinks its got 16GB, and the balloon driver tries to force the OS to page out the rest, meaning that extra 12GB of RAM you added doesn’t mean anything. So you add more RAM, in my case up to 24GB, and now the balloon driver gobbles up 20GB, and the server is still dog slow.
To fix, set the Limit to “unlimited” under the Virtual Machine Properties -> Resources -> Memory settings, by ticking the box, the VMware articile above gives some more tips too.
Then just wait, or just VMotion the VM to another host, in my case, it did itself, then all of a sudden the balloon was gone and the server was back to normal using all the RAM it needed.
Re-runing the RAMMap tool showed the “driver locked” memory to be only 1MB.
Great stuff, all done.
So I checked some other VMs loads of others had this same problem, but the RAM given to them before was enough, so only a couple were ballooning, but nowhere near what this particular VM was doing!