Posts relating to VMWare ESX, ESXi and any other virtualised type infrastructure and the strange and interesting journeys it takes me on! |
VMWare
Can't unmount an NFS datastore from a ESXi host
PROBLEM As part of a storage replacement programme, we had migrated all our VMs off the NFS datastore. When attempting to unmount the datastore from the host(s) we where given this error: Call "HostDatastoreSystem.RemoveDatastore" for object "datastoreSystem-122544" on vCenter Server "vcenterFQDN" failed. An unknown error has occurred. Checking the /var/log/vmkernel.log on the host we saw the following (where <datastorename> is the name of the NFS datstore we were trying to remove): 2017-09-29T10:38:05.498Z cpu22:59182 opID=cf1be666)WARNING: NFS: 1921: <datastorename> has open files, cannot be unmounted Of course first thing was to check all the VMs were actually moved, yes they were. Next checking HA to make sure its not using a handle for its datastore heartbeating; again no this had all been pointed at the new storage, but i temporarily disabled HA on the cluster with no avail. Perhaps it was some of the VMs had snapshots open when they were migrated, again no this doesn't leave file handles open on migration. SOLUTION Finally came across these settings on the host where it was still referencing the old storage, so they needed to be updated to point at a folder on the new storage and hey presto I could unmount the storage as expected.
Note that I could not find any open handles from any of the hosts to the storage using any of the tricks (shown below), it just seemed that having that value in the storage was enough to stop it dismounting as expected. http://www.torkwrench.com/2013/02/12/esxi-unable-to-complete-sysinfo-operation-when-unmounting-nfs-datastore/ |
vMOTION (including Storage vMOTION) Fails Migrating from ESXi 5.5 to ESXi 6.0 U1
When attempting to migrate a VM from VMware ESXi 5.5. to ESXi 6.0 U1 you may experience problems, this does not affect all VMs only some and they are typically VMs that have been backed up using an external backup tool and make use of CBT (Change Block Tracking). A bug within ESX 5.5 or ESXi 6.0 U1 is present and attempting to storage vMOTION from a 5.5 ESXi host to a 6.0 U1 tickles this bug and causes the vMOTION to fail with an error such as the following: Looking within the vmware.log stored with the VMDKs for the VM on your virtualisation storage you'll see something like: To resolve you can run the following script (below) - This script has been taken from this thread: https://communities.vmware.com/thread/507191, all credit to AFIARoth who cam up with it. The process for allowing you to migrate the problem is as shown below. The script sets the CBT disable flag, then creates a snapshot and removes it immediately afterwards to force an update of the VMX file, when run in enable mode it sets the CBT disable flag and then creates a snapshot and removes it immediately afterwards to force an update of the VMX file.
|
Ping from a specific VMKernel Port to test Connectivity
If you need to test connectivity between two VMKernel ports VMK's you can use the following command: # vmkping -I vmk2 <Destination IP> |
HP BL490c G7 Blade within a c7000 BladeSystem Chassis with Flex-10 Networking Fails after ESXi 5.5 Upgrade
After an upgrade to ESXi 5.5 from ESXi 5.1, all the networking on the host stopped working, as I had already upgraded other BL490c G7 blades with (seemingly) identical hardware but upon upgrading this one the NICS were detected but there was no connectivity through the chassis, the flex-10 to the host. All checks on the BladeSystem controllers and the ESXi OS seemed to show all was working fine and connected, there were no errors on the ports within the BladeSystem. So I attempted the following: https://communities.vmware.com/thread/463557?tstart=0 esxcli system module set --enabled=false --module=elxnet esxcli system module set --enabled=true --module=be2net Reboot host, after a reboot the networking was operational, it appears this is a driver issue within VMware. To revert the changes: esxcli system module set --enabled=false --module=be2net esxcli system module set --enabled=true --module=elxnet Reboot host, networking went back to the failed state again. However this driver was working on (seemingly) identical hardware fine. So I checked the firmware version on a working and not working host: On Host2 (an identical machine to the faulty Host1): # esxcli network nic get -n vmnic0 Advertised Auto Negotiation: true Advertised Link Modes: 10000baseT/Full Auto Negotiation: false Cable Type: Current Message Level: -1 Driver Info: Bus Info: 0000:02:00:0 Driver: elxnet Firmware Version: 4.2.401.6 Version: 10.5.121.7 Link Detected: true Link Status: Up Name: vmnic0 PHYAddress: 0 Pause Autonegotiate: false Pause RX: true Pause TX: true Supported Ports: Supports Auto Negotiation: false Supports Pause: true Supports Wakeon: true Transceiver: Wakeon: MagicPacket(tm)
On Host2 we get: # esxcli network nic get -n vmnic0 Advertised Auto Negotiation: true Advertised Link Modes: 10000baseT/Full Auto Negotiation: false Cable Type: Current Message Level: -1 Driver Info: Bus Info: 0000:02:00:0 Driver: elxnet Firmware Version: 3.102.517.703 Version: 10.5.121.7 Link Detected: true Link Status: Up Name: vmnic0 PHYAddress: 0 Pause Autonegotiate: false Pause RX: true Pause TX: true Supported Ports: Supports Auto Negotiation: false Supports Pause: true Supports Wakeon: true Transceiver: Wakeon: MagicPacket(tm) Notice the firmware on the card if different, this should be upgraded to the same version, that way it resolves the issue. Upgraded the NIC firmware, rebooted the host afterwards the networking operated as expected. http://h20564.www2.hpe.com/hpsc/swd/public/readIndex?sp4ts.oid=4194638&swLangOid=8&swEnvOid=54 |
When trying to deploy an OVF using the web interface getting error: Client Integration Plug-in must be installed to enable OVF functionality
Problem: When trying to deploy an OVF using the web interface, you getting this error message below. Even when you have installed the client integration plugins it still doesn't work. "Client Integration Plug-in must be installed to enable OVF functionality" Solution: 1. Uninstall any/all existing Vmware client integration plugins from: Control Panel->Add/Remove Programs 2. Restart your computer. 3. Open Internet Explorer or Chrome and install the plugin from web client either from the login page or from the OVF menu when prompted. 4. Once installed close all browser windows and restart your computer. 5. The plugin within Internet explorer should work fine now, when you try it. If you are prompted to allow the plugin, click accept and ensure it doesn't prompt again. 6. To get the plugin to work in Google Chrome (version 45+), start Chrome and get to the following menu: Settings Menu (Three Lines, top right) -> Privacy -> Content settings... -> Plugins -> Manage Exceptions Add the URLs of the vCenter Servers and set them to "allow": https://hostname.domain.com:9443 https://hostname.domain.com:9443 https://hostname.domain.com https://hostname.domain.com Click the "Done" button to save the settings, and then close and re-open Chome and attempt to deploy a plugin, you should find it works now, if you are prompted to allow the plugin to run click "Allow" and set it to remember the setting. Currently I've not found a fix for making the Client Integration Plugin work in Firefox yet. |
VM won't vMotion on Nutanix Cluster - Error: msg.vmk.status.VMK_BUSY
Had an issue where a VM would not migrate to another node within the Nutanix cluster, getting this error message after about 21%.
To resolve this I used: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1031636 even though this is for version 4.x this worked for VMware 5.5. On the host under advanced settings change this setting: Migrate.VMotionResolveswaptype = 1 Change the value to 0 on the problem host and the host you are migrating to, you can set it back to 1 again once the migration has completed. |
Can't join VMware 5.5 VCSA to Active Directory - Error: Invalid Active Directory Domain
When attempting to add a VMWare vCenter Server Appliance to our Active Directory I encountered this error message: "Error: Invalid Active Directory Domain". Within the /var/log/vmware/vpx/vpxd_cfg.log log file I was seeing the following: START locking... /usr/sbin/vpxd_servicecfg ad write 2016-01-21 09:31:32 18457: [18454]BEGIN execution of: /usr/sbin/vpxd_servicecfg 'ad' 'write' 'username@domain.com' CENSORED 'DOMAIN.COM' 2016-01-21 09:31:32 18457: Testing domain (DOMAIN.COM) 2016-01-21 09:31:32 18457: ERROR: Failed to ping: 'DOMAIN.COM' 2016-01-21 09:31:32 18457: VC_CFG_RESULT=301 2016-01-21 09:31:32 18457: END execution It would seem that the root record for the domain is missing, this should resolve to a domain controller, so it has something to bind to. So within the /etc/hosts I added a record for domain.com that pointed to the IP address of one of our domain controllers. 172.17.5.10 domain.com domain Then tried again now with success: 2016-01-21 09:45:05 23284: START locking... /usr/sbin/vpxd_servicecfg ad write 2016-01-21 09:45:05 23287: [23284]BEGIN execution of: /usr/sbin/vpxd_servicecfg 'ad' 'write' 'username' CENSORED 'DOMAIN.COM' 2016-01-21 09:45:05 23287: Testing domain (DOMAIN.COM) 2016-01-21 09:45:05 23287: Enabling active directory: 'DOMAIN.COM' 'username' 2016-01-21 09:45:11 23287: VC_CFG_RESULT=0 2016-01-21 09:45:11 23287: END execution Reboot the VCSA. Then login as the administrator@vsphere.local user account and then configure the VCSA with the active directory identity source. Administration->Single Sign-On->Configuration->Identity Sources Then add the Active Directory identity source. You should then be able to remove this line from the hosts file and the AD authentication should continue to work OK. Once you have added permissions to the various AD groups or users you want to have access. |
Find the file and folder location of a VMWare template
Seems on the face of it to be something that you can find in the GUI, but its not quite that easy! Based on this link: https://psvmware.wordpress.com/2013/07/04/where-are-virtual-machine-templates-located/ You can find it out by running the following, this will provide you with the name of the template, the host it is currently on and where the vmtx file (i.e. the template) is located. From a VMWare PowerCLI prompt run the following to find all the templates in the cluster called "Cluster1": # Connect-VIServer <FQDN of vCenter> # foreach($vmhost in get-cluster 'Cluster1' | get-vmhost){get-template -Location $vmhost | select name,@{n='VMHOST';e={$vmhost.name}},@{n='VMTX';e={$_.extensiondata.config.files.VmPathName}} | Format-Table -wrap } If you need to migrate the template to a new host (but don't want to do the deploy, then recreate template approach, or put a host into maintenance mode) then you can find the location, remove it from the inventory and then re-add the template from its file location on a new host. |
VMware Transparent Page Sharing - Is it on? How much RAM is it saving?
VMWare TPS (Transparent Page Sharing) is a mechanism within the hypervisor to allow the saving of RAM by allowing virtual machines where their memory matches to only store a copy of the memory used once. I.e. If you had 5 Windows 2012 R2 virtual machines running, they are going to be almost completely identical, so if each VM was assigned 4GB of RAM, rather than it using 20GB of RAM, it would transparent page share so you'd be using 4GB of RAM plus some small amount of additional memory (assuming that the VMs are similar enough). Obviously in a large environment this saving in RAM could be quite large, meaning a saving in physical RAM needed to be purchased. Its worth noting however, TPS has been turned off in 5.5 update 2 onwards. This is explained in this link. But you can re-enable this if you so wish. Instead, VMs will only share memory pages if they have a matching salt. But how do you tell if TPS is on, and how much RAM is it saving you? To tell if a host has TPS salting enabled (so in the real sense TPS is disabled, unless you've configured the same salt for each VM), first click on the host, then click "configuration" then click "advanced settings", then click on the "mem" node. On the right hand pane look for the setting: mem.shareForceSalting (see VMware KB for more details) if it is set to 0, or it doesn't exist TPS will be enabled. If it it is set to 1 or 2, you'll find that TPS is not enabled unless you have configured a per VM option.
Based on what the VMWare KB article says (link above) within the table of settings:
If you don't do this after applying the update, the default settings are to set the mem.shareForceSalting to 2, which uses the vc.uuidz salt value, which will likely be blank for each VM, so it will generate a random salt for each VM, thus turning off TPS. To see how much RAM you are saving from an SSH console run this command: # esxtop Now press "m" to view the memory. Notice at the top the line called PSHARE/MB, this line shows TPS in action, so if it said: PSHARE/MB: 32030 shared, 3899 common: 28131 saving This would mean that the VMs are using 32030MB of RAM which is basically the same between all the VMs, so to store 32030MB of RAM, ESXi only actually needs to store 3899MB (i.e. the common memory), meaning that 28131MB of RAM is being saved by TPS. |
Automating VMware Memory Changes
I had to make some memory changes on a batch of VMs. I didn't want to have to manually shutdown each VM, alter the memory setting and then restart as this would be very time consuming instead I prepared this script. You first need to create a CSV file, e.g. vmalterlist.csv and place it somewhere on your machine. The format of the file should be as follows, where VMName is the name of the VM (as shown in vCenter) and TargetMemoryGB is the number of GB of RAM you want the VM to be altered to. The script when run will stop each VM, ideally using VMTools, change the memory and start back up the VMs again. Note: If you don't specify to start the VMs explicitly they wont start.
The script takes three arguments at the command line when run from PowerCLI. It requires the vCenter FQDN, the location of the CSV file and if you want the VMs to be powered back up again at the end. It assumes about 5 minutes to allow the VMs to shutdown.
|