VMWare

Posts relating to VMWare ESX, ESXi and any other virtualised type infrastructure and the strange and interesting journeys it takes me on!

Can't unmount an NFS datastore from a ESXi host

posted 29 Sep 2017, 03:53 by Tristan Self

PROBLEM

As part of a storage replacement programme, we had migrated all our VMs off the NFS datastore. When attempting to unmount the datastore from the host(s) we where given this error:

Call "HostDatastoreSystem.RemoveDatastore" for object "datastoreSystem-122544" on vCenter Server "vcenterFQDN" failed.
An unknown error has occurred.


Checking the /var/log/vmkernel.log on the host we saw the following (where <datastorename> is the name of the NFS datstore we were trying to remove):

2017-09-29T10:38:05.498Z cpu22:59182 opID=cf1be666)WARNING: NFS: 1921: <datastorename> has open files, cannot be unmounted

Of course first thing was to check all the VMs were actually moved, yes they were. Next checking HA to make sure its not using a handle for its datastore heartbeating; again no this had all been pointed at the new storage, but i temporarily disabled HA on the cluster with no avail. Perhaps it was some of the VMs had snapshots open when they were migrated, again no this doesn't leave file handles open on migration.

SOLUTION

Finally came across these settings on the host where it was still referencing the old storage, so they needed to be updated to point at a folder on the new storage and hey presto I could unmount the storage as expected.
  • Advanced Settings -> ScratchConfig.ConfiguredScratchLocation
(Reboot of host required after change)
  • Advanced Settings -> Syslog.global.logDir
All you need to do is create new log folders on the new storage for these hosts and make the settings changes on each host, after a reboot (for the scratchconfig in particular) you'll be able to dismount the datastore.

Note that I could not find any open handles from any of the hosts to the storage using any of the tricks (shown below), it just seemed that having that value in the storage was enough to stop it dismounting as expected.

http://www.torkwrench.com/2013/02/12/esxi-unable-to-complete-sysinfo-operation-when-unmounting-nfs-datastore/

vMOTION (including Storage vMOTION) Fails Migrating from ESXi 5.5 to ESXi 6.0 U1

posted 25 May 2017, 06:42 by Tristan Self   [ updated 25 May 2017, 06:44 ]

When attempting to migrate a VM from VMware ESXi 5.5. to ESXi 6.0 U1 you may experience problems, this does not affect all VMs only some and they are typically VMs that have been backed up using an external backup tool and make use of CBT (Change Block Tracking). A bug within ESX 5.5 or ESXi 6.0 U1 is present and attempting to storage vMOTION from a 5.5 ESXi host to a 6.0 U1 tickles this bug and causes the vMOTION to fail with an error such as the following:

Cannot migrate to <hostname>
vMOTION migration [GUID of Task] failed to read stream keepalive: Connection closed by remove host, possibly due to timeout
Failed waiting for data. Error 195887167. Connection closed by remote host, possibly due to timeout.

Looking within the vmware.log stored with the VMDKs for the VM on your virtualisation storage you'll see something like:

2017-05-25T11:02:27.345Z| vmx| I120: [msg.svmotion.mirror.thread.remote.disk.setup.fail] Failed to set up disks on the destination host.
2017-05-25T11:02:27.345Z| vmx| I120: [msg.svmotion.fail.platform] Failed to copy one or more of the virtual machine's disks. See the virtual machine's log for more details.

To resolve you can run the following script (below) - This script has been taken from this thread: https://communities.vmware.com/thread/507191, all credit to AFIARoth who cam up with it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
cls 
#Import VMware PowerCLI Snapin 
if ( (Get-PSSnapin -Name VMware.VimAutomation.Core -ErrorAction SilentlyContinue) -eq $null 
 Add-PsSnapin VMware.VimAutomation.Core 
$vCenterServer = "Vcenter-server-name" 
$VMName ="VMName" 
$mute = Connect-VIServer $vCenterServer -WarningAction SilentlyContinue 
$VMs = get-vm -Name "$VMName" 
#Create a VM Specification to apply with the desired setting: 
$spec = New-Object VMware.Vim.VirtualMachineConfigSpec  
$spec.ChangeTrackingEnabled = $false 
#Apply the specification to each VM, then create and remove a snapshot: 
foreach($vm in $VMs){  
    $vm.ExtensionData.ReconfigVM($spec)  
    $snap=$vm | New-Snapshot -Name 'Disable CBT'  
    $snap | Remove-Snapshot -confirm:$false 
Disconnect-VIServer * -Confirm:$false

The process for allowing you to migrate the problem is as shown below. The script sets the CBT disable flag, then creates a snapshot and removes it immediately afterwards to force an update of the VMX file, when run in enable mode it sets the CBT disable flag and then creates a snapshot and removes it immediately afterwards to force an update of the VMX file.

  1. Run the script above in disable mode to turn off CBT.
  2. vMOTION the VM, you should find this works as expected now.
  3. Run the script above in enable mode to turn on CBT.

Ping from a specific VMKernel Port to test Connectivity

posted 28 Nov 2016, 01:37 by Tristan Self

If you need to test connectivity between two VMKernel ports VMK's you can use the following command:

# vmkping -I vmk2 <Destination IP>

If you have seperate IP stacks for different types of VMkernel ports you need to add this stack name to the command too:

# vmkping -I vmk2 -S <IP Stack Name> <Destination IP>

HP BL490c G7 Blade within a c7000 BladeSystem Chassis with Flex-10 Networking Fails after ESXi 5.5 Upgrade

posted 22 Aug 2016, 12:05 by Tristan Self

After an upgrade to ESXi 5.5 from ESXi 5.1, all the networking on the host stopped working, as I had already upgraded other BL490c G7 blades with (seemingly) identical hardware but upon upgrading this one the NICS were detected

but there was no connectivity through the chassis, the flex-10 to the host. All checks on the BladeSystem controllers and the ESXi OS seemed to show all was working fine and connected, there were no errors on the ports within the

BladeSystem.

 So I attempted the following:

https://communities.vmware.com/thread/463557?tstart=0

esxcli system module set --enabled=false --module=elxnet

esxcli system module set --enabled=true --module=be2net

Reboot host, after a reboot the networking was operational, it appears this is a driver issue within VMware.

To revert the changes:

esxcli system module set --enabled=false --module=be2net

esxcli system module set --enabled=true --module=elxnet

 Reboot host, networking went back to the failed state again. However this driver was working on (seemingly) identical hardware fine. So I checked the firmware version on a working and not working host:

On Host2 (an identical machine to the faulty Host1):

# esxcli network nic get -n vmnic0

   Advertised Auto Negotiation: true

   Advertised Link Modes: 10000baseT/Full

   Auto Negotiation: false

   Cable Type:

   Current Message Level: -1

   Driver Info:

         Bus Info: 0000:02:00:0

         Driver: elxnet

         Firmware Version: 4.2.401.6

         Version: 10.5.121.7

   Link Detected: true

   Link Status: Up

   Name: vmnic0

   PHYAddress: 0

   Pause Autonegotiate: false

   Pause RX: true

   Pause TX: true

   Supported Ports:

   Supports Auto Negotiation: false

   Supports Pause: true

   Supports Wakeon: true

   Transceiver:

   Wakeon: MagicPacket(tm)

 

On Host2 we get:

# esxcli network nic get -n vmnic0

   Advertised Auto Negotiation: true

   Advertised Link Modes: 10000baseT/Full

   Auto Negotiation: false

   Cable Type:

   Current Message Level: -1

   Driver Info:

         Bus Info: 0000:02:00:0

         Driver: elxnet

         Firmware Version: 3.102.517.703

         Version: 10.5.121.7

   Link Detected: true

   Link Status: Up

   Name: vmnic0

   PHYAddress: 0

   Pause Autonegotiate: false

   Pause RX: true

   Pause TX: true

   Supported Ports:

   Supports Auto Negotiation: false

   Supports Pause: true

   Supports Wakeon: true

   Transceiver:

   Wakeon: MagicPacket(tm)

Notice the firmware on the card if different, this should be upgraded to the same version, that way it resolves the issue.

Upgraded the NIC firmware, rebooted the host afterwards the networking operated as expected.

http://h20564.www2.hpe.com/hpsc/swd/public/readIndex?sp4ts.oid=4194638&swLangOid=8&swEnvOid=54

When trying to deploy an OVF using the web interface getting error: Client Integration Plug-in must be installed to enable OVF functionality

posted 19 May 2016, 01:53 by Tristan Self

Problem: When trying to deploy an OVF using the web interface, you getting this error message below. Even when you have installed the client integration plugins it still doesn't work.

"Client Integration Plug-in must be installed to enable OVF functionality"

Solution:

1. Uninstall any/all existing Vmware client integration plugins from: Control Panel->Add/Remove Programs

2. Restart your computer.

3. Open Internet Explorer or Chrome and install the plugin from web client either from the login page or from the OVF menu when prompted.

4. Once installed close all browser windows and restart your computer.

5. The plugin within Internet explorer should work fine now, when you try it. If you are prompted to allow the plugin, click accept and ensure it doesn't prompt again.

6. To get the plugin to work in Google Chrome (version 45+), start Chrome and get to the following menu:

Settings Menu (Three Lines, top right) -> Privacy -> Content settings... -> Plugins -> Manage Exceptions

Add the URLs of the vCenter Servers and set them to "allow":

https://hostname.domain.com:9443
https://hostname.domain.com:9443
https://hostname.domain.com
https://hostname.domain.com

Click the "Done" button to save the settings, and then close and re-open Chome and attempt to deploy a plugin, you should find it works now, if you are prompted to allow the plugin to run click "Allow" and set it to remember the setting.

Currently I've not found a fix for making the Client Integration Plugin work in Firefox yet.

VM won't vMotion on Nutanix Cluster - Error: msg.vmk.status.VMK_BUSY

posted 19 Apr 2016, 11:05 by Tristan Self


Had an issue where a VM would not migrate to another node within the Nutanix cluster, getting this error message after about 21%.

Migration to host <172.16.0.10> failed with error msg.vmk.status.VMK_BUSY (195887108).

vMotion migration [-1407287005:1461064336904335] remotecall returned Busy

vMotion migration [-1407287005:1461064336904335] failed: remote host <172.16.0.10>> failed with status Busy.

File is being locked by a consumer on host NTNX-14SM6111111-A with exclusive lock.

vMotion migration [-1407287005:1461064336904335] failed to initialize a valid destination swapfile: Busy.

Failed waiting for data. Error 195887108. Busy.


To resolve this I used: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1031636 even though this is for version 4.x this worked for VMware 5.5.

On the host under advanced settings change this setting:

Migrate.VMotionResolveswaptype = 1

Change the value to 0 on the problem host and the host you are migrating to, you can set it back to 1 again once the migration has completed.


Can't join VMware 5.5 VCSA to Active Directory - Error: Invalid Active Directory Domain

posted 21 Jan 2016, 04:11 by Tristan Self

When attempting to add a VMWare vCenter Server Appliance to our Active Directory I encountered this error message: "Error: Invalid Active Directory Domain".

Within the /var/log/vmware/vpx/vpxd_cfg.log log file I was seeing the following:

START locking... /usr/sbin/vpxd_servicecfg ad write
2016-01-21 09:31:32 18457: [18454]BEGIN execution of: /usr/sbin/vpxd_servicecfg 'ad' 'write' 'username@domain.com' CENSORED 'DOMAIN.COM'
2016-01-21 09:31:32 18457: Testing domain (DOMAIN.COM)
2016-01-21 09:31:32 18457: ERROR: Failed to ping: 'DOMAIN.COM'
2016-01-21 09:31:32 18457: VC_CFG_RESULT=301
2016-01-21 09:31:32 18457: END execution

It would seem that the root record for the domain is missing, this should resolve to a domain controller, so it has something to bind to.

So within the /etc/hosts I added a record for domain.com that pointed to the IP address of one of our domain controllers.

172.17.5.10    domain.com    domain

Then tried again now with success:

2016-01-21 09:45:05 23284: START locking... /usr/sbin/vpxd_servicecfg ad write
2016-01-21 09:45:05 23287: [23284]BEGIN execution of: /usr/sbin/vpxd_servicecfg 'ad' 'write' 'username' CENSORED 'DOMAIN.COM'
2016-01-21 09:45:05 23287: Testing domain (DOMAIN.COM)
2016-01-21 09:45:05 23287: Enabling active directory: 'DOMAIN.COM' 'username'
2016-01-21 09:45:11 23287: VC_CFG_RESULT=0
2016-01-21 09:45:11 23287: END execution

Reboot the VCSA.

Then login as the administrator@vsphere.local user account and then configure the VCSA with the active directory identity source.

Administration->Single Sign-On->Configuration->Identity Sources

Then add the Active Directory identity source.

You should then be able to remove this line from the hosts file and the AD authentication should continue to work OK. Once you have added permissions to the various AD groups or users you want to have access.



Find the file and folder location of a VMWare template

posted 5 Jan 2016, 08:15 by Tristan Self   [ updated 5 Jan 2016, 08:16 ]

Seems on the face of it to be something that you can find in the GUI, but its not quite that easy!


You can find it out by running the following, this will provide you with the name of the template, the host it is currently on and where the vmtx file (i.e. the template) is located.

From a VMWare PowerCLI prompt run the following to find all the templates in the cluster called "Cluster1":

# Connect-VIServer <FQDN of vCenter>

# foreach($vmhost in get-cluster 'Cluster1' | get-vmhost){get-template -Location $vmhost | select name,@{n='VMHOST';e={$vmhost.name}},@{n='VMTX';e={$_.extensiondata.config.files.VmPathName}} | Format-Table -wrap }

If you need to migrate the template to a new host (but don't want to do the deploy, then recreate template approach, or put a host into maintenance mode) then you can find the location, remove it from the inventory and then re-add the template from its file location on a new host.


VMware Transparent Page Sharing - Is it on? How much RAM is it saving?

posted 2 Jan 2016, 11:30 by Tristan Self

VMWare TPS (Transparent Page Sharing) is a mechanism within the hypervisor to allow the saving of RAM by allowing virtual machines where their memory matches to only store a copy of the memory used once. I.e. If you had 5 Windows 2012 R2 virtual machines running, they are going to be almost completely identical, so if each VM was assigned 4GB of RAM, rather than it using 20GB of RAM, it would transparent page share so you'd be using 4GB of RAM plus some small amount of additional memory (assuming that the VMs are similar enough). Obviously in a large environment this saving in RAM could be quite large, meaning a saving in physical RAM needed to be purchased.

Its worth noting however, TPS has been turned off in 5.5 update 2 onwards. This is explained in this link. But you can re-enable this if you so wish. Instead, VMs will only share memory pages if they have a matching salt.

But how do you tell if TPS is on, and how much RAM is it saving you?

To tell if a host has TPS salting enabled (so in the real sense TPS is disabled, unless you've configured the same salt for each VM), first click on the host, then click "configuration" then click "advanced settings", then click on the "mem" node. On the right hand pane look for the setting: mem.shareForceSalting (see VMware KB for more details) if it is set to 0, or it doesn't exist TPS will be enabled. If it it is set to 1 or 2, you'll find that TPS is not enabled unless you have configured a per VM option.

  • To allow inter-VM TPS for all virtual machines after installing 5.5 update 2 or above, you need to set mem.shareForceSalting to 0 for all the ESXi hosts.

Based on what the VMWare KB article says (link above) within the table of settings:

  • To allow inter-VM TPS for select virtual machines (that are safe to share memory) set mem.shareForceSalting to 1 for all the ESXi hosts then on each VM set the sched.mem.pshare.salt setting on each VM to a common value.
If you don't do this after applying the update, the default settings are to set the mem.shareForceSalting to 2, which uses the vc.uuidz salt value, which will likely be blank for each VM, so it will generate a random salt for each VM, thus turning off TPS.

To see how much RAM you are saving from an SSH console run this command:

# esxtop

Now press "m" to view the memory.

Notice at the top the line called PSHARE/MB, this line shows TPS in action, so if it said:

PSHARE/MB:    32030 shared, 3899 common: 28131 saving

This would mean that the VMs are using 32030MB of RAM which is basically the same between all the VMs, so to store 32030MB of RAM, ESXi only actually needs to store 3899MB (i.e. the common memory), meaning that 28131MB of RAM is being saved by TPS.



Automating VMware Memory Changes

posted 2 Dec 2015, 08:45 by Tristan Self

I had to make some memory changes on a batch of VMs. I didn't want to have to manually shutdown each VM, alter the memory setting and then restart as this would be very time consuming instead I prepared this script. 

You first need to create a CSV file, e.g. vmalterlist.csv and place it somewhere on your machine. The format of the file should be as follows, where VMName is the name of the VM (as shown in vCenter) and TargetMemoryGB is the number of GB of RAM you want the VM to be altered to. The script when run will stop each VM, ideally using VMTools, change the memory and start back up the VMs again. Note: If you don't specify to start the VMs explicitly they wont start.

VMName,TargetMemoryGB
wintest1,2
ubuntutest1,1

The script takes three arguments at the command line when run from PowerCLI. It requires the vCenter FQDN, the location of the CSV file and if you want the VMs to be powered back up again at the end. It assumes about 5 minutes to allow the VMs to shutdown.

<#

.SYNOPSIS
This script will connect to vCenter, using the list specified shutdown each VM, alter the memory settings as per the CSV file and then start the VM again.

.DESCRIPTION
The script makes use of a connection to the vCenter server to shut down each VM in turn, alter its memory and restart it.
If a VM does not have VMware Tools installed or running, the VM will be force powered off.

.EXAMPLE
./shutVMaltermem.ps1 -VIserver <serverFQDN>  -FileNamePath <filenameandpath> -PowerOnAtEnd [Yes|No]
./shutVMaltermem.ps1 -VIserver vcenter.domain.com -FileNamePath c:\vmlist.csv -PowerOnAtEnd Yes

.NOTES
If you run the script without any arguments it will connect with the default parameters. VMs will not be started unless specified.
The CSV file should have the first line as: VMName,TargetMemoryGB, then each VM should be specified by its name followed 
by the amount of memory in GB the memory should be reduced to.

#>

param (
    [string]$VIserver = "vcenter.domain.com",
    [string]$filenamepath = "c:\vmalterlist.csv",
    [string]$poweronatend = "No"
)

Add-PSSnapin VMware.VimAutomation.Core -ErrorAction SilentlyContinue

# Connect to vcenter server
Write-Host "Connecting to vCenter......"  
connect-viserver $VIserver

Write-Host
Write-Host "Script is processing these VMs:"
Write-Host

# Import vm name from csv file  
Import-Csv $filenamepath |  
    foreach {
        $strVMName = $_.VMName
        $strTargetMemoryGB = $_.TargetMemoryGB
        
        # Get a view for the current VM
        $vm = Get-View -ViewType VirtualMachine -Filter @{"Name" = $strVMName}
        
        # If VM is powered on check VMTools status, otherwise move on.
        if ($vm.Runtime.PowerState -eq "PoweredOff") {
                # Write to the screen the current power state
                Write-Host $strVMName "=" $vm.Runtime.PowerState " - VM powered off, nothing to do!"
        }
        
        if ($vm.Runtime.PowerState -ne "PoweredOff") {  
            if ($vm.config.Tools.ToolsVersion -ne 0) {  
                # Write to the screen the current power state
                Write-Host $strVMName "=" $vm.Runtime.PowerState " - VMware tools installed, graceful shutdown attempted."
                
                # Graceful shutdown the VM
                Shutdown-VMGuest $strVMName -Confirm:$false  
                          
            } else {  
                # Write to the screen the current power state
                Write-Host $strVMName "=" $vm.Runtime.PowerState " - VMware tools not installed, forced shutdown attempted."
            
                # Force shutdown the VM 
                Stop-VM $strVMName -Confirm:$false
                
            }  
        }   
    }
    
Write-Host
Write-Host
Write-Host -NoNewLine "Waiting 5 minutes for all VMs to shutdown, before altering memory: "

for ($a=0; $a -le 4; $a++) {
  Write-Host -NoNewLine "`r#"
  Sleep 60
}
Write-Host -NoNewLine " Done!"

Write-Host
Write-Host "Altering memory on the following VMs:"
Write-Host

# Import vm name and new memory size from CSV file to perform the changes to the memory configuration on each VM.
Import-Csv $filenamepath | 
    foreach {
        $strVMName = $_.VMName
        $strTargetMemoryGB = $_.TargetMemoryGB   
              
        Write-Host $strVMName " - Memory set to:" $strTargetMemoryGB
        
        # Set the memory to the new value, and force to confirm its change.
        Set-VM -VM $strVMName -MemoryGB $strTargetMemoryGB -Confirm:$false
    }
    
# Import vm name from CSV file, to restart all the VMs again, it will give you the option to confirm if you want to start a particular VM.
Import-Csv $filenamepath | 
    foreach {
        $strVMName = $_.VMName             
               
        # Attempt to start the VMs, if the user has requested that.
        if ($poweronatend -eq "Yes") {
            Write-Host "Starting VM: "$strVMName
            Start-VM -VM $strVMName -Confirm:$false -RunAsync
            
        }
        
    }
    
if ($poweronatend -ne "Yes") {
    # Write to the screen that no VMs will be started.
    Write-Host "Not starting any VMs at request of the user."
}

# Disconnect from vCenter.
disconnect-viserver $VIserver -Confirm:$false 

Write-Host "Script complete, disconnected from vCenter."



1-10 of 25