Posts Tagged ‘vmware’

vSphere 5.1 – Train Wreck in Slow Motion

vSphere 5.1 arrived this summer to no great fan-fare. We waited a few weeks, heard no sounds of howling pain (we did not listen very hard, I guess), and decided to proceed with upgrading vCenter.  I have been digging out of the wreckage ever since.

How do you know if upgrading to vSphere 5.1 is right for you?  Here are a few bullet points to help you decide:

  • Do you have CA-signed (externally trusted, or in-house Enterprise CA server) certificates in use in your current vSphere environment?
  • Are you using an external MS SQL Server to host your vCenter database?  Are you using mirrored SQL databases?
  • Is your environment currently stable and reliable?

Is you answered “yes” to any of these questions, do not upgrade to vSphere 5.1.  At least, not yet. Do deceive yourself that that the vSphere 5.1.0a release will be any help, either.

What is the big problem, you ask?  The major source of pain in this release is the new “Single Sign-On Service” that handles authentication and authorization for all of the other vSphere components.  This component of vSphere has twitchy SSL certificate requirements that are poorly documented by VMware.  The SSL requirements are so touchy that in our case, even the self-signed certs generated by the installer did not work.  Unlike all of the other current vSphere components, it does not support mirrored SQL databases.  It has new permissions requirements in AD that are not documented at all, and at the time of our installation, did not even have a KB entry.  The installer is very buggy, most notably in that it requests that you set and admin password for the SSO Service, and demands password complexity, but it does not inform you when your password is unacceptably long (i.e. longer than 32 characters) or when your password contains illegal characters (i.e. most regular expression special characters).

So, if you do upgrade, be prepared for an extended service outage.  Give yourself a long service window.  Have your VMware support contract numbers handy.  Familiarize yourself with the myriad of locations that are used to log vCenter data.  Learn to use PowerShell (get-childitem -recurse | select-string -pattern “configSettingThatThevCenterInstallerBorkedUp”) and keep this page bookmarked:

http://derek858.blogspot.com/2012/09/vmware-vcenter-51-installation-part-1.html

Here are UVM we are indebted to Derek Seaman for his thorough documentation of the vSphere 5.1 installation process and detailed SSL certificate generation instructions.

Following are some installation quirks that we encountered, presented mainly for my own reference, but maybe you will find them useful as well:

  1. “Performance Charts Experienced an Internal Error” seen in the vSphere client after the upgrade:
    This happened because vCenter Web Services did not read the database mirroring configuration from our defined ODBC data sources… it grabbed the primary database only, and not the mirror data.  The fix?  Edit:
    “%ProgramData%\VMware\VMware VirtualCenter\vcdb.properties”
    Find the “url=” line, and append:
    ;failoverPartner\=[mirrorServer]
    (Where [mirrorServer] the the actual DB mirror host name.  Don’t forget the “\” before the “=”.)
  2. Some users with permissions to vCenter 5.0 cannot log in after the upgrade.  In the vSphere web client, these users are marked as “disabled”:
    This occurred for use for two reasons:

    1. The SSO Service installer prompts us for a service account to use during install.  Following installation, the service is seen to be running as “SYSTEM”, and not the specified service account.  Change the Service to run with your planned service account using services.msc after the installation.  As an alternative, you could specify those credentials  in the vSphere Web Client -> Administration ->Sign-On and Discovery -> Configuration -> Identity Sources.  Edit your identity source, and under “Authentication Source” select “password”, then enter your service account credentials.
    2. The SSO Service needs to read account attributes that cannot be read by a standard user account (at least, not in an AD forest at a Server 2008 R2 functional level).  When we asked VMware support to define the required permissions, they replied: “an account has to have at least read-only permissions over the user and group Organization Units furthermore read permissions also on the properties of the users, such as UserAccessControl.”  After some experimentation, I just gave the SSO Service account “read all properties” rights to the account OU, and login abilities were restored.
  3. Our SSO Service broke when the mirrored database servers that we currently use for vCenter services had a failover event.  During install, I used the standard “failoverPartner=” JDBC connection string property to specify our failover database server.  Unfortunately, the SSO service ignores this property.  I could not identify an acceptable workaround for this problem. Ultimately, I installed a SQL Express instance on our vCenter server to house just the SSO database.  I tried:
    1. Using SQL Aliases, but this failed because the JDBC driver is not aware of SQL Aliases.
    2. Using a script that edits the local “hosts” file on a database failover event.  I then used this host name alias for the database connections.  This almost worked.  I edited the following files to use the host alias, instead of the actual database server host name:
      %ProgramFiles%\VMware\Infrastructure\SSOServer\webapps\ims\WEB-INF\classes\jndi.properties
      and:
      %ProgramFiles%\VMware\Infrastructure\SSOServer\webapps\lookupservice\WEB-INF\classes\config.properties
      Upon restart, the SSO Service was able to connect to the database, but it did not survive a failover.  Apparently the old database connection information was still in use somewhere, and VMware support was not helpful in identifying all of the database configuration locations for SSO.
    3. While VMware does have command line configuration tools that could have been used to script reconfiguration of the database connection strings, I have deemed that they are too fragile for production use.
  4. The option to authenticate using Windows session credentials in the vSphere Client (traditional version) stopped working after the 5.1 upgrade.  This is a bug that is fixed with the 5.1.0a release.  Unfortunately, the SSO installer for 5.1.0a does not work in upgrade mode.  Aargh!  I had to uninstall the SSO service to get the updated files into place.  Guess what the uninstaller does?  That’s right… it erases the SSO Service database (drops all tables!  Gah!), and deletes all configuration files for the service.  Before you upgrade, make sure that you have an SSO Service backup bundle.  I did, but it was outdated.  I had to re-register all of the vCenter components with SSO manually, which was a pain in the butt.
  5. vSphere Update Manager registered with vCenter using the wrong DNS name.  We could not scan ESXi hosts for updates, because vCenter was telling them to connect to an invalid URL.  To fix, I needed to search the registry for the incorrect host name, and replace with the correct one:
    “HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\VMware, Inc.\VMware Update Manager\VUMServer”
    For good measure I also edited:
    %Program Files(x86)%\VMware\Infrastructure\Update Manager\extension.xml
    To contain the correct host name.  Then we restart the Update Manager services, and we are back in business.
  6. Other fun related to VMware Update Manager… the SQL Account used by Update Manager cannot have a password that exceeds 24 characters in length. Special characters in the SQL Account password also may cause problems.

So, VMware is not my favorite company this month.  On to solve more problems.  We still cannot add new permissions to vCenter, and Performance Charts are loading like a slug in winter.

VMware Performance Charts broken again – fix your connection string.

Following upgrade of our Virtual Center server to the vSphere 5 version, we have been struggling with crashing services and memory exhaustion.  Well, the server got an upgrade from 8Gb to 16Gb of RAM this am, so it is now swimming in excess memory.  Despite this, the Perf Charts in vCenter have gone dead again.

We were seeing errors similar to those described here:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1012812

We got the “”perf charts service experienced an internal error” when looking at the Performance tab for any object in the vSphere Client.  A look at the latest “vctomcat-stderr[...].log” file in “C:\Program Files\VMware\Infrastructure\tomcat\logs” reveals JDBC connection errors.  The damn service is trying to connect to the standby partner of our SQL database mirrored pair!  So where are the connection strings stored for this service.  Well, no thanks to VMware documentation, I discovered the connection string stored here:
C:\ProgramData\VMware\VMware VirtualCenter\vcdb.properties

All I had to do was append “;failoverPartner=[hostName]” to the line starting with “url=”, then restart the tomcat service (the “VMware VirtualCenter Management WebServices” service).  Viola… performance reports are back.

Now back to fixing everything else that is broken… also known as “everything”.

VMware Video Problems on Server 2008 R2

Jumpy mouse and slow console performance on your Server 2008 R2 ESX guests? It is likely that the VMware Tools installer did not activate the VMware video driver. See the following KB:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1011709

The key is to upgrade your current video driver using the one located in:
C:\Program Files\Common Files\VMware\Drivers

Discovering orphaned vmdk files in vSphere

On occasion we have found abandoned vmdk files in our vSphere infrastructure. I often have thought we needed to take some time to hunt down and exterminate these orphans. As is often the case, someone else already did the initial research required to make automation of this task possible, but I fou nd I needed to do some updating of the source scripts for improved accuracy, improved formatting, and compatibility with vSphere 4.1:

# getOrphanVMDK.ps1
# Purpose : List all orphaned vmdk on all datastores in all VC's
# Version : v2.0
# Author  : J. Greg Mackinnon, from original by HJA van Bokhoven
# Change  : v1.1  2009.02.14  DE  angepasst an ESX 3.5, Email versenden und Filegrösse ausgeben
# Change  : v1.2  2011.07.12 EN  Updated for ESX 4, collapsed if loops into single conditional
# Change  : v2.0  2011.07.22 EN: 
	# Changed vmdk search to use the VMware.Vim.VmDiskFileQuery object to improve search accuracy
	# Change vmdk matching logic as a result of VmDiskFileQuery usage
	# Pushed discovered orphans into an array of custom PS objects
	# Simplified logging and email output
			
Set-PSDebug -Strict

#Initialize the VIToolkit:
add-pssnapin VMware.VimAutomation.Core
[Reflection.Assembly]::LoadWithPartialName("VMware.Vim")

#Main

[string]$strVC = "myViServer.mydomain.org"								# Virtual Center Server name
[string]$logfile = "c:\local\temp\getOrphanVMDK.log"
[string]$SMTPServer = "mysmtp.mydomain.org"							# Change to a SMTP server in your environment
[string]$mailfrom = "GetOrphanVMDK@myViServer.mydomain.org"	# Change to email address you want emails to be coming from
[string]$mailto = "vmware@mydomain.org"							# Change to email address you would like to receive emails
[string]$mailreplyto = "vmware@mydomain.org"						# Change to email address you would like to reply emails

[int]$countOrphaned = 0
[int64]$orphanSize = 0

# vmWare Datastore Browser query parameters
# See http://pubs.vmware.com/vi3/sdk/ReferenceGuide/vim.host.DatastoreBrowser.SearchSpec.html
$fileQueryFlags = New-Object VMware.Vim.FileQueryFlags
$fileQueryFlags.FileSize = $true
$fileQueryFlags.FileType = $true
$fileQueryFlags.Modification = $true
$searchSpec = New-Object VMware.Vim.HostDatastoreBrowserSearchSpec
$searchSpec.details = $fileQueryFlags
#The .query property is used to scope the query to only active vmdk files (excluding snaps and change block tracking).
$searchSpec.Query = (New-Object VMware.Vim.VmDiskFileQuery)
#$searchSpec.matchPattern = "*.vmdk" # Alternative VMDK match method.
$searchSpec.sortFoldersFirst = $true

if ([System.IO.File]::Exists($logfile)) {
    Remove-Item $logfile
}

#Time stamp the log file
(Get-Date –f "yyyy-MM-dd HH:mm:ss") + "  Searching Orphaned VMDKs..." | Tee-Object -Variable logdata
$logdata | Out-File -FilePath $logfile -Append
#Connect to vCenter Server
Connect-VIServer $strVC

#Collect array of all VMDK hard disk files in use:
[array]$UsedDisks = Get-View -ViewType VirtualMachine | % {$_.Layout} | % {$_.Disk} | % {$_.DiskFile}
#The following three lines were used before adding the $searchSpec.query property.  We now want to exclude template and snapshot disks from the in-use-disks array.
# [array]$UsedDisks = Get-VM | Get-HardDisk | %{$_.filename}
# $UsedDisks += Get-VM | Get-Snapshot | Get-HardDisk | %{$_.filename}
# $UsedDisks += Get-Template | Get-HardDisk | %{$_.filename}

#Collect array of all Datastores:
#$arrDS is a list of datastores, filtered to exclude ESX local datastores (all of which end with "-local1" in our environment), and our ISO storage datastore.
[array]$allDS = Get-Datastore | select -property name,Id | ? {$_.name -notmatch "-local1"} | ? {$_.name -notmatch "-iso$"} | Sort-Object -Property Name

[array]$orphans = @()
Foreach ($ds in $allDS) {
	"Searching datastore: " + [string]$ds.Name | Tee-Object -Variable logdata
	$logdata | Out-File -FilePath $logfile -Append
	$dsView = Get-View $ds.Id
	$dsBrowser = Get-View $dsView.browser
	$rootPath = "["+$dsView.summary.Name+"]"
	$searchResult = $dsBrowser.SearchDatastoreSubFolders($rootPath, $searchSpec)
	foreach ($folder in $searchResult) {
	    foreach ($fileResult in $folder.File) {
			if ($UsedDisks -notcontains ($folder.FolderPath + $fileResult.Path) -and ($fileResult.Path.length -gt 0)) {
				$countOrphaned++
				IF ($countOrphaned -eq 1) {
					("Orphaned VMDKs Found: ") | Tee-Object -Variable logdata
					$logdata | Out-File -FilePath $logfile -Append
				}
				$orphan = New-Object System.Object
				$orphan | Add-Member -type NoteProperty -name Name -value ($folder.FolderPath + $fileResult.Path)
				$orphan | Add-Member -type NoteProperty -name SizeInGB -value ([Math]::Round($fileResult.FileSize/1gb,2))
				$orphan | Add-Member -type NoteProperty -name LastModified -value ([string]$fileResult.Modification.year + "-" + [string]$fileResult.Modification.month + "-" + [string]$fileResult.Modification.day)
				$orphans += $orphan
				$orphanSize += $fileResult.FileSize
				$orphan | ft -autosize | out-string | Tee-Object -Variable logdata
				$logdata | Out-File -FilePath $logfile -Append
				[string]("Total Size or orphaned files: " + ([Math]::Round($orphanSize/1gb,2)) + " GB") | Tee-Object -Variable logdata
				$logdata | Out-File -FilePath $logfile -Append
				Remove-Variable orphan
			}
		}
	}
}
(Get-Date –f "yyyy-MM-dd HH:mm:ss") + "  Finished (" + $countOrphaned + " Orphaned VMDKs Found.)" | Tee-Object -Variable logdata
$logdata | Out-File -FilePath $logfile -Append

if ($countOrphaned -gt 0) {
	[string]$body = "Orphaned VMDKs Found: `n"
	$body += $orphans | Sort-Object -Property LastModified| ft -AutoSize | out-string
	$body += [string]("Total Size or orphaned files: " + ([Math]::Round($orphanSize/1gb,2)) + "GB")
    $SmtpClient = New-Object system.net.mail.smtpClient
    $SmtpClient.host = $SMTPServer
    $MailMessage = New-Object system.net.mail.mailmessage
    $MailMessage.from = $mailfrom
    $MailMessage.To.add($mailto)
    $MailMessage.replyto = $mailreplyto
    $MailMessage.IsBodyHtml = 0
    $MailMessage.Subject = "Info: VMware orphaned VMDKs"
    $MailMessage.Body = $body
	"Mailing report... " | Tee-Object -Variable logdata
	$logdata | Out-File -FilePath $logfile -Append
    $SmtpClient.Send($MailMessage)
}
Disconnect-VIServer -Confirm:$False

Adding Drivers to the built-in Windows Recovery Environment

Windows 7 and Windows 2008 R2 feature an out-of-box installation of the very useful Windows Recovery Environment (WinRE).  WinRE can save your buttocks… but what if your system is using storage drivers that are not available in the out-of-box WinRE environment?  Such as the VMware Paravirtual SCSI driver (PVSCSI)?

Fortunately, WinRE is just a modified WinPE image, so you can add drivers using DISM.exe, right?  Sure… if you can find the WinPE image that is used by WinRE!  Fortunately, there is a tool for this.

Open a command prompt on your Server 2008 R2 system, and run “REAgentC.exe /info”… the output will tell you where to find the image:

Recovery Environment: \\?\GLOBALROOT\device\harddisk0\partition2\Recovery\bb338b68-0d2c-11df-be64-84e1223bd0bb
BCD Id: bb338b68-0d2c-11df-be64-84e1223bd0bb

So, on the second partition of the first disk (also known as the “C:” drive, according to “Diskpart”), you will find a hidden “recovery” directory, with subdirectory “bb338b68-0d2c-11df-be64-84e1223bd0bb”.  Within here is “winre.wim”.

Now there is simply the matter of injecting the drivers. First, place all the drivers you wish to inject into an easily accessible directory (such as c:\local\temp, in our example), and then run the following commands:

1
2
3
4
5
mkdir c:\wimtemp
dism /mount-wim /WimFile:c:\recovery\bb338b68-0d2c-11df-be64-84e1223bd0bb\winre.wim /index:1 /mountdir:c:\wimtemp
dism /image:c:\wimtemp /add-driver /driver:C:\local\temp /recurse
dism /unmount-wim /mountdir:c:\wimtemp /commit
rmdir /q c:\wimtemp

Et voila! We press “f8″ on next reboot, select the recovery environment, and suddenly we have full access to the local disk.

Fixing The Ever-Crashing VMware Virtual Center Service

Here is a  post on fixing the irritatingly unreliability of our VMware VirtualCenter Service (VPXD).  Here we are running the latest-and-greatest from VMware (vSphere 4.0 Update 1), and our Virtual Center still cannot ride out a SQL database failover.  A tiny loss in connectivity between the VPXD process and it’s remote SQL database, and the service faults.  It does not outright stop, sadly, since we could configure it to auto-restart… it just stops working and never reconnects to the database.

You would think that there would be gripes about this all over the Internet, but it is not so commonly complained about as you might think.  Fortunately I found a lead today:

http://communities.vmware.com/message/1332356

The solution proposed by “embo500″ is to trigger a PowerShell script when an “EventID 1000″ gets registered by the VPXD service.  This is more or less what I though we were going to have to do.  I was hoping there were some data source or virtual center service settings we could throw that would mitigate the problem, but apparently not.

FWIW, here is the code snippet provided by embo in the thread above:

1
2
3
4
5
6
7
8
9
$logentry = Get-EventLog -LogName Application | Where {$_.EventId -eq "1000"} | Where {$_.Source -eq "VMWare VirtualCenter Server"} | Select -First 1
if ($logentry.Message -match "ODBC error")
{
if ($logentry.Message -match "SHUTDOWN is in progress")
{
Start-Sleep -s 30
Start-Service vpxd
}
}

The code provides a good starting place.  However, I think a better approach might be to run this command:

get-viserver

If you get a successful connection result, all is well.  If not, then you need to cycle VPXD.  You have to have the VMware PowerShell modules loaded for this to work.  However, it turns out that none of this scripting will likely be needed as work on our SQL infrastructure has changed the game.

We had an additional problem as well… I just converted from a SQL Server 2005 failover cluster based on MSCS to a SQL Server 2008 mirrored database model.  Unfortunately, I could get the Virtual Center service to respect the failover node specified in the data source selector.  Aargh!  I tried enabling the SQL Server Browser service on the database servers to see if it would help the VC Server make connectivity (and also the firewall ports required for VC to reach the browser service).  This was ineffective, as was disabling the named pipe SQL Client transport, as suggested in other forums.

In the end the problem was resolved by using the SQL Native Client Data Source setup tool to test the datasource during a mirror failover.  The connectivity test failed with a permissions error!  Why?  Well, the VMware Virtual Center requires the use of SQL authentication.  When we set this up on our original SQL 2005 failover cluster, the database account used by the VC Service was mapped to the local database “dbo” user (this is the default config for vCenter).  Guess what?  That does not work in a mirror config.  The SQL logon accout to database account mapping is per server.  Once I set up a separate account in the database for the virtual center account (and assigned it the “dbo” role), the data source started working.

Even better news… we now find that Virtual Center rides out mirrored database failover events.  I was able to swing primary/mirror roles between our data centers several times without Virtual Center even noticing.  So much for the old problems of Microsoft failover clusters.