Archive for the ‘Sharepoint’ Category

Coping with Renamed user Accounts in sharepoint

Yesterday I received a strange error report from a person trying to create a new SharePoint site collection.  Our front line guy went to investigate and found that she was getting a “User cannot be found” error out of SharePoint when attempting to complete the self-service site creation process.  This person reported that her last name changed recently, along with her user ID, yet SharePoint will still showing her as logged in under her old name.

Linking the “Correlation ID” up to the diagnostic logs was of no great help.  The diagnostic logs simply reported “User cannot be found” when executing the method “Microsoft.SharePoint.SPSite.SelfServiceCreateSite”.  We are able to see that “ownerLogin”, “ownerEmail”, and “ownerName” strings were being passed to this function, but not what the values of those strings were.  I guessed that the web form was passing the person’s old account login name to the function, and that since this data was no longer valid, an error was getting displayed.  But how to fix this?

SharePoint 2010 (and WSS 3.0 before it) keeps a list of Site Users that can be accessed using the SharePoint Web “SiteUsers” property. This list is updated every time a new user logs in to the site.  The list entries contain username, login identity, email address, and security ID (SID) data.  It also appears that Site User data is not updated when user data changes in Active Directory (as long as the SID stays the same, that is).  Additional user account data is stored in XML data in the SharePoint databases, and can be accessed using the SharePoint Web “SiteUserInfoList” property.  All of this data needs to be purged from the root web site so that our hapless user can once again pass valid data to the SelfServiceCreateSite method.

Presumably the Site Management tools could be forced to get the job done, but the default views under SharePoint 2010 are hiding all site users from me, even when I log in as a site administrator.  Let’s try PowerShell instead:

add-pssnapin microsoft.sharepoint.powershell 
$root = get-spweb -identity "https://sharepoint.uvm.edu/" 

# "Old ID" below should be all or part of the user's original login name: 
$oldAcc = $root.SiteUsers | ? {$_.userLogin -match "oldID"} 
#Let's see if we found something: 
$oldAcc.LoginName 

#Remove the user from the web's SiteUsers list: 
$root.SiteUsers.Remove($oldAcc.LoginName) 
$root.Update() 
#Let's see if it worked: 
$id = $oldAcc.ID 
$root = get-spweb -identity "https://sharepoint.uvm.edu/" 
$root.SiteUsers.GetByID($id) 
# (This should return a "User cannot be found" error.) 

#Now to see what is in SiteUserInfoList: 
$root.SiteUserInfoList.GetItemById($id) 
# (This data can be cleaned up in the browser by visiting:
# " /_catalogs/users/simple.aspx" 
# from your site collection page.)

SharePoint 2010 – Email Alerts to Site Administrators

We are in the final stages of preparation for the long-overdue upgrade to SharePoint 2010.  I have set up a preview site with a copy of the production SharePoint content database, and I want to notify all site owners that they should check out their sites for major problems.  How to do?  PowerShell?  Absolutely!


Set-PSDebug -Strict
Add-PSSnapin -Name microsoft.SharePoint.PowerShell

[string] $waUrl = "https://sharepoint2010.uvm.edu"
[string] $SmtpServer = "smtp.uvm.edu"
[string] $From = "saa-ad@uvm.edu"

$allAdmins = @()

[string] $subjTemplate = 'Pending Upgrade for your site -siteURL-'
[string] $bodyTemplate = @"
Message Body Goes Here.
Use the string -siteURL- in the body where you want the user's site address to appear.
"@

$wa = Get-SPWebApplication -Identity $waUrl

foreach ($site in $wa.sites) {
	#Write-Host "Working with site: " + $site.url
	$siteAdmins = @()
	$siteAdmins = $site.RootWeb.SiteAdministrators
	ForEach ($admin in $siteAdmins) {
		#Write-Host "Adding Admin: " + $admin.UserLogin
		[string]$a = $($admin.UserLogin).Replace("CAMPUS\","")
		[string]$a = $a.replace(".adm","")
		[string]$a = $a.replace("-admin","")
		[string]$a = $a.replace("admin-","")
		if ($a -notmatch "sa_|\\system") { $allAdmins += , @($a; [string]$site.Url) }
	}
	$site.Dispose()
}

$allAdmins = $allAdmins | Sort-Object -Unique
#$allAdmins = $allAdmins | ? {$_[0] -match "jgm"} | Select-Object -Last 4

foreach ($admin in $allAdmins) {
	[string] $to = $admin[0] + "@uvm.edu"
	[string] $siteUrl = $admin[1]
	[string] $subj = $subjTemplate.Replace("-siteURL-",$siteUrl)
	[string] $body = $bodyTemplate.Replace("-siteURL-",$siteUrl)
	Send-MailMessage -To $to -From $From -SmtpServer $SmtpServer -Subject $subj -BodyAsHtml $body
}

SharePoint 2010 – Authentication and Browser Support Planning

We have gone back to the drawing board with SharePoint 2010 planning, and now are challenging some ideas about how authentication must be configured for SharePoint to work with our clients.  Previously, we felt the need to provide multiple supported authenticaiton types (Windows, Basic, and Forms) hosted on different IIS web sites, with unique URL’s (sharepoint, sharepointlite, partnerpoint).  We also felt the need to have a version of sharepoint with “Client Integration” features enabled, and one with these features disabled (sharepointlite).  With changes in SharePoint 2010, is this really necessary?

First, let’s look at the Windows vs. Basic assumption.  Are there any browsers out there that do not support Windows/NTLM authentication?  In fact, there are… the Android mobile browser and, um…, well, there do not appear to be any others.  However, it is not necessary to make a separate web site available to allow basic authentication.  If we enable basic auth on an exsiting Windows-auth web site, browsers that do not support Windows auth suddenly start working. (Update, 2013-02-36:  Chrome for Android now supports NTLM authentication, so there appears to be no need to support Basic authentication at all at this point in time.)

Now, let’s look at the client integrated vs. dumbed-down assumption.  Previously, we wanted to ensure the clients handled links to MS Office documents stored in SharePoint in a predictable fashion.  Users of Firefox and Safari frequently complained about “strange” document handling behavior in SharePoint links.  For people experiencing confusion caused by failed attempts to launch Office applications from SharePoint browser links, we exposed a version of SharePoint that had the client integration features disabled.  HOWEVER, under SharePoint 2010, the client integration features are much more reliable.  On Windows, I am able to make use of Office 2010 client integration in both IE9 and Firefox.  On the Mac, client integration works with Office 2011 Mac and Safari 5 or FireFox 8.  Additionally, SharePoint will detect if a browser cannot support client integration, and will disable Office inegration links automatically.  For example, the “Open in Word” link is greyed-out in my Chrome browser, while the “Download file” link is active.

Add to this the new mobile web version of SharePoint 2010.  If you connect from a browser listed in the “compat.browser” file on the SharePoint web server, you get directed to a light-weight mobile version of SharePoint instead.  See: http://blogs.technet.com/b/office2010/archive/2010/03/09/configure-sharepoint-server-2010-for-mobile-device-access.aspx for more details on how “compat.browser” works.  This version will use Office Web Apps to render Office documents, but editing will not be possible.  This grants mobile users a functional (if somewhat limited) access method for SharePoint content, while at the same time sidestepping the issue of client integration.  It also means that we do not have to deal with the hassle of attempting to ensure that Office Web Apps will work in a myriad of underpowered mobile browsers.

All things considered, things are looking bright for SharePoint 2010.  It seems we no longer will need a “lite” version of SharePoint, and we will not need two URLs to support the varying authentication needs of legacy browsers.

NTLMv2 – Troubleshooting Notes – NTLM Negotiation is a lie!

We are now in the process, several years late, of trying to disallow use LM and NTLM authentication on our campus. First attempt? Train Wreck!

For stage 1 of enforcement we chose to restrict our domain controllers to accept only NTLMv2 authentication, and reject LM/NTLM. Our thinking was that clients are supposed to negotiage for the the best auth security available, so by forcing NTLMv2 at the DC, we would be able to quickly flush out non-compliant clients. Futher, we assumed that there would not be a lot of non-compliant clients. Guess what? We were WRONG WRONG WRONG

You know which clients are non-compliant? Windows 7, IE8, IE8, FireFox 4b10, Chrome 9. What? Really? It’s true… when accessing IIS web servers configured to use Windows authentication, with NTLMv2 enforced on the authenticating server, even Windows 7 will negotiate down to the weakest possible NTLM level.   So… access to SharePoint 2007 from IE 9 beta on Win 7 falls back to welcome-to-the-80s LM authentication.  This is very shocking, and annoying. When inspecting authentication negotiation traffic to the web server, we see that both client and server have indicated that NTLMv2 is supported, and that LM should not be used. However, when the client sends its authentication challenge response, it sends NTLM and LM responses only and not NTLMv2. Only by setting local security policy on the Windows client to “NTLMv2 only” are we able to prevent the sending of LM/NTLM. Negotiation is a lie!

So, we are going to take this us with MS Support. Some resources that may be helpful going forward:

http://technet.microsoft.com/en-us/magazine/2006.08.securitywatch.aspx
A good explanation of how the “LMCompatibilityLevel” security setting works.  Contains one major factual error… it claims that clients and servers will always negotiate the highest mutually-support LM auth level.  This may be true for SMB/CIFS.  It certainly is not true for web auth.

http://kb.iu.edu/data/bamu.html
Over at Indiana University, LM/NTLM has been disabled for some time.  They have thorough documentation on configuring workstations to function properly in their more restrictive environment.  This page concerns the use of SharePoint at IU, and the required configuration settings for various browsers and operating systems for their network.  There is nothing specific to our problem here, but there is the implication that they experienced problems similar to our own.  All of their docs state that you must configure your client to perform NTLMv2 only, or authentication will fail.  Also included are notes on FireFox and Mac platforms, where client-side configuration apparently is required.
http://kb.iu.edu/data/atpq.html
An additional page that suggests that network clients will not be able to authenticate to IU resources unless they disabled NTLMv2 on the client.

Concerning FireFox on Windows:
http://forums.mozillazine.org/viewtopic.php?f=23&t=1648755
A developer notes that as of FF 3.6, FF uses Microsoft APIs for NTLM authentication.  So, if FF is not performing NTLM auth as you expect, you need to look to your OS for the source of the problem.  It’s true… we confirmed via packet capture that FF 4.0b10 and IE9RC both perform NTLM authentication in exactly the same way.

Concerning Macintosh client settings:
http://discussions.apple.com/thread.jspa?threadID=2369451
Recommended smb.conf file settings to enforce use of NTLMv2 on the Mac.  References are made here to an “nsmb.conf” file, which is not a typographic error.  More on this file here:
http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man5/nsmb.conf.5.html

Office Web Apps and SharePoint 2010

Office web apps are here!

Where do I get them? From Microsoft Volume Licensing of course:
http://licensing.microsoft.com
You need to look in an unexpected place, though. Under the “Office” section, select “Office Professional Plus 2010″, then select the 32-bit edition (not 64-bit… weird!). Take note of the Web Apps product key in the product selection page.

Installation instructions can be found here:
http://technet.microsoft.com/en-us/library/ff431687.aspx#
And they are pretty good, with the exception of one PowerShell syntax error…

When you get to the stage of installing Service Application Proxies, if you choose to use the PowerShell method, you are given a small script that starts with the following command:

$appPool = Get-SPServiceApplicationPool -Name "SharePoint Web Services Default"

This command will not work as posted. “-Name” should be “-Identity”. A more reliable command might read:

$appPool = Get-SPServiceApplicationPool | where-object {$_.Name -match "SharePoint Web Services Default"}

Preparing the SharePoint 2007/WSS3 Database for Upgrade

  1. Run pre-upgrade check:
    stsadm –o preupgradecheck
    (to be run on the WSS 3 server… requires WSS 3.0 SP2 and October 2008 CU).
    Now, repair any problems that were reported…
  2. Delete orphaned sites:
    1. run: stsadm -o enumallwebs
      – find sites with “InSiteMap” set to “false”.
    2. take note of the SiteID GUID, then run:
      stsadm -o deletesite -force -siteid [siteid] -databaseserver [dbserver] -databasename [dbname]
    3. Delete references to missing web parts:
  3. The upgrade check report tells us:
    “ The following web part(s) are referenced by the content, but they are not installed on the web server Id = dcdbbbd0-8dd6-1ecb-a3b2-12d30061d482, Type = Unknown, Reference = 7, Status = Missing …”
    but there is no particular advice on how to fix the problem. I wasted a few hours trying to enumerate web parts in use by a site using PowerShell, and ended up digging around in the Content database using TransactSQL… here is what works:

    1. Run the following against the content database:
      use [contentDatabaseName]
      select DirName,LeafName from dbo.AllDocs where id in
      (select tp_PageUrlID from dbo.WebParts where
      (tp_WebPartTypeID='dcdbbbd0-8dd6-1ecb-a3b2-12d30061d482')
      OR (tp_WebPartTypeID='d5101cfe-e315-c578-cd06-1966f283e3ed')
      OR (tp_WebPartTypeID='602e7431-ac3e-75b9-c8e0-57533bdab161'))
    2. Access the page returned by the above query, appending “?Contents=1″.  Delete any web parts reporting errors.
  4. We are informed that various features are referenced by content, but that the features are not available on the server.  We are given only feature IDs.  Back to SQL:
    • use STSContentDBPrime
      select FullUrl from [Webs] where Id in (
      select WebId from [Features] where (FeatureId = 'bbe9def7-2fe9-a0b1-d712-aa128c837ebe')
      OR (FeatureId='bbe9def7-2fe9-a0b1-d7bb-aa128c837ebe')
      )
    • This returns the sites that are using these “bad features”.  But what to do about it?
    • http://www.gilham.org/Blog/Lists/Posts/Post.aspx?ID=229 suggests the use of “WSSAnalyzeFeatures” and “WSSRemoveFeatureFromSite“, both of which work to purge the evil old CKS:Enhanced Blog edition feature from the one site that was using it… sheesh!
  5. We are warned that “The following site definition(s) are referenced by the content, but they are not installed on the web server”. The name of the missing site definition is “Unknown”. We are given only a “template id” (11003) and a Language Code (1033). Finding where this definition is being used can be accomplished with a very small sql query:
    use SharePoint_Content_1 
    SELECT Title, MasterURL, WebTemplate FROM dbo.Webs where WebTemplate='11003'
    
  6. We are warned that “setup files” are referenced in the database that are not present on the server. We are given the names and paths of the files that are missing, but not the web sites that reference them. The offending site can be tracked down quickly using “stsadm”. Run this command:
    stsadm -o enumAllWebs -includesetupfiles > allWebs.txt
    Then search the resultant file for one or more of the filenames listed in the pre-upgrade check report. In this case, I was able to locate a site that used “SharePoint Learning Kit” components. This site was completely broken, since we removed the Learning Kit over a year ago. I got approval from the site owner, and deleted the offending site.
  7. We are warned that the feature “f374a3ca-f4a7-11db-827c-8dd056d89593″ is referenced by over 500 sites, but that the feature is no longer present on the server.  This feature is the “RadEditor for MOSS for IE”, which in fact has been phased out.  Lets use a handy “for” loop.
    1. First we export the SQL query we used above as a CSV file (this time searching for the “f374a…” feature instead).
    2. We use that csv as fuel for our “for” loop:
      for /f %i in (H:\BadFeatureSites_2010.csv) do WssRemoveFeatureFromSite.exe -scope site -url https://sharepoint.uvm.edu/%i -featureid f374a3ca-f4a7-11db-827c-8dd056d89593 -force
    3. Oh but look… WssRemoveFeatureFromSite can’t deal with sites that have spaces in the URL (why would you have a space in your site name, right?).  dang!  Let’s try “SharePoint Feature Administration and Clean Up Tool“… it works!
  8. We are told that we are missing some additional template files such as:
    Path = [C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\1033\MPS\..\sts\lists\wplib\dwp\MSContentEditor.dwp], Reference = [2], Status = [Missing]

    • Some searching discovered KB839877, which informs us that an earlier version of the “Workspace Meeting” template contained an accidental double period when a single period was intended.  But how do we find the site that is using this bad template?
    • The excellent tool “SharePoint Manager” from Carsten Keutman allows us to explore the SharePoint object model to more quickly find Site Template data that we need in a site.
    • The really really excellent tool “PowerGUI Script Editor” was used to quickly develop this script:
      # Discovers Meeting Workspaces from all "Webs" in the SharePoint web application defined in the "webAppUrl" variable.
      # Outputs discovered data to file "sitelist.csv
      # Greg Mackinnon, 2010-02-05
      [System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SharePoint") | Where-Object { $_.GetType().FullName -ne "System.Reflection.Assembly" }
      $webAppUrl = "https://sharepoint.uvm.edu"
      $wa = [Microsoft.Sharepoint.Administration.SPWebApplication]::Lookup($webAppURL)
      
      #[string]$out = "List of sites using the Meeting Workspace template"
      Remove-Item -Path .\sitelist.csv -Force
      
      foreach ($site in $wa.sites) {
      foreach ($web in $site.AllWebs) {
      Write-Output "Checking web: " + $web.Url
      if ($web.WebTemplate -match "MPS") {
      Out-File .\sitelist.csv -InputObject $web.ID.Guid -NoClobber -Append -Encoding Unicode
      }
      }
      $web.Dispose()
      }
      $site.Dispose()
      #Out-File .\sitelist.csv $out -Force
    • The script returns a list of “web” object GUIDs.  We feed this into SQL to discover where in the content database the sites that are using these web parts are implemented:
      use STSContentDBPrime
      --select DirName,LeafName,SetupPath from dbo.AllDocs where WebId='abee2623-56d2-43d6-8a9c-7b362fbd323b'
      select DirName,LeafName,SetupPath
      from dbo.AllDocs
      where ((WebId in ('siteGuid1','siteGuid2','...')) AND (SetupPath LIKE '%MSContentEditor.dwp%'))
    • Finally, the SQL query finds the “Meeting Workspace” sites that are using this corrupted web part.  We probably could just delete the web parts in question from the web part gallery, but I chose instead to use the stsadm.exe “export”, “deletesite”, “createsite”, and “import” commands to re-create the sites.  The export utility filtered out the corrupt content for us.  Once restored, the discovered sites no longer contained corrupted web parts.   Phew!
  9. The WSS3 instance has been prepped… Clone existing content database, and proceed with upgrade testing!

ADFS 2 and SharePoint 2010 Integration

Here is a quick entry on ADFS2 and SharePoint 2010 integration. It is not an implementation guide or end-to-end walkthough… that comes later, if we decide to implement this thing.

At present, I am most interested in the model of SharePoint->ADFS2->Shibboleth, where the SP-STS trusts tokens from ADFS2.  ADFS2 is part of a chained federation with our Shib service.  ADFS will consume Shib tokens, then transform them for the benefit of SharePoint.  However, I have no idea how to implement this solution at this time.

There are a few-too-many blog entires out there detailing how to configure ADFS2 and SharePoint 2010 for integration.  Trouble is, many of the step-by-step guides present contradictory configuration steps.  I guess there is no substitute for a deep, working knowledge of ADFS 2, SAML, and other Federation topics.

Here are some of the claims setup guides I have been working with:

Here are additional configuration posts on the process of upgrading an existing SharePoint Web Application from “Windows” authentication to “Claims” authentication.  The common denominators?

  1. You must add a valid new user to your claims-aware web app before migrating existing users, or the web application will be inaccessible after migration (or indeed, even before migration!)
  2. To trigger migration of users, you must invoke the “Migrate Users” method on your web app, E.g.:
    $wa = get-SpWebApplication "https://webappurl"
    $wa.MigrateUsers($true)

The things here that seem very unclear to me are:  What exactly is being done when you invoke the “MigrateUsers” method on the Web Application object?  How does SharePoint map legacy “Windows” users to new “Claims” users?  Anyway, here are the links:

Pages containing information that I have found useful while contemplating how to pull off SharePoint 2010:

Many of these links, as it turns out, were already discovered by members at dirteam.com.  Doh…

http://blogs.dirteam.com/blogs/jorge/archive/2010/07/06/configuring-sharepoint-2010-to-use-adfs-v2-as-an-authentication-provider.aspx

ADFS 2 Setup and Configuration

The following is a work in progress… Everything done so far works, but this is not yet a complete end-to-end deployment guide…

ADFS 2: new… improved?

There were lots of setup and configuration guides for ADFS 1.0, even though the product was nauseatingly difficult to understand.  Along comes ADFS 2… new, more powerful, better, more standards compliant… documentation reads like scrawl on a napkin.

Some setup quirks:

  • Don’t forget that no current version of Windows Server includes ADFS 2.0… not even Server 2008 R2.  If you want to install ADFS 2.0 you must download it from MS and install.  DO NOT add the out-of-box ADFS role on Server 2008 R2.  It will confuse you, because it does not disclose which version of ADFS that it is.
  • Since we are planning a farm, generate a farm SSL certificate on one of the ADFS servers, then export the certificate in PFX format (that is, with the private key), and restore the cert to the second ADFS server.
  • It is considered desirable to put the ADFS database on a reliable external database server, but documentation on this options is limited to command line help.  Here is what I did:
    1. On one of the ADFS servers, run:
      Fsconfig.exe GenerateSQLScripts /ServiceAccount [account] /ScriptDestinationFolder [destination folder]
    2. Install the appropriate version of the SQL Native Client on the ADFS servers.
    3. On the SQL server, run the two scripts that were generated by the above command.  This will create two databases… AdfsConfiguration and AdfsArtifactStore, and set permissions for the service account specified.
    4. Make sure that any firewalls on the SQL server will allow the ADFS servers to connect to it.
    5. Since we use SQL mirroring, set up mirrors for these databases.  Add the service account to the list of accepted logins on the mirror server, since this will not be done automatically.
    6. Generate certificates that will be used for the ADFS web site, token signing, and token decryption.  Put the certificates in the Windows certificate store of the local computer.
    7. Preconfigure binding of the SSL certificate to the default IIS web site.
    8. Configure the first server in the farm using the following syntax:
      Fsconfig.exe CreateSQLFarm /ServiceAccount [Service Account]
      /ServiceAccountPassword [password] /SQLConnectionString "Initial Catalog=AdfsConfiguration; Data Source=spellbound; failover partner=rearwindow; integrated security=SSPI; Network Library=DBMSSOCN" /FederationServiceName login2.uvm.edu /CleanConfig /CertThumbprint "[thumbprint]" /SigningCertThumbprint "[thumbprint]" /DecryptCertThumbprint "[thumbprint]"
      Note that the thumbprint can be obtained by viewing the properties of the certificate in Windows explorer.
    9. Note that the ADFS “Artifact Store” database should get configured automatically, but you can ceck on this by doing the following:
      1. Launch PowerShell with admin privs
      2. Run the command “Add-PSSnapin microsoft.adfs.powershell”
      3. Run “get-adfsproperties | select artifactdbconnection”
      4. Use “set-adfsproperties -artifactdbconnection” to change the string if necessary.
      5. See this resource for more details:
        http://social.technet.microsoft.com/wiki/contents/articles/ad-fs-2-0-migrate-your-ad-fs-configuration-database-to-sql-server.aspx

Of course, an ADFS server with no Federation partners does not do us much good.  Unfortunately, I don’t have any other ADFS servers that I want to integrate with, either.  What interests me more are Shibboleth services (such as our own “login.uvm.edu”), and federation with other “InCommon” partners.  I also am interested in “user centric” identity providers such as OpenID and “Windows Live ID”.  Here are some links to get us started down this path.  Unfortunately, I cannot find anything that details exactly what we want:

Improving the Stability of ECTS

ECTS, the external collaboration toolkit for sharepoint, has been a bit of  a thorn in my side.  While fulfilling a vital need at our institution, it is difficult to support, and contains many bugs that apparently are not going to get fixed.

However, I recently I managed to pull off a few minor tweaks for ECTS that should improve stability in our environment.

Fix 1:  ECTSEx

First off, I installed the EXTS Extensions available in Codeplex:

http://ectsex.codeplex.com/

My thanks to David McWee for providing these accelerators to the ECTS management interfaces.  We are now able to pull up info on ECTS users considerably more easily.  These extensions also allow password reset information to go directly to the user for whom they are intended.  Good stuff.  Unfortunately, the interface is not behaving on all nodes of our SharePoint web farm, which brings us to our next fix…

Fix 2:  Load Balancer reconfiguration

We have been experiencing intermittent problems which suggests duplication of actions by the ECTS processes.  For example, a user may receive two account activation notices, each with different passwords.  A user may change a password, only to have the new password expired immediately, followed by prompting that your password has been reset by an administrator.

My assumption is that somehow requests to either the web front end servers of the backing AD LDS instance are being replicated, resulting in replicated account management operations.  To combat this, I have reconfigured our F5 load balancers as follows:

  1. For the Big-IP Virtual Server which handles traffic to “PartnerPoint.uvm.edu” (our ECTS front-end), I set the Persistence Profile to “none”.  This prevent users from continuing to use a LB node if a more preferred node is online.
  2. For the Big-IP pool connected to PartnerPoint, we activate “Priority Group Activation” with the “Less than 1″ attribute.  This will cause clients to be routed to a different Pool member priority group if there are less than one servers remaining in the highest priority group.
  3. We then set the “sharepoint3″ node to be in priority group 2, and “sharepoint2″ to priority group 1.  Now when users connect to the virtual server, they are routed to priority group 2 (sharepoint3), and only go to sharepoint2 if sharepoint3 is down.
  4. Repeat these three steps for the AD LDS virtual server.  This limits LDAP traffic to sharepoint3, unless sharepoint 3 is down.
  5. Now we need to limit traffic to the ECTS admin interfaces to connect preferentially against sharepoint3 as well.  We do this with iRules.  Create a new iRule which will intercept traffic to the ECTS management pages, and route it to a different pool:
    when HTTP_REQUEST {
     if {[HTTP::uri] starts_with "/ECTS/"} {
     pool WSS_partner_admin
     }
    }
  6. We then create a Pool with priority group activation, as outlined in steps 1-3, above, and assign this iRule to the main SharePoint virtual server.

I am able to verify though packet captures that all requests to the ECTS admin pages, and all traffic to the ECTS front ends is being routed preferentially to “sharepoint3″.  So far, this seems to have helped.  If nothing else, the admin pages are now loading consistently, which was not true for us in the past.

SharePoint and Mirrored Databases

Un-fun thing about Windows SharePoint Services 3.0:  Support for mirrored databases.  I did my testing of SharePoint and our mirrored SQL 2008 servers using our SharePoint 2010 beta instance.  That worked great!  Unfortunately, WSS 3.0 does not have the same native support for mirroring.  Instead, we have to go though a bunch of rigmarole using SQL database aliases, some ugly SQL jobs to run the failovers (all T-SQL is ugly, if you are more used to object-oriented scripting languages), poorly documented SQL alerts to monitor mirrorg, and scheduled tasks to change the local aliases on the SharePoint web front ends. 

There is no single reference for this topic, either… just a bunch of incomplete pointers. Each one assumes that you are either an expert on SQL and know nothing about SharePoint, or that you are a SQL dunce who will have to ask the SQL DBA for help, and that you can comprehend only Sharepoint. What about the SharePoint admin who has to be the DBA because your company does not have one of those?

The most complete reference from Microsoft on SharePoint and SQL mirroring can be found here:
http://technet.microsoft.com/en-us/library/dd207312.aspx
This is a much updated version of the doc originally published in the SharePoint team blog, and later updated and added to the TechNet Library:
http://technet.microsoft.com/en-us/library/cc262910.aspx
This is a really poor document, though, since it spends a lot of time telling you how to set up mirroring, but only hints at how to make SharePoint deal with the new configuration.
There also is a published case study on SharePoint mirroring here:
http://technet.microsoft.com/en-us/library/cc531332.aspx
This study is more fleshed out, but lacks details on how to create and configure SQL alerts.
Fortunately, there is a good ref on SQL alerts and database mirroring here:
http://www.mssqltips.com/tip.asp?tip=1564

While the MS guidance is good at telling you what needs to be done, it is short on the details of how to do it.  We know that we need to accomplish the following tasks to enable SharePoint database failover:

  1. On the SQL Servers, detect when any one SharePoint database has lost “Primary Mirror” status, and failover the remaining databases so that they are all running on the same host.
  2. On the SharePoint web front end servers, we need a script that will test the database server specified in the SQL Server aliases entry, and perform the following actions if it is not the current “Primary” mirror role holder:
    1. Change the alias to the current “Primary” mirror
    2. Cycle SharePoint services

Ideally, we would have the script created in step 1 also send a signal to the web front ends to perform the SQL Alias change and SharePoint services cycle.

Microsoft gives us a sample script for the database monitor and failover script (see bullt “1″, above), but they only hint the remaining scripts that will be required, and they do not teall you the best way to tie all the script together.

Here is what I have come up with so far…

T-SQL Script for Database Mirror Status Monitoring and Failover:

The following is adapted from the MS Sample script.  It has been modified to force failover of all mirrored database on the current server if any mirrored DB has failed over.

/*
conditionalFailover.sql - J. Greg Mackinnon, 2010-04-09
SQL Script to detect condition where not all /synchronized/ mirrored databases
are in the primary role.
Script will force failover of all mirrored databases if above conditions are met.
*/

use master;
--Declare and initialize variables
DECLARE @SyncDBCount int;
DECLARE @PrimaryDBCount int;
SET @SyncDBCount = 0
SET @PrimaryDBCount = 0

--Collect total number of mirrored databases
SET @SyncDBCount = (
  SELECT COUNT(ALL database_id)
  FROM sys.database_mirroring
  WHERE mirroring_state=4
  );
PRINT N'Total Synchronized Mirrored Database Count: '
	+ (CAST(@SyncDBCount AS varchar(4)));

--Collect total number of databases holding the "primary" database role.
SET @PrimaryDBCount = (
  SELECT COUNT(ALL database_id) FROM sys.database_mirroring WHERE
 (mirroring_state=4 and mirroring_role=1)
 );
PRINT N'Total Mirrored Databases holding the "Primary" state: '
	+ (CAST(@PrimaryDBCount AS varchar(4)));

-- Begin IF condition when not all synced mirror DBs are Primary.
IF ((@PrimaryDBCount > 0) and (@PrimaryDBCount < @SyncDBCount)) BEGIN
	Print N'Beginning Failover of mirrored databases...'
	DECLARE i CURSOR
	READ_ONLY
	FOR SELECT name FROM sys.databases WHERE database_id IN
		(SELECT database_id
			FROM sys.database_mirroring
			WHERE (mirroring_state=4 and mirroring_role=1))

		DECLARE @name varchar(255)
		DECLARE @cmd varchar(1000)
		OPEN i

		FETCH NEXT FROM i INTO @name
		WHILE (@@fetch_status  -1) BEGIN
			IF (@@fetch_status  -2) BEGIN
				set @cmd = 'ALTER Database [' + @name + '] SET PARTNER FAILOVER;'
				PRINT N'Running command: ' + @cmd
				exec (@cmd)
			END --end if
		FETCH NEXT FROM i INTO @name
		END --end while
	CLOSE i
	DEALLOCATE i
END -- End IF conditional
ELSE
	PRINT N'No failovers will be performed at this time.'
GO

T-SQL Script to determine availability of a primary mirror for a given SQL Server and Database:

The following script can be run from a client system to determine if the named SQL Server and Database are online, and is the current “Primary” mirror holder.  It can be launched using the “SQLCMD.exe” utility, and requires the SQL Native Client to be present on the client system.  Syntax for launching is:

sqlcmd.exe -S [SQLHost] -i [scriptFile.sql] -v dbName=[dbName]

The script will return a text output stating that the database is in one of three possible states… “Primary”, “Secondary”, or “unknown”.  If the database or database server is not availble, then a SQLCMD error will be returned.  Here is the script:

/*
Detect Mirroring status of a single database
Returns char with value Primary|Secondary|unknown
(detection of "noSync" has been removed as a primary with a downed partner
will also be out of sync, but that does not mean that we don't want to use
the primary)
Database to be queried can be set by changing the @DBname variable
*/

use master;

-- Declare and initialize variables
Declare @DBname AS varchar(128);
Declare @mirState AS char(9);
Set @DBname = '$(dbname)';
-- End Declarations

-- Conditional logic to determine if @DBname is Primary or Secondary
IF (
	select mirroring_role from sys.database_mirroring where (
		database_id IN (select database_id from sys.databases where name = @DBname)
		)
) = 1 set @mirState = 'Primary'
ELSE
	BEGIN --NOTE:  THIS WILL NOT RETURN THE EXPECTED RESULT for a user in the "Public" role on the DB server.
		-- That's okay, because the "primary" switch does work as expected.
	IF (
	select mirroring_role from sys.database_mirroring where (
		database_id IN (select database_id from sys.databases where name = @DBname)
		)
	) = 2 set @mirState = 'Secondary'
	ELSE set @mirState = 'unknown'
	END;
--End Primary/Secondary selection

-- Conditional logic to determine if database is Synchronized
/*IF (
	select mirroring_state from sys.database_mirroring where (
		database_id IN (
			select database_id from sys.databases where name = 'VMware_VCDB'
			)
		)
	) != 4 set @mirState = 'noSync'
*/
--End Sync determination

-- Generate Output
Print N'Database ' + @DBname + N' has status: ' + @mirState;

 PowerShell Script to monitor validity of current DB alias, and update if no longer valid:

This script calls on the T-SQL script above to check on the status of the DB Alias, update it if necessary, and cycle SharePoint services on the web front end if the alias is updated. Be sure to update the script to use your SQL Host, Alias, and Database names before using in your environment.

#Invokes "sqlcmd" script to determine holder of current primary mirror server role for the SharePoint databases.
#Sets SQL aliases accordingly

Set-PSDebug -Strict

##### Start script variables: ###
[string]$db1="RearWindow"
[string]$db2="Spellbound"
[string]$dbAlias="WinDB"
[string]$dbName="SharePoint_FarmConfig_Test"
##### End script variables: #####

##### Start script functions: ###

function mirState {
	#Determines the mirroring state of a specified database
	#REQUIRES: SQLCMD.exe in the path specified in the function,
	#	the "determinMirDBState" sql script, in the path specified.
	#	input strings "$server" (for the SQL server or instance), and "$database" (the database to test)
	#PROVIDES: returns a string with possible values Primary|notThere|stateUnknown|other
	param ([string]$server,[string]$database)

	#Build SQLCMD command:
	[string]$sqlCom = 'c:\Program Files\Microsoft SQL Server\100\Tools\Binn\SQLCMD.EXE'
	[string]$sqlScript = "c:\local\scripts\sql\determineMirDBState.sql"
	#Invoke the command and capture to string $comOut
	[string]$comOut = & $sqlCom -S $server -i $sqlScript -v dbname=$database 2>&1

	if ($comOut.contains("Primary") -eq $true) { #if RearWindow is the primary role holder then...
		[string]$rc="Primary"
	# Removed "Secondary" selection as this string will not be returned in a low-privilege environment.
	#} elseif ($comOut | Select-String "Secondary") {#if RearWindow is the secondary role holder then...
	#	[string]$rc="Secondary"
	# Removed "noSync" detection as a primary with a downed partner will be out of sync, but it is not actionable.
	#} elseif ($comOut | Select-String "noSync"){ #if the mirroed database is not in sync then...
	#	[string]$rc="noSync"

	} elseif ($comOut.contains("not accessible") -eq $true) { #SQL error occured, meaning server is in some way unavailable.
		[string]$rc="notThere"
	} elseif (($comOut.contains("unknown") -or ($comOut.contains("Secondary"))) -eq $true) { #db state is anything other than primary
		[string]$rc="notPrime"
	} else { #something else unexpected.
		[string]$rc="other"
	}
	return $rc
} #end mirState

function getCurServer {
	#Determines current database alias setting for the string $DBAlias.
	#Requires: An input string $DBAlias.
	#	Assumes the existance of the alias in the registry,
	#	and that the alias is using the "DBMSSOCN" (TCP/IP) connection method.
	#Returns: A string containing the host name of the current database server for the alias
	param ([string]$dbAlias)

	[string]$curAliasSetting = (get-itemproperty 'HKLM:\SOFTWARE\Microsoft\MSSQLServer\Client\ConnectTo').$dbAlias
	[string]$curServer = $curAliasSetting.Replace("DBMSSOCN,","")
	return $curServer
}

function setAlias {
	#Sets in the regustry the host name for the provided SQL server alias
	#REQUIRES the names of the SQL host and its alias
	#PROVIDES nothing
	param ([string]$dbHost,[string]$dbAlias)
	[string]$aliasSetting = "DBMSSOCN," + $dbHost
	Set-ItemProperty -Path 'HKLM:\SOFTWARE\Microsoft\MSSQLServer\Client\ConnectTo' -Name $dbAlias -Value $aliasSetting
}

function cycleServices {
	#Restarts system services associated with SharePoint
	# This will expedite reconnection to the SharePoint databases in the event of a SQL mirror failover.
	& iisreset
	# We use the "if" conditional below to determine if the service is already running before attempting to restart it.
	if ((Get-Service -Name "spadmin").status -match "Running") {Restart-Service -Name "spadmin"}
	if ((Get-Service -Name "sptimerv3").status -match "Running") {Restart-Service -Name "sptimerv3"}
	if ((Get-Service -Name "spsearch").status -match "Running") {Restart-Service -Name "spsearch"}
	if ((Get-Service -Name "osearch").status -match "Running") {Restart-Service -Name "osearch"}
	Write-Output "SharePoint services cycled."
}
##### End script functions: #####

##### Start script body: ###
[string]$curServer = getCurServer -dbAlias $dbAlias
Write-Output "The database alias $dbAlias currently is set to: $curServer"

[string]$mirStateVar = mirState -server $curServer -database $dbName
Write-Host "Database $dbName on server $curServer is in state $mirStateVar."

if ($mirStateVar.contains("Primary")) {
	Write-Output "$curServer is already defined as the SQL Alias, and the server is functioning as expected.  Exiting."
	return
} else {
	switch -regex ($curServer) {
		$db1 {[string]$newServer = $db2}
		$db2 {[string]$newServer = $db1}
	}
	setAlias -dbHost $newServer -dbAlias $dbAlias
	Write-Output "Set alias $dbAlias to value $newServer."
	## Removed the "cycleServices function as the databases will get reconnected on their own.
	## You still can run "cycleServices if you want to expedite DB reconnection,
	## but keep in mind that this will cause all of the ASP.NET code to recompile.  Boo!
	#Write-Output "Cycling SharePoint services..."
	#cycleServices
	Write-Output "Done."
}
##### End script body: #####

Tying it all together

So now to make it all work togther…
The fastest thing to do is to implement the first T-SQL script as a SQL Server Agent job running on both SQL mirror servers. The jobs are configured to run once per minute, and have been staggered to run 30 seconds apart from each other. The script will not be triggered by databases in a “suspended”, we should be able to avoid job collision between the two servers. (I hope).

We then could implement the PowerShell script (and accompanying T-SQL script) as a scheduled task on the web front end servers. We need to run the script as an account in the PUBLIC role on the SQL servers, which can run as a batch job on the WFEs.
Easy, right? we shall see.