We are retiring our NetApp filer this year. It was nice knowing you, NetApp. Thank you for the no-hassle performance, agile volume management, and excellent customer support. We will not miss your insane pricing, and subtle incompatibilities with modern Windows clients.
In this multi-part series, I will be sharing PowerShell code developed to assist with our migration. In part one, we will look at bulk copy operations with RoboCopy. In part 2, we will look at a situation where RoboCopy fails to get the job done. In future parts, we will look at automated share and quota management and migration.
Migrating large amounts of data off a NetApp is not particularly straightforward. The only real option we have is to copy data off of the filer CIFS shares to their Windows counterparts. Fortunately, with the multi-threading power utility “robocopy” we can move data between shares pretty quickly. Unfortunately, robocopy only multi-threads file copy operations, not directory search operations. So, while initial data transfers with robocopy take place really quickly, subsequent sync operations are slower than expected. MS also released a utility called “RichCopy” whish supports multi-thread directory searching, but this utility is not supported by MS, and has some significant bugs (i.e. it crashes all the time). What to do?
PowerShell to the rescue! Using PowerShell jobs, we can spawn off a separate robocopy job for each subdirectory of a source share, and run an arbitrary number of parallel directory copies. With some experimentation, I determined that I could run ten simultaneous robocopy operations without overwhelming CPU or disk channels on the filer. Under this arrangement, or file sync Window has been reduced from almost 48 hours to a mere 2.5 hours.
Some tricky bits in the development of this script where:
Below is the script I developed for this job… it contains paths specific to our infrastructure, but easily could be modified. Change the “while ($jobcount -lt 10)” loop to set the number of simultaneous robocopy processes to be used by the script…
# FilerSync_jobQueue.ps1
# JGM, 2011-09-29
# Copies all content of the paths specified in the $srcShares array to
# corresponding paths on the local server.
# Keeps data on all copy jobs in an array "$q".
# We will use up to 10 simultaneous robocopy operations.
set-psdebug -strict
# Initialize the log file:
[string] $logfile = "s:\files_to_local.log"
remove-item $logfile -Force
[datetime] $startTime = Get-Date
[string] "Start Time: " + $startTime | Out-File $logfile -Append
# Initialize the Source file server root directories:
[String[]] $srcShares1 = "adfs$","JMP$","tsFlexConfig","software","mca","sis","shared"`
#,"R25"
#R25 removed from this sync process as the "text_comments" directory kills
#robocopy. We will sync this structure separately.
[String[]] $srcShares2 = "uvol_t1_1$\q-home","uvol_t1_2$\q-home","uvol_t1_3$\q-home",`
"uvol_t1_4$\q-home","uvol_t1_5$\q-home","uvol_t2_1$\q-home",`
"vol1$\qtree-home"
[String[]] $q = @() #queue array
function collectJobs {
#Detects jobs with status of Completed or Stopped.
#Collects jobs output to log file, increments the "done jobs" count,
#Then rebuilds the $jobs array to contain only running jobs.
#Modifies variables in the script scope.
$djs = @(); #Completed jobs array
$djs += $script:jobs | ? {$_.State -match "Completed|Stopped"} ;
[string]$('$djs.count = ' + $djs.count + ' ; POssible number of jobs completed in this colletion cycle.') | Out-File $logfile -Append;
if ($djs[0] -ne $null) { #First item in done jobs array should not be null.
$script:dc += $djs.count; #increment job count
[string]$('$script:dc = ' + $script:dc + ' ; Total number of completed jobs.') | Out-File $logfile -Append;
$djs | Receive-Job | Out-File $logfile -Append; #log job output to file
$djs | Remove-Job -Force;
Remove-Variable djs;
$script:jobs = @($script:jobs | ? {$_.State -eq "Running"}) ; #rebuild jobs arr
[string]$('$script:jobs.count = ' + $script:jobs.Count + ' ; Exiting function...') | Out-File $logfile -Append
} else {
[string]$('$djs[0] is null. No jobs completed in this cycle.') | Out-File $logfile -Append
}
}
# Loop though the source directories:
foreach ($rootPath in $srcShares1) {
[string] $srcPath = "\\files\" + $rootPath # Full Source Directory path.
#Switch maps the source directory to a destination volume stored in $target
switch ($rootPath) {
shared {[string] $target = "S:\shared"}
software {[string] $target = "S:\software"}
mca {[string] $target = "S:\mca"}
sis {[string] $target = "S:\sis"}
adfs$ {[string] $target = "S:\adfs"}
tsFlexConfig {[string] $target = "s:\tsFlexConfig"}
JMP$ {[string] $target = "s:\JMP"}
R25 {[string] $target = "S:\R25"}
}
#Enumerate directories to copy:
$dirs1 = @()
$dirs1 += gci $srcPath | sort-object -Property Name `
| ? {$_.Attributes.tostring() -match "Directory"} `
| ? {$_.Name -notmatch "~snapshot"}
#Copy files in the root directory:
[string] $sd = '"' + $srcPath + '"';
[string] $dd = '"' + $target + '"';
[Array[]] $q += ,@($sd,$dd,'"/COPY:DATSO"','"/LEV:1"' )
# Add to queue:
if ($dirs1[0] -ne $null) {
foreach ($d in $dirs1) {
[string] $sd = '"' + $d.FullName + '"';
[string] $dd = '"' + $target + "\" + $d.Name + '"';
$q += ,@($sd,$dd,'"/COPY:DATSO"','"/e"')
}
}
}
foreach ($rootPath in $srcShares2) {
[string] $srcPath = "\\files\" + $rootPath # Full Source Directory path.
#Switch maps the source directory to a destination volume stored in $target
switch ($rootPath) {
uvol_t1_1$\q-home {[string] $target = "H:\homes1"}
uvol_t1_2$\q-home {[string] $target = "I:\homes1"}
uvol_t1_3$\q-home {[string] $target = "J:\homes1"}
uvol_t1_4$\q-home {[string] $target = "K:\homes1"}
uvol_t1_5$\q-home {[string] $target = "L:\homes1"}
uvol_t2_1$\q-home {[string] $target = "M:\homes1"}
vol1$\qtree-home {[string] $target = "J:\homes2"}
}
#Enumerate directories to copy:
[array]$dirs1 = gci -Force $srcPath | sort-object -Property Name `
| ? {$_.Attributes.tostring() -match "Directory"}
if ($dirs1[0] -ne $null) {
foreach ($d in $dirs1) {
[string] $sd = '"' + $d.FullName + '"'
[string] $dd = '"' + $target + "\" + $d.Name + '"'
$q += ,@($sd,$dd,'"/COPY:DAT"','"/e"')
}
}
}
[string] $queueFile = "s:\files_to_local_queue.csv"
Remove-Item -Force $queueFile
foreach ($i in $q) {[string]$($i[0]+", "+$i[1]+", "+$i[2]+", "+$i[3]) >> $queueFile }
New-Variable -Name dc -Option AllScope -Value 0
[int] $dc = 0 #Count of completed (done) jobs.
[int] $qc = $q.Count #Initial count of jobs in the queue
[int] $qi = 0 #Queue Index - current location in queue
[int] $jc = 0 #Job count - number of running jobs
$jobs = @()
while ($qc -gt $qi) { # Problem here as some "done jobs" are not getting captured.
while ($jobs.count -lt 10) {
[string] $('In ($jobs.count -lt 10) loop...') | out-file -Append $logFile
[string] $('$jobs.count is now: ' + $jobs.count) | out-file -Append $logFile
[string] $jobName = 'qJob_' + $qi + '_';
[string] $sd = $q[$qi][0]; [string]$dd = $q[$qi][1];
[string] $cpo = $q[$qi][2]; [string] $lev = $q[$qi][3];
[string]$cmd = "& robocopy.exe $lev,$cpo,`"/dcopy:t`",`"/purge`",`"/nfl`",`"/ndl`",`"/np`",`"/r:0`",`"/mt:4`",`"/b`",$sd,$dd";
[string] $('Starting job with source: ' + $sd +' and destination: ' + $dd) | out-file -Append $logFile
$jobs += Start-Job -Name $jobName -ScriptBlock ([scriptblock]::create($cmd))
[string] $('Job started. Incrementing $qi to: ' + [string]$($qi + 1)) | out-file -Append $logFile
$qi++
}
[string] $("About to run collectJobs function...") | out-file -Append $logFile
collectJobs
[string] $('Function done. $jobs.count is now: ' + $jobs.count)| out-file -Append $logFile
[string] $('$jobs.count = '+$jobs.Count+' ; Sleeping for three seconds...') | out-file -Append $logFile
Start-Sleep -Seconds 3
}
#Wait up to two hours for remaining jobs to complete:
[string] $('Started last job in queue. Waiting up to three hours for completion...') | out-file -Append $logFile
$jobs | Wait-Job -Timeout 7200 | Stop-Job
collectJobs
# Complete logging:
[datetime] $endTime = Get-Date
[string] "End Time: " + $endTime | Out-File $logfile -Append
$elapsedTime = $endTime - $startTime
[string] $out = "Elapsed Time: " + [math]::floor($elapsedTime.TotalHours)`
+ " hours, " + $elapsedTime.minutes + " minutes, " + $elapsedTime.seconds`
+ " seconds."
$out | out-file -Append $logfile
#Create an error log from the session log. Convert error codes to descriptions:
[string] $errFile = 's:\files_to_local.err'
remove-item $errFile -force
[string] $out = "Failed jobs:"; $out | out-file -Append $logfile
$jobs | out-file -Append $errFile
$jobs | % {$jobs.command} | out-file -Append $errFile
[string] $out = "Failed files/directories:"; $out | out-file -Append $errFile
Get-Content $logfile | Select-String -Pattern "\\\\files"`
| select-string -NotMatch -pattern "^ Source" `
| % {
$a = $_.toString();
if ($a -match "ERROR 32 ") {[string]$e = 'fileInUse: '};
if ($a -match "ERROR 267 ") {[string]$e = 'directoryInvalid: '};
if ($a -match "ERROR 112 ") {[string]$e = 'notEnoughSpace: '};
if ($a -match "ERROR 5 ") {[string]$e = 'accessDenied: '};
if ($a -match "ERROR 3 ") {[string]$e = 'cannotFindPath: '};
$i = $a.IndexOf("\\f");
$f = $a.substring($i);
Write-Output "$e$f" | Out-File $errFile -Force -Append
}