• A-Z
  • Directory
  • myUVM
  • Loading search...

Frank's Activity Log

20 August 2014: Wednesday

Posted: August 20th, 2014 by fcs

LDAP:

  • Issues:
    • New “Faculty” (aka Banner Course Assignment) with an SSN that does not quite match the Faculty (PeopleSoft) that I think this is.  Email off to Registrar’s office…
    • New Graduate Student employee.  Found and merged.

BACKUPS:

  • CoM/IS clone finished overnight, Disk is close.
  • Oh joy!  Account Services deleted a departmental account per request and suddenly the customer determined that they needed that account after all.  Recovery time.

Hardware:

  • Failed drives in MANNSAN1 and STORNODE10 replaced…
  • 7914 (veeam-repos0) fibre card replaced with M5120/1GB/battery SAS RAID card.
    • RAID6 license obtained and installed.

19 August 2014: Tuesday

Posted: August 19th, 2014 by fcs

LDAP:

  • Issues:
    • Multiple CatCards: Notified CatCard office of the issue.
    • Two PeopleSoft new employees without SSNs:
      • found and matched them both.
  • VPN groups: Three people added to a VPN by request of the owner.
  • Special run for CatCard office – to give them some new data.

BACKUPS:

  • stornode5 and spogiprod both were “not configured properly” – deleted their aliases and let NetWorker create them again… bingo — they work.  I hate whatever that corruption issue is.
  • Tape clones finished.  Disk and CoM/IS clones running.

Other:

  • Account Services system has been up for a year… Oops… rectified.
  • Multiple rounds of cycling the Maestro GUI service.

18 August 2014: Monday

Posted: August 18th, 2014 by fcs

LDAP:

  • Issues:
    • Grrr – PeopleSoft again with the entry with no SSN later gets an SSN that matches an existing record and causes me to have to merge things.  Merged the records, disabled the duplicate account.
    • Three Student/Graduate Fellowships without SSNs.  Found them all and merged them in.

BACKUPS:

  • Disk failure on storage node 10 – except the Nagios check failed to alert.  Well, the Nagios (nrpe actually) check script was flawed.  Repaired, updated on the rest of the NetWorker systems so this won’t happen again…
    • Stole a disk out of veeam-bs0 (powered off, not in use) to get storage node 10 protected again, and reported the failed disk to our maintenance company – should have a new one on Tuesday.
  • Disk clones from last week still running – sigh – aborted them.
  • storage node 1 (stornode1) has been removed as a storage node.  Left the machine up, just in case.

SecurID:

  • Update to add new VPN end point.

15 August 2014: Friday

Posted: August 15th, 2014 by fcs

LDAP:

  • Registrar rolled semester forward from 201406 to 201409 yesterday.  Caused 18,515 updates (all good!)
  • Issues:
    • Multiple Barcodes: Notified CatCard office they were delivering a duplicate entry to LDAP.
    • Singleton Match with different last name:  OSP had notified me yesterday that this was going to happen.  Fixed.
  • Meeting with the CatCard office about their feed, how it works, and changes that can be made.
  • LDAP Account purge ran – 14 ERRORS – trying to add bounce rules for accounts that already had them.
  • LDAP Management password updated.

BACKUPS:

  • New email storage node is running with a higher load than expected, but not anything that is terrible – adjusting the monitoring points so it will stop doing the “WARNING” reports that cause others to become agitated.
  • Clones: CoM/IS clones should finish up today (4TB in process/19GB left), Disk clones probably not (30TB left).
  • Sharding update for the email backups (changed directive of safety net client from “Penguin backup directives” to “Penguin directives” and disabled the shard client to move back to not using shards at all)
  • Parallel stream choices adjusted for file server backups that will do full saves this weekend.
  • defined clients for two new file servers that will be grown in the coming week(s).
  • Putting plan (checklist) together for Monday’s work.

Code:

  • Attempting to grok what Ben was trying to do in user_trickle with the $SIG{INT} definition – since system is used to call the move user process, there is really no way to get a sigint to the parent process while the child is off doing its thing…

13 August 2014: Wednesday

Posted: August 13th, 2014 by fcs

LDAP:

  • Issues!
    • Two former students: dancing done.
    • A singleton match with a different last name.  Off to the provider to get them to check it out.
  • More work with the new TWS/Maestro java stuff that really doesn’t like our LDAP server.

BACKUPS:

  • Tape clones completed yesterday.  Disk clones have 41TB left to go (egad..).  CoM/IS clones are not started yet – their weekend full has not yet completed.
  • Data shredding of the disk array used in the testing of Veeam has completed, the array and the system it was attached to have been powered off.
  • CoM/IS backup finally completed.  CoM/IS clones started at noon.

 

12 August 2014: Tuesday

Posted: August 12th, 2014 by fcs

LDAP:

  • Another SSN addition identifies an account duplication.  Fortunately, this time, it was with an account that was purged three years ago.  Deleted the purged LDAP entry and moved on.
  • VPN removed from eight accounts per request from DBA’s.
  • Working with DBAs making the new Maestro work on the new system.

BACKUPS:

  • Still working on labeling the 256 tapes that came back from the vaults yesterday.
  • re-installed NetWorker on the test system (45 day license ran out).
  • Tape and Disk clones started.  CoM/IS backup from weekend continues to run.

11 August 2014: Monday

Posted: August 11th, 2014 by fcs

LDAP:

  • PeopleSoft added an SSN for an employee that now matches a student.  Merged the two and locked up the now defunct employee entry so the purge process will get it gone in three months.

BACKUPS:

  • email snapshot script seems to have broken again.  Two of the snapshots failed to mount on the system that is doing the backups.  Re-ran the mounting to force it to go, and restarted the backup.
    • Ah – then found that the entire snapshot process last night had failed because one of the saves was still running on /gfs63 because of the failure to page Saturday night when the snapshot process had failed and the code refuses to page on Saturday nights.
  • Taking the full tapes from Mann to Vaults (bringing back some empties to take their place): 355 went to the vaults, 256 came back.

8 August 2014: Friday

Posted: August 8th, 2014 by fcs

LDAP:

  • Yet another first time former student.  Yes, you know the drill as well as I do by now.
  • Reviewed the trace.log from the TWS debugging yesterday.  Found one possible thing that was causing it to fail – suggested a change.  Not sure that will be sufficient to make it work (still not clear – due to the inability of the expert IBM provided to actually understand a question – of what access level the application will require to the ldap servers).

BACKUPS:

  • The email backups are running, and they are performing an incremental, like they are supposed to be doing.  Damn, that was a convoluted way to get this to work.  What did I actually do that made it work?  Let me tell you:
    • pathownerignore was set up on stornode5.  Touch the file pathownerignore in the same directory that the NetWorker save binary lives (/usr/sbin in this case) and NetWorker will stop failing backups that it thinks are not owned by the system.  So, this prevents NetWorker from saving the indexes for the penguin-backup backups under the stornode5 client name.
    • Set up /etc/hosts or DNS so that the NetWorker server (ozzie in this case) finds that the IP address of penguin-backup just happens to be the one that stornode5 uses.
    • Add stornode5 to the remote access list for the penguin-backup client.
    • Put the penguin-backup client into the save groups that you want it in.
    • The end result of those four steps above are that the saves will run on stornode5 and they will have -c and -m parameters that specify the penguin-backup name.  The email about the save group(s) will say that the files will be saved as being on stornode5, but checking the actual indexes (in NMC: Media -> Indexes -> penguin-backup) you will see that they really are being filed under the penguin-backup name (and that their paths are not even in the stornode5 indexes).  Victory (until version 8.2 or something else changes).
  • Clones: CoM/IS and Tapes have finished.  Disk starts the day with 36TB left to go.  No way that is going to finish before the weekend full saves start up.
  • New Fileserver backups:  Removed the PSS flag and reduced the Parallelism value on the clients that will be doing their full saves tonight.
  • Dealing with the email snapshot process.  I think I have it sorted out and have activated the cronjob (since I’m on call tonight) and will see if it works.  The problem was that the “/sbin/multipath -F” command was failing (intermittently, of course).

7 August 2014: Thursday

Posted: August 7th, 2014 by fcs

LDAP:

  • No issues
  • Meeting to deal with config for Maestro/TWS

BACKUPS:

  • Two of the four fsXX systems completed their forced full backups in less than a day.  The last two are in a single group and they appear to be well on their way to completion today.
  • CoM/IS and Tape clones are finished – Disk clones… they have a long way left to go.
  • Email backups are a pain!!!
    • For some unknown reason – the removal/replacement of the flashcopy always fails on the first pass and works on the second.  Duh Wha???
  • zoo’s recover script had to be updated.

6 August 2014: Wednesday

Posted: August 6th, 2014 by fcs

LDAP:

  • We have issues!  Dealt with their thorny hides!
  • Doh!  Stupid ACL blunder removes functionality that was promised.  Now to repair that.

BACKUPS:

  • WTF!!!!  Things that should be incremental backups are performing full saves!
    • Stupid programming choice – the number of parallel streams for the save set changed so NetWorker (as per documentation) escalated the incremental backup to a full.
    • So… never ever ever ever ever ever ever use Parallel Save Streams!  Changing either the number of streams for a client or the number of save sets will result in a full backup when you are not expecting it to.
  • SR 64026734:  Collecting the new information EMC has requested to help them debug why this is happening.  Two executions of the dvdetect command.  Also informed by EMC that escalation NW161624 has been raised to deal with this problem (which means, they agree this is a real bug).
  • Ubuntu client with 8.1.1.3 fails to start.  glibc munmap_chunk invalid pointer.  Tried upgrading it to 8.1.1.7 – not fixed.  Off to EMC… Live Chat – SR 64935414.  Oh.  The system upgraded glibc, but only part way did it – because the application was manually installed and the entire system is broken.

Footprints:

  • Purge all projects of a certain former agent.
Contact Us ©2010 The University of Vermont – Burlington, VT 05405 – (802) 656-3131