• A-Z
  • Directory
  • myUVM
  • Loading search...

Frank's Activity Log

28 May 2015: Thursday

Posted: May 28th, 2015 by fcs

LDAP:

  • LDAP Update: one (1) issue – single item match:
    • It was an old purged ou=people (non-banner) entry.  Removed it, and reset the OSP feed to see this as new tonight.
  • ACCOUNT Update: no issues.
  • Development Server SSL certificate woes were caused by selinux.  Needed to use “fixfiles” (actually, I did the direct chcon) and it should have fixed them.  For future reference:
    • main: TLS init def ctx failed: -1
      • That error message means that slapd cannot read the TLS certificate files.

Backups:

  • CoM/IS and Disk clones continue.
  • Tape backups from the weekend finished overnight – clones can start, and then they finished.
  • bujbod2’s disks are working as expected.
  • Testing my theory about the problem with save.
    • I can’t build a directory tree that is deep enough to cause trouble.  I keep getting filesystem full errors (although there is space left) trying to build the thing.
  • SR71622710 – EMC never assigned it.  A SEV1 and they never touched it until I called back in 24 hours after it was queued.  Yeesh!
    • EMC’s response was that I need to increase the stack space (ulimit -s) for all processes on the system by 10K steps until save stops getting segmentation faults.  That was the solution for 7.6 – this is 8.2 — why is save still doing such a stupid thing?

Exchange:

  • Figuring out how to import PINE addressbook.
  • Trying to figure out which procmail filters were not translated and why.

27 May 2015: Wednesday

Posted: May 27th, 2015 by fcs

LDAP:

  • No issues in the LDAP update.
  • NFS Stale file handle issue in the ACCOUNT update.
  • New development SSL cert does not work… what’d I do wrong???

Backups:

  • CoM/IS clones started.
  • CoM/IS backups seem to have had a major problem last night.
  • A couple of the fileservers are having trouble too.
  • Tracked down the SegFault in save on fsa1.  Apparently… a directory that was too deep…

26 May 2015: Tuesday

Posted: May 26th, 2015 by fcs

LDAP:

  • Nightly LDAP Update: One issue over the long weekend.  A one item match with the wrong last name.
    • OSP requested to contact and verify what I found. OSP verified. LDAP updated, account created, former student kerberos updated.
  • Daily Account Update: No issues.

Backups:

  • CoM/IS and Tape backups continue with their long running saves that hit this past weekend.
  • Disk clones started.
  • Trying to figure out the quota issue with the bujbod2 mount on stornode3.
    • Harumph!  This is what happens when you put critical XFS information into /etc/project instead of the correct file (/etc/projects).  It is fixed.
  • Two (2) CoM/IS groups and one (1) SAA group failed over the long weekend.

22 May 2015: Friday

Posted: May 22nd, 2015 by fcs

LDAP:

  • No issues in the nightly LDAP update.
  • No issues in the morning ACCOUNT update.

Backups:

  • The usual three failures for this week.
  • bujbod2:
    • The trial run was a success.  3.2TB written there overnight and no apparent slowdown (nothing like the first time bujbod1 went into production).
    • Added the bujbod2 incremental disk to stornode2’s staging process.
    • Thought I would add one of the disks to stornode3 – but the quota did not propagate correctly.  Not sure why.
  • Defined two new clients for the new Luminis portal that is being released this summer.

21 May 2015: Thursday

Posted: May 21st, 2015 by fcs

LDAP:

  • Nightly LDAP update issues: one – a new student employee, found and merged.
  • No daily ACCOUNT update issues.

Backups:

  • CoM/IS and disk clones continue.
  • Special processing for old AIX box data.
  • bujbod2:
    • Investigating quota processing. It works!
    • CFEngine update sent out for approval to run upgrades at Backup server time.
    • One disk added to the Incr pool.

20 May 2015: Wednesday

Posted: May 20th, 2015 by fcs

LDAP:

  • Nightly LDAP update: 1 issue – a student, employee – found, merged.
  • Daily ACCOUNT update: no issues

Backups:

  • CoM/IS and Disk clones continue.
  • SB backups still having a failure.  Pinging the Admin again – to see what the issue is.
  • New fileserver failed to backup.  Informing the admin that it was not ready. Ah – the firewall was configured but not activated, so no traffic.
  • CoM/IS system having another SQL backup failure.  Pinging the admins.
  • bujbod2:
    • selinux: Put it back in enforcing mode – no audit found failures in the past two weeks and immediately found failures.  Eventually got the right set of greps (5 passes) to get everything permitted.
    • nfs server: for some reason – you have to “systemctl enable nfs-server” as well as “systemctl enable nfs” to get the nfs server to start at boot, but you only have to “systemctl start nfs” to get it all running after the system is up. Eh?
    • More perf testing… darn this stuff takes a long time!

19 May 2015: Tuesday

Posted: May 19th, 2015 by fcs

LDAP:

  • Two issues in the LDAP update:  Two new student employees.   Found their student record and merged the employee info with it.
  • No issues in the morning ACCOUNT update.

Backups:

  • That CoM/IS system is failing again.
  • Tape clones completed.
  • Continuing to work on the check_raid for bujbod2 – got the checks to actually work and not false positive!
  • bujbod2 – nfs service does not successfully start on reboot.

18 May 2015: Monday

Posted: May 18th, 2015 by fcs

LDAP:

  • One issue over the weekend: Found the student and merged the new employee info in.
  • No problems in the daily ACCOUNT updates.

Backups:

  • Unexpected backup failures – admins notified.
  • bujbod2 check_raid script updated to properly catch consistency checks and not send false positive reports.
  • bujbod2 – power supply loss of power is not reflected in the check_raid process at all.  Seems it is deficient.  I will need to talk with Kent about the design of this darned thing.
  • bujbod2 – 9:18 (V1:PD19) is again showing a red failure light, and 8:22 (GHSP) is again blinking.  Contacting AMS for suggestions.
    • AMS suggests reboot.
    • I powered down (shutdown -h) the bujbod2 system
    • powered off (held power button) the jbod
    • powered on the jbod
    • waited (multiple minutes, 5?)
    • booted bujbod2 – XFS says the filesystem on the jbod is corrupt.
    • Ouch.
    • xfs_repair did not find anything that needed fixing.
    • Turns out one of my options for mounting the XFS system was not good.

15 May 2015: Friday

Posted: May 15th, 2015 by fcs

LDAP:

  • Patched up three issues from Wednesday night and modifications to those same entries from Thursday night in the LDAP update issues field.
  • No issues in the ACCOUNT update the past two mornings.
  • Update mail alias for Account Services

Backups:

  • Notified an admin that his system is not backing up.  Asking if he has shut it down or if it is just on holiday.
  • The ZFS issue from the disk that failed and was not discovered until I returned from vacation has finally resolved itself (the new disk has been re-silvered, taking its place in the pool and the spare is back as a spare).
  • Reclaimed two (2) tapes for the vaults.
  • Replaced the failed drive in JBOD2.  Instead of needing to set it as the HSP – it went into “CopyBack” mode.  Now, I need to wait for the rebuild to complete.  Interesting how MegaCLI hides that info from you.

13 May 2015: Wednesday

Posted: May 13th, 2015 by fcs

LDAP:

  • An “Updating the WRONG entry” issue in the nightly LDAP update:
    • Banner adds an SSN to an entry and we find it matches an employee.
    • Student account not activated.
    • Entries merged; Account Services notified.
  • No problems in the daily ACCOUNT update.

Backups:

  • Scanning for warnings and notifying admins.
  • Replaced the failed drive in bujbod1 (32:19).  Had some trouble creating the new Virtual Disk (due to preserved cache for the failed drive).  Got around it and updated the documentation in the wiki.
  • Reported an apparent failed drive (looks like it might have or might not have failed at the last reboot?) in 9:18 on bujbod2 to AMSStorage.  Waiting to see if they want to replace it or not.

General:

  • Fileservers (ipv6 only) wandered their clocks into oblivion because they can’t talk to the ipv4 only ntp servers.  Set up their leader (straddles ipv4 and ipv6) to be an ntp server and the rest of them to talk to her.
Contact Us ©2010 The University of Vermont – Burlington, VT 05405 – (802) 656-3131
Skip to toolbar