[Beowulf] delayed savings time crashes
    David Mathog 
    mathog at caltech.edu
       
    Wed Apr 12 09:05:27 PDT 2006
    
    
  
This is an odd one.  I just realized that 9 of 20 nodes
rebooted on Apr 4.  (Since they all rebooted successfully everything
was working and there was no reason to think that this had
taken place.)  This appears to be related to the daylights
savings time change two days before.  The reason I think that is
that the nodes that rebooted have /var/log/messages files like:
Apr   4 08:01:00 nodename CROND ... /cron/hourly
Apr   4 09:01:00 nodename CROND ... /cron/hourly
Apr   4 08:24:33 nodename syslogd 1.4.1; restart
Notice the time shift backwards between the last normal 
record and the first reboot record.
As if it finally caught on that the clock had changed and that
somehow triggered a reboot.  Unfortunately none of the log files
contain a message that indicated exactly what it was that ordered
the reboot.
Unclear to me what piece of software could have triggered this. 
Presumably something that had it's own clock stuck one hour off
on the previous time standard and also has the ability to restart
the system.  ntpd?  Ganglia? They were both running.
Regards,
David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
    
    
More information about the Beowulf
mailing list