[scyld-users] Re: Scyld system mysteriously locks up

Tim Whitcomb twhitcomb at apl.washington.edu
Thu Mar 18 12:20:01 PST 2004


 > I purchased a 4 node, 8 processor Scyld (version 28) cluster
 > approximately 6 months ago.  About 5 days ago, it started mysteriously
 > locking up on me.  Once it is locked up, I can't do anything except
 > physically reboot the machine.
 > Unfortunately, I am rather new to Linux clusters and, since it worked
 > "right out of the box", I have had no experience in troubleshooting.
 > Can someone give me an idea of where I should start?
 > I have the BIOS on all machines set to do a full memory check on startup
 > and the /var/log/message file shows nothing.
 > Thanks,
 > Eric

This sounds suspiciously like a problem we've been fighting for the past 
year at least.  Are the machines actively running a job when they lock 
up or are they sitting idle?  I've done some tests that seem to suggest 
that our system does not like the same job being run on both processors 
of the same machine.  Where did you purchase your equipment from, what 
kind of processors are in it, what kind of interconnect are you using, 
and what is the motherboard in the machines?

TRW

Timothy R. Whitcomb
===================
Meteorologist
Applied Physics Lab
University of Washington
mail: 	twhitcomb at apl dot washington dot edu
voice:	(206) 543-2663



More information about the Scyld-users mailing list