Mysterious kernel hangs

Alan Ward award at
Thu Mar 15 06:35:35 PST 2001

It may seem simplistic, but have you any reason to think 
your machines aren't simply overheating?

There can be a lot of Joules going 'round in a dual box. Try
them out at say, 800 MHz, see if there's a difference. Idem
the case open.

Best regards,
Alan Ward

Felix Rauch ha escrit:
> We recently bought a new 16 node cluster with dual 1 GHz PentiumIII
> nodes, but machines mysteriously freeze :-(
> The nodes have STL2 boards (Version A28808-301), onboard adaptec SCSI
> controllers (7899P), onboard intel Fast Ethernet adapters (82557
> [Ethernet Pro 100]) and additional Packet Engines Hamachi GNIC-II
> Gigabit Ethernet cards.
> We tried kernels 2.2.x, 2.4.1 and now even 2.4.2-ac20, but it seems to
> be the same problem with all kernels: When we run experiments which
> use the network intensively, any of the machines will just freeze
> after a few hours. The frozen machine does not respond to anything and
> up to now we were not able to see any log-entries related to the
> freeze on virtual console 10 :-(   We switched now on all the "Kernel
> Hacking" stuff in the kernel configuration (especially the logging)
> and we will try again, hopefuly we will at least see some log outputs.
> The freezes do also happen if we let non-network-intensive jobs run on
> the machines (e.g. SETI at home), but clearly they happen less often.
> Does anyone of you have any ideas what could go wrong or what we could
> try to find the cause of the problems?
> Regards,
> Felix
> --
> Felix Rauch                      | Email: rauch at
> Institute for Computer Systems   | Homepage:
> ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
> CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307
> _______________________________________________
> Beowulf mailing list, Beowulf at
> To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list