Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Node Drop-Off

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mark Hahn hahn at physics.mcmaster.ca
Sun Nov 12 21:15:19 PST 2006


> I have a compute node that has started dropping off.  When I say drop off, I 
> mean the node (while running a job) will lose all connectivity and the 
> machine does not respond.  I have viewed the logs and can find no reason for 
> the node to cease functioning.

if you connect a console to such a node, is it simply panic'ed?

> Has anyone ever seen such behavior?

I have the occasional node which turns itself off under load.
the IPMI reports power being off, so it's distinct from panics.
the IPMI system-error-log doesn't show any reason.

we (and the vendor) regard this as grounds for repair (usually
the power supply).

regards, mark hahn.



More information about the Beowulf mailing list