Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

ch_p4 Error -> System Hangs

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Donald Becker becker at scyld.com
Tue Nov 6 07:55:35 PST 2001


On Tue, 6 Nov 2001, Chadalavada Kalyana Krishna wrote:

> I am working on a 7 node Linux Cluster ( 6 compute
> nodes , 1 FS).

What system?  (Kernel version, etc.)

> system from which the program was started, hung. I
> could not trace out the source to any s/w problem or
> installation, though I am not sure about it.
> 
> Repeated attempts to run the same resulted in hanging
> of n09, n11, n13,n14, n15. I was not able to Ping to
> the systems. But, I also do not understand why n10 did
> not hang though I ran the program there too.
> 
> Ths display is :
> 
> Code: some numbres.
> 
> Alicee: Killed Interrupt handler

You have a kernel crash.  Given that it didn't occur on all systems, you
should look first for a hardware problem, especially memory corruption.

> One important point is that we have configured mpich
> to use ssh instead of rsh for communication.

This is likely not related to a kernel crash.

Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993




More information about the Beowulf mailing list