ch_p4 Error -> System Hangs
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Donald Becker becker at scyld.comTue Nov 6 07:55:35 PST 2001
- Previous message: ch_p4 Error -> System Hangs
- Next message: problems with scyld - slave nodes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 6 Nov 2001, Chadalavada Kalyana Krishna wrote: > I am working on a 7 node Linux Cluster ( 6 compute > nodes , 1 FS). What system? (Kernel version, etc.) > system from which the program was started, hung. I > could not trace out the source to any s/w problem or > installation, though I am not sure about it. > > Repeated attempts to run the same resulted in hanging > of n09, n11, n13,n14, n15. I was not able to Ping to > the systems. But, I also do not understand why n10 did > not hang though I ran the program there too. > > Ths display is : > > Code: some numbres. > > Alicee: Killed Interrupt handler You have a kernel crash. Given that it didn't occur on all systems, you should look first for a hardware problem, especially memory corruption. > One important point is that we have configured mpich > to use ssh instead of rsh for communication. This is likely not related to a kernel crash. Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993
- Previous message: ch_p4 Error -> System Hangs
- Next message: problems with scyld - slave nodes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
