vkaryo at hotmail.com
Fri Sep 15 16:17:07 PDT 2000
Is there a technique to handle node failure? Shortly, I'll be working on an
that algorithm is naturally parallel and divided into course-grain "blocks".
I want to use a master/worker scheme. The master is to be set to reissue
blocks if the block doesn't return from the worker fast enough on the
assumption the node has failed. I know I can't rejoin a node after it
fails, but if the node fails will the whole app die?
Also, is there a way to detect the number of nodes other than at
initialization, so I can tell if a node has died?
(I plan on using MPI-Pro on a RH6.2 8-way single-proc Intel cluster with
100mbps switched ethernet.)
There are some efforts to build fault tolerating MPI's, but standard
MPI-1.x is supposed to kill the parallel application if a node dies,
or else the underlying system must transparently solve the fault.
Anthony Skjellum, PhD, President (tony at mpi-softtech.com)
MPI Software Technology, Inc., Ste. 33, 101 S. Lafayette, Starkville, MS
+1-(662)320-4300 x15; FAX: +1-(662)320-4301; http://www.mpi-softtech.com
"Best-of-breed Software for Beowulf and Easy-to-Own Commercial Clusters."
On Thu, 14 Sep 2000, Horatio B. Bogbindero wrote:
> what happens if a node in MPI dies? is the entire computation lost?
> william.s.yu at ieee.org
> I bought some used paint. It was in the shape of a house.
> -- Steven Wright
> Beowulf mailing list
> Beowulf at beowulf.org
Beowulf mailing list
Beowulf at beowulf.org
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.
Share information about yourself, create your own public profile at
More information about the Beowulf