MPI dies

Victor Karyo vkaryo at hotmail.com
Fri Sep 15 16:17:07 PDT 2000


Is there a technique to handle node failure?  Shortly, I'll be working on an 
that algorithm is naturally parallel and divided into course-grain "blocks". 
  I want to use a master/worker scheme.  The master is to be set to reissue 
blocks if the block doesn't return from the worker fast enough on the 
assumption the node has failed.  I know I can't rejoin a node after it 
fails, but if the node fails will the whole app die?

Also, is there a way to detect the number of nodes other than at 
initialization, so I can tell if a node has died?

(I plan on using MPI-Pro on a RH6.2 8-way single-proc Intel cluster with 
100mbps switched ethernet.)

Thanks
Victor Karyo.




There are some efforts to build fault tolerating MPI's, but standard
MPI-1.x is supposed to kill the parallel application if a node dies,
or else the underlying system must transparently solve the fault.


Anthony Skjellum, PhD, President (tony at mpi-softtech.com)
MPI Software Technology, Inc., Ste. 33, 101 S. Lafayette, Starkville, MS 
39759
+1-(662)320-4300 x15; FAX: +1-(662)320-4301; http://www.mpi-softtech.com
"Best-of-breed Software for Beowulf and Easy-to-Own Commercial Clusters."

On Thu, 14 Sep 2000, Horatio B. Bogbindero wrote:

 >
 > what happens if a node in MPI dies? is the entire computation lost?
 >
 >
 > ---------------------
 > william.s.yu at ieee.org
 >
 > I bought some used paint. It was in the shape of a house.
 > 		-- Steven Wright
 >
 >
 >
 > _______________________________________________
 > Beowulf mailing list
 > Beowulf at beowulf.org
 > http://www.beowulf.org/mailman/listinfo/beowulf
 >


_______________________________________________
Beowulf mailing list
Beowulf at beowulf.org
http://www.beowulf.org/mailman/listinfo/beowulf

_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.

Share information about yourself, create your own public profile at 
http://profiles.msn.com.





More information about the Beowulf mailing list