Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

Beowulf Questions

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Randall Jouett rules at bellsouth.net
Sat Jan 4 04:07:10 PST 2003


Hello again, Donald.

Donald Becker wrote:
> 
> 
> Our cluster philosophy is that the end user should not be required to
> do anything new or special to run a cluster application.


Great. End users and turn-key solutions are always a nice
thing to have in a business-level environment. Rock on, dewd!
:^)


> That means
>    Applications should work even if there is only a single machine in
>    the cluster.  Many beginner MPI applications don't handle this case
>    correctly.

Wow. I would have thought that people would have made plans
to deal with this, especially since something along these lines
can happen, although I'm pretty sure it's rather infrequent.
Go figure.

>    Cluster applications should not require a helper program such as
>    'mpirun' or 'mpiexec'.


In a commerical system, where end users shouldn't and wouldn't
know about such things, I totally agree. OTOH, in a production
environment where that vast majority of users are geekoids, I don't
have a problem with this, especially if mpirun or mpiexec is hidden
by a GUI or something. Since you are doing this as a commercial
endeavor, though, I agree with the way you guys are handling
this, Donald. This lets me and others know that your systems are
well thought out and end-user friendly, and that is something
we all expect when shelling out serious cash for a good
number-cruncher setup.


>The application code should interact with the scheduler to set any
>special scheduling requirements or suggestions.

True. Also, this shouldn't be any big deal, and I'd imagine
this is easily done via shell scripts or a quick C hack,
especially if feel that this type of your code should be
propriatary or something. Personally, I'd want to see something
like this done at the script level, though, so that a geek could
come along and change a few things for tweaks. That's just
me, though. (Shrug.)

> A sophisticated user should still be able to optimize and do clever
> things, but the basic operation shouldn't require any new knowledge.

Agreed.

> 
>>>    It does all of the serial setup and run-time I/O on the front end
>>>      machine (technically, the MPI rank 0 node).  This minimizes
>>>      overall work and keeps the POV-Ray call-out semantics unchanged
>>>    It does the rendering only on compute nodes (except for the N=0 case). 
>>>    It completes the rendering even with crashed or slow nodes.
>>
>>Ah. So it redistributes the work, huh? Kewl.
> 
> 
> Here we use knowledge about the application semantics to implement
> failure tolerence.  When we have idle workers and the rendering isn't
> finished, we send some of the remaining work to the idle machine.


Well, I hate to sound like a knothead here, Donald, and I don't
mean to be rude, but isn't this a defacto setup and standard in
a beowulf environment?? If not, what the hell are people thinking
about? :^) :^). To me, this just seems like the logical way to
write code, but the heck do I know? :^)


> If a machine fails we still finish the rendering and do the final
> call-outs, but don't cleanly terminate.

Ah. Ok. Kewl. Sounds logical to me.


Type at ya' later,

Randall
--
Randall Jouett
Amateur Radio: AB5NI

I eat spaghetti code out of a bit bucket while sitting at a hash table!




More information about the Beowulf mailing list