Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] cluster softwares supporting parallel CFD computing

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Eric W. Biederman ebiederm at xmission.com
Fri Sep 8 20:25:16 PDT 2006


Greg Lindahl <greg.lindahl at qlogic.com> writes:

> On Thu, Sep 07, 2006 at 01:15:01PM -0600, Eric W. Biederman wrote:
>
>> I agree.  Taking an interrupt per message is clearly a loss.
>
> Ah. So we're mostly in violent agreement!

That is always nice :)

>> Polling is a reasonable approach for the short durations say 
>> <= 1 milisecond, but it is really weird to explain that you can tell a
>> MPI application has failed to receive a message because it's cpu
>> utilization goes up.  Polling for seconds on end is a very rude thing
>> to do on a multitasking OS.
>
> This is very true. You'll find that many MPI implementations now get
> this right, for example I've seen OpenMPI has a policy where you can
> tell it to poll for a short time and then call yield(). Our MPI has
> this as the default. It's a compromise which doesn't hurt performance
> that often.

Nice.  I guess I just haven't had a chance to see this in action yet.

>> The problem from what I can tell is that latency is fundamental, and mostly
>> an artifact of the card implementation.  We are quickly reaching the
>> point we won't be able to improve latency any more.
>
> This is also very true. That's why we've moved on to attacking message
> rate and short-message bandwidth. Good message rate at high core counts
> is going to be even more important when we get 4 cores / socket.

Sounds right.

>> On the other hand it is my distinction impression the reason there is no
>> opportunity cost from polling is that the applications have not been
>> tuned as well as they could be.  In all other domains of programming
>> synchronous receives are serious looked down upon.  I don't know why
>> that should not apply to MPI codes as well.
>
> It does apply, however, many parallel algorithms used today are
> naturally blocking. Why?  Well, complicating your algorithm to overlap
> communication and computation rarely gives a benefit in practice. So
> anyone who's tried has likely become discouraged, and most people
> haven't even tried.

Could be I have not managed to climb high enough up into the stack
to get a look at a lot of applications yet.

Eric



More information about the Beowulf mailing list