[Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

Ashley Pittman apittman at concurrent-thinking.com
Wed Aug 13 03:29:05 PDT 2008


On Tue, 2008-08-12 at 12:09 -0600, Craig Tierney wrote:
> Chris Samuel wrote:
> > ----- "I Kozin (Igor)" <i.kozin at dl.ac.uk> wrote:

> > But that assumes you're not sharing a node with other
> > jobs that may well be doing I/O.
> > 
> I am wondering, who shares nodes in cluster systems with
> MPI codes?

In my experience, almost everyone.  In practise though most jobs ask for
even numbers of CPU's so larger jobs rarely get scheduled this way.

>  We never have shared nodes for codes that need
> multiple cores since be built our first SMP cluster
> in 2001.  The contention for shared resources (like memory
> bandwidth and disk IO) would lead to unpredictable code performance.

Unpredictable maybe but if the alternative is to not run at all then
it's still a win.  What you wouldn't want is to have a small number of
processes in a big job sharing a node with a resource hogging job and
slow down the entire big job however I've never seen this happening in
the wild.

> Also, a poorly behaved program can cause the other codes on
> that node to crash (which we don't want).

It goes without saying that this shouldn't be able to happen.

Ashley.




More information about the Beowulf mailing list