[Beowulf] Can one Infiniband net support MPI and a parallel file system?

Jason Clinton jclinton at advancedclustering.com
Wed Aug 6 11:31:09 PDT 2008


On Tue, Aug 5, 2008 at 4:25 PM, Gus Correa <gus at ldeo.columbia.edu> wrote:
> Is anybody using Infiniband to provide both
> MPI connection and parallel file system services on a Beowulf cluster?
>
> I thought to have a storage node that would
> serve a parallel file system to the beowulf nodes over IB
> (something like a NFS on steroids).
> The same IB net would also work as the MPI interconnect.
>
> Is this design possible?

We have customers doing Lustre and MPI with IB successfully. They
still have a good-old gigabit management network to fall back on: it
makes sense to keep this around because gigabit is so low-cost by
comparison and it's rock-solid. But, you should know that you need
more than a single node to provide disk I/O before you start to see
the performance benefit. I/O from a single node can--generally--barely
fill a gigabit link. To exceed that gigabit level of performance,
you'd need more than one storage node delivering storage to the Lustre
network.


> On a small cluster, does it require two separate IB physical networks (cards
> and switch),
> or can it be done with a single IB card per node and one switch?

It can be done with a single IB network.


> Is this design efficient?

Generally speaking, MPI programs will not be fetching/writing data
from/to storage at the same time they are doing MPI calls so there
tends to not be very much contention to worry about at the node level.


> Are there other practical and  cost effective alternatives to this idea?

If the cluster is small enough, using gigabit with a shared filesystem
is preferred since IB's low latency has relatively little affect on
the big source of latency in any storage system: the physical disks.
It's not until you cross the gigabit bandwidth barrier that IB really
starts to make sense--and that's a barrier that's not crossed that
often in a small cluster.


> Would this type of design work with GigE instead of IB?

Yes, but you'd still want IB for low latency MPI traffic.


> I confess I know nothing about parallel file systems and IB.
> So, please forgive me if my questions are nonsense.

Lustre and Panassas are certainly both stable options in this area.

--
Jason D. Clinton
Advanced Clustering Technologies, Inc.



More information about the Beowulf mailing list