[Beowulf] Mutiple IB networks in one cluster

Jeff Becker jeffrey.c.becker at nasa.gov
Mon Feb 3 09:39:06 PST 2014


Hi Prentice.

On 01/30/2014 08:33 AM, Prentice Bisbal wrote:
> Beowulfers,
>
> I was talking to a colleague the other day about cluster architecture
> and big data, and this colleague was thinking that it would be good to
> have two separate FDR IB clusters within a single cluster: one for
> message-passing, and the other purely for data movement. I'm a bit
> skeptical of this myself. I was always under the impression that IB
> has more than enough bandwidth for message-passing and I/O. I have
> some questions about this idea:
>
> 1. Does this make sense?
>
> 2. Does anyone have first hand experience with doing this, or can
> point me to someone who does (articles on line, papers on the topic
> will suffice)?

We use two fabrics on our Pleiades cluster at NASA. It is typically used 
as you propose, message passing on one fabric, I/O (NFS, Lustre) on the 
other. However, jobs can request both rails be used for message passing 
- in this case, message passing traffic could contend with I/O.

>
> 3. Would the present any issues for managing the fabric? I know IB is
> designed to detect loops automatically, but what about making sure
> certain traffic stays on certain IB interfaces.

Each fabric is disjoint from the other, and has its own subnet (manager).

>
> 4. Since IB uses cross-bar switches (please correct me if I'm wrong),
> we wouldn't need to duplicate switchgear, just double IB connections
> on each host, correct?
>

If you have separate subnets, you probably need separate switches for 
each fabric.

Hope this helps,

-jeff




More information about the Beowulf mailing list