[Beowulf] Mutiple IB networks in one cluster
prentice.bisbal at rutgers.edu
Fri Jan 31 08:35:16 PST 2014
On 01/31/2014 04:42 AM, Bogdan Costescu wrote:
> On Thu, Jan 30, 2014 at 5:33 PM, Prentice Bisbal
> <prentice.bisbal at rutgers.edu> wrote:
>> IB has more than enough bandwidth for
>> message-passing and I/O.
> Bandwidth is not the only measure for performance. Latency and number
> of messages per unit of time that can be handled are others.
> Especially the latter was the subject of a heated debate a few years
> ago on this list.
I know. Last time I checked (and to be honest, I don't keep track of all
the performance metrics every day),
IB has latency 1/7 - 1/8 that of 10 GbE, and Bandwidth is roughly the
same. I don't know anything about messaging rates. Since my question is
specific to IB, discussing other networking technologies is not relevant
to my situation.
> I would suggest first finding out whether there is a real need of such
> a setup and especially quantifying the need. F.e. do the applications
> perform simultaneously message-passing and I/O ? This could be the
> case with non-blocking MPI calls overlapping with classical or even
> MPI I/O.
Finding whether there is justification for this setup is exactly why I'm
asking everyone here for their opinions. Personally, I don't think
there's a need or justification for this, but I want hard facts,
studies, or whitepapers to confirm or disprove my opinion. I don't have
any historical usage data to go on.
There's no point in discussing the characteristics of a specific
application - this cluster will be available to many different
researchers in a large university, so studying or optimizing for a small
handful of applications will have no real value.
> I could also see some effects for the (many) cases where the existing
> IB uses a fat-tree topology and the parallel job uses some overloaded
> IB links. But, as this was a theoretical question and mine a
> theoretical answer, I'll stop here :)
More information about the Beowulf