Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Lowered latency with multi-rail IB?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Puthanveettil Prabhakaran Prajeev prajeev at tuxcentrix.com
Thu Mar 26 23:35:47 PDT 2009


http://www.penguincomputing.com/cluster_computing

Can the above be of any help to you ?

Regards
Prajeev

On Fri, Mar 27, 2009 at 11:16 AM, Dow Hurst DPHURST <DPHURST at uncg.edu>wrote:

> To: beowulf at beowulf.org
> From: Greg Lindahl <lindahl at pbm.com>
> Sent by: beowulf-bounces at beowulf.org
> Date: 03/27/2009 12:03AM
> Subject: Re: [Beowulf] Lowered latency with multi-rail IB?
>
> On Thu, Mar 26, 2009 at 11:32:23PM -0400, Dow Hurst DPHURST wrote:
>
> > We've got a couple of weeks max to finalize spec'ing a new cluster.  Has
> > anyone knowledge of lowering latency for NAMD by implementing a
> > multi-rail IB solution using MVAPICH or Intel's MPI?
>
> Multi-rail is likely to increase latency.
>
> BTW, Intel MPI usually has higher latency than other MPI
> implementations.
>
> If you look around for benchmarks you'll find that QLogic InfiniPath
> does quite well on NAMD and friends, compared to that other brand of
> InfiniBand adaptor. For example, at
>
> http://www.ks.uiuc.edu/Research/namd/performance.html
>
> the lowest line == best performance is InfiniPath. Those results
> aren't the most recent, but I'd bet that the current generation of
> adaptors has the same situation.
>
> -- Greg
> (yeah, I used to work for QLogic.)
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
> I'm very familiar with that benchmark page.  ;-)
>
> One motivation for designing a MPI layer to lower latency with multi-rail
> is when making use of accelerator cards or GPUs.  There is so much more work
> being done that the interconnect quickly becomes the limiting factor.  One
> Tesla GPU is equal to 12 cores for the current implementation of NAMD/CUDA
> so the scaling efficiency really suffers.  I'd like to see how someone could
> scale efficiently beyond 16 IB connections with only two GPUs per IB
> connection when running NAMD/CUDA.
>
> Some codes are sped up far beyond 12x and reach 100x such as VMD's cionize
> utility.  I don't think that particular code requires parallelization (not
> sure).  However, as NAMD/CUDA is tuned, the efficiency on the GPU is
> increased, and new bottlenecks found and fixed from previously ignored
> sections of code, there will be even more than a 12x speedup.  So, a
> solution to the interconnect bottleneck needs to be developed and I wondered
> if multi-rail would be the answer.  Thanks so much for your thoughts!
> Best wishes,
> Dow
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20090327/c1a93109/attachment.html


More information about the Beowulf mailing list