[Beowulf] Q: IB message rate & large core counts (per node)?

Joe Landman landman at scalableinformatics.com
Fri Feb 19 10:47:07 PST 2010


Brian Dobbins wrote:
> 
> Hi guys,
> 
>   I'm beginning to look into configurations for a new cluster and with 
> the AMD 12-core and Intel 8-core chips 'here' (or coming soonish), I'm 
> curious if anyone has any data on the effects of the messaging rate of 
> the IB cards.  With a 4-socket node having between 32 and 48 cores, lots 
> of computing can get done fast, possibly stressing the network. 

The big issue will be contention for the resource.  As you scale up the 
number of requesters, if the number of resources don't also scale up 
(even vitualized non-blocking HCA/NICs are good here), you could hit a 
problem at some point.

>   I know Qlogic has made a big deal about the InfiniPath adapter's 
> extremely good message rate in the past... is this still an important 
> issue?  How do the latest Mellanox adapters compare?  (Qlogic documents 
> a ~30M messages processsed per second rate on its QLE7342, but I didn't 
> see a number on the Mellanox ConnectX-2... and more to the point, do 
> people see this effecting them?)

We see this on the storage side.  Massive oversubscription of resources 
leads to contention issues for links, to ib packet requeue failures 
among other things.

> 
>   On a similar note, does a dual-port card provide an increase in 
> on-card processing, or 'just' another link?  (The increased bandwidth is 
> certainly nice, even in a flat switched network, I'm sure!)

Depends.  If the card can talk to the PCIe bus at full speed, you might 
be able to saturate the link with a single QDR port.  If your card is 
throttled for some reason (we have seen this) then adding the extra port 
might or might not help.  If you are at the design stage, I'd suggest 
"go wide" as you can ... as many IB HCAs as you can get to keep the 
number of ports/core as high as reasonable.

Of course I'd have to argue the same thing on the storage side :)

>   I'm primarily concerned with weather and climate models here - WRF, 
> CAM, CCSM, etc., and clearly the communication rate will depend to a 
> large degree on the resolutions used, but any information, even 'gut 
> instincts' people have are welcome.  The more info the merrier.
> 
>   Thanks very much,
>   - Brian
> 
-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615



More information about the Beowulf mailing list