[Beowulf] Scaling issues on Xeon E5-2680
tegner at renget.se
tegner at renget.se
Tue Mar 1 01:02:31 PST 2016
As I wrote before it turned out that only 15 of the 16 slots were popolated with DIMMs. After filling all the slots the job which previously took 724s instead finished in 425s (using all 24 cores on the node in both cases that is). I know before that it is important to place the DIMMS in pairs and in a "balanced" way, but I didn't realize that the performance penalty was that big!
Also, I'm a bit uncertain about infiniband stack, at the moment were are using the RPMs coming with CentOS-7.2, and I tested performance with our "home built" openmpi-1.10.2. Using mpitests-osu_bw and mpitests_osu_latency I recorded a bandwidth of about 6400 MB/s and a latency of just over 1 us. This was over a FDR switch.
Again, thanks for taking the time to answer, much appreciated!
On 29 February 2016 16:55:47 +01:00, Prentice Bisbal <pbisbal at oit.rutgers.edu> wrote:
> As others have said, there's a million things that could be going on here.
> What InfiniBand software stack are you using? Are you using the RPMs that come with CentOS 7, or are you using the lated version of OFED downloaded directly from Mellanox. For the past year or so, I've been hearing that the distro-provided RPMS perform much worse than the Mellanox-provided packages, but I haven't had the opportunity to test that myself.
> When you set up the InfiniBand stack, there's usually ulimits you need to tune and I think some kernel parameters. Have you done that on the new system? I think OpenMPI will report an error if these changes aren't made, but I'm not 100% sure. If you use the distro RPMS, I don't think these changes are made automatically.
> Also, did you configure OpenMPI so that it uses IB for the BTL instead of TCP? That would be an easy step to overlook when setting up a new system. I just checked the OpenMPI FAQ, and it says OpenMPI should now detect IB automatically and use that instead of TCP, but I would explicitly telling OpenMPI to not use TCP as a BTL, and see if that changes anything. If it's not automatically detecting IB correctly, that should cause it to through an error.
> On 02/28/2016 10:27 AM, Jon Tegner wrote:
> > Hi,
> > have issues with performance on E5-2680. Each of the nodes have 2 of these 12 core CPUs on SuperMicro SuperServer 1028R-WMR (i.e., 24 cores on each node).
> > For one of our applications (CFD/OpenFOAM) we have noticed that the calculation runs faster using 12 cores on 4 nodes compared to when using 24 cores on 4 nodes.
> > In our environment we also have older AMD hardware (nodes with 4 CPUs with 12 cores each), and here we don't see these strange scaling issues.
> > System is CentOS-7, and communication is over FDR Infiniband. BIOS is recently updated, and hyperthreading is disabled.
> > Feel a bit lost here, and any hints on how to proceed with this are greatly appreciated!
> > Thanks,
> > /jon
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit <http://www.beowulf.org/mailman/listinfo/beowulf>
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit <http://www.beowulf.org/mailman/listinfo/beowulf>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf