[Beowulf] Re: Problems scaling performance to more than one node, GbE
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Peter Kjellstrom cap at nsc.liu.seMon Feb 23 10:14:55 PST 2009
- Previous message: [Beowulf] Re: Problems scaling performance to more than one node, GbE
- Next message: [Beowulf] Tracing down 250ms open/chdir calls
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tuesday 17 February 2009, Bogdan Costescu wrote: > On Mon, 16 Feb 2009, Tiago Marques wrote: > > I must ask, doesn't anybody on this list run like 16 cores on two > > nodes well, for a code and job that completes like in a week? > > For GROMACS and other MD programs, the way a job runs depends on a lot > of factors that define the simulation: the size of the molecular > system, the force field in use, the cutoff distances, etc. ... > I have found several MD codes to scale rather poorly when used on > clusters composed of 8-core nodes, especially when those 8 cores are > coming from 2 quad-core Intel CPUs; A data-point from some testing I've done with gromacs-4.0.2 on dual-quad Clovertown nodes using the 4 classical test cases from gmxbench: villin and poly-ch: 7.1x speed up on one node (compared to using one core) dppc and lzm: superscalar at 8.9x and 8.2x. Gromacs-4.0.2 seems to be able to almost fully use the four extra cores even on a memory bandwidth choked node. > the poor scaling was also with > InfiniBand (Mellanox ConnectX), so IB will not magically solve your > problems. I've also done some scaling tests on our IB (ConnectX). Gromacs scales (for the quite small lzm case) as follows: 1 node 8 ranks: 8.2x 2 nodes 16 ranks: 15x 4 nodes 32 ranks: 25x 8 nodes 64 ranks: 38x Using ethernet (disclaimer: our ethernet isn't really super-tuned since we mostly run MPI on IB) I got 10x speed up on two nodes using 16 ranks (I didn't try using more nodes). Hopefully someone found this late post worth reading, Peter > The setup that seemed to me like a good compromise was with > 4-core nodes, when these 4 cores come from 2 dual-core CPUs, > associated with Myrinet or IB. > > You have to understand that, the way most MD programs are done this > days, the MD simulations of small molecular systems are simply not > going to scale, the communication dominates the total runtime. > Communication through shared memory is still the best way to scale > such a job, so having a node with as many cores a possible and running > a job to use all of them is probably going to give you the best > performance. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. Url : http://www.scyld.com/pipermail/beowulf/attachments/20090223/9ceac50f/attachment.bin
- Previous message: [Beowulf] Re: Problems scaling performance to more than one node, GbE
- Next message: [Beowulf] Tracing down 250ms open/chdir calls
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
