Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] More cores/More processors/More nodes?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Michael Creel michael.creel at uab.es
Mon Oct 2 04:21:45 PDT 2006



Greg Lindahl wrote:
> On Thu, Sep 28, 2006 at 07:06:14PM +0100, Peter Wainwright wrote:
> 
> 
>> What (in your opinion) is the right tradeoff between more cores,
>> more processors and more individual compute nodes?
> 
> $/performance.
> 
> Once you have your code written into pure MPI form, then you can run
> on any of the above alternatives. Then you can simply work out
> the price for various things, and make a guess at the performance.
> Run a few benchmarks to check your guesses.
> 
> The general rules work like this:
> 
> * The more cores per node, the less performance per core, due to
>   imperfect scaling plus generally you only have 1 interconnect
>   card/node.
> * Note that most interconnects don't scale very well to more
>   cores per node, for example the "latency" number everyone
>   quotes for interconnects is just 1 core/node. At 4 cores/node
>   this number is much worse for most interconnects.
> * The more cores per node, the price is often higher per core,
>   although this varies. You buy less interconnect, but you pay
>   more for fancier processors and motherboards.
> 
> We talk about a "sweet spot", that's still (in my opinion) 2 dual-core
> cpus per node.
> 
>> However, I do not understand what happens when you have
>> multi-processor/multi-core nodes in a cluster.  Do you just use MPI
>> (with each thread using its own non-shared memory) or is there any
>> way to do "mixed-mode" programming which takes advantage of shared
>> memory within a node (like, an MPI/OpenMP hybrid?).
> 
> The first is the easiest. MPI takes advantage of shared memory within
> the node.
> 
> The hybrid model is a lot more work for the programmer, and often is
> slower than pure MPI. And it hurts interconnect performance because you
> usually end up with just 1 core driving the interconnect.
> 
> -- greg
> 

A claimed record for gflops per dollar, at least a few months ago, was 
set using overclocked Pentium D dual core processors.
http://www.createphpbb.com/parallelknoppix/viewtopic.php?t=104&mforum=parallelknoppix



More information about the Beowulf mailing list