dual AMD clusters

Martin Siegert siegert at sfu.ca
Fri Jun 22 16:28:30 PDT 2001


On Fri, Jun 22, 2001 at 03:25:44PM -0600, Art Edwards wrote:
<snip>
> I am still considering whether to go with single or dual CPU motherboards in 
> an upgrade to the current Beowulf. Memory ber CPU is very important for us.
> Have you done benchmarks of single and dual CPU nodes? Are two really twice 
> as fast as one? I will be running MPI-based parallel codes.

I have not done benchmarks on single CPU nodes. We are going to have 70-96
dual processor nodes. With single processor nodes the switch problem becomes
a total nightmare.
However, I have done tests with using just one CPU on each node and compared
with using both CPUs. The answer to your question totally depends on the
type of code you are running (and to make this a fair comparison you should
buy twice as much memory per node for duals as you would buy for singles). 
Also, the criterion is not whether two CPUs run your code twice as fast
as one, but whether two CPUs on a dual run your code faster than two singles.

To give an example:
One of the tests that I am always running is a FFT benchmark. Parallel
FFTs need massive communication between processors (in the matrix transpose
step). The communication is of the worst kind (All-to-All) as it scales
with the square of the # of processors. These FFTs show the whole spectrum
of what you can expect in terms of performance depending on system size:
- If your system size is very large so that latency issues become less
  important, the maximum in the speedup vs. # of procs. curve is actually
  at 4 processors (or more - I can't measure that, yet).
- If you use only two processors, you always get the best speedup, if the
  processors are on the same node as long as everything still fits into
  memory. MPI using shared memory is much, much faster than tcp.
  As soon as the code doesn't fit into memory anymore you are better off
  by using CPU's that are on different nodes.
- for small system sizes the performance is dominated by latency and the
  largest speedup is always obtained by using two processors on the same
  node.

A different example:
Partial differential equations (PDE) using domain decomposition:
In this case communication only scales with the # of nodes and the
best performance depends on the system size exclusively: As long as
the system fits into memory when I use both CPUs in the nodes performance
is slightly worse than when using just one CPU on each node (same number of
procs. in total). This is not because of memory issues, but because of
competition for bandwidth. 

Besides the switch problem with single CPU nodes, I decided that duals give
me the highest flexibility (this cluster is the research computing facility
for the whole university, hence it must support the whole spectrum of
applications, not just one application of a particular research project,
thus YMMV): Very good performance when using just 2 CPUs. When using just
one CPU on each node an application can actually use almost twice as much
memory (assuming that you install twice as much memory on duals as you
would on singles) per process. This would leave the second CPU idle - or
it can be used for Monte-Carlo applications that don't need much memory -
we have lots of that kind of applications.

Cheers,
Martin

========================================================================
Martin Siegert
Academic Computing Services                        phone: (604) 291-4691
Simon Fraser University                            fax:   (604) 291-4242
Burnaby, British Columbia                          email: siegert at sfu.ca
Canada  V5A 1S6
========================================================================




More information about the Beowulf mailing list