[Beowulf] Naive question: mpi-parallel program in multicore CPUs

Wed Oct 3 07:31:38 PDT 2007

Nathan Moore wrote:
> My understanding is that on a multi-core machine, mpi communication
> routines (MPI_SEND, etc) are implemented as memory copy instructions. 
> Accordingly, message passing within a multi-core node should be very
> fast compared to your present cluster. 
> 
> That said, It seems like all the performance benchmarks suggest that
> dual-core chips have the performances of 1.5-1.7 single core chips, so
> for the same number of nodes (defined as a CPU core) you wouldn't see
> the same output.
> 
> All of this course depends on the structure of the code, memory usage,
> etc - these are just scaling estimates on my part.
> 
> regards,
> 
> Nathan
<snip>

Aren't there two ways it's done? IARC, MPICH2 has
(1) standard ch3
TCP/IP for all coms, but local coms bounced off a loopback interface
(2) Nemesis
TCP/IP for inter node coms, but intra node coms use shared memory.

And dual core chip performance is, again, totally dependent on the code
in question. If the code is highly pipelined and can run in cache, an
embarrassingly parallel algorithm will scale linearly with cores. Code
limited by memory bandwidth, I/O, or (in a cluster) network performance
will not scale as well.

-- 
Geoffrey D. Jacobs

To have no errors
  would be life without meaning
  No struggle, no joy