questions on SMP/duals/parallel

Mark Hahn hahn at physics.mcmaster.ca
Sun Jan 27 13:19:27 PST 2002


> If i understood well the SMP support in the kernel allows you to run kernel
> process in parallel (ie between your two processor in a dual Xeon/MP AMD)
> but not user jobs, is it correct? (pls, could you suggest didatical
> references/www on SMP features :-))  

any college-level text on computers.

SMP is not restricted to kernel-mode.  if you run two copies of any program
even user-mode, they will probably wind up on different processors.
naturally, a single program can also have multiple "threads", which let
it run on both processors.

> To run a user job (eg. my own coded app.) in parallel in a dual, one node
> system, i still need some kind of message passing code, right? This would

no.  the central property of an SMP machine is shared memory:
threads or even unrelated processes can share memory without message passing.
in general, people use openmp for SMP programming: it permits automatic
loop-parallelization using threads.  (threads are sibling processes that 
share all memory.)  but it's also possible to use separate processes
(such as those created by fork), that communicate via other forms of 
explicit shared memory (such as SysV shm).  an example of this is Gaussian.

> be diferent from running it in parallel in a cluster/various (single
> processor) nodes, since in the dual its a shared memory computing and in
> the later it is a distributed memory? 

yes, "cluster" generally means non-shared/distributed memory.

> Ok, so why are you guys running dual clusters? One direct benefit seems to
> be able to split user job in one proc. and kernel proc in the other (what
> SMP can do, can't it?).

there's no much concurrency between user and kernel space, since in general,
the kernel does little actual work.

> But are you splitting a user job between a dual
> (shared mem) and various nodes (dist. mem)? Which are the benefits? Am i
> missing something? 

just that the performance of SMP is generally higher (lower latency,
higher bandwidth) than even the fastest cluster interconnect 
(message-passing).  naturally, the structure of the program depends 
on how these hardware properties are exploited, but generally, shared
memory tends to be limited to relatively low concurrency (it's hard 
to scale SMP with uniform memory access above 4 or so CPUs.)

regards, mark hahn.





More information about the Beowulf mailing list