[Beowulf] Re: Cluster newbie, power recommendations

Tue Mar 21 10:13:12 PST 2006

On Tue, 21 Mar 2006, David Mathog wrote:

> I wonder if one couldn't set up a single modern computer,
> with a fast CPU and tons of memory, as N virtual machines, for
> instance using VMware, and then run a sort of virtual cluster?
> Obviously there wouldn't be any performance advantage to doing
> this but it might allow the subject to be taught when a real
> clusters isn't available.
...
> Seems doable.  Maybe even worth doing so that this subject could
> be more easily taught.

I agree, sort of, except that vmware is expensive and introduces
nonlinearities of its own into the simulation that are likely to be
"different" from the nonlinearities of a real parallel system.

I did something a BIT like this in one of my CWM articles.  I used perl
and threads (yes, perl now has threads) to write a very, very poor man's
task distribution shell that farmed out N copies of a "task" to N
machines, then collected the results.  The task was "generating random
numbers", and because this of course generally does NOT scale well as
the compute to communication ratio is too low, I put in an adjustable
(sleep) delay per rand returned, making them cost "more cpu" on the
nodes relative to communication via ssh back to the controlling perl
script.  Fiddle with it a bit, and you can generate curves that
demonstrate scaling very nicely.

I discovered that it really didn't matter if the tasks were distributed
on more real nodes or sshd many times to the master node.  The sleep
introduced "parallel delays" by permitting the threads to advance
independently with ms resolution or thereabouts.  Alas, I couldn't
exactly control the network speed as easily, but...

...this is a good way to think about doing this in general.  A hack of
my original script would fork off N threads on a SINGLE host, and would
introduce a set of parameters that basically permit one to adjust the
computational granularity and the compute:communicate time ratios.  This
still wouldn't reproduce the nonlinearities of a real parallel process
perfectly -- no cache effects, no latency vs bandwidth (at least not
without further hacking and a rescaling of time), no memory I/O binding
-- but it certainly suffices to show the general properties of scaling
curves and how they depend on e.g. serial vs parallel fractions and the
compute:communication ratios.  Makes decent graphs of same, with a bit
of work.

I've still got the code around if anybody wants it -- it might be up on
the Monkey website as well -- the article is there I'm pretty sure but
Doug was going to work out a way of posting the supporting scriptware
and I don't know if he ever did that.  Doug?

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu