[Beowulf] How to configure a cluster network

Mark Hahn hahn at mcmaster.ca
Fri Jul 25 07:04:57 PDT 2008


>>  virtualization is a throughput thing.
>
> Mark, Please can you clarify what you mean by 'throughput'

sorry, I don't whether the use of that term is widespread or not.
what I mean is that with some patterns of use, the goal is just
to jam through as many serial jobs per day, or to transfer 
as many GBps over a link as possible.  these are operations that 
can be overlapped, and which are not, individually, latency-sensitive.
to me, throughput computing is a lot like handling fungible commodities:
jobs by the ton.

being lat-tolerant is nice, since it means the system can schedule 
differently.  for instance, if the serial jobs spend any time with 
the cpu idle (blocked on IO for instance), you can profitably overcommit
your cpus (run slightly more processes than cpus).  you can gain by
overlapping.

similarly, virtualization is all about overlapping low duty-cycle jobs.
it does bring something new to the table: being able to provision a node
with a completely new environment without dealing with the time overhead
of booting on bare metal.  it's unclear to me whether that's a big deal - 
I cringe at the thought of offering our users their own choice of OS 
and distro.  using VM's would isolate jobs better, so that they couldn't
see that they were, for instance, sharing a node, but I don't think it 
would greater insulate against performance intrusions (for instance, if 
someone is consuming all the memory bandwidth, it'll still be noticed.)

virtualization is a pretty basic part of "cloud computing" grids, though,
where you specifically want to mask users from each other, and where,
by virtue of being internet apps, processes do a lot of waiting.

regards, mark hahn.



More information about the Beowulf mailing list