[Beowulf] VMC - Virtual Machine Console
Jeffrey B. Layton
laytonjb at charter.net
Wed Jan 16 07:31:11 PST 2008
Douglas Eadline wrote:
> I get the desire for fault tolerance etc. and I like the idea
> of migration. It is just that many HPC people have spent
> careers getting applications/middleware as close to the bare
> metal as possible. The whole VM concept seems orthogonal to
> this goal. I'm curious how people are approaching this
Like many things, the devil is in the details. While I don't want to be as
prodigious as rgb, I want to mention a few things and ask some questions:
- With multi-core processors, to get the best performance you want to
assign a process to a core. But this can cause problems when moving
a process or creating a checkpoint. For example VMware explicitly
tells you not to do this. While I can't state their position, in
idea is that restarting a check-pointed VM may have problems when
a process is pinned to a core (even more so if the CPU is different).
Also, moving a pinned process to another node may cause problems
if the nodes is different in pretty much any way (it may also be affected
by what's on the new node).
- As Ashley pointed out, the network aspect is still very problematic.
Getting good performance out of a NIC in a VM is not easy and from
what I understand difficult or impossible to do with multi-core nodes
(I would love to hear if someone has gotten very good performance out
of a NIC in a VM when other VM's are also using the same NIC. Please
give as many details as possible)
- As Meng mentioned, IO is still problematic (I think for the same reasons
that interconnects are).
- I haven't seen any benchmarks run in VM's using several nodes with
an interconnect. Does anyone know of any?
- Has anyone tried moving processes around to different nodes for an
MPI job? I'm curious what they found.
I would like to see virtualization take off in HPC, but I have to see a few
demos of things working and I need to see reasons why I should adopt
it. Right not I don't relish taking my "High" Performance Computing
system and turning it into "Kind-of-High" Performance Computing because
it would allow non-code specific checkpointing or movement of processes.
Losing 10% in performance, for example, in HPC is a big deal, and I haven't
yet seen the benefits of virtirualization for giving up the 10% (I'm
be shown to be wrong though).
The only aspect of virtualization that could make some sense in HPC is
what rgb mentioned - allowing the user to select and OS as part of their
job and installing or tearing down the OS as part of the job. I can see this
being very useful if the details could be worked out (I know there are
working on it but I haven't seen any large demonstrations of it yet and I
would really like to see such a beastie).
Anyway, my 2 cents (and probably my last since this topic falls under
Landman's Rule: of flammability).
More information about the Beowulf