Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] VMC - Virtual Machine Console

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Jeffrey B. Layton laytonjb at charter.net
Wed Jan 16 07:31:11 PST 2008


Douglas Eadline wrote:
> I get the desire for fault tolerance  etc. and I like the idea
> of migration. It is just that many HPC people have spent
> careers getting applications/middleware as close to the bare
> metal as possible. The whole VM concept seems orthogonal to
> this goal. I'm curious how people are approaching this
> problem.
>   
Like many things, the devil is in the details. While I don't want to be as
prodigious as rgb, I want to mention a few things and ask some questions:

- With multi-core processors, to get the best performance you want to
   assign a process to a core. But this can cause problems when moving
   a process or creating a checkpoint. For example VMware explicitly
   tells you not to do this. While I can't state their position, in 
general the
   idea is that restarting a check-pointed VM may have problems when
   a process is pinned to a core (even more so if the CPU is different).
   Also, moving a pinned process to another node may cause problems
   if the nodes is different in pretty much any way (it may also be affected
   by what's on the new node).

- As Ashley pointed out, the network aspect is still very problematic.
   Getting good performance out of a NIC in a VM is not easy and from
   what I understand difficult or impossible to do with multi-core nodes
   (I would love to hear if someone has gotten very good performance out
   of a NIC in a VM when other VM's are also using the same NIC. Please
   give as many details as possible)

- As Meng mentioned, IO is still problematic (I think for the same reasons
   that interconnects are).

- I haven't seen any benchmarks run in VM's using several nodes with
   an interconnect. Does anyone know of any?

- Has anyone tried moving processes around to different nodes for an
   MPI job? I'm curious what they found.


I would like to see virtualization take off in HPC, but I have to see a few
demos of things working and I need to see reasons why I should adopt
it. Right not I don't relish taking my "High" Performance Computing
system and turning it into "Kind-of-High" Performance Computing because
it would allow non-code specific checkpointing or movement of processes.
Losing 10% in performance, for example, in HPC is a big deal, and I haven't
yet seen the benefits of virtirualization for giving up the 10% (I'm 
dying to
be shown to be wrong though).

The only aspect of virtualization that could make some sense in HPC is
what rgb mentioned - allowing the user to select and OS as part of their
job and installing or tearing down the OS as part of the job. I can see this
being very useful if the details could be worked out (I know there are 
people
working on it but I haven't seen any large demonstrations of it yet and I
would really like to see such a beastie).

Anyway, my 2 cents (and probably my last since this topic falls under
Landman's Rule: of flammability).

Jeff




More information about the Beowulf mailing list