[Beowulf] cloudy HPC?
hahn at mcmaster.ca
Fri Jan 31 07:30:05 PST 2014
>> by "HPC services", I mean a very heterogenous mixture of serial, bigdata,
>> fatnode/threaded, tight-coupled-MPI, perhaps
>> even GP-GPU stuff from hundreds of different groups, etc.
> What would running in VMs bring as an advantage to these jobs relative
> to running on bare-metal ?
I had hoped to make that clear in other sections of my message:
it would split the responsibility into one organization concerned
only with hardware capital and operating costs, and another group
that does purely os/software/user support. Compute Canada's current
funding catastrophe is based on distrust of the HPC organizations
by the funders - they seem to think we're greedy, turf-driven wastrels.
>> but VM infrastructure
>> like KVM can give device ownership to the guest, so IB access *could* be
> So you start a VM and you assign an IB card to it. What do you assign
> to another VM running at the same time on the same node?
we wouldn't, obviously. owning an IB card is only relevant for an
MPI program, and one that is pretty interconnect-intensive. such jobs
could simply be constrained to operatin in multiples of nodes.
(this is common in existing HPC centers, though not universal. because
my organization, Sharcnet, is grotesquely in need of HW refresh,
we try to squeeze every cycle out of our hardware, and always share
nodes. we have many nodes in production since 2006, and a few since 2003.)
> If there can
> only be one job accessing the IB card bare-metal, is it worth using
> VMs at all ?
I don't know why you ask that. I'm suggesting VMs as a convenient way
of drawing a line between HW and SW responsibilities, for governance
reasons. though it's true that this could all be done bare-metal
(booting PXE is a little clumsier than starting a VM or even container.)
and that many jobs don't do anything that would stress the interconnect
(so could survive with just IP provided by the hypervisor.)
thanks, mark hahn.
More information about the Beowulf