[Beowulf] BMW Shifts Supercomputing To Iceland To Save Emissions
hahn at mcmaster.ca
Mon Oct 15 11:07:47 PDT 2012
> Mind you, I'm a huge fan of small clusters under a single person's control,
>where nobody is watching to see if you are making 'effective utilization'
>and you can do whatever you want. A personal supercomputer, as it were.
>But I recognize that for much of the HPC world, clusters are managed in the
>same way as big iron mainframes were in the 70s,
I think you're being a bit disingenuous here. dedicated/personal
clusters are perfectly sensible when the workload is non-bursty
or somehow otherwise high-duty-cycle. or perhaps when you're
talking about resources cheap enough to hand out like pencils.
(that is, let's be honest: cheap enough to waste.)
a larger, shared resource pool is ideal for bursty/low-DS environments.
as far as I can see, there are really only a couple problems with this:
- many people and most environments have a mixture of burstiness.
- schedulers are not awesome at managing latency of either flavor
when both are mixed, especially in the presence of poor resource
requirements (bad runtime estimates, poor memory requirements, etc.)
- resource granularity becomes even more of a problem: serial jobs
"contaminate" nodes for parallel use or high vs low mem, etc.
- very short runtime limits permit more rebalancing of resources,
but are incredibly harmful to most people's productivity.
- preemption (SIG_STOP/CONT) seems to be a relatively little-used
way to optimize for latency - enough so that it simply does not work
right on major non-free schedulers.
- it's hard to get people to treat storage as ephemeral :(
- big resources are also big budget targets :(
More information about the Beowulf