[Beowulf] scheduler recommendations for a HPC cluster

Wed Oct 7 16:52:27 PDT 2009

> How does one compare different schedulers, anyways? Is it mostly "word
> of mouth" and reputation. Feature sets are good to look at but it's
> not really a quantitative metric. Are there any third party
> comparisons of various schedulers? Do they have a niche that one
> scheduler outperforms another?

I've never seen a useful comparison or even discussion of schedulers.
as far as I can tell, part of the problem is that the conceptual domain
is not developed enough to permit real, general tool-ness.

I don't mean that schedulers aren't useful.  it's dead simple to throw
together a package that lets users submit/control jobs, arbitrate a queue,
matches jobs to resources, fire them up, etc.  they all do it, and you 
can write such a system from scratch in literally a few programmer-days
if you know what you're doing.  it's the details that matter, and that's 
where existing schedulers, even though they are functional, are not 
good tools.

to me, a good tool has a certain patina of conceptual integrity about it. 
for instance, what it takes to make a good compiler is well known: we 
are all familiar with the two interfaces: source language and machine code.
there are differences in quality, but for the most part, compilers all 
work alike.  we all are at least somewhat aware of the long history of 
compilers, littered with the kind of mistakes that bring learning.
you know what to expect from a screwdriver or drillpress,
though they may differ in size or power.  your programming language may 
be more torx than phillips, or you may prefer a multibit screwdriver.
but we all know it needs a comfortable handle, a certain size and rigidity,
blade for fitting the screw, etc.

schedulers are more like an insanely ramified swiss army knife: feature
complete is sometimes detrimental, and extreme featurefulness sometimes means
it's guaranteed to not do what you want, only something vaguely in that
direction.  I think there's a physics-of-software principle here, that 
features always lead to less flexibility.  (that doesn't deny that techniques
like refactoring help, but they _only_ introduce a discontinuity between
regions of featuritis.  if nothing else, the dragging weight of
back-compatibility is piecewise-montonic...)

> Perhaps my quest for a quantitative metric is stupid. Maybe this is
> one of the many areas of technology where things are more qualitative
> than quantitative anyways. Price/ performance is always hard to define
> but for schedulers this seems impossible.

nah.  it's a domain problem: what a scheduler should do and how is simply
not well-defined, so scheduler companies just go for quantity to win you over.

> The other issue seems to be per core licensing. To me it seems as an

well, or licensing at all.  it would be different if you were paying 
SchedulerCo to "make everything work the way I want", AND they could 
actually do it.  instead, you pay for the thing they want to sell,
and then spend huge amounts of your time fighting it and ultimately 
erecting shims and scaffolds around it to get it closer to "right".

> admin the amount of time and effort one puts in configuring a
> scheduler for a 50 core system and a 2000 core system is not grossly
> too different (maybe I am wrong?).

depends on what you want.  if you're undemanding and merely want to 
hit the "users can submit jobs which eventually run somehow" milestone,
then there's certainly no reason to pay for anything, and can expect it
to take a competent sysadmin a few hours to set up.  ie, the goal is 
"keep the machineroom warm".

maybe my organization is uniquely picky, but I would say that after 5 years
elapsed, and probably > 2 staff-years manhandling it, our (commercial)
scheduler does maybe 65% of what it should.  we're in the process of
switching to another (mostly commercial) scheduler, and I expect that with
1-2 staff-years of investment, we'll have it close to 65 or 70%.

> The license cost on the other hand
> scales with cores. That makes the justification even harder.

per-core licensing is just asinine: vendors seem to think that since 
the government taxes everyone, that's a fine revenue model for them too.
they don't consider it from the other direction: that the amount of 
development and support effort is pretty close to constant per installation 
(ie, specifically _not_ a function of core count).

we can argue about tax/politics/society over beer sometime,
but "soak the rich" is not optimal in the marketplace.