[Beowulf] scheduler recommendations for a HPC cluster
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mark Hahn hahn at mcmaster.caWed Oct 7 16:52:27 PDT 2009
- Previous message: [Beowulf] scheduler recommendations for a HPC cluster
- Next message: [Beowulf] scheduler recommendations for a HPC cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> How does one compare different schedulers, anyways? Is it mostly "word > of mouth" and reputation. Feature sets are good to look at but it's > not really a quantitative metric. Are there any third party > comparisons of various schedulers? Do they have a niche that one > scheduler outperforms another? I've never seen a useful comparison or even discussion of schedulers. as far as I can tell, part of the problem is that the conceptual domain is not developed enough to permit real, general tool-ness. I don't mean that schedulers aren't useful. it's dead simple to throw together a package that lets users submit/control jobs, arbitrate a queue, matches jobs to resources, fire them up, etc. they all do it, and you can write such a system from scratch in literally a few programmer-days if you know what you're doing. it's the details that matter, and that's where existing schedulers, even though they are functional, are not good tools. to me, a good tool has a certain patina of conceptual integrity about it. for instance, what it takes to make a good compiler is well known: we are all familiar with the two interfaces: source language and machine code. there are differences in quality, but for the most part, compilers all work alike. we all are at least somewhat aware of the long history of compilers, littered with the kind of mistakes that bring learning. you know what to expect from a screwdriver or drillpress, though they may differ in size or power. your programming language may be more torx than phillips, or you may prefer a multibit screwdriver. but we all know it needs a comfortable handle, a certain size and rigidity, blade for fitting the screw, etc. schedulers are more like an insanely ramified swiss army knife: feature complete is sometimes detrimental, and extreme featurefulness sometimes means it's guaranteed to not do what you want, only something vaguely in that direction. I think there's a physics-of-software principle here, that features always lead to less flexibility. (that doesn't deny that techniques like refactoring help, but they _only_ introduce a discontinuity between regions of featuritis. if nothing else, the dragging weight of back-compatibility is piecewise-montonic...) > Perhaps my quest for a quantitative metric is stupid. Maybe this is > one of the many areas of technology where things are more qualitative > than quantitative anyways. Price/ performance is always hard to define > but for schedulers this seems impossible. nah. it's a domain problem: what a scheduler should do and how is simply not well-defined, so scheduler companies just go for quantity to win you over. > The other issue seems to be per core licensing. To me it seems as an well, or licensing at all. it would be different if you were paying SchedulerCo to "make everything work the way I want", AND they could actually do it. instead, you pay for the thing they want to sell, and then spend huge amounts of your time fighting it and ultimately erecting shims and scaffolds around it to get it closer to "right". > admin the amount of time and effort one puts in configuring a > scheduler for a 50 core system and a 2000 core system is not grossly > too different (maybe I am wrong?). depends on what you want. if you're undemanding and merely want to hit the "users can submit jobs which eventually run somehow" milestone, then there's certainly no reason to pay for anything, and can expect it to take a competent sysadmin a few hours to set up. ie, the goal is "keep the machineroom warm". maybe my organization is uniquely picky, but I would say that after 5 years elapsed, and probably > 2 staff-years manhandling it, our (commercial) scheduler does maybe 65% of what it should. we're in the process of switching to another (mostly commercial) scheduler, and I expect that with 1-2 staff-years of investment, we'll have it close to 65 or 70%. > The license cost on the other hand > scales with cores. That makes the justification even harder. per-core licensing is just asinine: vendors seem to think that since the government taxes everyone, that's a fine revenue model for them too. they don't consider it from the other direction: that the amount of development and support effort is pretty close to constant per installation (ie, specifically _not_ a function of core count). we can argue about tax/politics/society over beer sometime, but "soak the rich" is not optimal in the marketplace.
- Previous message: [Beowulf] scheduler recommendations for a HPC cluster
- Next message: [Beowulf] scheduler recommendations for a HPC cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
