[Beowulf] HPC workflows
griznog at gmail.com
Mon Dec 3 10:12:10 PST 2018
On Fri, Nov 30, 2018 at 9:44 PM John Hearns via Beowulf <beowulf at beowulf.org>
> John, your reply makes so many points which could start a whole series of
I would not deny partaking of the occasional round of trolling.
> > Best use of our time now may well be to 'rm -rf SLURM' and figure out
> how to install kubernetes.
> My own thoughts on HPC for a tightly coupled, on premise setup is that we
> need a lightweight OS on the nodes, which does the bare minimum. No general
> purpose utilities, no GUIS, nothing but network and storage. And container
> The cluster will have the normal login nodes of course but will present
> itself as a 'black box' to run containers.
> But - given my herd analogy above - will we see that? Or will we see
> private Openstack setups?
10 years ago, maybe even 5 I would have agreed with you wholeheartedly. I
was never impressed much by early LXC, but for my first year of exposure to
Docker hype I was thinking exactly what you are saying here. And then I
tried CoreOS and started missing having a real OS. And then I started
trying to do things with containers. And then I realized that I was seeing
software which was "easier to containerize" and that "easier to
containerize" really meant "written by people who can't figure out
'./configure; make; make install' and who build on a sand-like foundation
of fragile dependencies to the extent that it only runs on their Ubuntu
laptop so you have to put their Ubuntu laptop in a container." Then I
started asking myself "do I want to trust software of that quality?" And
after that, "do I want to trust the tools written to support that type of
poor-quality software?" And then I started to notice how much containers
actually *increased* the amount of time/complexity it took to manage
software. And then I started enjoying all the container engine bugs... At
that point, reality squished the hype for me because I had other stuff I
needed to get done and didn't have budget to hire a devops person to sit
around mulling these things over.
>From the perspective of the software being containerized, I'm even more
skeptical. In my world (bioinformatics) I install a lot of crappy software.
We're talking stuff resulting from "I read the first three days of 'learn
python in 21 days' and now I'm an expert, just run this after installing
these 17 things from pypi...and trust the output" I'm good friends with
crappy software, we hang out together a lot. To me it just doesn't feel
like making crappy software more portable is the *right* thing to do. When
I walk my dog, I follow him with a bag and "containerize" what drops out.
It makes it easier to carry around, but doesn't change what it is. As of
today I see the biggest benefit of containers as that they force a
developer to actually document the install procedure somewhere in a way
that actually has to work so we can see firsthand how ridiculous it is
(*cough* tensorflow *cough*).
I got sidetracked on a rant again. Your proposed solution works fine in an
IT style computing world, it needs the exact staff IT wants to grow these
days and instead of just a self-directed sysadmin it has the potential to
need a project manager. I don't see it showing up on many lab/office
clusters anytime soon though because it's a model that embraces hype first
and in an environment not focused on publishing or press releases around
hype, it's a lot of extra work/cost/complexity for very little real
benefit. While you (and many on this list) might be interested in
exploring the technical merits of the approach, it's actual utility really
hits home for people who require that extra complexity and layered
abstraction to justify themselves. The understaffed/overworked among us
will just write a shell/job script and move along to the next raging fire
to put out.
> On Fri, 30 Nov 2018 at 23:04, John Hanks <griznog at gmail.com> wrote:
>> On Thu, Nov 29, 2018 at 4:46 AM Jon Forrest <nobozo at gmail.com> wrote:
>>> I agree completely. There is and always be a need for what I call
>>> "pretty high performance computing", which is the highest performance
>>> computing you can achieve, given practical limits like funding, space,
>>> time, ... Sure there will always people who can figure out how to go
>>> faster, but PHPC is pretty good.
>> What a great term, PHPC. That probably describes the bulk of all "HPC"
>> oriented computing being done today, if you consider all cores in use down
>> to the lab/workbench level of clustering. Certainly for my userbase
>> (bioinformatics) the computational part of a project often is a small
>> subset of the total time spent on it and time to total solution is the most
>> important metric for them. It's rare for us to try to get that last 10% or
>> 20% of performance gain.
>> <rant>This has been a great thread overall, but I think no one is
>> considering the elephant in the room. Technical arguments are not winning
>> out in any of these technologies: CI/CD, containers, "devops", etc. All
>> these things are stacking on arbitrary layers of abstraction in an attempt
>> to cover up for the underlying, really really crappy software development
>> practices/models and resulting code. They aren't successful because they
>> are *good*, they are successful because they are *popular*.
>> As HPC admins, we tend to report to research oriented groups. Not always,
>> but more often than "normal" IT folks do who are often insulated from
>> negative user feedback by ticket systems, metrics, etc. Think about the
>> difference in that reporting chain:
>> A PI/researcher gets her next grant, tenured position, brilliant new
>> post-doc, etc., based on her research. Approach them about expanding the
>> sysadmin staff by 10x people and they'll laugh you out of the room. Ask for
>> an extra 100% budget to buy Vendor B storage rather than whitebox and
>> they'll laugh you out of the room. They want as much raw
>> computation/storage as cheaply as possible and would rather pay a grad
>> student than a sysadmin to run it because a grad student is more likely to
>> stumble over a publication and boost the PI's status. sysadmins are dead
>> weight in this world, only tolerated.
>> A CIO or CTO gets his next job based on the headcount and budget under
>> his control. There is no incentive to be efficient in anything they do. Of
>> course, there is the *appearance* of efficiency to maintain, but the CIO
>> 101 class's first lecture is on creative accounting and metrics. Pay more
>> for Vendor B? Of course, they pay for golf and lunch, great people. Think
>> about all those "migrate/outsource to the cloud" projects you've seen that
>> were going to save so much money. More often than not, staff *expands* with
>> "cloud engineers", extra training is required, sysadmin work gets
>> inefficiently distributed to end users, err, I mean developers. Developers
>> now need to fork into new FTEs who need training...and so it goes. More
>> head count, more budget, more power: happy CIO. Time to apply to a larger
>> institution/company, rinse and repeat.
>> Think about it from the perspective of your favorite phone app, whatever
>> it may be:
>> - app is released, wow this is useful!
>> - app is updated, wow this is still useful and does 2 more things
>> - app is updated, ummm..., it's still useful but these 4 new things
>> really make what I need hard to get to
>> - app is updated, dammit, my feature has been split and replaced with 8
>> new menus, none of which do what I want?!?!?
>> No one goes to the yearly performance review and says "I removed X
>> features, Y lines of code and simplified the interface down to just the
>> useful functions, there's nothing else to be done" and gets a raise. People
>> get raises for *adding* stuff, for *increasing* complexity. You can't tie
>> your name to a simplification, but an addition goes on the CV quite nicely.
>> It doesn't matter if in the end any benefit is dwarfed by the extra
>> complexity and inefficiency.
>> Ultimately I blame us, the sysadmins.
>> We could have installed business oriented software and worked with
>> schools of business, but we laughed at them because they didn't use MPI.
>> Now we have the Hadoop and SPARK abominations to deal with.
>> We could have handed out a little sudo here and there to give people
>> *measured* control, but we coveted root and drove them to a more expensive
>> instance in the cloud where they could have full control.
>> We could have rounded out node images with a useful set of packages, but
>> we prided ourselves on optimizing node images to the point that users had
>> to pretty much rebuild the OS in $HOME to get anything to run, and so now:
>> We could have been in a position to say "hey, that's a stupid idea"
>> (*cough* systemd *cough*) but we squandered our reputation on neckbeard
>> BOFH pursuits and the enemies of simplicity stormed the gates.
>> Disclaimer: I'm confessing here. I recognize I played a role in this so
>> don't think I didn't throw the first stone at myself. Guilty as charged.
>> Enjoy the technical arguments, but devops and cloud and containers and
>> whatever next abstraction layers arise don't care. They have crept up on us
>> under a fog of popularity and fanbois-ism and overwhelmed HPC with sheer
>> numbers of "developers". Not because any of it is better or more
>> efficient, but because no one really cares about efficiency. They want to
>> work and eat and if adding and supporting a half-dozen more layers of
>> abstraction and APIs keeps the paychecks coming, no one is simplifying
>> anything. I call it "devops masturbation". The fact that pretty much all of
>> it could be replaced with a small shell script is irrelevant. devops needs
>> CI/CD, containers, and cloud to justify existence, and they will not go
>> quietly into that good night when offered a simpler, more efficient and
>> cheaper solution which puts them out of a job. Best use of our time now may
>> well be to 'rm -rf SLURM' and figure out how to install kubernetes. Console
>> yourself with the realization that people are willing to happily pay more
>> for less if the abstraction is appealing enough, and start counting the fat
>> stacks of cash.
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf