[Beowulf] HPC workflows
griznog at gmail.com
Fri Nov 30 14:02:55 PST 2018
On Thu, Nov 29, 2018 at 4:46 AM Jon Forrest <nobozo at gmail.com> wrote:
> I agree completely. There is and always be a need for what I call
> "pretty high performance computing", which is the highest performance
> computing you can achieve, given practical limits like funding, space,
> time, ... Sure there will always people who can figure out how to go
> faster, but PHPC is pretty good.
What a great term, PHPC. That probably describes the bulk of all "HPC"
oriented computing being done today, if you consider all cores in use down
to the lab/workbench level of clustering. Certainly for my userbase
(bioinformatics) the computational part of a project often is a small
subset of the total time spent on it and time to total solution is the most
important metric for them. It's rare for us to try to get that last 10% or
20% of performance gain.
<rant>This has been a great thread overall, but I think no one is
considering the elephant in the room. Technical arguments are not winning
out in any of these technologies: CI/CD, containers, "devops", etc. All
these things are stacking on arbitrary layers of abstraction in an attempt
to cover up for the underlying, really really crappy software development
practices/models and resulting code. They aren't successful because they
are *good*, they are successful because they are *popular*.
As HPC admins, we tend to report to research oriented groups. Not always,
but more often than "normal" IT folks do who are often insulated from
negative user feedback by ticket systems, metrics, etc. Think about the
difference in that reporting chain:
A PI/researcher gets her next grant, tenured position, brilliant new
post-doc, etc., based on her research. Approach them about expanding the
sysadmin staff by 10x people and they'll laugh you out of the room. Ask for
an extra 100% budget to buy Vendor B storage rather than whitebox and
they'll laugh you out of the room. They want as much raw
computation/storage as cheaply as possible and would rather pay a grad
student than a sysadmin to run it because a grad student is more likely to
stumble over a publication and boost the PI's status. sysadmins are dead
weight in this world, only tolerated.
A CIO or CTO gets his next job based on the headcount and budget under his
control. There is no incentive to be efficient in anything they do. Of
course, there is the *appearance* of efficiency to maintain, but the CIO
101 class's first lecture is on creative accounting and metrics. Pay more
for Vendor B? Of course, they pay for golf and lunch, great people. Think
about all those "migrate/outsource to the cloud" projects you've seen that
were going to save so much money. More often than not, staff *expands* with
"cloud engineers", extra training is required, sysadmin work gets
inefficiently distributed to end users, err, I mean developers. Developers
now need to fork into new FTEs who need training...and so it goes. More
head count, more budget, more power: happy CIO. Time to apply to a larger
institution/company, rinse and repeat.
Think about it from the perspective of your favorite phone app, whatever it
- app is released, wow this is useful!
- app is updated, wow this is still useful and does 2 more things
- app is updated, ummm..., it's still useful but these 4 new things really
make what I need hard to get to
- app is updated, dammit, my feature has been split and replaced with 8
new menus, none of which do what I want?!?!?
No one goes to the yearly performance review and says "I removed X
features, Y lines of code and simplified the interface down to just the
useful functions, there's nothing else to be done" and gets a raise. People
get raises for *adding* stuff, for *increasing* complexity. You can't tie
your name to a simplification, but an addition goes on the CV quite nicely.
It doesn't matter if in the end any benefit is dwarfed by the extra
complexity and inefficiency.
Ultimately I blame us, the sysadmins.
We could have installed business oriented software and worked with schools
of business, but we laughed at them because they didn't use MPI. Now we
have the Hadoop and SPARK abominations to deal with.
We could have handed out a little sudo here and there to give people
*measured* control, but we coveted root and drove them to a more expensive
instance in the cloud where they could have full control.
We could have rounded out node images with a useful set of packages, but we
prided ourselves on optimizing node images to the point that users had to
pretty much rebuild the OS in $HOME to get anything to run, and so now:
We could have been in a position to say "hey, that's a stupid idea"
(*cough* systemd *cough*) but we squandered our reputation on neckbeard
BOFH pursuits and the enemies of simplicity stormed the gates.
Disclaimer: I'm confessing here. I recognize I played a role in this so
don't think I didn't throw the first stone at myself. Guilty as charged.
Enjoy the technical arguments, but devops and cloud and containers and
whatever next abstraction layers arise don't care. They have crept up on us
under a fog of popularity and fanbois-ism and overwhelmed HPC with sheer
numbers of "developers". Not because any of it is better or more
efficient, but because no one really cares about efficiency. They want to
work and eat and if adding and supporting a half-dozen more layers of
abstraction and APIs keeps the paychecks coming, no one is simplifying
anything. I call it "devops masturbation". The fact that pretty much all of
it could be replaced with a small shell script is irrelevant. devops needs
CI/CD, containers, and cloud to justify existence, and they will not go
quietly into that good night when offered a simpler, more efficient and
cheaper solution which puts them out of a job. Best use of our time now may
well be to 'rm -rf SLURM' and figure out how to install kubernetes. Console
yourself with the realization that people are willing to happily pay more
for less if the abstraction is appealing enough, and start counting the fat
stacks of cash.
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf