[Beowulf] HPC workflows

John Hearns hearnsj at googlemail.com
Fri Nov 30 21:43:05 PST 2018


John, your reply makes so many points which could start a whole series of
debates.

 > Best use of our time now may well be to 'rm -rf SLURM' and figure out
how to install kubernetes.
Well that is something I have given a lot of thought to recently.
The folks over at Sylabs have already released a development version of
Singularity with Kubernetes Container Runtime Interface.
So yes, maybe the future of HPC at the small industrial/departmental level
will be Kubernetes.
Maybe it will be so for the national lab scale systems too.

Might be worth bringing in one of my adages here - in IT you run with the
herd, or you get trampled. Meaning that you may identify a good technology,
you may find that it fits well with your needs and that you like working
with it.
But if the herd is thundering off with another, even inferior, technology
then you will be left behind.

My own thoughts on HPC for a tightly coupled, on premise setup is that we
need a lightweight OS on the nodes, which does the bare minimum. No general
purpose utilities, no GUIS, nothing but network and storage. And container
support.
The cluster will have the normal login nodes of course but will present
itself as a 'black box' to run containers.
But - given my herd analogy above - will we see that? Or will we see
private Openstack setups?





















On Fri, 30 Nov 2018 at 23:04, John Hanks <griznog at gmail.com> wrote:

>
>
> On Thu, Nov 29, 2018 at 4:46 AM Jon Forrest <nobozo at gmail.com> wrote:
>
>
>> I agree completely. There is and always be a need for what I call
>> "pretty high performance computing", which is the highest performance
>> computing you can achieve, given practical limits like funding, space,
>> time, ... Sure there will always people who can figure out how to go
>> faster, but PHPC is pretty good.
>>
>>
> What a great term, PHPC. That probably describes the bulk of all "HPC"
> oriented computing being done today, if you consider all cores in use down
> to the lab/workbench level of clustering. Certainly for my userbase
> (bioinformatics) the computational part of a project often is a small
> subset of the total time spent on it and time to total solution is the most
> important metric for them. It's rare for us to try to get that last 10% or
> 20% of performance gain.
>
> <rant>This has been a great thread overall, but I think no one is
> considering the elephant in the room. Technical arguments are not winning
> out in any of these technologies: CI/CD, containers, "devops", etc. All
> these things are stacking on arbitrary layers of abstraction in an attempt
> to cover up for the underlying, really really crappy software development
> practices/models and resulting code. They aren't successful because they
> are *good*, they are successful because they are *popular*.
>
> As HPC admins, we tend to report to research oriented groups. Not always,
> but more often than "normal" IT folks do who are often insulated from
> negative user feedback by ticket systems, metrics, etc. Think about the
> difference in that reporting chain:
>
> A PI/researcher gets her next grant, tenured position, brilliant new
> post-doc, etc., based on her research. Approach them about expanding the
> sysadmin staff by 10x people and they'll laugh you out of the room. Ask for
> an extra 100% budget to buy Vendor B storage rather than whitebox and
> they'll laugh you out of the room. They want as much raw
> computation/storage as cheaply as possible and would rather pay a grad
> student than a sysadmin to run it because a grad student is more likely to
> stumble over a publication and boost the PI's status. sysadmins are dead
> weight in this world, only tolerated.
>
> A CIO or CTO gets his next job based on the headcount and budget under his
> control. There is no incentive to be efficient in anything they do. Of
> course, there is the *appearance* of efficiency to maintain, but the CIO
> 101 class's first lecture is on creative accounting and metrics. Pay more
> for Vendor B? Of course, they pay for golf and lunch, great people. Think
> about all those "migrate/outsource to the cloud" projects you've seen that
> were going to save so much money. More often than not, staff *expands* with
> "cloud engineers", extra training is required, sysadmin work gets
> inefficiently distributed to end users, err, I mean developers. Developers
> now need to fork into new FTEs who need training...and so it goes. More
> head count, more budget, more power: happy CIO. Time to apply to a larger
> institution/company, rinse and repeat.
>
> Think about it from the perspective of your favorite phone app, whatever
> it may be:
>  - app is released, wow this is useful!
>  - app is updated, wow this is still useful and does 2 more things
>  - app is updated, ummm..., it's still useful but these 4 new things
> really make what I need hard to get to
>  - app is updated, dammit, my feature has been split and replaced with 8
> new menus, none of which do what I want?!?!?
>
> No one goes to the yearly performance review and says "I removed X
> features, Y lines of code and simplified the interface down to just the
> useful functions, there's nothing else to be done" and gets a raise. People
> get raises for *adding* stuff, for *increasing* complexity. You can't tie
> your name to a simplification, but an addition goes on the CV quite nicely.
> It doesn't matter if in the end any benefit is dwarfed by the extra
> complexity and inefficiency.
>
> Ultimately I blame us, the sysadmins.
>
> We could have installed business oriented software and worked with schools
> of business, but we laughed at them because they didn't use MPI. Now we
> have the Hadoop and SPARK abominations to deal with.
>
> We could have handed out a little sudo here and there to give people
> *measured* control, but we coveted root and drove them to a more expensive
> instance in the cloud where they could have full control.
>
> We could have rounded out node images with a useful set of packages, but
> we prided ourselves on optimizing node images to the point that users had
> to pretty much rebuild the OS in $HOME to get anything to run, and so now:
> containers.
>
> We could have been in a position to say "hey, that's a stupid idea"
> (*cough* systemd *cough*) but we squandered our reputation on neckbeard
> BOFH pursuits and the enemies of simplicity stormed the gates.
>
> Disclaimer: I'm confessing here. I recognize I played a role in this so
> don't think I didn't throw the first stone at myself. Guilty as charged.
>
> Enjoy the technical arguments, but devops and cloud and containers and
> whatever next abstraction layers arise don't care. They have crept up on us
> under a fog of popularity and fanbois-ism and overwhelmed HPC with sheer
> numbers of "developers".  Not because any of it is better or more
> efficient, but because no one really cares about efficiency. They want to
> work and eat and if adding and supporting a half-dozen more layers of
> abstraction and APIs keeps the paychecks coming, no one is simplifying
> anything. I call it "devops masturbation". The fact that pretty much all of
> it could be replaced with a small shell script is irrelevant. devops needs
> CI/CD, containers, and cloud to justify existence, and they will not go
> quietly into that good night when offered a simpler, more efficient and
> cheaper solution which puts them out of a job. Best use of our time now may
> well be to 'rm -rf SLURM' and figure out how to install kubernetes. Console
> yourself with the realization that people are willing to happily pay more
> for less if the abstraction is appealing enough, and start counting the fat
> stacks of cash.
> </rant>
>
> griznog
>
>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20181201/2176a9b1/attachment-0001.html>


More information about the Beowulf mailing list