[Beowulf] Bright Cluster Manager

John Hearns hearnsj at googlemail.com
Thu May 3 06:53:14 PDT 2018


I agree with Doug. The way forward is a lightweight OS with containers for
the applications.
I think we need to learn from the new kids on the block - the webscale
generation.
They did not go out and look at how massive supercomputer clusters are put
together.
No, they went out and build scale out applications built on public clouds.
We see 'applications designed to fail' and 'serverless'

Yes, I KNOW that scale out applications like these are Web type
applications, and all application examples you
see are based on the load balancer/web server/database (or whatever style)
paradigm

The art of this will be deploying the more tightly coupled applications
with HPC has,
which depend upon MPI communications over a reliable fabric, which depend
upon GPUs etc.

The other hat I will toss into the ring is separating parallel tasks which
require computation on several
servers and MPI communication between them versus 'embarrassingly parallel'
operations which may run on many, many cores
but do not particularly need communication between them.

The best successes I have seen on clusters is where the heavy parallel
applications get exclusive compute nodes.
Cleaner, you get all the memory and storage bandwidth and easy to clean up.
Hell, reboot the things after each job. You got an exclusive node.
I think many designs of HPC clusters still try to cater for all workloads
- Oh Yes, we can run an MPI weather forecasting/ocean simulation
But at the same time we have this really fast IO system and we can run your
Hadoop jobs.

I wonder if we are going to see a fork in HPC. With the massively parallel
applications being deployed, as Doug says, on specialised
lightweight OSes which have dedicated high speed, reliable fabrics and with
containers.
You won't really be able to manage those systems like individual Linux
servers. Will you be able to ssh in for instance?
ssh assumes there is an ssh daemon running. Does a lightweight OS have ssh?
Authentication Services? The kitchen sink?

The less parallel applications being run more and more on cloud type
installations, either on-premise clouds or public clouds.
I confound myself here, as I cant say what the actual difference between
those two types of machines is, as you always needs
an interconnect fabric and storage, so why not have the same for both types
of tasks.
Maybe one further quip to stimulate some conversation. Silicon is cheap.
No, really it is.
Your friendly Intel salesman may wince when you say that. After all those
lovely Xeon CPUs cost north of 1000 dollars each.
But again I throw in some talking points:

power and cooling costs the same if not more than your purchase cost over
several years

are we exploiting all the capabilities of those Xeon CPUs

































































On 3 May 2018 at 15:04, Douglas Eadline <deadline at eadline.org> wrote:

>
>
> Here is where I see it going
>
> 1. Computer nodes with a base minimal generic Linux OS
>    (with PR_SET_NO_NEW_PRIVS in kernel, added in 3.5)
>
> 2. A Scheduler (that supports containers)
>
> 3. Containers (Singularity mostly)
>
> All "provisioning" is moved to the container. There will be edge cases of
> course, but applications will be pulled down from
> a container repos and "just run"
>
> --
> Doug
>
>
> > I never used Bright.  Touched it and talked to a salesperson at a
> > conference but I wasn't impressed.
> >
> > Unpopular opinion: I don't see a point in using "cluster managers"
> > unless you have a very tiny cluster and zero Linux experience.  These
> > are just Linux boxes with a couple applications (e.g. Slurm) running on
> > them.  Nothing special. xcat/Warewulf/Scyld/Rocks just get in the way
> > more than they help IMO.  They are mostly crappy wrappers around free
> > software (e.g. ISC's dhcpd) anyway.  When they aren't it's proprietary
> > trash.
> >
> > I install CentOS nodes and use
> > Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs and
> > software.  This also means I'm not suck with "node images" and can
> > instead build everything as plain old text files (read: write SaltStack
> > states), update them at will, and push changes any time.  My "base
> > image" is CentOS and I need no "baby's first cluster" HPC software to
> > install/PXEboot it.  YMMV
> >
> >
> > Jeff White
> >
> > On 05/01/2018 01:57 PM, Robert Taylor wrote:
> >> Hi Beowulfers.
> >> Does anyone have any experience with Bright Cluster Manager?
> >> My boss has been looking into it, so I wanted to tap into the
> >> collective HPC consciousness and see
> >> what people think about it.
> >> It appears to do node management, monitoring, and provisioning, so we
> >> would still need a job scheduler like lsf, slurm,etc, as well. Is that
> >> correct?
> >>
> >> If you have experience with Bright, let me know. Feel free to contact
> >> me off list or on.
> >>
> >>
> >>
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing
> >> To change your subscription (digest mode or unsubscribe) visit
> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.
> beowulf.org_mailman_listinfo_beowulf&d=DwIGaQ&c=C3yme8gMkxg_
> ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=DhM5WMgdrH-xWhI5BzkRTzoTvz8C-
> BRZ05t9kW9SXZk&m=2km_EqLvNf2v9rNf8LphAYkJ-Sc_azfEyHqyDIzpLOc&s=
> kq0wdhy80VqcBCwcQAAQa0RbsgWIekhd0qU0zC81g1Q&e=
> >
> >
> > --
> > MailScanner: Clean
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
> >
>
>
> --
> Doug
>
> --
> MailScanner: Clean
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20180503/1b8eb42a/attachment.html>


More information about the Beowulf mailing list