[Beowulf] Gentoo in the HPC environment

Wed Jun 25 08:30:52 PDT 2014

On 06/25/2014 09:51 AM, Gavin W. Burris wrote:
> Hi, Jonathan.
>
> I would make a strong argument against Gentoo.  I would recommend that
> you choose Red Hat or its binary-compatible derivatives, like CentOS.
>
> At the end of the day, you need to ask yourself what you are trying to
> accomplish.  In an HPC environment, that should be a customer-facing
> service that is reliable and well supported.  You do not want to be
> constantly patching, compiling or changing APIs out from underneath of
> research code.  Enterprise Linux provides a stable base for HPC
> application development with a very solid lifecycle.
> https://access.redhat.com/site/support/policy/updates/errata/

Unfortunately, the reality of the HPC code market is that, quite often, 
the OS required by the application for support is often at odds with 
what you describe above.  More often than not, commercial and closed 
source applications are built and qualified (for support and guarantee 
of functionality) against several very specific OS and library versions. 
  It is rare, in my experience with this, that any of these are 
up-to-date versions of Red Hat or Red Hat derived distributions.

This is not to say that one should go with another distribution or not, 
there are valid engineering and support choices either way.

But this said, given the often incompatible, and quite often 
non-solvable problems of providing a common supported platform for 
everything to run on, one unsupported platform is as good as the other, 
with the caveat that one needs to pay attention to the ease of 
management as well as other things.

This is why stateless machines, booting an instance with a particular OS 
for a particular job, is a *far* more reasonable and workable approach 
than laying down one OS and demanding that applications conform to it. 
There is a bit more freedom when you have source, and can rebuild, but 
for commercial and closed source apps, if you want support from the 
vendors, you need to adhere to their requirements.

Which means one of a few options

1) stateless as I had mentioned:  NFS or iSCSI like OS PXE booted on 
bare metal.  This is the best of the lot, as it gets very easy to create 
a dedicated stateless load for a particular application, providing 
everything the application needs to run at bare metal speed, without 
requiring a hard lock of a system to an OS.

2) VM based:  Just like stateless, but running as a VM atop a JEOS. 
This is the cloud model.  Almost bare metal speed, and a loss of access 
to things like IB and other fast PCIe cut-through technologies.  But 
this is what Amazon, et al provide.  You can build this yourself with 
OpenStack.  Works great for less latency sensitive apps, and you can 
have each VM be whatever the application needs to run supported.

3) Docker/Container based:  Sort of a cross between stateless and VM 
based, it provides direct hardware access, and you can set up 
effectively independent and completely supportable containers to run on 
each system, independent of the OS requirements of the job.  See 
http://www.docker.com/whatisdocker/

> Being part of a larger community, running the same builds, has its
> advantages.  You won't be the only person encountering a weird stability
> or performance bug.  You also get vendor hardware support, which is
> huge.

You won't get vendor support for CentOS.  And CentOS cannot be "shipped" 
by a commercial entity anymore in a for profit manner, for any reason. 
See  http://www.centos.org/legal/trademarks/#unacceptable-uses .  They 
aimed this (obviously) at Oracle and OEL, but it presents, shall we say, 
some interesting collateral damage.

If you want a Red Hat based distro in a commercial sense, you are 
currently limited to, not surprisingly, Red Hat.  Sure, you might use 
https://www.scientificlinux.org/ which is a rebuild + added bits - 
copyrighted bits.

We switched our systems to Debian after we saw this.  We've been quite 
unhappy with some of the horrible broken-ness we've seen in the init 
system with {Red Hat|CentOS} 6.x for a while, and that legal change plus 
some nasty unfixable dracut stuff pushed us to better vistas.

> I know the counter arguments here.  There is always going to be the
> coder that wants Ubuntu and this-month's release of $LANGUAGE, like on
> their vagrant box.  I have found the Software Collections to be a

Err ... no.  The center of mass of the market has moved on to the faster 
changing distributions.  We opted for Debian over Ubuntu due to 
silliness in the Ubuntu kernel bits that made adding our patches hard. 
Much easier with a sane system.  Its very ... very ... hard to fix all 
the breakage when we make changes to CentOS/Red Hat.  You might say 
"don't change", but since part of our value is inherent in the changes, 
well ...

As I've been saying for more than a decade, the application OS 
requirements are a detail of the job.  Tools like kvm and docker let us 
get away from having a massive impedance mismatch between application 
requirements and node software environment requirements.

Having fought the supported OS battles for decades (jeez ... since the 
80s!), and having the scars to prove it, I personally prefer the 
simpler/better/lower friction route.  No more square pegs in round holes.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
twtr : @scalableinfo
phone: +1 734 786 8423 x121
cell : +1 734 612 4615