[Beowulf] OS for 64 bit AMD

Joe Landman landman at scalableinformatics.com
Sun Apr 3 15:12:47 PDT 2005



Mark Hahn wrote:
>>>fully usable in a production environment.
>>I disagree with this, rather strongly.  The Fedora series has had a 
>>number of surprises for admins, for driver makers, for users, and so 
>>forth.  SE-Linux, 4k-stacks, glibc changes, etc.  All of these wound up 
>>in the supported release (e.g. the one for production environments). 
>>Sure you can use it on your systems.  Of course you can.  If something 
>>breaks on some commercial code that you might run, are you SOL?  If you 
>>don't run any commercial code, and have no liability issues associated 
>>with using supported platforms, this is a moot point.
> 
> you seem to be conflating "changelessness" with productionworthiness
> (or even "stability").

Uh... no.  The changes were introduced quite quickly with little 
preparation.  Given the focus of Fedora, this makes perfect sense.  For 
a production class system, you do not make changes quickly (generalization).

> if you have a single-purpose cluster dedicated to some specific package,
> then by all means, lock it to whatever release/config/color the 
> package's vendor likes the best.

Most of our customers clusters are devoted to 3-6 packages, with some 
subset being larger numbers.  If your system is used for in-house codes 
with no need of guaranteed feature sets (including specific levels of 
libraries, supporting packages, etc), by all means, use the distro or 
packaging of your choice.  If you have a dependency of any sort upon a 
package that you do not have source to, you have an effective constraint 
on your freedom.  Most of our customers are using one or more commercial 
codes to which they have no source code.

> but don't pretend that change across releases means that something 
> is somehow not production-worthy, or that its defensible for an app
> to depend on the distro, rather than the actual platform (ABI).

<sigh>  Apps do depend on distros if you want support from the 
commercial vendor, or if you need to defend your results in a legal 
forum.   The latter is rarely an issue for academic focused machines, 
and is very much an issue for industrial research and development folks. 
  Don't pretend that since it may not apply to you that it doesn't apply 
to everyone.

The rate of change of the distro, the focus of the distro, in that it is 
a moving target, specifically indicated by the folks who make it, render 
it an experimental platform (paraphrase of their words).  Experimental 
platforms do not a production system make.  You can go argue the point 
with Redhat if you like, they freely admit that it is an experimental 
platform.  This system is designed to be the platform where Redhat tests 
things (e.g. proving ground).  Test systems are not production systems.

>>>only means that FC is on a shorter release cycle, and might contain
>>>the new puce-and-teal color scheme, which turns out to be a bad idea.
>>On the contrary, I don't think SE-Linux is "puce-and-teal color scheme". 
>>
>>  Nor are 4k stacks (that broke many many drivers).  Yes, FC introduced 
> 
> they were all trivially disable-able.  also, what commercial applications
> depend on the size of the kernel stack?

You made the insinuation that the only real release to release changes 
were "puce-and-teal color scheme", which I pointed out to be obviously 
false.  If you did not mean to insinuate it, maybe you can indicate what 
what you perceive the maximal impact release to release changes are.

As for code that depends upon the size of the kernel stack, read the 
various forums on the drivers.  Short version of this is that there were 
quite a few broken drivers as a result of this (is a driver not 
important in your view?  It is to a commercial entity).  The ones that 
affected me directly were the Linuxant and nVidia drivers.  Before you 
go off and bang on their non-open source nature, remember that they are 
applications people will use, and before you go deploy that nice 
workstation sporting the nVidia FX3000 unit for visualization using 
ProStar or other engineering codes, you really need the display driver 
to work.   I have had a few customers that have insisted upon running 
FC-x with their nice graphics cards to do their visualization work. 
Were they ever surprised.  Made lots of frantic calls to us to help them 
resolve this.

Here is a simple definition that I think will help frame the discussion 
properly.  A production class OS should had very few surprises, and 
support for the surprises that arise.  Is FC-x production class?

>>those.  No, it was a significant shock when stuff stopped working.  Is 
>>that really production ready?  (e.g. thorough testing and bug fixes so 
>>that there will be no surprises)
> 
> all you're saying, again and again, is that "production-worthy" to you
> means that the machine is configured exactly as your single app-vendor 
> wants it.  with this logic, nothing can ever change.  actually, this 
> approach is much of the reason that windows sucks so much.

<sigh> Wrong.  Production worthy means as I indicated above, though I am 
quite sure other reasonable definitions are possible or even more 
accepted.  Whether you like this or not (and I know I do not like it), 
most commercial application vendors qualify their programs on very few 
linux distributions.  Most folks in the commercial software world have 
been burned in the past by "compatibility" and "ABI"s that were supposed 
to work.  If they are going to be held accountable for the quality (or 
lack thereof), they are going to try it.  Each additional distribution 
adds costs/time (ask Greg, he just indicated as much in his not on 
PathScale compiler platform support).  Each additional distribution adds 
complexity, as LAM 7.0.x may not work with 7.1.x (remember the MPI ABI 
discussion?  I sure as heck would like this, so I don't need to have 6-7 
different MPI implementations on each cluster), or return slightly 
different results to their function calls ...

> 
>>  Bottom line is (apart from Greg's company) I know of very few 
>>commercial software vendors targetting FC-x as a supported platform.  As 
> 
> this begs the question of whether commercial apps depend on behavior
> or configuration which is not standard on the platform.  in the compiler
> world, for instance, dependence on undefined behavior is a bug.

Commercial app vendors tend to aim for the most widely accepted 
platforms, and build for these.  So if these platforms have oddities, or 
bad libraries/compilers (gcc 2.96), this is going to be carried over 
into the application.  If they really require some special feature of a 
new library (LSTC with LAM, etc), then they will likely build their own 
and distribute it.  That actually helps, in that if the app has 
dependencies that it cannot anticipate the distro has within it, then it 
should carry the dependencies forward on their own... though this leads 
quickly to 7+ MPI implementations on the cluster.

> FC is not a platform, Linux is.  I'd be most curious to hear the explanation
> of how an app gets to be dependent on RHEL and will not work on other 
> distributions which conform to the same API.  or are you claiming that 
> there is no ABI?

<sigh>  What has this got to do with FC being production grade?  The ABI 
for FC has shifted.  The ABI for RHEL-x has shifted, though in a defined 
manner, and this ABI will remain constant for a 5 year interval after 
RHEL-x release.  FC-x will shift when needed.  These shifts of FC ABI, 
the functionality changes, the kernel changes that fundamentally alter 
the way drivers work define the purpose of the environment... all these 
contribute to the overall view of whether FC is a production ready or 
not.  If you don't need commercial apps, or better still, your 
commercial apps are supported on "Linux" and not on "RHEL", then it 
doesn't matter what the OS underlying it is.  More to the point, if the 
OS does not break drivers with the upgrades, does not break major 
functionality at each upgrade, then it is probably a production class 
OS.  FC-x isnt that.   One can easily make the same argument about a 
certain OS from the northwest US (I always kick myself after an upgrade, 
as they introduce something new that almost, but not quite, works the 
same as it did before, and usually manages to break compatibility with 
other bits).




-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615



More information about the Beowulf mailing list