[Beowulf] OS for 64 bit AMD

Robert G. Brown rgb at phy.duke.edu
Mon Apr 4 08:22:13 PDT 2005


On Sun, 3 Apr 2005, Joe Landman wrote:

> > FC is not a platform, Linux is.  I'd be most curious to hear the explanation
> > of how an app gets to be dependent on RHEL and will not work on other 
> > distributions which conform to the same API.  or are you claiming that 
> > there is no ABI?
> 
> <sigh>  What has this got to do with FC being production grade?  The ABI 
> for FC has shifted.  The ABI for RHEL-x has shifted, though in a defined 
> manner, and this ABI will remain constant for a 5 year interval after 
> RHEL-x release.  FC-x will shift when needed.  These shifts of FC ABI, 
> the functionality changes, the kernel changes that fundamentally alter 
> the way drivers work define the purpose of the environment... all these 
> contribute to the overall view of whether FC is a production ready or 
> not.  If you don't need commercial apps, or better still, your 
> commercial apps are supported on "Linux" and not on "RHEL", then it 
> doesn't matter what the OS underlying it is.  More to the point, if the 
> OS does not break drivers with the upgrades, does not break major 
> functionality at each upgrade, then it is probably a production class 
> OS.  FC-x isnt that.   One can easily make the same argument about a 
> certain OS from the northwest US (I always kick myself after an upgrade, 
> as they introduce something new that almost, but not quite, works the 
> same as it did before, and usually manages to break compatibility with 
> other bits).

I've been following your discussion with great interest, as you both
make excellent points.  Let me add a few comments.

  a) Picky point, but an "upgrade" vs an "update" has always meant that
binary compatibility is broken for at least some things.  Sometimes
major things, like libc.  Sometimes minor things like libwhatever.  In
addition there tends to be fairly significant motion in application-land
and GUI-land.

  b) 90% of this discussion is occurring because vendors of commercial
linux software don't understand the concept of linux packaging and
participating in a dynamic process.  This is neither here nor there --
it doesn't get Joe off his hook -- but it is an apropos observation
because if companies that sold software (open source or not) actually
learned how to build for linux and participate in the various
distribution forums a lot of this problem would go away.

  c) There is always Centos -- logo-free RHEL, basically.  IIRC it
follows RHEL by what, a few hours?  I think it is perfectly reasonable
for companies that sell commercial software for linux to build it for a
commercial distribution like RHEL, and if you are in an industry where
you MUST have both somebody to call and a line of responsibility should
things fail in certain ways (e.g. banking, certain parts of the health
care industry, etc.) it is almost certainly both wise and legally
necessary to buy RH.  If you are setting up a research cluster or
departmental LAN at a University, though, and don't want to pay RH
(without getting into whether what you pay per node is "reasonable")
there is always Centos -- no direct phone support but all the stability,
and we never use any direct support anyway.  This is usually fair --
just because it is "enterprise linux" doesn't mean that it is bug free,
and Universities STILL are primary debugging entities for RH, SuSE,
Debian, FC, and everything else.

  d) I think that one place where you (Joe) and you (Mark) most
fundamentally disagree -- TECHNICALLY beta testing refers to a very
specific pre-release phase in a commercial software development cycle.
As in:

  Alpha:  Testing during active development by development team to
ensure that the product "works".  There can be a first and second stage
(known as "black box" testing)

  Beta:  Sampled testing in the community where the software is to be
used, usually on a "pre-release" basis.

  Pilot:  Like a beta but usually after or in parallel with the beta
phase and intended to see if the product has commercial potential or if
it is still missing desirable features that would enhance its commercial
potential.

  Gamma:  This is a joke term for software that is released but is still
full of bugs so that the hapless public becomes de facto "beta testers"
after the formal beta phase.  Every major release of Windows has, for a
while after its appearance, very much looked like a gamma release.

(From one of several sources, e.g.

http://en.wikipedia.org/wiki/Beta_testing#Alpha_testing

)

Note that this cycle formally refers to commercial code development,
where there is a well-defined "team" and an organization capable of
supporting the various levels of testing, feedback, and ultimately
commercial exploitation.  

One "feature" of open source software is that in a very deep sense it is
all, always, beta/gamma software and all users are, in a very deep
sense, beta/gamma testers.  This is true in the commercial software
world as well; it's just that THEY don't always acknowledge it and are
sometimes about as responsive as molasses when you call them and point
out that their platform doesn't work on this graphics adapter or crashes
randomly every time a certain subtask is initiated, hence the "gamma"
joke.  Other companies (like pathscale) are very responsive and really
"get" points b) and d) and support things as broadly as possible in
order to keep their market as broad as possible.  As a consumer, I'm a
lot more likely to buy pathscale's compilers if I don't "have" to buy
RHEL for my entire enterprise first.  The same is true for all sorts of
WinXX software -- I might well buy some of it for my linux boxes if I
didn't have to buy WinXX (and all that implies) first.

To be very clear, FC is not a "beta" linux distribution any more than RH
itself, SuSE, Debian, Mandrake are beta distributions, but since ALL
linux distributions are composed of hundreds of packages in varying
states of ongoing development, ALL linux distributions involve feature
changes and bugs that are revealed as they are implemented in ever
richer and more complex environments.  Just like the rest of the
software universe.  The primary difference is that linux FIXES those
bugs VERY RAPIDLY so that >>any<< linux distribution with an update
mechanism such as yum rapidly becomes stable in production, just as ALL
of them are somewhat unstable and break things with new features when
they are first released.

So what you are arguing about is a degree, not an absolute.  It is silly
to call Red Hat "stable" and Fedora "unstable", just as it is silly to
argue that there aren't real differences in their rates of change and
the longevity of their support cycles.  As a consumer, one can choose a
level of both that suits you and your needs.

  e) On this same line, let's be very careful to differentiate between
the terms "commercial distribution", "stable", and "rapidly changing".
Commercial linux distributions such as Red Hat are neither necessarily
stable nor slowly varying, but by implication they are in a lot of the
discussion so far.  To use RH as a historical example, the 7.3->8->9
sequence is one of the most striking in recent years.  Major libc
changes, lots of stuff broken, very rapid release cycle, lots of pissed
off humans.  Hence commercial distributions can occur rapidly or not,
depending on what reasons there are either way.  Note also that just
because RH promises to support stuff for N years doesn't mean that their
consumers will actually be satisfied with that cycle, or that other
(commercial) linux distros will pick the same cycle.  In fact, the
REASON to get long-running support for a particular release is because
things are NOT either stable or bug free -- bugs appear in libraries for
years after a release first comes out, some of them serious or
security-related.  At some point, though, the world needs to just move
on.

  f) To amplify, "slowly varying" is very much an ambivalent "advantage"
for an operating system and distribution.  It is an open invitation to
stagnation and laziness on the part of commercial developers and
administrators alike.  How often on this list do we hear of people who
are STILL running RH 7.3 based clusters and wonder aloud at the lack of
driver support or the ability to run on Opterons?  What's the knee-jerk
response?  Upgrade to something modern.  Things improve, they get
better, more secure, faster, more powerful.  So what one is really
arguing about (I hope) isn't that we should all be running RH 5.2 just
because there exist vendors somewhere who never bothered to port their
application(s) to 6.x, 7.x, 8, 9, RHEL (or any of the other distros).
You laugh, but I've corresponded with people repeatedly over the years
who are locked into one or another of those numbers by some silly
application.  These individuals are to be viewed with sympathy and
unwilling tolerance, not praised for helping to hold the world back...



To conclude, once one separates commercial, stable, rapidly/slowly
varying from some deep correlation as in "Debian is not commercial and
hence is both unstable and not rapidly varying enough", "Fedora is Red
Hat's Beta testing distribution and horribly unstable as it changes too
rapidly", or "Red Hat is commercial and hence is stable and each release
will still be around, supported, when my youngest kid goes off to
college" THEN one can assess a particular situation instead of throwing
sweeping generalizations out there that are obviously false for some
important set of potential applications.

  * Some Fedora releases have changed things enough to break some
customers' systems.  Fine, so have some Red Hat releases, some Mandrake
releases, some Windows releases.  It is too early to determine if this
is a trend, and in any event if RH >>does<< adopt things piloted in
Fedora -- ever -- it is just a matter of time before those same things
"break" their clients' systems.  The evolution and development of new
features and tools is as much a reason FOR using Fedora as it is AGAINST
it, in most environments but with some clear exceptions.

  * There is a considerable "cost" to a major upgrade in any
organization (and for any distribution, commercial or non-commercial).
Things have to be rebuilt, that final gamma stage of testing occurs,
deep bugs and incompatibilities are revealed, you have to deal with the
commercial package problem.  This cost is balanced against the cost of
doing nothing and staying with a single snapshot of a single
distribution forever.  Millions of WinXX users can attest with every
bug, crash, or virus the sterility of choosing to nothing (or having no
real choice but to do nothing).  Most sites choose a sane middle ground
here that is comfortable for their level of expertise, application set,
and other resources.  In our case we use FC on most desktops as users
LIKE getting new desktop features and FC (under active development)
tends to stabilize very rapidly so that early teething problems are
rapidly resolved and yum-updated across an organization without further
human effort.  We avoid the 6 month cycle by only upgrading the
linux at duke distribution every other upgrade, which gives us a lot of
time to get used to the new features.  We do use Centos on
fault-intolerant servers, and have it available for people with a
commercial "requirement" for RH's libraries.  We ALSO have SuSE
available (for a price) or RHEL itself (also for a price) for people who
want either the support or who have particular commercial packags with
library dependences.  Not everything is built for RHEL.

  * This latter mix clearly indicates that THERE IS NO "RIGHT" ANSWER to
this debate.  FC is clearly perfectly acceptable as an operating system
for a cluster, a LAN of workstations, my laptop ;-), a server farm.
RHEL or Centos are also perfectly acceptable.  At the cluster level,
they are nearly indistinguishably acceptable; for a LAN server they
might have a small edge not because FC cannot be made stable enough for
servers but because it is a PITA to upgrade servers and so it is a good
place to stick with a release as long as it is supported and security
patched and has a functional span of the (usually small) set of e.g.
server programs such as nfsd or httpd required by the server and
applications.  And Debian is equally reasonable, as are Scientific
Linux, Caosity, etc.

  * So it would be great if arguments in absolute terms were softened
just a bit.  There are some circumstances where it makes sense to use
RHEL or Centos or SuSE.  The burning need to use a software package that
will only run on one or the other of them is a great reason to use it.
In other circumstances it makes more sense to use FC*, or Debian, or
Scientific Linux, because it is free, >>BECAUSE<< it is rapidly evolving
and you need to use one of its rapidly evolving features, because you
like using distributions under rapid development so bugs are fixed
quickly and community scrutiny is strong.  It is no more fair to say
"Fedora Core Sucks" and should never be installed on a cluster (not
true, use it all the time, works great) than it is to say "Fedora Core
is Perfect" and that there are never good reasons to use RHEL or Centos.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list