[Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?

Jan Heichler jan.heichler at gmx.net
Mon Apr 6 11:55:40 PDT 2009


Hallo Rahul,

Montag, 6. April 2009, meintest Du:

RN> Just making the point that CentOS (IMHO) is as good or bad a choice
RN> and I wish the vendor's let me peacefully live with it! :) To each his
RN> own Distro (within reasonable bounds).

Well - the point is that a vendor can't debug a distro. You are using
CentOS and have a reasonable point about saying "it is the same as
RedHat which (according to Dell) should work". And the next guy uses
Debian - Gentoo - Slackware.. whatever.

And it happened more than once that a defect driver or something
caused trouble that seems to be a hardware problem. So the vendors
have to protect themselves... normally smaller vendors are more
flexible when it comes to support. Big organisations are not flexible.
The supporters are paid by the number of cases they solve. So they
eliminate everything that is not "their problem".

RN> A minor point (very subjective, of course) that makes me prefer one
RN> distro over another is the user-base size.How big is the usage of
RN> ScientificLinux, vs CentOS vs RHEL vs ComputeNodeLinux? I tried to get
RN> figures but I don't have good numbers.

I think you are right - and Scientific/CentOS/RedHat you can count as
a singe UserBase.

RN>  {note I left Fedora out since
RN> many users reported that it may not be the most stable one and typical
RN> Fedora users might be of a different profile that what matches a
RN> Beowulf application}

Fedora is (most important) updated far to often. After 12 to 18 months
you don't get patches/updates anymore. So if you don't want to
reinstall you cluster every once in a while you should use
CentOS/ScientificLinux/etc


Jan




More information about the Beowulf mailing list