AMD [IBM] press release

Bob Drzyzgula bob at
Wed Nov 20 15:35:18 PST 2002

On Wed, Nov 20, 2002 at 05:13:00PM -0500, Robert G. Brown wrote:
> I like to build my own systems as well, and tend to spend MORE time on
> systems built by vendors, even very friendly and cooperative local
> vendors, than I do on systems I build myself, although it has taken me a
> year plus of buying lots of systems and letting the vendor build them
> (and then having to mess with them later) to really figure that out.
> Unless you have a linux-expert vendor that can do EXACTLY what you have
> to do to install the system for you, you end up messing with it more
> getting it to where you can install it from an often unknown and
> slightly broken initial state OR communicating with the vendor about
> what they did wrong than it takes you to just do it, at least with
> modest numbers of systems.

What you said. In addition, in situation where you are
relying on a local vendor to implement your own specifications,
you raise the question of responsibility for problems.
If the vendor designs and builds the system, misconfiguration
is clearly the vendor's fault. But if you told the vendor
exactly what to do, who is responsible? The last thing you
want is to get into a finger-pointing situation with your
vendor. If you handle it all yourself, there's no question --
it's your problem from top to bottom. That might be scary,
but it can be less expensive and time consuming to simply
accept that responsibility than to spend weeks bickering
over who is going to do what. Been there, done that. Of course,
you better have an understanding employer before you go it

> With that said, there are some components and circumstances where Dell's
> fancy hardware makes sense -- department LAN servers, for example, where
> one can minimize the consumption of the scarcest of our resources --
> primary sysadmin time -- by buying the highest quality servers, keeping
> them under the expensive same-day service contracts, and upgrading them
> pretty steadily as they age out.  This is 2x or more expensive at the
> hardware side, but can prevent expensive downtime on a major shared
> resource.

In this situation, I will thrown in a bit of a pitch for
Intel's integrator platforms. They are very well designed,
obsessively documented, and reasonably priced. And where
I work, for most systems, "same-day" service is simply
not an option. When the systems go down, someone needs
to be working on it in minutes. For years we carried
on-site service contracts, but we found that in almost
all situations we just wound up handing the repairman the
dead part and telling him the serial number of the system
we pulled it from. Maintenance contractors make extremely
expensive spare-parts couriers.

Maintenance contracts are like health insurance -- an
ill-defined amalgam of pre-paid services and outsourced
risk. It really pays to take a hard look at what you are
getting out of it.

> I think many people confuse HA with HPC.  HA often demands brand name
> stuff and all the expensive service deals, because of the
> nonlinear/magnified costs associated with downtime.  HPC is USUALLY
> fairly insensitive to downtime of single nodes -- it costs you a 1/N
> fraction of the total resource, it might cost you one chunk of work at
> whatever you've established your checkpoint/task granularity scaled with
> N, but it rarely costs you an extended N-scaled loss of resource.

Sure enough.

> Besides, as Bob points out so ably, a sensible buying/building pattern
> can actually significantly reduce downtime even compared to the best of
> service contracts.

Too many organizations, I think, think about computing
purchases one at a time. It really does pay to plan,
if your workload allows it.


More information about the Beowulf mailing list