COTS was Re: [Beowulf] 96 Processors Under Your Desktop

Robert G. Brown rgb at phy.duke.edu
Thu Sep 2 07:35:38 PDT 2004


On Wed, 1 Sep 2004, Jim Lux wrote:

> Interesting point..
> At what point does "turnkey" turn into COTS?
> Maybe if COTS were really Consumer Commercial Off The Shelf or Consumer Off
> The Shelf?
> I would think that the intent of COTS is a sort of non-customized,
> non-unique, catalog item.
> 
> But RGB makes a valid point that one can buy a complete turnkey cluster,
> software all installed, etc.   However, these are not really COTS, in that,
> I doubt any of the vendors has a warehouse full of them all sitting on the
> shelf ready to be shipped.

Nowadays, a lot of vendors (e.g. Dell) don't have a warehouse full of
>>PCs<< sitting on the shelf ready to be shipped.  Just in time,
semi-custom assembly has become a commodity itself; even vanilla pc
vendors often have a web configurator.  Although this is all really
nit-picking about whether or not there needs to be a real shelf and a
commodity market with brokers and everything for something to be COTS as
it stands in the original beowulf definition.  Obviously there doesn't.

Let's instead go with the fairly clear purpose for including COTS in the
original definition of beowulf.  The idea was, and still is, to exploit
the fact that computers built out of components that are sold by the
tens and hundreds of millions of units, ideally components that are
available from several competing manufacturers, have profit margins
determined by the scale benefits of large scale manufacturing AND
surpressed from above my competition in the marketplace.  This was in
direct contrast to the big iron supercomputers of the day, which were
generally hand engineered and used many parts that were custom
engineered for just the one system and manufactured in limited runs at
consequent high cost.

Even six or seven years ago there was nothing contradictory about
building and selling a turnkey beowulf.  Rackmount or not, they were
built out of readily available COTS PARTS (not necessarily readily
available COTS SYSTEMS, as cluster people have nearly always
microspecified the configurations of their nodes in such a way that you
would be very unlikely to find them on the actual shelf of an actual
store.  We add memory, alter disk, select a faster CPU, dump the high
end video card, add a better NIC or even a cluster-specific custom NIC.

So I'd have to disagree that beowulf cluster nodes have EVER been
"catalog items" in practice.  They have ALWAYS been more or less custom
assembled according to specification, but they have been built out of
COTS >>parts<<.  No fair using a fancy motherboard with exotic
communications or memory pathways of use only to cluster builders.  No
fair doing a custom ASIC and designing your own motherboard or card just
for your one cluster.  Just an off the shelf disk (however nice or cheap
a shelf), an off the shelf motherboard (sold by the hundreds of
thousands in nice identical boxes) equipped with memory and network etc
ditto, and packed up in a standard case.

Now, in recent years, the COTS concept has been bent a little in both
beowulf and generic cluster engineering in several respects.  Custom
cases have become a commonality among vendors selling turnkey clusters
or selling vanilla "cluster nodes" (as always, built to your
specification within reason).  This is partly because server class
motherboards run hotter than a 1U packaging permits.  I'd argue that
they are still, barely, COTS in the sense that matters because there are
lots of competing manufacturers, lots of units sold, and multiple
markets that use them (HPC clusters and server clusters, very different
markets at that).  

The other is the network, where as we know the cluster market HAS
sustained the development of a handful of "speciality" networks, e.g.
myrinet, sci.  These "cluster networks" are really the one place where I
see the definition of COTS bent to the breaking point (if one wants to
be picky -- I personally don't think there is any reason to be religious
about it especially for this particular component).  Yes, the network
interfaces have to plug into a standard bus.  Yes, they are "mass
marketed" (to all the cluster builders in the universe).  Yes, there is
even competition -- between the few, totally incompatible and
proprietary alternatives.

Where gigabit ethernet is clearly COTS, gigabit myrinet is clearly not.
Yet who amongst us would argue against implementing the latter (or its
more recent faster decendents and cousins) in a cluster design that
required it?  Not me, that's for sure.  One day this may change.
Perhaps a new network will emerge that is used (as is ethernet today) in
a wide range of systems and not just clusters, that has high bandwidth
and low latency, and that is built to "open" standards at least in the
sense that (like ethernet) anybody can pay a modest fee and design an
interoperable interface on the basis of published specifications.

In the meantime, COTS is an important element of beowulf or cluster
design, and it is the overwhelming cost-benefit of COTS vs non-COTS that
has cause the explosive growth in the number of cluster nodes in recent
years.  But it shouldn't be carried to a fault -- if non-COTS components
end up being part of the most cost-effective solution to a particular
problem, obviously a sane buyer will use them.

Now, regarding turnkey clusters -- what the cluster buyer gets from the
deal is BOTH a pile of COTS hardware (possibly "corrupted" with a
COTS-grey custom 1U case and openly non-COTS network) AND the human
expertise required to install it to the requirements of the customer.
This latter step has nothing whatsoever to do with the beowulf or
general cluster definition.  Most cluster users hire somebody to install
their cluster these days, I'll bet.  Folks like myself who both build
the cluster and use it are an archaic holdover from the early days of
the beowulf list, although I have no doubt that the list itself
overrepresents this part of the cluster-operating population for obvious
reasons.  At Duke, MOST of the clusters on campus are built and
maintained by systems people and actually used by research faculty that
never touch a screwdriver or wire and that never have root privileges on
a node.

With that, how could it matter who does the assembly and installation?
It is "hired out" (relative to the actual user of the hardware) either
way.  Obviously one should (again) go with the most cost-effective
solution.  In some cases this will be turnkey solutions, for example if
a cluster is needed by a group with little local expertise or
opportunity cost sysadmin labor available.  In others, it will be a
locally engineered, installed, maintained cluster, typically where there
is a lot of local expertise or a surplus of opportunity cost sysadmin
labor so that economy of scale can be exploited.  Universities and
certain government labs fit the latter pattern; corporations and other
government labs more often fit the other.  Both are using COTS clusters
and seeking to maximize utility and minimize cost.

> It would also be interesting to know how many of those turnkey clusters are
> being delivered to total cluster newbies who will use them with minimal
> cluster specific training (obviously, they need to know where the power
> switch is, etc.).  I'd guess that most of the turnkey clusters are going to
> either someone who has used a cluster before, possibly having built one
> themselves and recognizing they have better things to do with their time, or
> to someone who will take a class or specific training on cluster use.

I'm not at all sure about this, but I'm sure that some of the turnkey
vendors will respond.  From what Joe has told me, for example, I think
that many turnkey clusters are custom engineered for specific (software)
applications and sold along with training and support to a group that
has minimal local skill or experience with clustering.

Somebody that really knows what they are doing knows that the marginal
cost of turnkey is far greater than the cost of the week or so (tops)
that it takes to set up and install most cluster configurations, less in
an environment already running multiple clusters.  The additional cost
per node means fewer nodes, and cluster users tend to be node-hungry.  I
think that they usually spend nodes for turnkey systems when they have
little choice (or when they have very deep pockets behind them).

> I see the COTS model is more consumerish, in that the seller doesn't expect
> to have to provide much customization and support. Not many people take a
> class on operating their TV or VCR or cellphone. Some people take classes on
> PCs, but most sort of get on the job training from someone else who knows
> more about what to do.  I don't think there's enough cluster folk about to
> go for that model, though.

There are now numerous vendors selling "cluster nodes" as more or less
commodity items.  Penguin, for example.  Penguin owns Scyld, and will
sell you a turnkey cluster with scyld preinstalled on it I'm certain.
It will sell you a cluster with SuSE preinstalled on it (their default,
I believe) where with PVM or MPI you are left needing to install
accounts, set up NFS, and go "poof, you're a cluster".  It will sell you
a cluster with SuSE preinstalled on it (for free, why not) that you can
subsequently PXE-start into your own cluster configuration.  They charge
according to what you get, they use OTC parts in custom cases (important
as noted to get a reliable dual anything in a 1U form factor), they have
plenty of competition e.g. Appro as noted by Gerry, IBM, Dell, etc...
who ALSO use OTC parts in custom cases, will in some cases preinstall an
OS for you, and so on.  Then there are groups that JUST assemble nodes
or buy them from these vendors and put together a serious cluster for
you and deliver it all racked up to your very door, and will provide
training or custom softare installs or even site management -- for a
fee.

I think these examples make it clear that the marketplace has clearly
differentiated the (mostly) COTS systems themselves that make up
"cluster nodes" in all sorts of clusters and the installation,
maintenance, and operation of those nodes.  A beowulf need not be a DIY
enterprise, and if it isn't it is just a matter of how and who does the
work you don't do yourself INDEPENDENT of the COTS issue.

> And, thinking of things where the complexity is between toaster oven and PC,
> there was some sort of self-instructional video built into my new HD-DVR
> cable box, and given the "rev zero" ness of the operating software, maybe I
> should have watched it. (totally off the subject, but it's supposedly a
> Linux based system running on a 733 MHz x86, with an integrated cable modem
> and ethernet interface, etc..... There's a vehicle for "grid computing"....
> The cable company can sell spare cycles on my box, and I'm paying for the
> electricity AND the box too.  Maybe that's why they gripe so much when I
> unplug it all the time (it draws about 100W, 24/7, so it costs more for the
> electricity to run it than I pay for the HD cable service))
> 
> And, I have one of those $200 WalMart cluster nodes at home..it's OK, but I
> wouldn't buy another one, for a variety of reasons.

Precisely.  Cluster builders have >>always<< specified node
configuration and only >>rarely<< actually used systems purchased off of
some shelf the way they come out of the box.  That doesn't make their
nodes any less of a COTS item; it only recognizes that the COTS part
refers to the components, not the particular assembly.

So I reiterate -- it will be interesting indeed to see if the cluster
market is finally large enough to support "cluster specific" CPUs and
supporting chipsets that are not used at all in non-cluster applications
the way it supports cluster specific NICs.  COTS or not, if they are
price/performance winners they are likely to succeed, if not they are
likely to fail (or at least fail to grow beyond a specialty
market-within-a-market scale).

   rgb

> 
> ----- Original Message -----
> From: "Robert G. Brown" <rgb at phy.duke.edu>
> To: "Greg Lindahl" <lindahl at pathscale.com>
> Cc: <beowulf at beowulf.org>
> Sent: Wednesday, September 01, 2004 2:16 PM
> Subject: Re: COTS was Re: [Beowulf] 96 Processors Under Your Desktop
> 
> 
> > On Wed, 1 Sep 2004, Greg Lindahl wrote:
> >
> > > On Wed, Sep 01, 2004 at 12:41:24PM -0700, Jim Lux wrote:
> > >
> > > > "requires time and expertise to set up" is of course what makes
> clusters (as
> > > > a completed system) not COTS, even though the components or
> subassemblies
> > > > may be COTS.
> > >
> > > I learn a new definition of COTS every day. I hadn't seen this one
> > > before.  I suppose all the parents struggling to assemble toys on Xmas
> > > eve can console themselves that the mass-market item they bought at
> > > Wal-Mart isn't COTS...
> >
> > And turnkey beowulf systems (built of COTS components) have been around
> > for many years now.  In fact, some list members (ahem;-) have built and
> > sold them.
> >
> > So the new computer cluster (orien?), with a new CPU, is actually
> > FARTHER from COTS -- especially if the new CPU is designed only for use
> > in the cluster market.  Hopefully it isn't -- it's questionable as to
> > whether the cluster market can sustain a specialty CPU with so many COTS
> > alternatives that stay cheap because they are mass marketed.
> >
> > Wal Mart sells compute nodes, too, if you want to use their cheap
> > systems for that purpose.
> 
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu






More information about the Beowulf mailing list