HD cloning

Tue Dec 5 11:59:33 PST 2000

On Wed, 6 Dec 2000, Bruce Janson wrote:

> And to you Robert Brown: speak for yourself please when you say
> (in your message of Sun, 3 Dec 2000 16:03:39 -0500 (EST)):
>
> 	A beowulf is a high performance computing
> 	cluster, not a data or web server cluster.
>
> This kind of supercomputer elitism, this fascination with fine-
> grained parallelism and linear speed-up has held back the progress
> of single system image multicomputing for long enough.  I disagree
> with your claim, so much so that I wouldn't even fight for your
> right to make it (well, not with much conviction).

Whew!  Harsh words!  I don't know how you read "fine-grained
parallelism" into the phrase "high performance computing cluster" -- I'm
actually an embarrassingly parallel Monte Carlo kind of guy (and linear
speedup is what you get when you're NOT fine-grained, so I make out just
fine there;-).  I'd therefore be deeply hurt if I weren't wearing
teflon-coated asbestos over a kevlar vest;-)

However, I'm not speaking for myself and it's not what I say that
matters.  It is what e.g. Sterling and Becker say:

  http://www.beowulf.org/intro.html

(first paragraph).  As the constructors of the original beowulf and
coiners of the very term, their definition and utilization is the one
that matters, although there is also a consensual element of the list
participants associated with it.

You clearly need to read a bit while properly sedated. I recommend a
nice cold beer, or even two or three, so go grab a few and have a seat
(and so will I;-).  Now, here's a reading list.  Let's see, sitting
right next to me I have:

In Search of Clusters, by Greg Pfister

(http://www.phptr.com/ptrbooks/ptr_0138997098.html)

This book will teach you that "cluster computing" is a venerable and
generic term that includes high availability and failover clusters
(suitable for use as webservers and distributed databases) as well as at
least certain kinds of (parallel as opposed to vector) supercomputers or
generic SMP systems themselves, which Pfister views correctly as being a
cluster of processors united by some sort of distributed/common IPC and
memory system.

Beowulfs are a specific subclass of cluster computers and (IIRC) aren't
even discussed (at least in any detail -- they didn't make the index) in
Pfister's book, which was probably largely (being) written at the time
the original beowulf was being built and winning its builders Bell
prizes for the most cost-beneficial high performance computer design.
Of course all the beowulf "glue" -- PVM, MPI, and all that -- did exist
and was in use by me among many many others years before the beowulf
project, so it isn't surprising that beowulfs are described in all but
name.  It does (amusingly to me, at least:-) mention Microsoft's Cluster
Services "Wolfpack".

Consequently we find that pre-empting the word "cluster" in favor of
"beowulf" for all linux clusters seems a bit presumptive.  Clusters were
around and long before PC's, linux, and even ethernet.  As I said, all
beowulfs are clusters, but not all clusters are beowulfs.  Not even all
linux clusters.  Not even all rackmount or shelfmount dedicated linux
clusters.

Frankly, not even the mostly shelfmounted linux cluster on a private
network I'm sitting at right now at home is technically a "real
beowulf", although I usually do speak of it as my "home beowulf" because
it "almost" qualifies.  However, it contains 2-3 workstations that
double as compute nodes and on the net I find:

The Beowulf FAQ, compiled and maintained by Kragen Sitaker:

(http://www.dnaco.net/~kragen/beowulf-faq.txt)

I quote:

  1. What's a Beowulf? [1999-05-13]

  It's a kind of high-performance massively parallel computer built
  primarily out of commodity hardware components, running a
  free-software operating system like Linux or FreeBSD, interconnected
  by a private high-speed network.  It consists of a cluster of PCs or
  workstations dedicated to running high-performance computing tasks.
  The nodes in the cluster don't sit on people's desks; they are
  dedicated to running cluster jobs.  It is usually connected to the
  outside world through only a single node.

  Some Linux clusters are built for reliability instead of speed.  These
  are not Beowulfs.

This is a summary of a fairly extended list discussion -- it is a fair
representation of the consensual view of the list at that time.

This is of course a fairly particular (although accurate enough)
definition and there are those on the list with a broader view.  In fact
(amusingly enough) I'm one of them and have had some interesting
discussions with the more passionate defenders of the original "tight"
definition in the past.  From the Beowulf HowTo, for example, we get:

  There are probably as many Beowulf definitions as there are people who
  build or use Beowulf Supercomputer facilities. Some claim that one can
  call their system Beowulf only if it is built in the same way as the
  NASA's original machine. Others go to the other extreme and call
  Beowulf any system of workstations running parallel code. My
  definition of Beowulf fits somewhere between the two views described
  above, and is based on many postings to the Beowulf mailing list...

By no great coincidence, another book I happen to have at hand is "How
to Build a Beowulf", by Sterling, Salmon, Becker and Savarese (SSBS),
which has to be viewed as a sort of "horse's mouth" view of beowulfery.
In it, they take a surprisingly inclusionary view of beowulfery at the
application level, while being much stricter on the hardware
architecture side (where they clearly differentiate a "true beowulf"
from an e.g. NOW or COW or POPCs like the one I run at home and mislabel
a "beowulf":-).  For example, in section 10.2, "New Opportunities" the
authors acknowledge that while historically beowulfs have been primarily
used for scientific and technological applications (traditional
"supercomputing chores") the hardware architecture itself is amenable to
new domains of application including databases and web servers and
hyperrealistic simulational online gaming and virtual realities and
process control and AI and genetic programming.

Who could argue?  A rackmount/shelfmount linux cluster is an (undeniably
useful in all of these venues) rackmount/shelfmount linux cluster, and
the architectural glue that they view as being a core element of the
beowulf can be used to stick together many kinds of parallel
applications.  I even do some AI and work on parallel genetic
optimization code on my home 'wulf, and my gateway node has a
(non-parallelized:-( webserver running on it (oops, I did it again).

One still has to ask if it is fair to call any old rack of linux boxes
in an ISP a "beowulf", or a rack of linux boxes in a webfarm, or a rack
of linux boxes running any sort of distributed database, or office full
of linux workstations running a background computation at the same time
they provide console access and word processing to foreground users.

In the past, this has has been a point where list opinions have diverged
somewhat, partly because not everybody uses the same glue to the same
extent.  A beowulf >>can<< be viewed as nearly any dedicated rack or
stack of linux boxes on a private network because there is no sine qua
non of beowulf on the kernel/glue level.  With Scyld as a sort of
unifying glue, that may slowly become less of an issue, although I doubt
it.  It isn't clear that a unified process id space is desirable in all
circumstances, for example, and one does give up one sort of the power
and flexibility of a node in exchange for another sort of power when one
configures nodes so that they can no longer support a login process (for
example).  There is also no real advantage in being >>too<< narrow in a
definition.

I personally think (and this is now MY opinion, not accepted definitions
or even necessarily in agreement with SSBS), from being on the list for
years and reading all of these books and more fairly carefully, that it
isn't really fair.  ISP's discovered stacks of linux boxes independently
and have written their own glue.  So did a lot of the webserver folks.
They use a largely independent software base and overlap remarkably
little with the message-passing sort of software/networking technology
that seems to be an essential element to beowulfery.  Databases I'm more
open minded about, but again the problems being solved often transcend
just parallel computation and communication on COTS hardware.

I'm just not comfortable with every rackmount or shelfmount cluster
known to man that happens to run linux (or freebsd, or WinNT, or DOS, or
Solaris -- where does one draw a line?)  being suddenly relabelled "a
beowulf".  It would be like calling toilet paper and paper towels and
paper napkins and even wet naps "Kleenex" just because one kind of
facial tissue was particularly successful at branding.  Diversity keeps
us from having rolls of toilet paper on hand for use as picnic napkins,
specificity keeps me from bringing home paper napkins instead of facial
tissue.

There are also practical reasons to maintain at least a teeny bit of
focus on the list, regardless of just what a beowulf "really" is.  For
one, this list has one of the best signal-to-noise ratios of any list
I've ever been on (and I'm probably singlehandedly some of the worst of
the noise;-).  Parallel database discussions alone could make this list
as bad as the linux kernel list (which is practically unusable at this
point without an attack-dog adaptive procmail filter) and would be
utterly uninteresting to, um, "many" of the list paticipants.  Possibly
violently uninteresting.

For better or worse a lot of the list members are

  a) Physical Scientists >>using<< beowulfs for numerical computations.
We tend to be less concerned about whether a given cluster is really a
"beowulf" in the precise sense defined by SSBS and in the FAQ and more
concerned with whether the particular cluster we are working with or
trying to design will accomplish the real work at hand that we need
done at affordable cost.  In other people's money, of course;-)

  b) Real Computer Scientists working beowulfery as their primary
research interest -- see the Clemson group, with Rob Ross, Walter Ligon,
and others and the PVFS they are building as well as the beowulf
underground site, for example.  There are papers published on this stuff
and prizes awarded for this stuff.  Heck, I dabble in this as well
myself as it is quite fun, but it is really an avocation, not a
vocation.

  c) Professionals running (turnkey and otherwise) beowulf support
businesses -- Paralogics' Doug Eadline and HPTi's Greg Lindahl, for
example.  Scyld's Dan Ridge, Erik Hendriks, Don Becker and others.  Note
that many of these guys are Real Computer Scientists who graduated to
the "real world" to get rich on their startups.  They are NOT competing
on e.g. webserver RFPs (as far as I know, anyway).

  d) Sundry Interested Parties.  At Expos and online I've met corporate
folks from IBM, publishers, entrepreneurs, oil prospectors, and many,
many students.  Many of these folks just listen and learn and ask rare
questions.

  e) A very significant fraction of the list membership is overseas with
much of it in developing countries.  This makes sense -- the beowulf
concept is by definition and design >>the<< bleeding edge cost/benefit
winner for supercomputer design.  The USA still has deep pocket funding
agencies that will spring for a multimillion dollar big iron
supercomputer to solve a grand challenge problem in four or five years
four or five years before the current generation of PC's can do it as a
screensaver.  Overseas, they often don't.  Students in Korea, in
Pakistan, in Malaysia can put together a real supercomputer for a few
thousand dollars or even less, using recycled or obsolete parts.

Many of these folks (outside of d) and e) have been on the list almost
since the beginning.  They provide both perspective and continuity and
an instant, free consulting service to those in groups d) and e), who
come and go.  They'll tolerate a discussion of embarrassingly parallel
applications and grid computing and so forth because it is high
performance computing on a COTS open source architecture (in the sense
that many flops are being expended on a legitimate parallel
supercomputing application, at least).  They'll (mostly) tolerate my
calling my heterogeneous beowulfish cluster at home or in the Duke
physics department a "beowulf" because it is too tedious to write
"heterogeneous beowulfish cluster consisting of a mix of dedicated and
desktop nodes running an appropriate parallel task mix based loosely on
e.g. MOSIX, PVM, MPI, sockets and other IPC mechanicsms that keeps it
busy" every time I want to refer to it.

BUT do they have to listen to every person who builds a web farm and
wants to call it a beowulf talk about transparent forwarding of
connections?  Do they want to delve into the wonders of (parallel) MCIF
systems, SQL statements (plus extensions) executed in parallel on a
distributed database?  Do they want to hear about failover in web-based
distributed applications, about (parallel) CRM systems, about (parallel)
B2B efforts and about (parallel) server appliances? Even if they think
some of this interesting (as I obviously do) is there time to cover all
this on this list?  Not in two lifetimes.

A moderate degree of focus is what allows me to write long answers to
newbie questions.  A moderate degree of focus allows me to read the
discussion written by others and find it useful instead of Huh?

I'd be happy to listen to a discussion of e.g. PVFS as a platform for a
database (I already have, actually).  I think integrating a webfarm with
a distributed computatational base (beowulf) is a totally nifty idea for
providing certain kinds of services (and have a startup company working
on the idea).  However, high availability, failover, ISP issues per se,
webfarm issues per se, I just don't have time to learn about all of this
and get ANYTHING done.

Is this fair?  Am I being crazy here?  I'm not even a Real Computer
Scientist, so casting me as a fine-grained computing fiend who is
somehow obstructing "true, unbridled" beowulf development is just not
correct...

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu