Computer Science research done on Beowulf class systems

Robert G. Brown rgb@phy.duke.edu
Tue, 8 Jun 1999 12:06:33 -0400


On Mon, 7 Jun 1999, Walter B. Ligon III wrote:

(original thread?)

> > > A few really
> > > large sites might need to spread the load a bit, but would probably be better
> > > served my multiple servers that work together than a beowulf.

(from Greg)

> > 
> > Obviously we are back to "What's a beowulf?" Multiple servers that
> > work together is a traditional cluster. Businesses, btw, have used
> > clusters for as long as scientific programmers have used clusters. The
> > wall street firm I used to work for didn't have any machine with more
> > than 2 CPUs, nor did they do any parallel programing, but they had a
> > large cluster. They built it for availability and throughput reasons.

(Walter)

> Yeah, well, I really don't want to debate that.  What you have said here is
> exactly my point.  Beowulf isn't really the approach for these problems, but
> there are other good approaches.  "Clusters" have been around for a long time
> in many different forms, and certainly they are a technique for improving
> throughput and capacity.  Beowulf is a parallel computer architecture.  Most
> of the resources in a beowulf are for internal use.  A lot of what I said
> in that posting was actually generic to parallel computers - which HAVE
> been around longer than computers clustered for business use.

An interesting note on this subject (which I'm cc'ing to the Extreme
Linux list as well, as it is as relevant to the EL folks as it is to
just beowulfers):  

Network Computing magazine just came out with Linux for its cover story.
It's claim: Linux is a mature and rapidly developing environment well
able to compete head to head with NT or Netware, already in place in
many organizations (although typically for dedicated "speciality"
purposes more than as a general purpose solution), and with (obviously)
superior price/performance for any purpose for which it competes head to
head performance wise.

They place it somewhat above NT but still below "commercial Unices"
primarily because of (don't get angry at me, I didn't write this stuff,
but the author, Greg Shipley, gshipley@neohapsis.com would undoubtedly
LOVE your input:-):

  a) A lack of "robust SMP support".
  b) An "unpolished clustering technology".
  c) A lack of a "robust 64 bit journalized file system".
  d) A lack of "advanced options for high availability" (see Greg's
     remarks above).

All of these latter elements are presumably present in "market leaders"
like Compaq's Tru64 and Sun's Solaris on the high end.  On the low(er)
end, the article complains that Linux is still weak on the commercial
database front:

  "Moving a company's financial system onto an early beta of Oracle 8
for Linux is a bad idea..."

although he praises it for FTP and Web server farms.  Similarly, the
author sings the praises of Lotus Notes, Microsoft Exchange, and Novell
Groupwise and bemoans the lack of a similar tool in Linux (while
acknowledging its superiority as an SMTP, POP or IMAP mail relay or
server platform, complains that there are still missing certain things
like an enterprise level backup tool, worries about the "expert
friendly" nature of Linux (as basically a Unix), and finally does a
comparison of Linux documentation and support -- the "Linux Certified
Engineer" (LCE, obnoxious as such a thing might be to you or me) gives
businesses the warm fuzzies and is most definitely in the near future.

(Parenthetically, I wonder how they will grant LCE's.  Will I have to
pay somebody hundreds of dollars and take a "course" that I could have
given as an instructor instead?  Will Don Becker or Alan Cox have an
LCE?  Or will there (more reasonably) be an open certification process
that is either dirt cheap or free that doesn't require one to pay for or
take a course at all if one can pass the exam without it.  Enquiring
minds want to know...;-)

All of this strikes me as being a pretty fair treatment.  It is
certainly one of the best treatments Linux has received in a major
computing mag -- NWC ends up basically endorsing it as very, very nearly
"enterprise ready" (which I interpret as being ready to completely
replace WinXX products and other Unices from top to bottom in an
enterprise) and an overwhelming price/performance win whereever it is
already deployed in the enterprise.  

My biggest bitch about the article is in its treatment of "robust SMP"
and clustering, the topic of this thread.  Of course we all know that
SMP under linux is quite robust indeed and in 2.2.x becomes both robust
and sophisticated.  At the time the article was written, 2.2.x was not
autodeployed in commercial Linux distributions and now is, so perhaps
the author would remove this as an objection/obstacle.  Still, there are
quotes that annoy me:

  "Clustering is another area in which Linux lags for mainstream
corporate needs.  The Beowulf project hit the mainstream this spring by
matching world record holder, a Cray T3t-900-AC64, in the PovRay
benchmark test" ... (run by IBM on a 17 Netfinity cluster running Red
Hat)..." Linux clusters have been popping up in education and aerospace
research facilities for some time now.  A few production Linux clusters
even rate among the Top 500 most powerful computers in the world
(www.top500.org).  Organizations seeking raw, high-end computational
power won't find a more cost-effective solution.  >>But Linux clustering
is little more than academic.  Web services. databases and general
high-availability services that would benefit from Linux clustering
haven't matured yet.<<"

(>>emphasis mine<<).  This is a curious remark, since earlier he
describes the widespread use of Linux in FTP and Web server farms (are
not "farms" "clusters"?).  I do think that the remarks concerning
database "clusters" are apropos, but not Linux specific.  The real
problem (as I understand it) is that "database cluster" technlogy itself
is fairly immature on any platform.  Is this incorrect?  Does NT or
Solaris support some sort of superior "clustering"?  Also, what
"advanced options for high-availability" clustering are missing?

> > So watch out for folks who use "beowulf" interchangably with "cluster".
> > I don't, but most of the new people asking questions on this mailing
> > list do.
> 
> Well, I feel I should work to educate them, not support their misconceptions.

This ongoing education thread (which reaches back over many iterations)
is actually a very important one that both of you have contributed
tremendously to over many years -- clearly the author of the NWC article
recognizes a key element of the distinction, that beowulfs are powerful
parallel numerical engines while business "clusters" are more
amorphously defined and (whether or not the actual component tools exist
for Linux) are not being widely >>marketed<< as turnkey solutions at
this time.

It is my own belief (based on reading this list for many years) that
there is both some truth and some error in the author's statements
concerning Linux clustering technology.  I would say that in many cases
it does in fact exist, but I would also agree that it isn't yet properly
organized and packaged and resold, although there are a few vendors
(VAR, Paralogics, others?) that are working on it.  This represents a
huge opportunity to entrepreneurs, as we were working hard on pointing
out at the EL booth at Linux Expo last month.  I'd say that anyone who
identifies key business "clustering" technologies and develops them
agressively over the next six to twelve months has an excellent chance
of riding a wave as Linux surges into the Enterprise.  They'll
undoubtedly make a few bushel baskets of well-deserved moola in the
meantime.

In a lot of cases this will consist of identifying and integrating
existing tools, in others porting existing tools, and in still others
designing and building key components that are still missing.  Alas,
there isn't a lot that can be done with the database side of things
except work on a clustered version of mysql (a possibility recently
discussed on this list) -- the commercial products are being made openly
available to non-corporate (discorporate? ;-) Linux humans which is
good, but they are still not open source which makes it hard to tinker
with them.  This is a key time for developing partnerships and business
alliances to speed the development process, as well.  If disparate
groups in possession of distinct pieces of the pie get together, they
can build the pie a lot faster and there is plenty of pie-starved market
to go around.

The last thing that would be very useful (that is apropos to the beowulf
list in particular) would be the integration of the capabilities of the
classic "beowulf" with those of the "business cluster", which may well
be a lot more amorphous or which may optimize completely different parts
of the information processing stream (like PVFS optimizes disk access,
or Web farms optimize availability).  The most powerful and general
purpose cluster information entity that I can imagine would be one with
a "beowulf" component optimized for very fast parallel computation on
problems with a variety of grain sizes, a (journalized, 64 bit?) PVFS
component that provides very fast distributed access to a very large
file structure, a parallel network component that provides load-balanced
high-availability access to all of this data and compute power, and
probably several other "clustered" components I haven't thought of.
Such a Linux/COTS construct could serve as the core of a true enterprise
level compute facility -- instant and balanced access to both data and
processing power.

Just musings, I know, but I thought they might be of interest on this
list.  I learned at Linux Expo that there is a rather large group of
"lurkers on the lists"; people trying to understand the technology being
developed and discussed to see how to apply it in their own niches.  It
would be really very interesting to develop some sort of list of what
clustering technologies and products already exist in Linux (at what
stages of development) and what is still "missing" that would earn Linux
the NWC seal of approval as "Enterprise Ready" as a clustering
foundation.

Oh, one last interesting note.  NWC repeated the infamous Samba/SMB
comparison done by Mindcraft a few months ago.  Their conclusion:  Linux
performs almost identically to NT as an SMB server -- either one can be
a bit faster on any particular component of their benchmark depending on
configuration details.  Their auxiliary conclusion: If one is building a
large operation, Linux (with fixed costs of maybe $100 for a
distribution CD or two) is an overwhelming cost/benefit win, saving
thousands of dollars.

NWC also had a rather scathing editorial condemning the publication of
the Mindcraft result as "independent" when in fact it was bought and
paid for by Microsoft and run by Microsoft personnel on Microsoft
systems.  Their point is that Mindcraft has now destroyed any
credibility they might have ever had.  Not that they had a great deal to
begin with.

    rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu