[Beowulf] List traffic

Thu Jan 12 07:35:41 PST 2012

On 1/12/12 6:53 AM, "Ellis H. Wilson III" <ellis at runnersroll.com> wrote:
>  I 
>recently read a blog that suggested (due to similar threads following
>these trajectories) that the Wulf list wasn't what it used to be.

I think that's for a variety of reasons..

The cluster world has changed.  Back 15-20 years ago, clusters were new,
novel, and pretty much roll your own, so there was a lot of traffic on the
list about how to do that.  Remember all the mobo comparisons, and all the
carefully teased out idiosyncracies of various switches and network
schemes.

Back then, the idea of using a cluster for "big computing" was kind of
new, as well.  People building clusters were doing it either because the
architecture was interesting OR because they had a computing problem to
solve, and a cluster was a cheap way to do it, especially with free labor.

I think clustering has evolved, and the concept of a cluster is totally
mature.  You can buy a cluster essentially off the shelf, from a whole
variety of companies (some with people who were participating in this list
back then and still today), and it's interesting how the basic Beowulf
concept has evolved.

Back in late 90s, it was still largely "commodity computers, commodity
interconnects" where the focus was on using "business class" computers and
networking hardware. Perhaps not consumer, as cheap as possible, but
certainly not fancy, schmancy rack mounted 1U servers.. The switches
people were using were just ordinary network switches, the same as in the
wiring closet down the hall.

Over time, though, there has developed a whole industry of supplying
components specifically aimed at clusters: high speed interconnects,
computers, etc.   Some of this just follows the IT industry in general..
There weren't as many "server farms" back in 1995 as there are now.

Maybe it's because the field has matured?

So, we're back to talking about "roll-your-own" clusters of one sort or
another.  I think anyone serious about big cluster computing (>100 nodes)
probably won'd be hanging on this list looking for hints on how to route
and label their network cables.  There's too many other places to go get
that information, or, better yet, places to hire someone who already knows.

I know that if I needed massive computational power at work, my first
thought these days isn't "hey, lets build a cluster", it's "let's call up
the HPC folks and get an account on one of the existing clusters".

But I still see the need to bring people into the cluster world in some
way.  I don't know where the cluster vendors find their people, or even
what sorts of skill sets they're looking for.  Are they beating the bushes
at CMU, MIT, and other hotbeds of CS looking for prior cluster design
experience?  I suspect not, just like most of the people JPL hires don't
have spacecraft experience in school, or anywhere.  You look for bright
people who might be interested in what you're doing, and they learn the
details of cluster-wrangling on the job.

For myself, I like probing the edges of what you can do with a cluster.
Big computational problems don't excite me.  I like thinking about things
like:

1) What can I use from the body of cluster knowledge to do something
different.  A distributed cluster is topologically similar to one all
contained in a single rack, but it's different.  How is it different
(latency, error rate)? Can I use analysis (particularly from early cluster
days) to do a better job.

2) I've always been a fan of *personal* computing (probably from many
years of negotiating for a piece of some shared resource).  It's tricky
here, because as soon as you have a decent 8 or 16 node cluster that fits
under a desk, and have figured out all the hideous complexity of how to
port some single user application to run on it, someone comes out with a
single processor box that's just as fast, and a lot easier to use.  Back
in the 80s, I designed, but did not build, a 80286 clone using discrete
ECL logic, the idea being to make a 100MHz IBM PC-AT that would run
standard spreadsheet software 20 times faster (a big deal when your huge
spreadsheet takes hours to recalculate).  However, Moore's law and Intel
made that idea a losing proposition.

But still, the idea of personal control over my computing resources is
appealing.  Nobody watching to see "are you effectively using those cpu
cycles".  No arguing about annual re-adjustment of chargeback rates where
you take the total system budget and divide it by CPU seconds.  Ooops not
enough people used it, so your CPU costs just quadrupled.

3) I'm also interested in portable computing (Yes, I have a NEC 8201-
TRS-80 Model 100 clone, and a TI-59, I did sell the Compaq, but I had one
of those too,  etc.)  This is another interesting problem space.. No big
computer room with infrastructure.  Here, the fascinating trade is between
local computer horsepower and cheap long distance datacomm.  At some
point, it's cheaper/easier to send your data via satellite link  to a big
computer elsewhere and get the results back.  It's the classic 60s remote
computing problem revisited once again.

>