Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

cluster frustrations

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Joachim Worringen joachim at lfbs.RWTH-Aachen.DE
Thu Jan 17 00:12:24 PST 2002


Patrick Geoffray wrote:
> 
> Joachim,
> 
> Joachim Worringen wrote:
> > But they don't get it to run reliably with
> > the current Linux/GM/MPICH versions which of course should run faster,
> > better, nicer. I don't blame Linux or Myrinet for these problems -
> 
> Obviously, you do. Inciting another flame war ?

No, I never intend to incite flame wars, but discussions. I can tell you
a lot of stories about mal-functioning self-made SCI clusters, but I
have no hands-on experience with such a cluster being operated in a
similar (production) environment, because such customers usually chose
Scali-made systems. And I prefer to talk about hands-on experience, not
second-hand stories. The Scali-equipped systems I know of run well now,
although this hasn't always been like this (mostly due to bugs/strange
features in the last generation hardware, LC2). But Scali systems, to
stick with these, are well-defined platforms, running qualified kernels
etc., which (if not using such) is one source of problems.

[...]
> So if you really experienced problems with this machine, please
> contact help at myri.com, this is the first step toward happiness.

I had reproducable application aborts when running PMB with 32
processes. I informed Ulrich Detert about this, and he confirmed the
problems. Up to now, they stick with 2.2 (which runs stable, but not as
fast it could), which does *not* mean, that such a system wouldn't work
with 2.4 and current GM - it's only that these guys did try to find that
"golden configuration" during their update (or by chance did hit the one
dirty configuration) and didn't succeed. 

Once again: I don't doubt that there do exist Myrinet systems which run
perfectly. There just may be a lot of chances (with self-made clusters
in general) to make mistakes, hindering stable operation.

> You cannot compare Crays/SP2 with do-it-yourself Linux clusters. 

Exactly. Paying less money means investing more time. Which may be
equivalent to money.

  Joachim

-- 
|  _  RWTH|  Joachim Worringen
|_|_`_    |  Lehrstuhl fuer Betriebssysteme, RWTH Aachen
  | |_)(_`|  http://www.lfbs.rwth-aachen.de/~joachim
    |_)._)|  fon: ++49-241-80.27609 fax: ++49-241-80.22339



More information about the Beowulf mailing list