[Beowulf] torus versus (fat) tree topologies

Chris Sideroff cnsidero at syr.edu
Fri Nov 12 10:37:09 PST 2004


On Thu, 2004-11-11 at 18:15, Mark Hahn wrote:

> > choosing a high-speed interconnect.  The consensus was that I should 
> > first determine whether I even need one, which I have done, and my 
> > conclusion was that it will benefit our hardware/software combination 
> > greatly.  I have some results which I will post soon.
> 
> I'd be curious to know what gigabit configuration you tested, and 
> whether you considered any mesh-based gigabit.

  We did not consider a mesh-based gigabit.  At the time of purchase,
there was no one present in the department even remotely familiar with
clustering and only one person with the UNIX/Linux abilities to
administer it.  This is why I've been asking all these questions.

  I guess I haven't mentioned it yet but I'm a PhD student in the
Mechanical and Aerospace Engineering department at Syracuse University
in upstate New York.  Prior to my arrival here I only had superficial
knowledge of clustering and have subsequently spent the last year
researching, reading, configuring, testing, etc ...  all of this while
working on my PhD research (CFD).  So I'm essentially the administrator
_and_ major user of it. I have to admit it's kinda nice to have almost
exclusive use of that much horsepower (64 Opteron 242's) for my work!
  
  Sorry, that was a bit of topic.  Just wanted to give some background
as I realize the majority of readers are seasoned cluster
users/admins/researchers.

  Back to the question - mesh-based gigabit.  The boards only have two
gigabit ports therefore I would need to get two more per node to test
this topology.  If Fluent is truly nearest-neighbour and there is some
benefit to torus topology for this case it might be an option.  BTW, a
managed HP Pro/Curve (forget model) 36-port gigabit switch is currently
used, which possibly may also be hindering performance.

  Another thing I'm looking at is the hardware vendors experience with
Fluent.  Since, there is no option of tuning the code it is quite
important that the vendor have had some experiences with other customers
using it.  And even better, if the technicians are willing to provide
support for Fluent with their product.  Dolphin has been _extremely_
helpful in this respect, providing an SCI cluster for me to test Fluent
and offering suggestions for running it (thanks Simen).

  I don't expect our cluster will grow beyond the current 32-nodes. 
There may possibly be an upgrade of CPUs (and/or memory) in the future
when the dual-core chips come out but doubtfully the addition of nodes,
simply because the are not enough users of it.  As a consequence, I am
looking to find the "best" interconnect solution which will allow a few
people use of most or all of the CPUs for the jobs we run.

Thanks again for your comments.  Chris Sideroff




More information about the Beowulf mailing list