[Beowulf] Re: Why Do Clusters Suck?

Tue Mar 22 15:01:09 PST 2005

On Tue, 2005-03-22 at 15:12, David Mathog wrote:
> > On Tue, 2005-03-22 at 12:42, David Mathog wrote:
> > > 
> > > More to the point, while there is certainly a lot
> > > of room for improvement, an awful lot of work is
> > > getting done today using existing cluster technology
> > > and it's far from clear to me that an advance
> > > in cluster management software would result in much more
> > > productivity.  As opposed to, for instance, improving
> > > network throughput, CPU power, or component reliability by
> > > a factor of 10, any one of which would lead to an immediate
> > > and dramatic productivity increase.
> > > 
> > 
> > Would it?
> 
> Yes.  Programs tend to either be CPU limited and/or bandwidth
> limited.  If you improve the relevant components the program
> will speed up to the point that something else becomes the new
> bottleneck.  For most of our work now the CPU or memory bandwidth
> is limiting but for some operations (data distribution) the
> network bandwidth is.
> 
> > Myrinet or IB is more than enough bandwidth for
> > us
> 
> Ok.  Now imagine what would happen if you dropped back to 100baseT,
> which is what I'm still using.

I think we are talking two different things.  I thought we
were talking 'Why do clusters suck'.  My response was to indicate
that your problems aren't necessarily related to clusters.  
It might be related to your cluster, but not clusters in general.
Of course if we don't throw in "It depends" we will start more
arguments.  

If you want to compare network bandwidth of a cluster to an
Altix or Cray then yes, the bandwidth is woefully inadequate.
As far as CPUs, you can buy nodes with Itanium and build a cheap
altix if you don't want share memory.  Cray vector processors
are a bit difficult.  Even for IBM system, you can get the POWER5
in a small form factor and build a cluster.

For the nodes themselves, you can buy systems with redundant power and
redundant disk.  You can buy from an system vendor that qualifies 
their hardware much more rigorously than another.  When you buy cheap
hardware you get cheap hardware.

> 
>  (weather and ocean modes, nearest neighbor communications), we prefer
> > better latency.
> 
> > We have over a thousand nodes and hardware
> > reliability has never significantly impacted our users and their
> > productivity. 
> 
> We've lost up to 2 of our 20 nodes at a time.  Most of our
> tasks depend upon particular data set slices being distributed
> across the nodes. When one node goes down it takes several
> hours to redistribute the data appropriately among the
> remaining nodes.  If I had 1000 nodes this would become
> enough of a problem that I'd have to redo the data distribution 
> method and build in something resembling a RAID like redundancy.

Did you have to architect your system this way?  This is
an issue with your problem and your solution, not clusters.

Losing nodes is only critical if you lose the data.  You should
be able to pull the disk, plop it in another node, turn it on
and keep going.  That shouldn't take long as someone has access.
Is the downtime of that process the most cost effective way to
maintain your availability.

I don't know the format of your data, but you could buy one
more node and add it to the cluster.  At certain intervals, copy
the data from one node (assuming local disk) to the next.  This is
probably better than redistributing your data from 20 nodes to 18 nodes.
In case of a failure, have the extra node kick in and redistribute
the task, not the data, to the right nodes.  You will have to
balance out the number of checkpoints to the amount of processing
that gets done, but it is doable.

Or, I am completely wrong. I don't really know your problem.  My point
is have you considered finding a way to not have to redistribute all of
the data?

> 
> > 
> > Our biggest problem is the immaturity of development
> > tools. 
> 
> I feel your pain on that one.
> 
> > It is all too common to hear
> > developers tell me things like "does it work if you turn off bounds
> > checking?".
> 
> Egads!  I'm a big fan of building and testing programs on as many
> completely different platforms as possible, and with every
> possible warning enabled.  That does wonders for wringing latent
> bugs out of code.

What you have are scientists playing computational scientists.  The code
isn't what is important (for my users), the results are.  However,
scientists rarely are willing to give a dime of funding to the hire
computational scientists even though in the long run they will probably
write more papers and get more science done.  Some get it, but
not as many that should.

Craig

> 
> Regards,
> 
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech