disadvantages of linux cluster
dtj at uberh4x0r.org
Tue Nov 5 08:11:46 PST 2002
On Tue, 2002-11-05 at 02:30, Stephane Glockner wrote:
> I work in a french university laboratory. We do massive Computational
> Fluid Dynamics at present time on ORIGIN2000, IBM Regatta... The solution of
> a Beowulf cluster seems very interesting as regards as financial cost and
> We would like to use it in the following manner :
> - multi users (around 10 or 15 users)
> - multi single-processor, light and heavy applications (<=2Go of RAM)
> - use of a parallel version of our application
> I didn't see anywhere on the web disadvantages of such a solution. What are
> they ? Is it simple to administrate day after day ? Is it stable ? Does it
> support heavy load of users and applications as defined above ? Isn't it too
> much an experimental solution ?
It depends on many factors whether the cluster will be the best value.
Here are some questions that you might ask yourself.
* How parallel is your application? If it scales to say 4 processors,
then a grand and glorious beowulf may or may not be useful.
* How big is your problem? Regardless of the horsepower you can acquire
for very little money, if it won't fit in memory, you are sunk. Get a
few big jobs competing, you are also sunk.
* Do the proposed users work and play well with other children? Apart
from going the batch route, users keenly competing for resources in a
cluster environment is a sad sight. As far as I know, handling quality
of service issues in a cluster environment is pretty much a black art.
My experience suggests that giving them each their own cluster is
generally a better solution than having them share a really big cluster.
Of course they can share, but then the lines of resource ownership are
* How much space do you have? Bunches of boxes take up bunches of room.
O2K's and such are pretty big, but 32 mid-tower cases take up a bunch of
* Service and support can be an issue. Bunches of boxes can have
bunches of problems, often subtle, that may drive you to madness. You
get a bad batch of a component and you are chasing down lots of
problems. In a large cluster (210 dual boxes) that I was involved with,
2 boxes were DOA. Then there were intermittent failures. Last I heard,
15 of the boxes had been replaced. Of course Murphy will dictate that
the failures will only happen at the most inopportune time.
* Desire is also a problem. About the time you get the 2.0Ghz boxes up
and productive, the 2.4Ghz cpus are available and would make life much
better, faster and stronger. You get around to upgrading to the 2.4's
and the next wave is there. Its all so easy, the siren's call of Ghz.
* Optimising for clusters is definitely an art, especially with all the
hardware possibilities. When you get an O2K, what you get is pretty much
what you get.
More information about the Beowulf