[Beowulf] Why Do Clusters Suck?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Stuart Midgley stuart.midgley at anu.edu.auTue Mar 22 15:34:08 PST 2005
- Previous message: [Beowulf] Why Do Clusters Suck?
- Next message: [Beowulf] Why Do Clusters Suck?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 22/03/2005, at 7:36, Douglas Eadline - ClusterWorld Magazine wrote: > So why do clusters suck? From my position, this issue is really complex. In the Australian scene, the main reason "clusters suck" has nothing to do with distros, hardware or associated software. It is more an issue with support staff. It is easy to buy hardware, software and download a distro. However, it is very difficult to get good support staff. Clusters, by their nature and design, are not simple beasts. When everything is running well, you can manage them with almost no staff. However, when something goes wrong the diagnostic/resolution cycle can be long and very complex. An error in an MPI program could be the actual user code, the MPI layer, a system software issue, the interconnect, some hardware failure or a combination of all three. Getting good staff to understand and handle all these layers is difficult. Spending $100k will get you a reasonable sized cluster on the floor within a few weeks, which will last say 3 years. Yet, in the staff space $100k doesn't even get a good system administrator for a single year. And, a system administrator is not always what is required. They may not have a good understanding of MPI/applications etc. How to make clusters less sucky? Well, for a large cluster users/system administrators, decent training would be a good start. Training which takes people through the process of building, installing, breaking and fixing a cluster. Of course, then there is the MPI/application side of things which would be another course. Try to wrap 10years worth of system/computational experience up into a 5 days course ;) Stu. -- <---------------------------------------------------------------------> Dr Stuart Midgley | stuart.midgley at anu.edu.au Supercomputer Facility | smidgley at netspace.net.au Leonard Huxley Building 56 | +61 (0)2 6125 5988 Work Australian National University | +61 (0)2 6125 8199 Fax CANBERRA ACT 0200 | +61 (0)4 1125 2488 Mob
- Previous message: [Beowulf] Why Do Clusters Suck?
- Next message: [Beowulf] Why Do Clusters Suck?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
