[Beowulf] number of admins
kewley at gps.caltech.edu
Mon Jun 6 15:23:19 PDT 2005
We expect to get a large new cluster here, and I'd like to draw on the
expertise on this list to educate management about the personnel
The cluster is expected to be:
~1000 Dell PE1850 dual CPU compute nodes
master & other auxiliary nodes on similar hardware
Nortel stacked-switches-based GigE network
many-TB SAN built on Data Direct & Ibrix
Platform LSF HPC Rocks roll
Moab added later, quite possibly
tape library backup (software TBD)
NFS service to public workstations
nine man-weeks of Dell installation support
10 man-days of Ibrix installation support
The users will be something like:
~10 local academic groups, perhaps 60 users total
several different locally-written or -customized codebases
at least one near-real-time application with public exposure
We have some experience already with a 160-node Dell cluster that has
some of the basic elements listed above, but several of the pieces will
be totally new, and some of the pieces we already have will need
My questions to you are:
* How many sysadmins should we plan to have once the cluster is stable?
* Is there indeed any such thing as a "stable" cluster of this sort, and
if so, should we get additional help during the initial phase of the
project, when things are less stable (help beyond the vendor
installation support listed above)?
* If we need more help in the initial phases, how might we go about
finding people? Contract workers? Commercial or private
* Should we look for any specific non-obvious skillset, or would skilled
sysadmins be adequate?
* If we only have one sysadmin, someone who is bright and capable, but
is learning as they go, is that too small a support staff?
* If one such sysadmin is too little, then what would you expect the
impact on the users to be?
I have been giving my opinion to management, but I'd really like to get
(relatively unbiased) professional opinions from outside as well. I
thank you for any comments you can make!
More information about the Beowulf