Skyld Beowulf/ Diskless nodes /Installation trouble

Senthil Kandasamy senthilk at engin.umich.edu
Wed Jan 2 17:53:04 PST 2002


Hi Guys,

Hopefully someone can help me out.
First of all, I am a Chemical Engineer/Biophysicist who is fairly familiar 
with linux.
I am trying to install/fix  beowulf on a cluster recently purchased in our 
research group.
This cluster was bought before I joined the group and scyld beowulf had 
been installed on it (improperly).
Since no one else in our group was interested in parallel computing, no 
body had noticed the fact that though one could send computational jobs to 
the individual nodes, it could not handle parallel jobs on multiple nodes ( 
could not connect to host..is the error I get when I mpirun)

We have 1 master +15 diskless nodes, all dual processors.
The Scyld Beowulf (without the support, i.e. the $2 version) has been 
installed on it.
However, I suspect that the NFS mounting of the individual nodes has not 
been done correctly.
Since I do not have any documentation (could not find any on the 
installation disk) on how to setup diskless nodes, I am kind of  helpless.
The resources on the net and newsgroups have not been very helpful.
I tried to reinstall the skyld/redhat cd on the cluster, but the setup 
process never really seems to be concerned about NFS mounting.
Once the set up is finished, the nodes are up and running and can handle 
individual jobs using bpsh.
But I can never connect to the nodes when I try to run a parallel job using 
mpirun.

Is there any definitive (and upto date) documentation/howto on how to 
install a diskless beowulf cluster?
Any help would be greatly appreciated. It just kills me to ~30 GFlops just 
sitting there unutilized while I try to find computer time on other 
supercomputers.

Thanks.

Senthil 




More information about the Beowulf mailing list