couple of questions
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Ted Sariyski tsariysk at craft-tech.comTue Feb 18 09:09:23 PST 2003
- Previous message: Copying RedHat Install Wizard
- Next message: MPICH ch_p4mpd device questions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi, I have a couple of questions: 1. I'm running a cluster of 100 i386 cpus with channel bonding 100BaseT ethernet. I use RedHat7.2 and NFS mounted file server as user storage. Occasionally some process enter a " D " state forever and the only way I figured out to get rid of them is to reboot the node. What may cause this problem and is there more intelligent solution to it? 2. I'm planning to rebuild the cluster and to boot nodes through the net. I cannot afford Scyld or PBSPro so I am looking for a solution for a diskless cluster with mpich and OpenPBS. I believe that if I keep a local hard disk I should be able to provide the 'local space' required for OpenPBS to run. Any comments? 3. I'm shopping for a parallel debugger and accurate parallel profiler with minimal overload on the performance. Jumpshot seems to be inappropriate for profiling 100 cpus job. Any recommendations? Thanks in advance, -- Ted Sariyski
- Previous message: Copying RedHat Install Wizard
- Next message: MPICH ch_p4mpd device questions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
