[Beowulf] mpich2 complain about nodes that i dont use
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Ru-Zhen Li r.li at qmul.ac.ukFri Sep 30 06:58:42 PDT 2005
- Previous message: [Beowulf] Re: UPS & power supply instability
- Next message: [Beowulf] mpich2 complain about nodes that i dont use
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dear all, I am using mpich2 on linux cluster, I kept having errors like the following rank 14 in job 2 cn128_57798 caused collective abort of all ranks exit status of rank 14: killed by signal 9 or mpdrun_cn145: cannot connect to local mpd (/tmp/mpd2.console_lrz); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option) there are 160 nodes on the cluster, I used "mpdboot -n -f" to initiate the mpi, and since there are always errors when i tried to boot every nodes, so i only defined 64 nodes in mpd.hosts file, and in the errors above, I dont have them in the mpd.hosts file or the command where i used my application (mpiexec command) does anybody have any experience in this? Thanks a lot! Best regards, ruzhen -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20050930/10cc766b/attachment.html
- Previous message: [Beowulf] Re: UPS & power supply instability
- Next message: [Beowulf] mpich2 complain about nodes that i dont use
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
