[Beowulf] Re: TCP connect error: ECONNREFUSED. - solved-
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Jörg Saßmannshausen jorg.sassmannshausen at strath.ac.ukWed Apr 15 02:24:25 PDT 2009
- Previous message: [Beowulf] Moores Law is dying
- Next message: [Beowulf] Re: Beowulf Storage Node
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dear all, some time ago I contacted the list regarding the above problem. I would like to thank all who contributed towards the solution, finally I found out what is going on. The problem lies in the hostlist (which contains the nodes where the job is going to run on, so the machinefile if you like) and in particular the order of it. PBS type schedulers (I have used TORQUE before) are using the $PBS_NODEFILE which you only need to read out. I was looking at the internet for something similar but all I could find was that SGE apparently writes out the hostfile in a file. So I used that. What I was not aware of at the time is that the vendor has its own script to generate that file and (unfortunately for me) that contains the command 'sort'. So, the order of the nodes gets changed. However, ddikick, the program which is doing the parallelisation, seems to be quite fussy about that as the first node will be the master, initiating all the other processes. Unfortunately, as the order is different from what SGE supplied, that leads to the bizzar situation that SGE is starting of the process on a 'slave' (with respect from ddikick) and hence ddikick-master and SGE-master will never speak to each other. The solution was to use the $PE_HOSTFILE and read out the nodes from there, same as I do with the $PBS_NODEFILE. It could not be any easier _if_ I had known on beforehand. I thought I share that with you, in case somebody is searching the list and founds my thread. :-) All the best from Glasgow! Jörg -- ************************************************************* Jörg Saßmannshausen Research Fellow University of Strathclyde Department of Pure and Applied Chemistry 295 Cathedral St. Glasgow G1 1XL email: jorg.sassmannshausen at strath.ac.uk web: http://sassy.formativ.net Please avoid sending me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html
- Previous message: [Beowulf] Moores Law is dying
- Next message: [Beowulf] Re: Beowulf Storage Node
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
