[Beowulf] Re: after update sgeexecd not starting correctly on reboot

Reuti reuti at staff.uni-marburg.de
Wed Nov 26 04:15:38 PST 2008


Hi David,

Am 26.11.2008 um 01:08 schrieb David Mathog:

>> I think  maybe the NFS mounting is different, so that the remote_fs
>> prerequisite isn't really satisfied, even though the associated  
>> script
>> has run.  The sgeexecd script does include a test:
>>
>> while [ ! -d "$SGE_ROOT" -a $count -le 120 ]; do
>>    count=`expr $count + 1`
>>    sleep 1
>> done
>
> This seems to have been it.  Changing "$SGE_ROOT" to "$SGE_ROOT/bin"
> let SGE came up ok in a couple of consecutive reboots.  Not definitive
> proof that was the issue, but at least it seems like progress.
> Apparently it was getting to this part of the SGE init script before
> $SGE_ROOT was actually mounted, the -d test always passed, NFS
> mounted or not, and of course the SGE start up failed since none of  
> that
> code from the remote system was reachable.  Just for kicks I added an
> echo line within the loop, so that if it sticks there it will show
> up on the console.

may I beg you to enter an issue at http://gridengine.sunsource.net/  
of this?

-- Reuti



More information about the Beowulf mailing list