[Beowulf] Strange SGE scheduling problem

Schoenefeld, Keith schoenk at utulsa.edu
Thu Jul 24 19:07:18 PDT 2008


This definitely looked promising, but unfortunately it didn't work.  I
both added the appropriate export lines to my qsub file, and then when
that didn't work I checked the mvapich.conf file and confirmed that the
processor affinity was disabled.  I wonder if I can turn it on and make
it work, but unfortunately the cluster is full at the moment, so I can't
test it.

-- KS

-----Original Message-----
From: Shannon V. Davidson [mailto:svdavidson at charter.net] 
Sent: Wednesday, July 23, 2008 4:02 PM
To: Schoenefeld, Keith
Cc: beowulf at beowulf.org
Subject: Re: [Beowulf] Strange SGE scheduling problem

Schoenefeld, Keith wrote:
> My cluster has 8 slots (cores)/node in the form of two quad-core
> processors. Only recently we've started running jobs on it that
require
> 12 slots.  We've noticed significant speed problems running multiple
12
> slot jobs, and quickly discovered that the node that was running 4
slots
> on one job and 4 slots on another job was running both jobs on the
same
> processor cores (i.e. both job1 and job2 were running on CPU's #0-#3,
> and the CPUs #4-#7 were left idling.  The result is that the jobs were
> competing for time on half the processors that were available.
>
> In addition, a 4 slot job started well after the 12 slot job has
ramped
> up results in the same problem (both the 12 slot job and the four slot
> job get assigned to the same slots on a given node).
>
> Any insight as to what is occurring here and how I could prevent it
from
> happening?  We were are using SGE + mvapich 1.0 and a PE that has the
> $fill_up allocation rule.
>
> I have also posted this question to the hpc_training-l at georgetown.edu
> mailing list, so my apologies for people who get this email multiple
> times.
> Any insight as to what is occurring here and how I could prevent it
from
> happening?  We were are using SGE + mvapich 1.0 and a PE that has the
> $fill_up allocation rule.
>   

This sounds like MVAPICH is assigning your MPI tasks to your CPUs 
starting with CPU#0.  If you are going to run multiple MVAPICH jobs on 
the same host, turn off CPU affinity by starting the MPI tasks with the 
environment variable VIADEV_USE_AFFINITY=0 and VIADEV_ENABLE_AFFINITY=0.

Cheers,
Shannon

> Any help is appreciated.
>
> -- KS
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>
>   

-- 
_________________________________________

Shannon V. Davidson <sdavidson at appro.com>
Software Engineer     Appro International
636-633-0380 (office)  443-383-0331 (fax)
_________________________________________






More information about the Beowulf mailing list