Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Strange SGE scheduling problem

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Schoenefeld, Keith schoenk at utulsa.edu
Thu Jul 24 19:07:18 PDT 2008


This definitely looked promising, but unfortunately it didn't work.  I
both added the appropriate export lines to my qsub file, and then when
that didn't work I checked the mvapich.conf file and confirmed that the
processor affinity was disabled.  I wonder if I can turn it on and make
it work, but unfortunately the cluster is full at the moment, so I can't
test it.

-- KS

-----Original Message-----
From: Shannon V. Davidson [mailto:svdavidson at charter.net] 
Sent: Wednesday, July 23, 2008 4:02 PM
To: Schoenefeld, Keith
Cc: beowulf at beowulf.org
Subject: Re: [Beowulf] Strange SGE scheduling problem

Schoenefeld, Keith wrote:
> My cluster has 8 slots (cores)/node in the form of two quad-core
> processors. Only recently we've started running jobs on it that
require
> 12 slots.  We've noticed significant speed problems running multiple
12
> slot jobs, and quickly discovered that the node that was running 4
slots
> on one job and 4 slots on another job was running both jobs on the
same
> processor cores (i.e. both job1 and job2 were running on CPU's #0-#3,
> and the CPUs #4-#7 were left idling.  The result is that the jobs were
> competing for time on half the processors that were available.
>
> In addition, a 4 slot job started well after the 12 slot job has
ramped
> up results in the same problem (both the 12 slot job and the four slot
> job get assigned to the same slots on a given node).
>
> Any insight as to what is occurring here and how I could prevent it
from
> happening?  We were are using SGE + mvapich 1.0 and a PE that has the
> $fill_up allocation rule.
>
> I have also posted this question to the hpc_training-l at georgetown.edu
> mailing list, so my apologies for people who get this email multiple
> times.
> Any insight as to what is occurring here and how I could prevent it
from
> happening?  We were are using SGE + mvapich 1.0 and a PE that has the
> $fill_up allocation rule.
>   

This sounds like MVAPICH is assigning your MPI tasks to your CPUs 
starting with CPU#0.  If you are going to run multiple MVAPICH jobs on 
the same host, turn off CPU affinity by starting the MPI tasks with the 
environment variable VIADEV_USE_AFFINITY=0 and VIADEV_ENABLE_AFFINITY=0.

Cheers,
Shannon

> Any help is appreciated.
>
> -- KS
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>
>   

-- 
_________________________________________

Shannon V. Davidson <sdavidson at appro.com>
Software Engineer     Appro International
636-633-0380 (office)  443-383-0331 (fax)
_________________________________________






More information about the Beowulf mailing list