[Beowulf] picking out a job scheduler

Tue Jan 2 13:06:08 PST 2007

For what it's worth I'm a biased Grid Engine and Platform LSF user  ...

On Dec 29, 2006, at 11:40 AM, Nathan Moore wrote:

> I've presently set up a cluster of 5 AMD  dual-core linux boxes for  
> my students (at a small college).  I've got  MPICH running, shared  
> NIS/NFS home directories etc.  After reading the MPICH installation  
> guide and manual, I can't say I understand how to deploy MPICH for  
> my students to use.  So far as I can tell, there no load balancing  
> or migration of processes in the library, and so now I'm trying to  
> figure out what piece of software to add to the  cluster to (for  
> example) prevent the starting of an MPI job when there's already  
> another job running.
>
> (1) Is openPBS or gridengine the appropriate tool to use for a  
> multi-user system where mpich is available?  Are there better  
> scheduling options?
>

Both should be fine although if you are considering *PBS you should  
look at both Torque (a fork of OpenPBS I think) and PBSPro  
(commercial but last time I checked they had very good options for  
academic sites).  I can't speak intelligently about the PBS variants  
these days... it's been too long since I've been hands on.

Lots of people use Grid Engine with MPICH using both loose and tight  
integration methods. The mailing list  
(users at gridengine.sunsource.net) has a very helpful community with an  
excellent signal to noise ratio.

Despite being an SGE zealot there are times when I can make both a  
technical and business argument for why Platform LSF is the "best"  
solution for a particular project or problem -- you may want to add  
this to your evaluation plate if you are considering (at all)  
commercial options. If not, don't sweat it.  For a small cluster in  
an academic environment LSF may be hard to justify but if you can get  
good academic pricing it is often worthwhile to crunch the numbers --  
LSF in some cases can 'win' from a features,  lower-administrative- 
burden and support perspective but this a case-by-case thing.

> (1.5) Can mortals install and configure Gridengine?  Thus far it  
> seems too wonderful for me to understand.

Grid Engine is easy to install. I've posted an article here that  
covers the stuff I wish someone had told me beforehand about SGE:

"Things to think about before installing Grid Engine"

http://gridengine.info/articles/2005/09/29/things-to-think-about- 
before-installing

... it boils down to the fact that during installation SGE is  
unusually sensitive to issues regarding hostnames and forward/reverse  
DNS resolution.

>
> (2) Also, if my cluster is made up of a mix of single and dual  
> processor machines, what's the proper way to tell mpd about that  
> topology?

Depends on which MPI implementation and which of the many available  
methods you are using to bootstrap the process.

>
> (3) Its likely that in the future I'll have part-time access to  
> another cluster of dual-boot (XP/linux) machines.  The machines  
> will default to booting to Linux, but will occasionally (5-20 hours  
> a week) be used as windows workstations by a console user (when a  
> user is finished, they'll restart the machine and it will boot back  
> to linux).  If cluster nodes are available in this sort of  
> unpredictable and intermittent way, can they be used as compute  
> nodes in some fashion? Wil gridengine/PBS /??? take care of this  
> sort of process migration?
>

Grid Engine will not transparently preserve and migrate running jobs  
off of machines that get bounced suddenly.  This sort of transparent  
and automatic checkpointing and migration is actually pretty hard to  
do in practice.  If you know in advance which machines are going to  
be shut down and rebooted into windows then there are tools in all  
the common scheduling packages for "draining" a particular machine or  
queue.  You can also "kill and reschedule" jobs that are running on  
any one queue instance or cluster queue. One can even do this on a  
calendar basis when the  "need windows" schedule is predictable (does  
not seem possible in your case).  If the running cluster jobs are  
short lived so that you don't have a big runtime investment then you  
can bounce machines whenever you want - Grid Engine can be told to  
reschedule failed jobs automatically to a different available host --  
the hard case to deal with is the very long running jobs that (a)  
can't be reliably checkpoint or (b) are difficult to suspend/resume/ 
migrate due to the parallel application itself.

The answer may be application specific in your case.

Regards,
Chris

> best regards,
>
> Nathan
>
>
>
> - - - - - - - - - - - - - - - - - - - - - - -
> Nathan Moore
> Physics
> Winona State University
> nmoore at winona.edu
> AIM:nmoorewsu
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf