[Beowulf] Why need of a scheduler??
bill at platform.com
Thu Nov 29 07:48:50 PST 2007
From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org]On Behalf Of amjad ali
Sent: November 29, 2007 8:14 AM
To: beowulf at beowulf.org
Subject: [Beowulf] Why need of a scheduler??
I want to develop and run my parallel code (MPI based) on a Beowulf cluster. I have no problem as such that many user might log on to the cluster simultaneously. Suppose that I am free to use cluster dedicatedly for my single parallel application.
1) Do I really need a cluster scheduler installed on the cluster? Should I use scheduler?
[Bill Bryce] If you are the only user of the cluster then you don't *really* need a scheduler, however if you want to queue up lots of jobs on your cluster and 'just let it run' then a scheduler is needed.
2) Is there any effect/benefit on the running of a parallel code with or without cluster job scheduler?
[Bill Bryce] There are some benefits. Most schedulers/resource managers are integrated with various implementations of MPI so that task startup and task cleanup is controlled by the scheduler/resource manager. What this means for you is less time spent cleaning up dead processes on the nodes in your cluster. Schedulers and resource managers also have job control which allows you to suspend and resume the entire parallel job - sometimes this is useful.
The scheduler and resource manager are really useful when you want to run many jobs on your cluster - when this happens the scheduler will decide which nodes in the cluster will be allocated for your parallel job based on the scheduling policy (or policies) enabled in the scheduler. Overall a good scheduler and resource manager combination will keep the whole cluster busy running your work whereas manually running jobs relies on *you* to be the scheduler which is not as efficient as a scheduler.
One of the distinct advantages of a scheduler/resource manager is when things go wrong. All scheduler/resource managers provide mechanisms to handle job failure, host failure and re-running the job automatically. So if your job fails at say 3 in the morning the system will automatically recover and start your work over again.
3) How you differentiate between cluster scheduler and cluster resource manager?
I'm sure you'll get more responses on this question but here is my two cents worth....
The cluster scheduler is only concerned about scheduling work to the available resources (or future availablity of resources), so FCFS, Fairshare, Backfill, pre-emption, Quality of Services, are all valid scheduling policies used by a scheduler to determine when your job will run and what resources the job will use to run on the cluster.
The resource manager is responsible for discovering, monitoring and aggregating resources in your cluster, so that this information can be used by the scheduler to make decisions. Resources can be pretty much anything, but to keep it simple resources are the hosts, cpus, cores, memory, network interfaces, etc...
4) If there is any significant difference between a scheduler and manager then plaese tell me that which of the fall in which category:
Ah now this is where, depending on who you talk to; you get different answers...
OpenPBS, PBSPro, SGE, LSF, Torque - are all both resource managers and schedulers.
Maui & Moab are Schedulers - and require a resource manager (so one of the above)
Scyld - is much more than just a resource manager/scheduler it is an entire collection of software for running a Beowulf cluster - you could say Scyld is the whole 'cluster stack of software'. Resource managers and Schedulers can run on a Scyld cluster.
SLURM - is a resource manager and is quite good at doing resource allocations for parallel jobs. SLURM is used in conjunction with a Scheduler (or one of the Scheduler/Resource managers)
OpenPBS, PBS Professional, SGE, Maui, Moab, Torque, Scyld, LSF, SLURM etc.
5) What is maent by " PBS/SGE/LSF supports integration with the Maui scheduler?
It simply means that Maui can sit on top of PBS/SGE/LSF and 'take control of scheduling' by turing off or ignoring the built in scheduler in PBS/SGE/LSF.
Precise, easy and brief reply requested. Thanks to all.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf