[Beowulf] Puzzling Intel mpi behavior with slurm

Prentice Bisbal pbisbal at pppl.gov
Fri Apr 6 12:37:28 PDT 2018


See the URL below for a good overview of how Slurm works:

https://slurm.schedmd.com/quickstart.html

The way I understand it, tasks are started by Slurmd. Ssh is not 
involved at all.

SGE does the same thing with 'tight integration'. The tasks are started 
on the compute nodes by sgeexecd, which spawns an sge sheperd task, 
which then spawns the actual task.

To really complicate things, you should look at process management 
interface (PMI). This is a middle layer between Slurm (or an other 
scheduler) and the MPI tasks. It's a standardized abstraction layer to 
make programming MPI implementations and schedulers easier. It also 
increases startup time of the MPI jobs, which is not insignificant for 
large jobs.

www.mcs.anl.gov/papers/P1760.pdf

Prentice

On 04/05/2018 11:10 AM, Faraz Hussain wrote:
> Here's something quite baffling. I have a cluster running slurm but 
> have not setup passwordless ssh for a user yet. So when the user runs 
> "mpirun -n 2 -hostfile hosts hostname", it will hang because of ssh 
> issue. That is expected.
>
> Now the baffling thing is the mpirun command works inside a slurm 
> script! How can it work if passwordless ssh has not been configured? 
> Does slurm use some different authentication (munge?) to login to the 
> hosts and execute the hostname command?
>
> Or does slurm have some fancy behind the scenes integration with Intel 
> mpi ?
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20180406/8bc88be3/attachment.html>


More information about the Beowulf mailing list