[Beowulf] Python libraries slow to load across Scyld cluster

Jeff White jaw171 at pitt.edu
Fri Jan 16 11:18:55 PST 2015


If you are using these libraries often and they exist on a remote server 
(NFS or whatever) you may want to use the "libraries" or "prestage" 
directives in Scyld's config to put them on compute nodes instead.

Jeff White - GNU+Linux Systems Administrator
University of Pittsburgh - CSSD

On 01/15/2015 06:22 PM, Don Kirkby wrote:
> I'm new to the list, so please let me know if I'm asking in the wrong place.
>
> We're running a Scyld Beowulf cluster on CentOS 5.9, and I'm trying to
> run some Django admin commands on a compute node. The problem is that it
> can take three to five minutes to launch ten MPI processes across the
> four compute nodes and the head node. (We're using OpenMPI.)
>
> I traced the delay to smaller and smaller parts of the code until I
> created two example scripts that just import a Python library and print
> out timing information. Here's the first script that imports the
> collections module:
>
> from datetime import datetime
> t0 = datetime.now()
> print 'started at {}'.format(t0)
>
> import collections
> print 'imported at {}'.format(datetime.now() - t0)
>
>
>
> When I run that with mpirun -host n0 python
> cached_imports_collections.py, it initially takes about 10 seconds to
> import. However, repeated runs take less time, until it takes less than
> 0.01 seconds to import.
>
> Running an equivalent script to import the decimal module takes about 30
> seconds, and never speeds up like that. It may be too large to get
> cached completely. I ran beostatus while the script was running, and I
> didn't see the network traffic go over 100 kBps. For comparison, running
> wc on a 100MB file on a compute node causes the network traffic to go
> over 3000 kBps.
>
> I looked in the Scyld admin guide (PDF) and the reference guide (PDF),
> and found the bplib command that manages which libraries are cached and
> not transmitted with the job processes. However, its list of library
> directories already includes the Python installation.
>
> Is there some way to increase the size of the bplib cache, or am I doing
> something inefficient in the way I launch my Python processes?
>
> Thanks,
> Don Kirkby
> British Columbia Centre for Excellence in HIV/AIDS
> cfenet.ubc.ca
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>


More information about the Beowulf mailing list