[Beowulf] Torrents for HPC

Tue Jun 12 15:59:46 PDT 2012

On 06/12/2012 03:47 PM, Skylar Thompson wrote:
> We manage this by having users run this in the same Grid Engine
> parallel environment they run their job in. This means they're
> guaranteed to run the sync job on the same nodes their actual job runs
> on. The copied files change so slowly that even on 1GbE network is
> rarely a bottleneck, since we only transfer files that are changed.

Our problem is we have many users and don't want 50,000 30 minute jobs 
to turn into a giant jobs that defeats the priority system while 
running.  With an array job users can get 100% of the cluster if it's 
idle and quickly decay to their fair share when other higher priority 
jobs run.

That way we can have the cluster 100% utilized, but new jobs (from users 
using less than their fair share) can get through the queue (which might 
well be months long) quickly.