[Beowulf] Torrents for HPC

Bill Broadley bill at cse.ucdavis.edu
Tue Jun 12 15:42:47 PDT 2012


Many thanks for the online and offline feedback.

I've been reviewing the mentioned alternatives.  From what I can tell 
none of them allow nodes to join/leave at random.  Our problem is that a 
user might submit 500-50,000 jobs that depend on a particular dataset 
and have a variable number of jobs/nodes running at any given time.  So 
ideally each node that a job lands on would do something like:
   1) Is this node subscribed to this dataset?  If not start a client.
   2) Is the dataset completely downloaded?  If not wait.

Because of the node churn we didn't want the  send <file/dir> <list of 
nodes> approach.

We also wanted to handle multiple file transfers of multiple directories 
for multiple users at once.  From what I tell, most (all?) other 
approaches assume a mostly idle network and don't robustly handle cases 
where 1/3rd of the nodes have highly contended links.

Because we are using the links for MPI, NFS, and torrents we didn't want 
to use an approach that wasn't robust with highly variable per node 
bandwidth.  Any comments on how well the various alternatives work with 
a busy network?  Seems like any tree based approach would have problems.

As far as the torrent creation process.  My small 5 disk RAID manages 
300-400MB/sec and manages around 80% of that for creating torrents.  It 
looks single threaded, parallel friendly, and easy to parallelize.  But 
from what I can tell torrent creation is I/O limited at least for us.  I 
already have some parallel checksumming code around for another project, 
I could likely tweak it to create torrents if people out there thing 
this is a real bottleneck.  I like the torrent behavior of guaranteed 
file integrity and self-healing files.

Using MPI does make quite a bit of sense for clusters with high speed 
interconnects.  Although I suspect that being network bound for IO is 
less of a problem.  I'd consider it though, I do have sdr/ddr/qdr 
clusters around, but so far (knock on wood) not IO limited.  I've done a 
fair bit of MPI programming, but I'm not sure it's easy/possible to have 
nodes dynamically join/leave.  Worst case I guess you could launch a 
thread/process for each pair of peers that wanted to trade blocks and 
still use TCP for swapping metadata about what peers to connect to and 
block to trade.




More information about the Beowulf mailing list