[Beowulf] clustering using xen virtualized machines
pc7 at sanger.ac.uk
Tue Jan 26 07:24:25 PST 2010
On the AWS ec2 side, we've been performing a range of tests including
full genome sequencing pipelines across varying numbers of nodes and
storage. The biggest challenge to date has been IO, particularly if the
smaller image systems are used. Where jobs are highly cpu bound, little
network (or heaven forbid disk) bound things go reasonably well and have
the potential to scale. Once IO becomes a factor the scaling decreases
We've also had a run around with Xen and it requires more network
tiffling to automate role outs (at least in our environment) but it
works ok, especially when paired with something like openQRM. It's a
ways off being as polished as VMware and some of the interesting memory
handling doesn't appear to be all there. As a result performance
degrades rapidly as the number of hosts and IO hungry app load increases
fairly severely. Regrettably I don't have enough useful data to present
this at present and as always YMMV.
> I've been using Amazon ec2 for clustering for months now, from a software perspective it's very similar to running real hardware. For my needs (development) it's perfectly adequate, I've not benchmarked it against running the same code on the raw hardware though.
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Beowulf