[Beowulf] Please help to setup Beowulf

Fri Feb 20 15:16:38 PST 2009

Chris,

Thank you for your comments. I just want to clarify that the SOLiD
system from Applied Biosystems ships with Scyld ClusterWare not Rocks.

Regards

Arend

-----Original Message-----
From: Chris Dagdigian [mailto:dag at sonsorol.org] 
Sent: Tuesday, February 17, 2009 11:52 AM
To: Michael Will
Cc: Beowulf List
Subject: Re: [Beowulf] Please help to setup Beowulf

On Feb 17, 2009, at 2:29 PM, Michael Will wrote:

> What features differentiate SGE in support of life science workflow
> from LSF/PBS/Torque/Condor?
>
> Michael

They all have their pros and cons, heck I'm still an LSF zealot when  
cost is not an issue as Platform has the best APIs, documentation and  
layered products for the industry types who need to stand these things  
up in full production mode within enterprise organizations that may  
have varying levels of Linux/HPC/MPI experience.

The short list of why Grid Engine became popular in the life sciences:

LSF: great product but commercial-only and a pricing model that can  
get out of hand (I remember when having more than 4GB RAM in a Linux  
1U pushed me into an obscene license tier ...).

Condor: Did not have the fine grained policy and resource allocation  
tools that make life easier when you need to have a shared cluster  
resource supporting multiple competing users, groups, projects and  
workflows. The policy tools for LSF/SGE/PBS were more capable.  When I  
saw condor out in the field seemed to be mostly used only in academic  
sites and in situations where cycles from PC systems were being  
aggregated across LAN, metro and wan-scale distances. Bio problems  
tend to be more I/O or memory bound rather than CPU bound so most bio  
clusters tend to be closely situated racks of gear.

PBS/TORQUE: I'll ignore the FUD from back in the day when people were  
claiming that PBS lost jobs and data at high scale and concentrate on  
just one key differentiator. At the time when life science was  
transitioning from big SGI Altix and Tru64 Alphaservers machines to  
commodity compute farms, PBS did not support the concept of array  
jobs. If there was one overwhelming cluster resource management  
feature essential for bio work
it would be array tasks. This is because we tend to have a very high  
concentration of batch/serial workflows that involve running an  
application many many times in a row with varying input files and  
parameter options. The cliche example in bioinformatics is needing to  
run half a million blast searches. Without array task scheduling this  
would require 500,000 individual job submissions. The fact that I  
never met a serious PBS shop that had not made local custom changes to  
the source code also soured me on deploying it when I was putting such  
things into conservative IT shops who were still new and fearful of  
Linux.

We also don't make heavy use of the globus style WAN-scale capital "G"  
grid computing as much of our workflows and pipelines are actually  
performance bound by the speed of storage rather than CPU or memory  
issues. It was always easier, cheaper and more secure to colocate  
dedicated CPU resources local to fast storage rather than distribute  
things out as far as possible.

The big news in Bio-IT these days is actually the terabyte scale wet  
lab instruments such as confocal microscopes and next-gen DNA  
sequencing systems that can produce 1-3TB of raw data per experiment.  
Some of these lab instruments ship with software pipelines developed  
to run under grid engine. A popular example is the Solexa/Illumina  
Genome Analyzer which alone has driven SGE uptake in our field. A  
notable exception is the SOLiD system which (I think) ships with a  
Windows front end that hides a back end ROCKS cluster running either  
PBS or torque under the hood.

And from Mark:

> how about providing some useful content - for instance, what is it  
> that you think is especially valuable about sge?

Hopefully I've done some of that with this message. It basically boils  
down to the fact that at the time our field started using compute  
farms in a serious manner, SGE offered the best overall combination of  
features, price and fine grained resource allocation & policy control.  
I think what made us a bit different from some other use cases is our  
heavy use of serial/batch workflows combined with our tendency to  
require that our HPC infrastructures support multiple (and potentially  
competing) workflows and pipelines which made the policy/allocation  
features a key selection criteria. We also do little if any true WAN- 
scale "grid" computing due to workflows that tend to be more storage/ 
IO bound than anything else.  For people starting fresh with a cluster  
scheduling layer who did not have an investment in time, expertise and/ 
or software licensing costs, Grid Engine turned out to be a popular  
choice. With that popularity came a good set of people in the  
community who can now support and configure these systems (as well as  
evangelize them) so the cycle is fairly self perpetuating.

General life science cluster cheat sheet:

- Workloads tend to be far more serial/batch in nature than true  
parallel
- Policy and resource allocation features are very important to people  
deploying these systems
- Storage speed is often more important than network speed or latency  
in many cases
- Fast interconnects are often used for cluster/distributed  
filesystems rather than application message passing
- Our MPI codes are often quite horrific from an efficiency/tuning  
standpoint - gigE works just as well as Myrinet or IB
- Exceptions to the MPI rule: computational chemistry, modeling and  
structure prediction (those fields have well written commercial MPI  
codes in use)
- Huge resistance to improved algorithms as scientists want to use  
*exactly* the same code that was used to publish the journal paper

-Chris