BLAST for beowulf
pillsbury at turbogenomics.com
Mon Apr 23 06:44:20 PDT 2001
There is another commercial version of BLAST that was not mentioned -
TurboBLAST from Turbogenomics. TurboBLAST solves many of the objectives
outlined by Christopher.
1) I want to run a lot of BLAST queries in batches.
TurboBLAST performs batch-queuing so there is no need for LSF or PBS.
2) I want more speed on a single BLAST query.
TurboBLAST partitions input sequences (i.e. queries) and databases to
overcome memory limitations.
3) I have a BIG DNA database to search through.
Same as above
4) I want to set up a web-interface BLAST service on a cluster for
TurboBLAST comes with web-interface for BLAST similar to the NCBI BLAST
interface at http://www.ncbi.nlm.nih.gov/blast/
265 Church Street
New Haven, CT 06510
----- Original Message -----
From: "Christopher Hogue" <hogue at mshri.on.ca>
To: "gregory j pryzby" <greg at pryzby.org>
Cc: <beowulf at beowulf.org>
Sent: Thursday, April 19, 2001 8:44 PM
Subject: Re: BLAST or wu-blast for beowulf?
> Hi folks
> Sorry that I haven't found time to answer this before, been very busy
> setting up our new company here in Toronto.
> The short answer is that so far there are only commercial
> implementations available (www.computefarm.com or www.sgi.com), or run
> BLAST with PBS and script it yourself, or set up the www-based cgi's for
> BLAST and run them behind a load balancer.
> The long "archive-quality" answer follows...
> BLAST or WU-BLAST are bioinformatics applications that compare protein
> or DNA sequences to databases with DNA or proteins to find
> similarities. The original programs are highly optimized for
> multiprocessor machines like Sun and SGI boxes upon which they were
> originally developed.
> The BLAST executables (original, non clustered versions) are at
> and WU-BLAST is at
> When we refer to BLAST jobs, we call them a "query" which is one
> sequence being compared to one database.
> There are several issues about running BLAST on a cluster, and different
> implementation objectives - The answer is it depends on what you want
> clustered BLAST to do! These vary quite a bit, and require different
> Here's some examples of what your objectives might be:
> 1) I want to run a lot of BLAST queries in batches.
> 2) I want more speed on a single BLAST query.
> 3) I have a BIG DNA database to search through.
> 4) I want to set up a web-interface BLAST service on a cluster for
> In all cases, the implementation also needs scripts to do the daily
> updating of databases stored on the local node hard disks. Figure on
> doing some work here, PERL helps.
> I address these situations:
> 1) I want to run a lot of BLAST queries.
> Then you want a compute farm approach. Many people use load sharing
> software like LSF or PBS to execute BLAST on compute farms. You will
> also need to make scripts to ftp download and update the databases on
> all the nodes as a regular process or a cron job.
> 2) I want more speed on a single BLAST query.
> BLAST becomes I/O bound very quickly on an SMP machine, and doesn't
> really scale that well on a cluster for a single query. It is already
> multithreaded. Amdahl's law gets you very quickly in BLAST if you try
> interprocess communication as a model for speeding it up, so forget it.
> So if you want speed, add memory, faster CPUs or more of them, or chunk
> the database into pieces (see 3 below). I suggest to run multithreaded
> BLAST on dual CPU nodes with sufficient disk and memory to store the
> databases. Remember to use the processor number argument too, it needs
> to be told how many CPUs to run on.
> 3) I have a BIG DNA database to search through and must partition it.
> People who use BLAST on protein databases have smaller memory
> requirements than those using BLAST on DNA databases. The DNA databases
> are much larger, and in commercial compaines can be up to several 10's
> of Gigs. Companies often set up SMP machines with lots of RAM as BLAST
> servers, and they are typically not Linux boxes.
> Databases that don't fit in memory often cause the computers to thrash,
> esp. if you have multiprocessor machines running. e.g a dual cpu node
> with 128Gb RAM with two processes running will thrash horribly on a
> large DNA database as each thread competes to load the database chunk it
> is working on into the same block of memory.
> BLAST uses memory-mapped I/O, so that multiple instances can use the
> same data in memory, and it works best when the whole database fits in
> memory and multiple processes can have at it.
> Blackstone computing (www.computefarm.com) makes a clustered commercial
> version of BLAST that operates, apparently using a redeployment of
> memory-mapped I/O. It seems to broadcast to the cluster that it is
> looking for a file when looking for a piece of a database, and it grabs
> any copy of that file already in memory BLAST databases from another
> node through a socket. So it does a memory-memory transfer rather than
> a disk-memory transfer. I have not tried this implmentation. It also
> may require some heavy scripting to break up the BLAST databases to
> match your cluster node size and memory. Figure doing this on a daily
> update cycle.
> Anyhow, the memory-mapped I/O trick is an interesting one that could be
> implemented at the LINUX kernel level somewhere, I think as a general
> purpose cluster utility.
> 4) I want to set up a web-interface BLAST server.
> This is a common desire, but is not really a cluser issue. A good
> single CPU machine can do this nicely for a few casual users, again put
> enough RAM in it.
> Look here for precompiled executables for the CGI versions of BLAST.
> They have executable cgi's for Linux, Tru64, SGI and Solaris.
> If you set these up with a load-balancer on several nodes, you may have
> what you are looking for for more users.
> Christopher Hogue, Ph.D.
> CIO MDS Proteomics
> On leave from the Samuel Lunenfeld Research Inst. Mt. Sinai, Toronto.
> gregory j pryzby wrote:
> > I am looking for infromation (w/o much success) to see if there is a
> > version of BLAST that will run on a beowulf cluster.
> > --
> > greg pryzby greg at pryzby dot org
> > ach tee tee pee colon slash slash pryzby dot org slash
> > fingerprint: 8A1A DB90 869F 5DD1 D6E9 EEB6 C156 6B04 849F A86F
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
More information about the Beowulf