Parallel BLAST
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
William R. Pearson wrp at alpha0.bioch.virginia.eduSun Apr 14 19:32:20 PDT 2002
- Previous message: G4's for scientific computing
- Next message: Need many C/C++ MPI programs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> Why is it that BLAST is not available for MPI/PVM? I would think > clusters would be the prefect host for such an application. > Is it there is no need because BLAST is already so fast and > no one wants to break the database out onto node-resident disks? > Or is it that BLAST is kept running on single processor or shared memory > machines BLAST so that the DB is always in memory ready to roll without > loading and doing the same for a cluster is not worth it > because the same trick is difficult to do on a node given the current > way clusters are built? I assume the same is true for FASTA? I suspect that BLAST is not available for MPI/PVM because (1) it is too fast, and (2) there is not much demand for it. 95% of the time, BLAST is almost an in-memory grep (the other 5% of the time it is working on the things it is looking for). Sequence comparison is embarrassingly parallel, and very easily threaded. Distributing the sequence databases and collecting results has more overhead (there probably aren't many distributed grep programs either). FASTA is 5 - 10X slower than BLAST, and Smith-Waterman is another 5-20X slower than FASTA. Here, the communications overhead is low, and distributed systems work OK for FASTA, and great for Smith-Waterman (where the overhead fraction is very small). Of course, it is a lot easier to compile a threaded program, and just run it, than it is to install and configure the MPI or PVM environment and the programs to run in it. Bioinformatics software is often run by computer savvy biologists, not high-performance computing folks, and not having to install and configure PVM/MPI is a big advantage. The NCBI probably does not make a PVM/MPI parallel BLAST because there is very little demand for it, and it does not meet their computational needs. Bill Pearson
- Previous message: G4's for scientific computing
- Next message: Need many C/C++ MPI programs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
