BLAST and FASTA benchmarks
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
William R. Pearson wrp at alpha0.bioch.virginia.eduSat Apr 13 13:11:25 PDT 2002
- Previous message: decent performance from G4 Macs?
- Next message: DMA difficulties
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
There was a bit of misinformation about the difference between the BLAST and FASTA programs for protein and DNA sequence comparison program. Both BLAST and FASTA search for local sequence similarity - indeed they have exactly the same goals, though they use somewhat different algorithms and statistical approaches. The advantage of an ES40 or other large shared memory machine for BLAST is that it has been optimized for searching databases that are large memory mapped files, and it runs multithreaded. PVM and MPI versions of BLAST are not available, but, it is important to remember that BLAST is extremely fast, and highly optimized to go through a large amount of memory very quickly; it would be difficult to provide an equally efficient distributed version - but, of course, a distributed memory machine would be much cheaper. PVM and MPI versions of FASTA are available. FASTA actually is a package of about a dozen programs that vary more than 100-fold in speed. It is easy to make efficient PVM/MPI versions of the slower algorithms (Smith-Waterman, TFASTY, TFASTX); parallel versions of the FASTA algorithm are less efficient. How to benchmark BLAST and FASTA - As Greg Lindahl pointed out, the appropriate platform for BLAST (less so for FASTA) depends on the size of the database. Very few databases are larger than 2 Gb (I think the person who said he had an 80 Gb database was mistaken - the largest publically available sequence database, Genbank, currently has 17Gb of sequence data). In contrast, protein sequence databases are much smaller, typically 50 - 500 Mb). If you would like to try searching some protein or DNA sequence databases, they are available from ftp.ncbi.nih.gov/blast/db. nr.Z and swissprot.Z are two representative protein sequence databases, nt.Z and est_mouse.Z are representative DNA databases. Simply select 10 - 100 sequences at random from these databases and run them against the full size databases. Bill Pearson
- Previous message: decent performance from G4 Macs?
- Next message: DMA difficulties
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
