What could be the performance of my cluster
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Suraj Peri suraj_peri at yahoo.comSat Apr 13 03:21:52 PDT 2002
- Previous message: What could be the performance of my cluster
- Next message: How do you keep clusters running....
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
BLAST ( Basic Local Alignment Search tool) takes the query ( either protein or DNA) sequence and try to match the small pathces ( lets say it breaks your sequence in to small pieces of 6 letters and then try to match them in a the database index file) . Once BLAST algo. finds any small match it tries to extend your query sequence for further match in the database. If it finds more then it makes a score and represent that score. If it doesnt then it represents low score and based on low scores we do not consider lower score hits. Thus, in my opinion it does many claculations and finally show the scores. ( P-value) Interestingly , BLAST is considered a local alignment search tool because it tries to match bits of your query sequence and then extends for more matches. in contrast there is another algorithm called FASTA ( Fast alignment search tool ) this is a global ( means it takes big chunks of sequences and then tries to thread them over database). So Bill Pearson (creator) made a PVM version of FASTA and his students at virginia are using it on a beowulf cluster. ( You can access that at ftp://ftp.virginia.edu/pub/fasta/) In my case my database would be ~80 GB. ( i hope to use this much data over NFS) I am planning to introduce this algorithm in every node and then using MPICH I would like to ask my node to access the whole database using NFS. I am new to this area, but I wonder the ideas I am having are practical or not. We will start configuring our cluster some time in May. cheers suraj. --- Robert Depenbrock <robert at bay13.de> wrote: > Greg Lindahl wrote: > > > > Hi Greg, > > > On Fri, Apr 12, 2002 at 11:15:52AM -0600, Craig > Tierney wrote: > > > > > Is the BLAST code something that spends lots > > > of time trying doing lots of little > calculations, > > > or doing one big calculation? How important is > > > the speed of access to the database? What is > > > the memory footprint of the code when it runs > > > on the DS20E? > > > > It depends. > > > > What BLAST does is compare a set of sequences > against a big database of > > sequences. The databases come in small, medium, > and large (bigger than > > 2 GByte) sizes; the sequences can either be a > single sequence (imagine > > a researcher looking up a single protein using a > web interface) or a > > large set of them. If it's a large set, the > problem is embarrassingly > > parallel. > > > > The BLAST implementation used by most people isn't > parallel. It can be > > fairly easily parallelized to divide the big > database up into pieces. > > > > People build fairly different clusters to run > BLAST depending on their > > details. The guys at Celera Geonmics didn't want > to use a parallel > > version, and their database is bigger than 2 > GBytes, so they bought > > Alphas. Most people have small enough databases to > fit into 2 GBytes, > > but search against 1 sequence at a time, so they > can't afford to read > > the entire database over NFS every time, and keep > it on a local disk. > > Do you have some sample proteins and databases ? > > I would like to test some machines i have availble > to mess around a > little bit. > (HP PA-Risc Series, SUN Sparc Fire, Itanium, Power > PC). > > I would like to build a little benchmark around > these datasets. > > regards > Robert Depenbrock > > -- > nic-hdl RD-RIPE > http://www.bay13.de/ > e-mail: robert at bay13.de > Fingerprint: 1CEF 67DC 52D7 252A 3BCD 9BC4 2C0E > AC87 6830 F5DD > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ===== PIL/BMB/SDU/DK __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/
- Previous message: What could be the performance of my cluster
- Next message: How do you keep clusters running....
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
