[Beowulf] Re: OT: informatics software for linux clusters
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comMon May 15 13:09:18 PDT 2006
- Previous message: [Beowulf] Re: OT: informatics software for linux clusters
- Next message: [Beowulf] Re: OT: informatics software for linux clusters
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
David Mathog wrote: >> Scalable Informatics has released Scalable HMMer, an optimized >> version of HMMer 2.3.2 that is 1.6-2.5x faster per node on benchmark >> tests run on Opteron systems. > > Did you remove the memory organization changes SE put in to make > it run better on the Altivec Macs? Those really made life hard when I > was trying to optimize this code to run Hi Dave: We didn't start from the Altivec patch. It is in a large "ifdef" in fast_algorithms.c. I didn't see memory organization changes in the non-altivec code (though there was a line about some issue with the Intel compilers). We started from the base p7Viterbi in fast_algorithms, and rewrote the loops a bit. > on our Beowulf with Athlon MP processors. The problem was the > P7Viterbi data structures didn't fit entirely into cache (no matter I was worried about cache thrashing (and still am) with our changes. The code isn't complex, but the particulars of the original implementation weren't terribly cache friendly. > how it was organized) and this resulted in toxic query lengths that ran > several times slower. That is, take a query sequence > of length 1000, run hmmpfam, nip off the last character, run it again, > etc. It was anything but a smooth function of execution time vs. query Ohhh.... I would love a test like that. Is this something that you found in general with the baseline code or with the Altivec'ed code? This would be very good to include in our regression testing... > length. Working around the Altivec stuffed helped some but didn't > entirely eliminate the effect. Probably the bigger cache on the > Opteron would eliminate this effect for smaller sequences but I'm > guessing you could still run into it with a long query. We ran an 8000 letter query length as our longest test. If you have some specific test cases which exercise bugs, please let me know what they are and I will see if we can use them. > > This has nothing to do with the Parallel implementation though, it > was a data size vs. cache size effect. That is an issue with this code. The Athlon has a 256k L2 last I remember, and a 128k L1. Rather hard to keep lots of stuff in cache. Right now the big issue we are running into for another aspect of this project is the lack of a vector max/min function in SSE*. (If anyone from AMD/Intel is listening, this is a *big* issue, and I even have a rough idea how to do it "quickly" in SSE at the expense of many SSE registers. Joe > > Regards, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615
- Previous message: [Beowulf] Re: OT: informatics software for linux clusters
- Next message: [Beowulf] Re: OT: informatics software for linux clusters
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
