[Beowulf] Re: scaling
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comMon Jan 16 19:00:08 PST 2006
- Previous message: [Beowulf] scaling
- Next message: [Beowulf] RE: [Bioclusters] FPGAin bioinformatics clusters (again?)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jim Lux wrote: > I think that unforeseen scaling problems are probably the single biggest > challenge with HPC. Yes. Absolutely. This is why in the Bioinformatics benchmark system, I have purposely scaled the benchmark sizes from tiny (which shouldn't be used for much more than proof that it runs correctly) to quite large. I need to update the gargantuan mpiblast test. So far I don't know if anyone was able to run that test successfully. It is very hard to shake bugs out of a system when you don't pound on it in the same way that your program will. We have in the past seen people post linpack/hpcc results and "passing" test reports for clusters that were unable to perform their primary function due to bugs/errors in the build of the system. Nothing catches problems like real users really using the system. > If it's not the plethora of files, or crippling > interprocessor comm needs, it's something like timing races and implied > barriers (in a brute force master doling out identical work packets to > all the slaces.. they all finish at the same time). Yup. Way way back (2000-ish) in ct-blast we had these neat corner cases where we would have a nice distribution of sequence sizes, apart from the few gargantuan sequences. We used some neat techniques to try to deal with the load imbalance (various sorting/sampling methods for the input sequences), and that helped somewhat, but better techniques were needed for the monster sequences. You could see the load imbalance in the queuing system. The long running jobs would keep running, and running, and running ... You could to a degree predict which jobs would take a long time, it was a function of the input sequence length. With a little bit of program intelligence, you could bubble these to the start of the queue. Would make the delays to get initial results large, but the load balance would be better. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615
- Previous message: [Beowulf] scaling
- Next message: [Beowulf] RE: [Bioclusters] FPGAin bioinformatics clusters (again?)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
