[Beowulf] Re: scaling
landman at scalableinformatics.com
Mon Jan 16 19:00:08 PST 2006
Jim Lux wrote:
> I think that unforeseen scaling problems are probably the single biggest
> challenge with HPC.
Yes. Absolutely. This is why in the Bioinformatics benchmark system, I
have purposely scaled the benchmark sizes from tiny (which shouldn't be
used for much more than proof that it runs correctly) to quite large. I
need to update the gargantuan mpiblast test. So far I don't know if
anyone was able to run that test successfully.
It is very hard to shake bugs out of a system when you don't pound on it
in the same way that your program will. We have in the past seen people
post linpack/hpcc results and "passing" test reports for clusters that
were unable to perform their primary function due to bugs/errors in the
build of the system.
Nothing catches problems like real users really using the system.
> If it's not the plethora of files, or crippling
> interprocessor comm needs, it's something like timing races and implied
> barriers (in a brute force master doling out identical work packets to
> all the slaces.. they all finish at the same time).
Yup. Way way back (2000-ish) in ct-blast we had these neat corner cases
where we would have a nice distribution of sequence sizes, apart from
the few gargantuan sequences. We used some neat techniques to try to
deal with the load imbalance (various sorting/sampling methods for the
input sequences), and that helped somewhat, but better techniques were
needed for the monster sequences.
You could see the load imbalance in the queuing system. The long
running jobs would keep running, and running, and running ... You could
to a degree predict which jobs would take a long time, it was a function
of the input sequence length. With a little bit of program
intelligence, you could bubble these to the start of the queue. Would
make the delays to get initial results large, but the load balance would
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452
cell : +1 734 612 4615
More information about the Beowulf