[Beowulf] Re: Why Do Clusters Suck?

Tue Mar 22 15:04:14 PST 2005

Joe Landman wrote:
> 
> 
> Craig Tierney wrote:
>
>> Our biggest problem is the immaturity of development
>> tools.  Another way to put that is "my compiler doesn't reproduce
>> the bugs in the other compilers my users are accustom to using"
>> or "Fortran isn't a standard, it is a suggestion". It is a rare creature
>> that writes clean, portable code.  It is all too common to hear
>> developers tell me things like "does it work if you turn off bounds
>> checking?".  I spend way too much time with new users trying to explain
>> to them the difference between 'code porting' and 'bug fixing'.  
> 
> 
> <commiserate />
> 
> me:   "how do you know it works"
> them:"it compiles with no errors"
> me:   "no... how do you know it works, functions correctly?"
> them:(puzzled look) "it compiles with no errors ..."
> 

Amen to that, Joe.

My personal complaint is that there aren't enough good standard 
test/validation suites out there for cluster building.  Some libraries 
like Atlas include them, but they are also tied to that specific 
package.  It would be really great if as a community we could do 
something like the Linux test project oriented towards cluster-building 
and scientific computing.  Something that I can run when my boss wants 
"proof" that upgrading a library didn't completely rejigger the 
numerical stability of the results.  I know that the stock answer here 
is that we ought to generate our own regression tests based on our on 
particular application set, but I think it would be a boon for a more 
generic framework and solution to evolve. If nothing else, it would 
offer a basis for heterogeneous systems in a grid environment to trust 
each other's results without necessarily requiring full application 
cross-validation.  It might be a pipe dream, but I like it 8-)

Andy