[Beowulf] Does computation threaten the scientific method?

Thu Mar 29 09:22:49 PDT 2012

To borrow from an old joke, I'd say the short answer is "No.", and the 
long answer?  "Nooooooooooo."

Reproducibility is an interesting issue - on the surface, it seems like 
a binary thing: something is or is not reproducible.  In reality, 
though, things are almost never duplicated exactly, and there exists 
some fuzzy threshold at which point things are considered good enough to 
be a reproduction.  I can go down to a local store and buy a print of 
the Mona Lisa and, to me, it might be a really great reproduction, yet 
even writing that sentence has some art critic screaming in agony.  
Similarly, in computing, if I run some model on two different systems 
and get two different results, that can either be indicative of a 
potential issue or it can be completely fine, because those differences 
are below a certain threshold and thus the runs were, in scientific 
terms, 'reproducible' with respect to each other.

On a small scale (meaning a lab, code or project), this is a key issue - 
I've seen grad students and faculty alike be dismayed by trivial 
differences, and when this happens, more often than not the mentality 
is, "My first results are correct - make this code give them back to 
me", without understanding that the later, different results are quite 
possibly equally valid, and possibly more so.  Back in the early Beowulf 
days, I remember switching some codes from an RS/6000 platform to an 
x86-based one, and the internal precision of the x86 FPU was 80-bits, 
not 64, so sequences of FP math could produce small differences unless 
this option was specifically disabled via compiler switches.  Which a 
lot of people did, not because the situation was carefully considered, 
but because with it on, it gave 'wrong' results.    Another example 
would be an algorithm that was orders of magnitude faster than one 
previously in use, but wasn't adopted because ultimately the results 
were different.  The catch here?  Reordering the input data while still 
using the original algorithm gave similarly different answers - the 
nature of the code was that single runs were useless, and ensemble runs 
were a necessity.

Ultimately, the issues here come down to the common perception of 
computers - "They give you THE answer!" - versus the reality of 
computers - "They give you AN answer!", with the latter requiring 
additional effort to provide some error margin or statistical analysis 
of results.  This happens in certain computational disciplines far more 
often than others.

On the larger scales - whether reproducibility is an issue in scientific 
/fields/ - again, I'd say the answer is no.   The scientific method is 
resilient, but it never made any claims to be 'fast'.  Would it speed 
things up to have researchers publish their code and data?  Probably.  
Or, rather, it'd certainly speed up the verification of results, but it 
might also inhibit new approaches to doing the same thing.  Some people 
here might recall Michael Abrash's "Graphics Programming Black Book", 
which had a wonderful passage where about a word-counting program.  It 
focused explicitly on performance tuning, with the key lesson being that 
nobody thought there was a better way of doing the task... until someone 
showed there was.  And that lead to a flurry of new ideas.  Similarly, 
having software that does things in a certain way often convinces people 
that that is THE way of doing things, whereas if they knew it could be 
done but not how, newer methods might develop.  There's probably some 
happy medium here, since having so many different codes, mostly with a 
single author who isn't a software developer by training, seems less 
efficient and flexible than a large code with good documentation, a good 
community and the ability to use many of those methods previously in the 
one-off codes.

In other words, we can probably do better, but science itself isn't 
threatened by the inefficiency in verifying results, or even bad results 
- in the absolute worst case, with incorrect ideas being laid down as 
the foundation for new science and no checking done on them, progress 
will happen until it can't... at which point people will backtrack until 
the discover the underlying principle they thought was correct and will 
fix it.  The scientific method is a bit like a game of chutes and 
ladders in this respect.

Ultimately, in a lot of ways, I think computational science has it 
better than other disciplines.  There was news earlier this week [1] 
about problems reproducing some early-stage cancer research - 
specifically, Amgen tried to reproduce 53 'landmark' conclusions, and 
were only able to do so with 11% of them.  Again, that's OK - it will 
correct itself, albeit in slow fashion, but what's interesting here is 
that these sorts of experiments, especially those involving mice (and 
often other wet-lab methods), don't have something like Moore's Law 
making them more accessible over time.  To reproduce a study involving 
the immune system of a mouse, I need mice.  And I need to wait the 
proper number of days.  Yet with computational science, what today may 
take a top end supercomputer can probably be done in a few years on a 
departmental cluster.  A few years after that?  Maybe a workstation.  In 
our field, data doesn't really change or degrade over time and the 
ability to analyze it in countless different ways becomes more and more 
accessible all the time.

In short (hah, nothing about this was short!), can we do better with our 
scientific approaches?  Probably.  But is the scientific method 
threatened by computation?  Nooooooooo.  :-)

That's my two cents,
   - Brian

[1] 
http://vitals.msnbc.msn.com/_news/2012/03/28/10905933-rethinking-how-we-confront-cancer-bad-science-and-risk-reduction
      Or, more directly (if you have access to Nature) : 
http://www.nature.com/nature/journal/v483/n7391/full/483531a.html

(PS.  The one thing which can threaten science is a lack of education - 
it can decrease the signal-to-noise ratio of 'good' science, amongst 
other things.  That's a whole essay in itself.)
(PPS.  This was a long answer, and yet not nearly long enough... but I 
didn't want to be de-invited from future Beowulf Bashes by writing even 
more!)

On 3/29/2012 7:58 AM, Douglas Eadline wrote:
>
> I am glad some one is talking about this. I have wondered
> about this myself, but never had a chance to look into it.
>
>
> http://www.isgtw.org/feature/does-computation-threaten-scientific-method
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20120329/d9935550/attachment.html>