[Beowulf] interesting article on HPC vs evolution of 'big data' analysis
landman at scalableinformatics.com
Thu Apr 9 11:27:52 PDT 2015
On 04/09/2015 01:39 PM, Douglas Eadline wrote:
> Parallel programming is hard. By extension so are MPI and
> other methods to express parallelism. Is that news?
> Should the biologist* use MPI?
> If they want to, but probably not a good idea.
> Are there alternatives?
> Of course, and it depends on what you want to do.
> Should we be developing other parallel programming languages
> and methods?
> Yes, people are.
> Does one size fit all?
> Is Exascale and MPI a big deal?
> To the extent that it is providing a platform to learn
> new things about computing and nature, Yes.
> To the extent that most people need Exascale performance, No.
> Should I mention Hadoop, Big Data, and Cloud because it
> they currently have a positive slope on marketing charts?
I generally expect smart people to write reasonably well reasoned
articles. I looked over this one, and found little to agree with ...
the conclusions weren't quite supported by the evidence. The evidence,
as it were, was weak ... something I would not expect someone to give
serious consideration to (seriously? google trends for MPI which mixes
in many things, Spark which mixes many things, and Hadoop?).
Its not simply that the analysis and conclusion from it is suspect, its
that the entire premise of there being "the one true way" is (as Doug
noted above) fundamentally flawed. But worse than that, the entire
argument, as weak as it was, was irreparably damaged by poor data
collection and filtering.
This gets back to a problem near and dear to my heart ... metrics. If
you measure something, part of what you need to demonstrate is that,
well, your measurement has meaning. If you fail in this, if you cannot
point to a specific thing that your measurement does very well, then,
fundamentally, there is nothing useful you can extract from your
This is a classic example of this sort of issue. But think about it in
terms of larger scale metrics and "data based" decision making, where,
quite possibly, the collected data has little meaning, less utility
(can't really be applied).
It is terrifying to me to see the sheer number of "data driven"
entities, quite possibly ingesting and analyzing metric truckloads of
"data" (scare quotes intentional), and "extracting" steaming piles of
... information ... out of them.
No disrespect intended for the author of this piece; I just found it
seriously wanting/flawed in some deeply fundamental ways.
FWIW: I am aware of people intermixing hadoop and MPI. Right
tool(sets), right jobs. There is more than one way to do things.
I am of the opinion that "Big Data" (again with the scare quotes!) has a
significant overlap with HPC, and its important for HPC practitioners to
become versed in any/all technologies that could help things go
faster/better. Chris Samuel's quote was dead on.
> *any domain expert who does not have a programming/HPC background.
>> Curious as to what the body of thought is here on this article:
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> Mailscanner: Clean
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: landman at scalableinformatics.com
p: +1 734 786 8423 x121
c: +1 734 612 4615
More information about the Beowulf