[Beowulf] massive parallel processing application required
gerry.creager at tamu.edu
Thu Feb 1 04:52:16 PST 2007
Mitchell Wisidagamage wrote:
>> Please don't fall into the trap of thinking "e-Science" requires a tie
>> to the Globus Toolkit to be valid.
> I do not think this (anymore). I queried Matthew Haynos from IBM who's
> an expert in this area some time ago as I'm new to grid computing. The
> silly questions are from me :o) Answers are his.
> Because at the moment distributed computing is only popular in the
> academic research and highly specialized part of the industry...atleast
> that's what I think. Any professional and personal comments from your
> Not true. Distributed computing is more and more mainstream. I think
> too that you are looking at distributed computing perhaps too narowly.
> Even if you are referring to supercomputing, witness that more and more
> of the Top 500 supercomputing sites are increasingly commerical (as
> opposed to academic or public institutions).
> Anyhow I just read it again and you stated that "Grid computing becoming
> more of a defacto standard for distributed computing in enterprises".
> May I ask why do you think that?
> I would say b/c of the growing ubiquity of scale-out computing (lots of
> machines, lots of resources, etc.) What's happening here is that
> scheduling, etc. is going from the machine into the network. People no
> longer know where things are going to run with hundreds / thousands of
> blade processors. This is a sea change. People use to say run this
> piece of work on this machine, now it's just run this work, I have no
> idea where. I've written an article series for IBM's grid site on
> Check out:
> particularly the "Next-generation distributed computing" article for a
> primer. I think you'll find the five or so articles in the series
I've read the article series and it is interesting. And, I'm not
completely given over to anti-grid sentiment. The problem remains,
however, to be embodied by a colleague, recounting his experience in
running an ocean circulation model: "We only had a 13% slowdown running
this as a grid application when compared to our local cluster."
Now, there are several things to consider that go unsaid here. One is
the degree of coupling in the code. Another is the size of the datasets
that have to be moved to the various sites to facilitate operations.
some codes will perform well when distributed broadly, while others will
die a horrid death waiting for pieces of the result to come back from
that P3 installation in Outer Geekdom. Some will suffer simply from
communications latency. Others will just continue to chug along. By
way of illustration, we benchmarked my MM5 semi-production run of 72
forecast hours for 3 domains of increasing resolution across the United
States. To complete in the same timeframe as a locally submitted job,
we found a requirement to double the number of processors when it was
distributed out to the "grid". This is an extreme example, of course,
and not one I propose to repeat anytime soon... It's much easier to run
MM5 and WRF locally and not have to worry quite so much about resource
reservation and odd processors failing mid-run.
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
More information about the Beowulf