<div dir="ltr">Peter:<div><br></div><div>Thanks for the questions.</div><div><br></div><div>The impossibility was theoretically proved that it is impossible to implement reliable communication in the face of [either sender or receiver] crashes.  Therefore, any parallel or distributed computing API that will force the runtime system to generate fixed program-processor assignments are theoretically incorrect. This answer is also related to your second question: the impossibility means 100% reliable communication is impossible.</div><div><br></div><div>Ironically, 100% reliable packet transmission is theoretically and practically possible as proved by John and Nancy for John's dissertation. These two seemingly conflicting results are in fact complementary. They basically say that distributed and parallel application programming cannot rely on the reliable packet transmission as all of our current distributed and parallel programming APIs assume. </div><div><br></div><div>Thus, MPI cannot be cost-ineffective in proportion to reliability, because of the impossibility. The same applies to all other APIs that allows direct program-program communications. We have found that the <key, value> APIs are the only exceptions for they allow the runtime system to generate dynamic program-device bindings, such as Hadoop and Spark. To solve the problem completely, the application programming logic must include the correct retransmission discipline. I call this Statistic Multiplexed Computing or SMC. The Hadoop and Spark implementations did not go this far. If we do complete the paradigm shift, then there will be no single point failure regardless how the application scales. This claim covers all computing and communication devices. This is the ultimate extreme scale computing paradigm.</div><div><br></div><div>These answers are rooted in the statistic multiplexing protocol research (packet switching). They have been proven in theory and practice that 100% reliable and scalable communications are indeed possible. Since all HPC applications must deploy large number of computing units via some sort of interconnect (HP's The Machine may be the only exception), the only correct API for extreme scale HPC is the ones that allow for complete program-processor decoupling at runtime. Even the HP machine will benefit from this research. Please note that the 100% reliability is conditioned by the availability of the "minimal viable set of resources". In computing and communication, the minimal set size is 1 for every critical path. </div><div><br></div><div>My critics argued that there is no way statistic multiplexed computing runtime can compete against bare metal programs, such as MPI. We have evidences to prove the opposite. In fact SMC runtime allows dynamic adjustments of processing granularity without reprogramming. Not only we can prove faster performances using heterogeneous processor but also homogeneous processors. We see this capability is critical for extracting efficiency out of HPC clouds.</div><div><br></div><div>Justin</div><div>SERC 310</div><div>Temple University</div><div><a href="mailto:shi@temple.edu">shi@temple.edu</a></div><div>+1(215)204-6437</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Mar 6, 2016 at 5:02 PM, Peter St. John <span dir="ltr"><<a href="mailto:peter.st.john@gmail.com" target="_blank">peter.st.john@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Justin,<div>I'm unsure just what you mean by some of what you said.</div><span class=""><div><br></div><div>"<span style="font-size:12.8px">Any fixed program-processor binding is a single point failure"</span></div></span><div><span style="font-size:12.8px">I'm troubled by the word "any". What about running two copies of a program, each with its own copy of the same data, on two processors (e.g. on a Tandem machine)? Surely that is not a single point of failure; is it not a "fixed program-processor binding"?</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">"...</span><span style="font-size:12.8px"> </span><span style="font-size:12.8px">it is impossible to implement reliable communication in the face of..."</span></div><div><span style="font-size:12.8px">If by "reliable" you mean "perfectly reliable" then the thesis is trivial and does not require proof. Reliability is a metrical value with costs; the cost is space (e.g. for error-correcting codes) or time (e.g. for re-transmissions) or whatever. Do you mean that MPI is cost-ineffective in proportion to reliability? If so, why?</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">Thanks,</span></div><div><span style="font-size:12.8px">Peter</span></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Mar 6, 2016 at 11:10 AM, Justin Y. Shi <span dir="ltr"><<a href="mailto:shi@temple.edu" target="_blank">shi@temple.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Actually my interest in your group is not much between "hate" and "love" of MPI or any other APIs. I am more interested in the "correctness" of parallel APIs.<div><br></div><div>Three decades ago, not doing "bare metal" computing was impossible for effective parallel processing. Today, insisting on "bare metal" computing is detrimental to extreme scale efforts.</div><div><br></div><div>Any fixed program-processor binding is a single point failure. The problem only shows when the application scales. And it is impossible to implement reliable communication in the face of crashes [Alan Fekete, Nancy Lynch and John Spinelli's 93 JACM paper proved this theoretically]. Therefore, any direct program-program communication API are theoretically incorrect for extreme scaling applications.  </div><div><br></div><div>The <key, value> pair API seems the only theoretically correct parallel programming API that can take us out of the abyss of impossibilities. However, systems like Hadoop and Spark have only showed the great promises of program-device decoupling, they were not really designed for tackling HPC applications. And their decoupling is incomplete by their runtime implementations.</div><div><br></div><div>I proposed a Statistic Multiplexed Computing idea leveraging the successes of <key, value> api systems and old Tuple Space semantics. My github contribution is called Synergy3.0+.  You are welcome to check it out and do a "bare metal" comparison against MPI and any other.</div><div><br></div><div>Our latest development is AnkaCom that was designed to tackling data intensive HPC without scaling limits.</div><div><br></div><div>My apologies in advance for my shameless self-advertising.  I am looking for serious collaborators who are interested in breaking this decade-old barrier.</div><span><font color="#888888"><div><br></div><div>Justin Y. Shi</div><div><a href="mailto:shi@temple.edu" target="_blank">shi@temple.edu</a></div><div>SERC 310</div><div>Temple University</div><div><a href="tel:%2B1%28215%29204-6437" value="+12152046437" target="_blank">+1(215)204-6437</a><br><div><br></div><div><br></div></div></font></span></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Mar 4, 2016 at 10:14 AM, C Bergström <span dir="ltr"><<a href="mailto:cbergstrom@pathscale.com" target="_blank">cbergstrom@pathscale.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">A few people have subscribed and it's great to see some interest -<br>

hopefully we can start some interesting discussions. Actually - my<br>

background is more on the "web" side of HPC. I took a big jump when I<br>

started working @pathscale - Over the past 6 years I've cringed more<br>

than once when I see design that looks ***worse*** (I didn't think<br>

possible) than hibernate with tons of outer joins and evil xml<br>

configs.. (Java references for anyone unfortunate enough to get what<br>

I'm saying)<br>

<div><div><br>

<br>

<br>

<br>

On Fri, Mar 4, 2016 at 10:05 PM, Justin Y. Shi <<a href="mailto:shi@temple.edu" target="_blank">shi@temple.edu</a>> wrote:<br>

> Thank you for creating the list. I have subscribed.<br>

><br>

> Justin<br>

><br>

> On Fri, Mar 4, 2016 at 5:43 AM, C Bergström <<a href="mailto:cbergstrom@pathscale.com" target="_blank">cbergstrom@pathscale.com</a>><br>

> wrote:<br>

>><br>

>> Sorry for the shameless self indulgence, but there seems to be a<br>

>> growing trend of love/hate around MPI. I'll leave my opinions aside,<br>

>> but at the same time I'd love connect and host a list where others who<br>

>> are passionate about scalability can vent and openly discuss ideas.<br>

>><br>

>> Despite the comical name, I've created mpi-haters mailing list<br>

>> <a href="http://lists.pathscale.com/mailman/listinfo/mpi-haters_lists.pathscale.com" rel="noreferrer" target="_blank">http://lists.pathscale.com/mailman/listinfo/mpi-haters_lists.pathscale.com</a><br>

>><br>

>> To start things off - Some of the ideas I've been privately bouncing<br>

>> around<br>

>><br>

>> Can current directive based approaches (OMP/ACC) be extended to scale<br>

>> out. (I've seen some research out of Japan on this or similar)<br>

>><br>

>> Is Chapel c-like syntax similar enough to easily implement in clang<br>

>><br>

>> Can one low level library succeed at creating a clean interface across<br>

>> all popular industry interconnects (libfabrics vs UCX)<br>

>><br>

>> Real world success or failure of "exascale" runtimes? (What's your<br>

>> experience - lets not pull any punches)<br>

>><br>

>> I won't claim to see ridiculous scalability in most web applications<br>

>> I've worked on, but they had so many tools available - Why have I<br>

>> never heard of memcache being used in a supercomputer and or why isn't<br>

>> sharding ever mentioned...<br>

>><br>

>> Everyone is welcome and lets keep it positive and fun - invite your<br>

>> friends<br>

>><br>

>><br>

>> ./C<br>

>><br>

>> ps - Apologies if you get this message more than once.<br>

>> _______________________________________________<br>

>> Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>

>> To change your subscription (digest mode or unsubscribe) visit<br>

>> <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>

><br>

><br>

</div></div></blockquote></div><br></div>

</div></div><br>_______________________________________________<br>

Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>

To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>

<br></blockquote></div><br></div>

</div></div></blockquote></div><br></div>