[Beowulf] ***UNCHECKED*** Re: Spark, Julia, OpenMPI etc. - all in one place

Jim Cownie jcownie at gmail.com
Wed Oct 14 04:15:48 PDT 2020


As ever, good stuff from Doug, but I’ll just add a little more background.

When we standardised MPI-1 (I was in the room in Dallas for most of this :-)) we did not expect it still to be the dominant interface which users would be coding to 25 years later, rather we expected that MPI would form a reasonable basis for higher level interfaces to be built upon, and we hoped that it would provide enough performance and be rich enough semantically to allow that to happen.
Therefore our aim was not to make it a perfect, high-level, end-user interface, but rather to make it something which we (as implementers) knew how to implement efficiently while providing a reasonable, portable, vendor-neutral layer which would be usable either by end-user code, or by higher-level libraries (which could certainly include runtime libraries for higher level languages).

Maybe we made it too usable, so no-one bothered with the higher-level interfaces :-) (I still have the two competing tee-shirts, one criticising MPI for being too big and having too many functions in the interface [and opinion from PVM…], the other quoting Occam as a rebuttal “praeter necessitatem” :-))

Overall MPI succeeded way beyond our expectations, and, I think, we did a pretty good job. (MPI-1 was missing some things, like support for reliability, but that, at least, was an explicit decision, since, at the time, a cluster  had maybe 64 nodes and was plugged into a single wall socket, and we wanted to get the standard out on time!)

-- Jim
James Cownie <jcownie at gmail.com>
Mob: +44 780 637 7146


> On 13 Oct 2020, at 22:03, Douglas Eadline <deadline at eadline.org> wrote:
> 
> 
>> On Tue, Oct 13, 2020 at 3:54 PM Douglas Eadline <deadline at eadline.org>
>> wrote:
>> 
>>> 
>>> It really depends on what you need to do with Hadoop or Spark.
>>> IMO many organizations don't have enough data to justify
>>> standing up a 16-24 node cluster system with a PB of HDFS.
>>> 
>> 
>> Excellent. If I understand what you are saying, there is simply no demand
>> to mix technologies, esp. in the academic world. OK. In your opinion and
>> independent of Spark/HDFS discussion, why are we still only on openMPI in
>> the world of writing distributed code on HPC clusters? Why is there
>> nothing
>> else gaining any significant traction? No innovation in exposing higher
>> level abstractions and hiding the details and making it easier to write
>> correct code that is easier to reason about and does not burden the writer
>> with too much of a low level detail. Is it just the amount of investment
>> in
>> an existing knowledge base? Is it that there is nothing out there to
>> compel
>> people to spend the time on it to learn it? Or is there nothing there? Or
>> maybe there is and I am just blissfully unaware? :)
>> 
> 
> 
> I have been involved in HPC and parallel computing since the 1980's
> Prior to MPI every vendor had a message passing library. Initially
> PVM (Parallel Virtual Machine) from Oak Ridge was developed so there
> would be some standard API to create parallel codes. It worked well
> but needed more. MPI was developed so parallel hardware vendors
> (not many back then) could standardize on a messaging framework
> for HPC. Since then, not a lot has pushed the needle forward.
> 
> Of course there are things like OpenMP, but these are not distributed
> tools.
> 
> Another issue the difference between "concurrent code" and
> parallel execution. Not everything that is concurrent needs
> to be executed in parallel and indeed, depending on
> the hardware environment you are targeting, these decisions
> may change. And, it is not something you can figure out by
> looking at the code.
> P
> arallel computing is hard problem and no one has
> really come up with a general purpose way to write software.
> MPI works, however I still consider it a "parallel machine code"
> that requires some careful programming.
> 
> The good news is most of the popular HPC applications
> have been ported and will run using MPI (as best as their algorithm
> allows) So from an end user perspective, most everything
> works. Of course there could be more applications ported
> to MPI but it all depends. Maybe end users can get enough
> performance with a CUDA version and some GPUs or an
> OpenMP version on a 64-core server.
> 
> Thus the incentive is not really there. There is no huge financial
> push behind HPC software tools like there is with data analytics.
> 
> Personally, I like Julia and believe it is the best new language
> to enter technical computing. One of the issues it addresses is
> the two language problem. The first cut of something is often written
> in Python, then if it get to production and is slow and does
> not have an easy parallel pathway (local multi-core or distributed)
> Then the code is rewritten in C/C++ or Fortran with MPI, CUDA, OpenMP
> 
> Julia is fast out the box and provides a growth path for
> parallel growth. One version with no need to rewrite.  Plus,
> it has something called "multiple dispatch" that provides
> unprecedented code flexibility and portability. (too long a
> discussion for this email) Basically it keeps the end user closer
> to their "problem" and further away from the hardware minutia.
> 
> That is enough for now. I'm sure others have opinions worth
> hearing.
> 
> 
> --
> Doug
> 
> 
> 
>> Thanks!
>> 
> 
> 
> -- 
> Doug
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20201014/f6831757/attachment.html>


More information about the Beowulf mailing list