[Beowulf] UNCHECKED Re: Spark, Julia, OpenMPI etc. - all in one place

Wed Oct 14 04:58:31 PDT 2020

Jim,

Thank you. I wrote distributed/parallel code in those times where PVM and
MPI were competing frameworks - I had the privilege to see things
transition from "big iron" multi-cpu machines from Sun, Dec etc. to beowulf
clusters and commodity hardware. At the time, things like MPI seemed like
real God-sends. Don't get me wrong, I am not criticizing MPI, just
wondering why nothing has come along to provide a higher level of
abstraction (with MPI underneath if necessary). Folks like Doug talk about
the lack of financial incentive but we are in academia and I am curious why
nobody came along and just did it as a research project, for example, as
opposed to the motivation of a potential commercial payoff down the road. I
also spent time in the industry starting in 2012 ("big data", how I dislike
this phrase but for the lack of better...) - things like Spark evolved in
parallel with functional languages like Scala, so at least you see some
kind of progression towards more verifiable code, code you can reason more
about, lazy evaluation, so on and so on. Meanwhile, in traditional HPC we
are still where we were 20 years ago and the same books on MPI apply. I
understand _why_ things like Spark evolved separately and differently (a
company generally does not have the luxury of an HPC cluster with a pay-for
parallel filesystem but they may have some machines on the ethernet they
can put together in a logical "cluster") and I am not saying we need the
same thing in HPC, I am just curious about (what I perceive as) the lack of
progress on the HPC side.

In your opinion, is it just inertia? I mean, professors know MPI and they
are telling their grad students to learn it, because, well, that's how it
was done before them? Or is it lack of better tooling? Are things like
Julia even proper improvements at the higher level of abstraction?

I was in a discussion with a CS professor at a prestigious US university a
few years ago about this very same topic, over a conference dinner. He got
pretty irritated when I asked him these questions. It was almost personal,
I thought he was going to throw the plate at me at the end :-)

On Wed, Oct 14, 2020 at 7:15 AM Jim Cownie <jcownie at gmail.com> wrote:

> As ever, good stuff from Doug, but I’ll just add a little more background.
>
> When we standardised MPI-1 (I was in the room in Dallas for most of this
> :-)) we did not expect it still to be the dominant interface which users
> would be coding to 25 years later, rather we expected that MPI would form a
> reasonable basis for higher level interfaces to be built upon, and we hoped
> that it would provide enough performance and be rich enough semantically to
> allow that to happen.
> Therefore our aim was not to make it a perfect, high-level, end-user
> interface, but rather to make it something which we (as implementers) knew
> how to implement efficiently while providing a reasonable, portable,
> vendor-neutral layer which would be usable either by end-user code, or by
> higher-level libraries (which could certainly include runtime libraries for
> higher level languages).
>
> Maybe we made it too usable, so no-one bothered with the higher-level
> interfaces :-) (I still have the two competing tee-shirts, one criticising
> MPI for being too big and having too many functions in the interface [and
> opinion from PVM…], the other quoting Occam as a rebuttal “praeter
> necessitatem” :-))
>
> Overall MPI succeeded way beyond our expectations, and, I think, we did a
> pretty good job. (MPI-1 was missing some things, like support for
> reliability, but that, at least, was an explicit decision, since, at the
> time, a cluster  had maybe 64 nodes and was plugged into a single wall
> socket, and we wanted to get the standard out on time!)
>
> -- Jim
> James Cownie <jcownie at gmail.com>
> Mob: +44 780 637 7146
>
>
> On 13 Oct 2020, at 22:03, Douglas Eadline <deadline at eadline.org> wrote:
>
>
> On Tue, Oct 13, 2020 at 3:54 PM Douglas Eadline <deadline at eadline.org>
> wrote:
>
>
> It really depends on what you need to do with Hadoop or Spark.
> IMO many organizations don't have enough data to justify
> standing up a 16-24 node cluster system with a PB of HDFS.
>
>
> Excellent. If I understand what you are saying, there is simply no demand
> to mix technologies, esp. in the academic world. OK. In your opinion and
> independent of Spark/HDFS discussion, why are we still only on openMPI in
> the world of writing distributed code on HPC clusters? Why is there
> nothing
> else gaining any significant traction? No innovation in exposing higher
> level abstractions and hiding the details and making it easier to write
> correct code that is easier to reason about and does not burden the writer
> with too much of a low level detail. Is it just the amount of investment
> in
> an existing knowledge base? Is it that there is nothing out there to
> compel
> people to spend the time on it to learn it? Or is there nothing there? Or
> maybe there is and I am just blissfully unaware? :)
>
>
>
> I have been involved in HPC and parallel computing since the 1980's
> Prior to MPI every vendor had a message passing library. Initially
> PVM (Parallel Virtual Machine) from Oak Ridge was developed so there
> would be some standard API to create parallel codes. It worked well
> but needed more. MPI was developed so parallel hardware vendors
> (not many back then) could standardize on a messaging framework
> for HPC. Since then, not a lot has pushed the needle forward.
>
> Of course there are things like OpenMP, but these are not distributed
> tools.
>
> Another issue the difference between "concurrent code" and
> parallel execution. Not everything that is concurrent needs
> to be executed in parallel and indeed, depending on
> the hardware environment you are targeting, these decisions
> may change. And, it is not something you can figure out by
> looking at the code.
> P
> arallel computing is hard problem and no one has
> really come up with a general purpose way to write software.
> MPI works, however I still consider it a "parallel machine code"
> that requires some careful programming.
>
> The good news is most of the popular HPC applications
> have been ported and will run using MPI (as best as their algorithm
> allows) So from an end user perspective, most everything
> works. Of course there could be more applications ported
> to MPI but it all depends. Maybe end users can get enough
> performance with a CUDA version and some GPUs or an
> OpenMP version on a 64-core server.
>
> Thus the incentive is not really there. There is no huge financial
> push behind HPC software tools like there is with data analytics.
>
> Personally, I like Julia and believe it is the best new language
> to enter technical computing. One of the issues it addresses is
> the two language problem. The first cut of something is often written
> in Python, then if it get to production and is slow and does
> not have an easy parallel pathway (local multi-core or distributed)
> Then the code is rewritten in C/C++ or Fortran with MPI, CUDA, OpenMP
>
> Julia is fast out the box and provides a growth path for
> parallel growth. One version with no need to rewrite.  Plus,
> it has something called "multiple dispatch" that provides
> unprecedented code flexibility and portability. (too long a
> discussion for this email) Basically it keeps the end user closer
> to their "problem" and further away from the hardware minutia.
>
> That is enough for now. I'm sure others have opinions worth
> hearing.
>
>
> --
> Doug
>
>
>
> Thanks!
>
>
>
> --
> Doug
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20201014/b2384524/attachment-0001.html>

[Beowulf] ***UNCHECKED*** Re: Spark, Julia, OpenMPI etc. - all in one place

[Beowulf] UNCHECKED Re: Spark, Julia, OpenMPI etc. - all in one place