[Beowulf] ***UNCHECKED*** Re: Spark, Julia, OpenMPI etc. - all in one place

Michael Di Domenico mdidomenico4 at gmail.com
Wed Oct 14 07:10:55 PDT 2020


On Wed, Oct 14, 2020 at 9:04 AM Oddo Da <oddodaoddo at gmail.com> wrote:
>
> Perhaps but I am seeing these at the same level of abstraction - the low level where you have to spell everything out.

i'm not sure I understand what you expect.  Message passing is a low
level function of passing bits of memory around the network.  i'm not
sure how you can abstract that away from the user.  if you're writing
something to solve an equation you have to know where things are and
where they're going.  there are certainly plenty of math libraries
that use MPI to solve complex equations where the user only has to
understand the equation and not the paradigm of how it gets solved in
hardware/software.

> I see what you wrote above as contradicting yourself. First you say that Spark/hadoop did not evolve from economic motivations but then you say that it did. You say that Spark and MPI can be used interchangeably. OK if that's the case, why does the data science/ML industry just not use MPI everywhere. Why did they not start with MPI as the underlying paradigm and just build all the tooling on top of it? I wish they did... ;-)

cost was one factor that accelerated spark/hadoop, it's not the only
or even the biggest factor.  the ML folks didn't start with MPi
because the AI frameworks were bred on workstations and then ported to
non HPC hardware (aka cloud platforms) where MPI isn't the dominant
paradigm.  now that ML/AI is taking hold in the HPC community for
different aspects and the models are starting to expand beyond the 4-8
gpus you can stick in a single box they are adding MPI underneath
(look at horovad) to spread the models over multiple machines (scale
out vs scale up).

> I see MPI as a low level solution for a problem, at the abstraction level where you need to spell everything out. It is like the comparison between C and languages like Scala or Haskell or Julia. I am asking why there is no progress on the latter in this scenario - we have the message passing interface level of abstraction, why are we not interested in using this to build tooling that is at the higher level, where we can hide the how and focus on the what.

In my opinion comparing MPI/C to Scala/Haskell isn't a fair
comparison.  Haskell was designed to be easy for mathematicians, just
like R is for statistians.  but both hide pretty much all of the code
under the covers.  That's great if all you want to do is solve
equations and haskell already has the code to implement them.  someone
could certainly add MPI solvers to haskell and do what i think you're
asking for.  but someone has to be motivated to do that

> 20 years ago, when I was an undergrad, I took a 200-level course in data structures and algorithms and it was taught in a programming language called Eiffel. The professor started with saying - "Eiffel is gaining in popularity, there are many new books being written about Eiffel but none about C. Do you know why?". I raised my hand and said "because C is an old language and nothing has changed and everything that was to be said about it, was already said about it in many previous books". At least in that domain we could discuss these things - new languages and paradigms and tooling abound. In the world of HPC, not so much.

C has morphed into other things, julia/C++/C#/go/etc, but C isn't
going to change unless the underlying hardware changes (ie quantum).
it's also a matter of intellectual build, there's so much intellectual
property in C around the world and no real reason to just rip it out.

i don't have a horse in this race either.  we can go back and forth
all day, but i'm not sure i understand what you're ultimately try to
get at (or i missed it).  to state that HPC hasn't evolved in 20yrs i
believe is a gross misrepresentation and to say that it's not evolved
into something great like Spark or Scala, isn't a fair comparison.


More information about the Beowulf mailing list