[Beowulf] ***UNCHECKED*** Re: Spark, Julia, OpenMPI etc. - all in one place

Oddo Da oddodaoddo at gmail.com
Wed Oct 14 11:06:30 PDT 2020


On Wed, Oct 14, 2020 at 1:24 PM Michael Di Domenico <mdidomenico4 at gmail.com>
wrote:

> On Wed, Oct 14, 2020 at 11:53 AM Oddo Da <oddodaoddo at gmail.com> wrote:
> > I did not use Spark or Scala as measures of greatness but they are
> evolution, at least people are trying ;). Not all evolution is in the
> positive direction, of course. But I do think that the world of software
> engineering has moved/changed for better since 1990s. Yes, we built
> software just fine in the 1990s and we built it fine in the 1960s but that
> is like saying we drove cars just fine in the 1930s, why do we need new
> cars.
>
> I don't see it that way.  to me things like hadoop/Spark/etc were
> designed to solve a specific problem other paradigms couldn't (or
> rather shouldn't).  it's not evolutionary, it's something new.  your
>

You stated that Spark/Hadoop approach can code for everything that MPI can
code for and vice versa. If this is all true and it is that easy, nobody
would have "invented" them since we already had MPI/C/C++ to solve all our
problems ;-). Yet, I think it was the absence of technical debt or maybe
investment is a better term - to describe why people went with a particular
approach. First it was Hadoop and then Spark, Akka, kafka and all of the
surrounding ecosystem, including new languages to facilitate this
innovation.

i'll agree that in some respects software engineering has gotten
> better in the last 20yrs, but it's subjective.  there are a lot of
> things that have gotten better and there are a lot of things that are
> much worse.  but i'm not sure you can apply that statement to HPC.
> HPC code doesn't churn like business code or even more volatile cloud
> code.  HPC code is usually written to solve something specific and
> gets incremental updates over time.  usually that something specific
> hasn't changed the last 20yrs (think physics/chemistry) the models we
> use to describe or solve the problems likely have, but the underlying
> code is probably basically the same with tweaks along the way to fit
> the new model.
>

I disagree. I think yes, there is old code that does not churn but there
are always new people/grad students coming into the field. They too are
being pointed in the same direction of how to do things, which is what we
are discussing here ;-)

an evolution to MPI.  but it goes back to technical debt.  to re-write
> something in chapel is non-trivial and may not be worth the time.
> writing something new and choosing chapel is really left up to the
> developer.  i have some chapel users here and there, but they're a
> minority.  and since chapel is largely only found on cray machines its
> exposure is low
>

It seems that in your world nothing new ever gets written? You are talking
only about re-writes ;).

i'm not sure the philosophical debate you're looking for is one that
> can take place.  like vim vs emacs or init5 vs systemd.  everything
> exists and it usually boils down to personal choice.  i run a fairly
>

OK, this is another contribution I appreciate. So far I have "technical
investment", "lack of motivation" and "personal preference". I came here
trying to figure out what it is that makes or breaks these things - I
appreciate you taking the time!

large hpc center and "user written" C/MPI code really only represents
> <20% of my workload.  but that's subjective.  i'd bet if the beowulf
> list did a poll you'd find heavy slants based on user base.  if you
> feel the industry hasn't moved, maybe thats just where you are
> working, what you're doing, or who you're working with rather than a
> representation of the hpc industry.
>

This is probably true. What is the rest of the 80% of the load in your HPC
world?

i still think you're trying to compare two things that shouldn't be
> compared.  MPI isn't a programming language, it's a library.  if you
> want to debate programming language evolution, that's a totally
> separate discussion from one that includes MPI/Spark/Etc
>

Programming languages are a part of it and I have said this before -
languages like Julia can incorporate MPI as an underlying (or one of
underlying) mechanisms/libraries to distribute computation. I have nothing
against MPI (as I have stated before). I have something - curiosity - about
what is holding a field in a certain state. Spark is a framework but I
think it is much more than MPI, by the way - as it is both a way to
distribute computation, but there is also lazy evaluation, resilient
datasets, Scala, functional programming etc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20201014/f5099c58/attachment.html>


More information about the Beowulf mailing list