[Beowulf] [External] Spark, Julia, OpenMPI etc. - all in one place

Tue Oct 13 05:42:07 PDT 2020

Looking at my question I posted earlier how is it that java is so high performing when it comes to large data sets?

Regards,
Jonathan

From: Beowulf <beowulf-bounces at beowulf.org> On Behalf Of Oddo Da
Sent: 13 October 2020 14:38
To: Michael Di Domenico <mdidomenico4 at gmail.com>
Cc: Beowulf Mailing List <beowulf at beowulf.org>
Subject: Re: [Beowulf] [External] Spark, Julia, OpenMPI etc. - all in one place

On Tue, Oct 13, 2020 at 8:33 AM Michael Di Domenico <mdidomenico4 at gmail.com<mailto:mdidomenico4 at gmail.com>> wrote:
i can't speak from a general industry sense, but i've had everything
run through my center over the past 11 years.  Hadoop seemed like
something that was going to take off.  it didn't with my group of
users.  we aren't counting clicks nor parsing text from huge files, so
its utility to us faded.  my understanding is the group behind hadoop
also made several industry missteps when trying to commercialize, i'm
not sure what happened after that.  i think a lot people realized that
hadoop made things easier, but the overhead was too high given the
limited functionality most people wanted to use it for

Michael, thank you for the insight. I think Hadoop in general is mostly dying, Spark is really the derivative that took off. Basically, what you are saying is that there is no demand on your infra for this kind of work. Do you have any insights as to why not? Do the AI/DS/ML guys just know that they cannot use your resources to run standard loads and go straight to the cloud or local ethernet clusters?

In your estimate, how many of your users write code in Julia vs MPI vs Python?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20201013/3f08e01f/attachment.html>