[Beowulf] [External] Spark, Julia, OpenMPI etc. - all in one place

Oddo Da oddodaoddo at gmail.com
Mon Oct 12 19:04:30 PDT 2020


Johann-Tobias,

Thank you for the reply.

I don't know enough detail about Julia to even be confused (I am learning
it now) :-)

It just seems to me that things have not really changed in the tooling in
the HPC space since 20+ years ago. This is why I thought, well, hoped that
something new and more interesting would have come along - like Julia.
Being able to express better and at a higher level parallelization or
distribution tasks (higher than MPI anyway) would be nice. Spark is nice
that way in the data science space but it cannot run in the same
space/hardware as traditional HPC approaches, sadly.

On Mon, Oct 12, 2020 at 8:15 PM Jo-To Schäg <johtobsch at gmail.com> wrote:

> I have some experience with Julia and can say with certainty that
> Julia is not aiming to replace MPI.
> Julia is a programming language aiming for a place in HPC and other
> development time limited computation heavy areas. Some Julia programs
> also uses MPI for internode communication. Such CliMA [1] defines it's
> own array type MPIStateArray which presents itself as a shared array
> over all machines in the cluster but it handles synchronization ghost
> elements so local stencil can use up to date data from their
> neighbours machine and all that while hiding behind the interface of
> an array data structure.
> However Julia also has other means of internode communication, such as
> a channel primitive that can send arbitrary data structure. So i see
> where to confusion might come from.
> Here is an instruction on how to use Julia on MITs Satori Cluster [2]
> from that and the surrounding excitement of Julia. I think Julia is
> slowly growing more popular in Super Computing space. Although the
> number of projects running on large scale cluster can probably be
> still counted on one hand.
> I am aware of the following projects using Julia at Cluster Scale:
>  - Celeste [3]
>  - CLiMA
>  - DSGE model of the FED [4]
>  - model informed drug development [5]
>
> If someone is interested in learning Julia, a good place to come into
> contact with the community is the Julia Slack.
>
> Sincerely,
> Johann-Tobias Schäg
>
> [1] https://github.com/CliMA/ClimateMachine.jl
> [2] https://mit-satori.github.io/satori-julia.html#getting-started
> [3] https://www.youtube.com/watch?v=uecdcADM3hY
> [4] https://github.com/FRBNY-DSGE/DSGE.jl
> [5] https://juliacomputing.com/case-studies/pfizer.html
>
>
> On Mon, 12 Oct 2020 at 21:20, Prentice Bisbal via Beowulf
> <beowulf at beowulf.org> wrote:
> >
> > I'm not an expert on Big Data at all, but I hear the phrase "Hadoop"
> less and less these days. Where I work, most data analysts are using R,
> Python, or Spark in the form of PySpark. For machine learning, most of the
> researchers I support are using Python tools like TensorFlow or PyTorch.
> >
> > I don't know much about Julia replacing MPI, etc., but I wish I did. I
> would like to know more about Julia.
> >
> > Prentice
> >
> > On 10/12/20 12:14 PM, Oddo Da wrote:
> >
> > Hello,
> >
> > I used to be in HPC back when we built beowulf clusters by hand ;) and
> wrote code in C/pthreads, PVM and MPI and back when anyone could walk into
> fields like bioinformatics, all that was needed was a pulse, some C and
> Perl and a desire to do ;-). Then I left for the private sector and
> stumbled into "big data" some years later - I wrote a lot of code in Spark
> and Scala, worked in infrastructure to support it etc.
> >
> > Then I went back (in 2017) to HPC. I was surprised to find that not much
> has changed - researchers and grad students still write code in MPI and
> C/C++ and maybe some Python or R for visualization or localized data
> analytics. I also noticed that it was not easy to "marry" things like big
> data with HPC clusters - tools like Spark/Hadoop do not really have the
> same underlying infrastructure assumptions as do things like
> MPI/supercomputers. However, I find it wasteful for a university to run
> separate clusters to support a data science/big data load vs traditional
> HPC.
> >
> > I then stumbled upon languages like Julia - I like its approach, code is
> data, visualization is easy, decent ML/DS tooling.
> >
> > How does it fare on a traditional HCP cluster? Are people using it to
> substitute their MPI loads? On the opposite side, has it caught up to Spark
> in terms of DS/ML quality of offering? In other words, can it be used as a
> one fell swoop unifying substitute for both opposing approaches?
> >
> > I realize that many people have already committed to certain
> tech/paradigms but this is mostly educational debt (if MPI or Spark on the
> other side is working for me, why go to something different?) - but is
> there anything substantial stopping new people with no debt starting out in
> a different approach (offerings like Julia)?
> >
> > I do not have too much experience with Julia (and hence may be barking
> at the wrong tree) - in that case I am wondering what people are doing to
> "marry" the loads of traditional HPC with "big data" as practiced by the
> commercial/industry entities on a single underlying hardware offering. I
> know there are things like Twister2 but it is unclear to me (from cursory
> examination) what it actually offers in the context of my questions above.
> >
> > Any input, corrections, schooling me etc. are appreciated.
> >
> > Thank you!
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >
> > --
> > Prentice Bisbal
> > Lead Software Engineer
> > Research Computing
> > Princeton Plasma Physics Laboratory
> > http://www.pppl.gov
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20201012/01ca14a9/attachment-0001.html>


More information about the Beowulf mailing list