[Beowulf] [External] Spark, Julia, OpenMPI etc. - all in one place

Michael Di Domenico mdidomenico4 at gmail.com
Tue Oct 13 06:13:08 PDT 2020


On Tue, Oct 13, 2020 at 8:52 AM Guy Coates <guy.coates at gmail.com> wrote:
>
> Having just spent some time looking at parallelising some ML/AI workloads, it was enlightening to see that as you scratch beneath the various frameworks like pytorch or horovod, you find...MPI. And RDMA. And workloads that can quickly become IO bound.  Plus ca change...

yup, we're working on that too.  we've been watching these for a while
now as the frameworks mature.  We have a lot of models that want to
spread across multiple gpus across multiple hosts.  some of the
frameworks have been less than pleasant in accomplishing this task,
but the ones that took the mpi/gpudirect route seem to be making the
most ground


More information about the Beowulf mailing list