[Beowulf] RoCE vs. InfiniBand

John Hearns hearnsj at gmail.com
Thu Nov 26 11:41:38 UTC 2020


Jorg, I think I might know where the Lustre storage is !
It is possible to install storage routers, so you could route between
ethernet and infiniband.
It is also worth saying that Mellanox have Metro Infiniband switches -
though I do not think they go as far as the west of London!

Seriously though , you ask about RoCE. I will stick my neck out and say
yes, if you are planning an Openstack cluster
with the intention of having mixed AI and 'traditional' HPC workloads I
would go for a RoCE style setup.
In fact I am on a discussion about a new project for a customer with
similar aims in an hours time.

I could get some benchmarking time if you want to do a direct comparison of
Gromacs on IB / RoCE









On Thu, 26 Nov 2020 at 11:14, Jörg Saßmannshausen <
sassy-work at sassy.formativ.net> wrote:

> Dear all,
>
> as the DNS problems have been solve (many thanks for doing this!), I was
> wondering if people on the list have some experiences with this question:
>
> We are currently in the process to purchase a new cluster and we want to
> use
> OpenStack for the whole management of the cluster. Part of the cluster
> will
> run HPC applications like GROMACS for example, other parts typical
> OpenStack
> applications like VM. We also are implementing a Data Safe Haven for the
> more
> sensitive data we are aiming to process. Of course, we want to have a
> decent
> size GPU partition as well!
>
> Now, traditionally I would say that we are going for InfiniBand. However,
> for
> reasons I don't want to go into right now, our existing file storage
> (Lustre)
> will be in a different location. Thus, we decided to go for RoCE for the
> file
> storage and InfiniBand for the HPC applications.
>
> The point I am struggling is to understand if this is really the best of
> the
> solution or given that we are not building a 100k node cluster, we could
> use
> RoCE for the few nodes which are doing parallel, read MPI, jobs too.
> I have a nagging feeling that I am missing something if we are moving to
> pure
> RoCE and ditch the InfiniBand. We got a mixed workload, from ML/AI to MPI
> applications like GROMACS to pipelines like they are used in the
> bioinformatic
> corner. We are not planning to partition the GPUs, the current design
> model is
> to have only 2 GPUs in a chassis.
> So, is there something I am missing or is the stomach feeling I have
> really a
> lust for some sushi? :-)
>
> Thanks for your sentiments here, much welcome!
>
> All the best from a dull London
>
> Jörg
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20201126/a9178f66/attachment.htm>


More information about the Beowulf mailing list