[Beowulf] experience with HPC running on OpenStack

Wed Jul 1 00:06:27 PDT 2020

At Job$1 we run multiple clusters on top on openstack. We are a very
interactive HPC shop and it really helps to deliver things that we couldn't
easily do any other way. The cgroups side of things is used pretty heavily
but it doesn't always address contention in the way a dedicated VM can. Our
networks are hardware passthrough ROCE and work without major issue
typically. We did have a whole assortment of issues with undocumented
"features" at the beginning but it is all quite mature now with MPI working
with no issues. It probably only really makes sense if you already have
cloud admins looking after the hardware side and the HPC admins look after
everything else. If you had to do both then I'd argue that's not an
efficient use of people time. If you want to talk about it in more depth
just let me know?

Cheers,

Lance
--
Dr Lance Wilson
Characterisation Virtual Laboratory (CVL) Coordinator &
Senior HPC Consultant
Ph: 03 99055942 (+61 3 99055942)
Mobile: 0437414123 (+61 4 3741 4123)
Multi-modal Australian ScienceS Imaging and Visualisation Environment
(www.massive.org.au)
Monash University

On Wed, 1 Jul 2020 at 15:05, Chris Samuel <chris at csamuel.org> wrote:

> On 29/6/20 5:09 pm, Jörg Saßmannshausen wrote:
>
> > we are currently planning a new cluster and this time around the idea
> was to
> > use OpenStack for the HPC part of the cluster as well.
> >
> > I was wondering if somebody has some first hand experiences on the list
> here.
>
> At $JOB-2 I helped a group set up a cluster on OpenStack (they were
> resource constrained, they had access to OpenStack nodes and that was
> it).  In my experience it was just another added layer of complexity for
> no added benefit and resulted in a number of outages due to failures in
> the OpenStack layers underneath.
>
> Given that Slurm which was being used there already had mature cgroups
> support there really was no advantage to them to having a layer of
> virtualisation on top of the hardware, especially as (if I'm remembering
> properly) in the early days the virtualisation layer didn't properly
> understand the Intel CPUs we had and so didn't reflect the correct
> capabilities to the VM.
>
> All that said, these days it's likely improved, and I know then people
> were thinking about OpenStack "Ironic" which was a way for it to manage
> bare metal nodes.
>
> But I do know the folks in question eventually managed to go to purely a
> bare metal solution and seemed a lot happier for it.
>
> As for IB, I suspect that depends on the capabilities of your
> virtualisation layer, but I do believe that is quite possible. This
> cluster didn't have IB (when they started getting bare metal nodes they
> went RoCE instead).
>
> All the best,
> Chris
> --
>   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20200701/b02ac259/attachment.html>