[Beowulf] experience with HPC running on OpenStack

Jörg Saßmannshausen sassy-work at sassy.formativ.net
Wed Jul 8 02:28:25 PDT 2020


Hi Chris,

thanks for your sentiments. Like most things, you have two sides of a coin: 
flexibility, which is what we all want, and complexity, which is the price you 
need to pay for the flexibility. 

This is why I thought it is best to ask the community for first hand 
experiences. One thing we also want to address is the use of GPUs, so we can 
use them a bit more efficient than we seem to do right now. 

Regards

Jörg


Am Mittwoch, 1. Juli 2020, 06:05:11 BST schrieb Chris Samuel:
> On 29/6/20 5:09 pm, Jörg Saßmannshausen wrote:
> > we are currently planning a new cluster and this time around the idea was
> > to use OpenStack for the HPC part of the cluster as well.
> > 
> > I was wondering if somebody has some first hand experiences on the list
> > here.
> At $JOB-2 I helped a group set up a cluster on OpenStack (they were
> resource constrained, they had access to OpenStack nodes and that was
> it).  In my experience it was just another added layer of complexity for
> no added benefit and resulted in a number of outages due to failures in
> the OpenStack layers underneath.
> 
> Given that Slurm which was being used there already had mature cgroups
> support there really was no advantage to them to having a layer of
> virtualisation on top of the hardware, especially as (if I'm remembering
> properly) in the early days the virtualisation layer didn't properly
> understand the Intel CPUs we had and so didn't reflect the correct
> capabilities to the VM.
> 
> All that said, these days it's likely improved, and I know then people
> were thinking about OpenStack "Ironic" which was a way for it to manage
> bare metal nodes.
> 
> But I do know the folks in question eventually managed to go to purely a
> bare metal solution and seemed a lot happier for it.
> 
> As for IB, I suspect that depends on the capabilities of your
> virtualisation layer, but I do believe that is quite possible. This
> cluster didn't have IB (when they started getting bare metal nodes they
> went RoCE instead).
> 
> All the best,
> Chris





More information about the Beowulf mailing list