[Beowulf] experience with HPC running on OpenStack [EXT]

Tim Cutts tjrc at sanger.ac.uk
Wed Jul 1 04:13:05 PDT 2020


Here, we deploy some clusters on OpenStack, and some traditionally as bare metal.   Our largest cluster is actually a mixture of both, so we can dynamically expand it from the OpenStack service when needed.

Our aim eventually is to use OpenStack as a common deployment layer, even for the bare metal cluster nodes, but we’re not quite there yet.

The main motivation for this was to have a common hardware and deployment platform, and have flexibility for VM and batch workloads.  We have needed to dynamically change workloads (for example in the current COVID-19 crisis, our human sequencing has largely stopped and we’ve been predominantly COVID-19 sequencing, using an imported pipeline from the consortium we’re part of).  Using OpenStack we could get that new pipeline running in under a week, and later moved it from the research to the production environment, reallocating research resources back to their normal workload.

There certainly are downsides; OpenStack is a considerable layer of complexity, and we have had occasional issues, although those rarely affect established running VMs (such as batch clusters).  Those occasional problems are usually in the services for dynamically creating and destroying resources, so they don’t have immediate impact on batch clusters.  Plus, we tend to use fairly static provider networks to connect the Lustre systems to virtual clusters, which removes another layer of OpenStack complexity.

Generally speaking it’s working pretty well, and we have uptimes of in excess of 99.5%

Tim

On 1 Jul 2020, at 05:09, John Hearns <hearnsj at gmail.com<mailto:hearnsj at gmail.com>> wrote:

Jorg, I would back up what Matt Wallis says. What benefits would Openstack bring you ?
Do you need to set up a flexible infrastructure where clusters can be created on demand for specific projects?

Regarding Infiniband the concept is SR-IOV. This article is worth reading:
https://docs.openstack.org/neutron/pike/admin/config-sriov.html [docs.openstack.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_neutron_pike_admin_config-2Dsriov.html&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50&m=T0asmfOta_bLT2cXWrpERYigde5lOqHx2vVIH2WSIOw&s=VMHyCkd1eb1ztnzu4i617zrYxnddfDUUEkn1u45xQq0&e=>

I would take a step back and look at your storage technology and which is the best one to be going forward with.
Also look at the proceeding sof the last STFC Computing Insights where Martyn Guest presented  a lot of
benchmarking results   on AMD Rome
Page 103 onwards in this report
http://purl.org/net/epubs/manifestation/46387165/DL-CONF-2020-001.pdf [purl.org]<https://urldefense.proofpoint.com/v2/url?u=http-3A__purl.org_net_epubs_manifestation_46387165_DL-2DCONF-2D2020-2D001.pdf&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50&m=T0asmfOta_bLT2cXWrpERYigde5lOqHx2vVIH2WSIOw&s=GNtI2S6yacqAS4bpUYbfq4bDe8nv9gXksMXaqCqgbro&e=>




On Tue, 30 Jun 2020 at 12:21, Jörg Saßmannshausen <sassy-work at sassy.formativ.net<mailto:sassy-work at sassy.formativ.net>> wrote:
Dear all,

we are currently planning a new cluster and this time around the idea was to
use OpenStack for the HPC part of the cluster as well.

I was wondering if somebody has some first hand experiences on the list here.
One of the things we currently are not so sure about it is InfiniBand (or
another low latency network connection but not ethernet): Can you run HPC jobs
on OpenStack which require more than the number of cores within a box? I am
thinking of programs like CP2K, GROMACS, NWChem (if that sounds familiar to
you) which utilise these kind of networks very well.

I cam across things like MagicCastle from Computing Canada but as far as I
understand it, they are not using it for production (yet).

Is anybody on here familiar with this?

All the best from London

Jörg



_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org<mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf [beowulf.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__beowulf.org_cgi-2Dbin_mailman_listinfo_beowulf&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50&m=T0asmfOta_bLT2cXWrpERYigde5lOqHx2vVIH2WSIOw&s=oVEwBKwlVDhzh5JPMjRBZxSAaRPRnCoMIkT-73oONAo&e=>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org<mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit https://urldefense.proofpoint.com/v2/url?u=https-3A__beowulf.org_cgi-2Dbin_mailman_listinfo_beowulf&d=DwIGaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50&m=T0asmfOta_bLT2cXWrpERYigde5lOqHx2vVIH2WSIOw&s=oVEwBKwlVDhzh5JPMjRBZxSAaRPRnCoMIkT-73oONAo&e=




-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20200701/d8a6a5e5/attachment-0001.html>


More information about the Beowulf mailing list