[Beowulf] Build Recommendations - Private Cluster

Sean McGrath smcgrat at tchpc.tcd.ie
Wed Aug 21 08:26:56 PDT 2019


Hi guys,

I was on the Programme Committee for the HPC Systems Professionals
Workshop, HPCSYSPROS18 at Super Computing last year, 
http://sighpc-syspros.org/workshops/2018/index.php.html.

A couple of the submissions I reviewed may be of interest here.

(1) Rapid Deployment of Bare-Metal and In-Container HPC Clusters Using
OpenHPC playbooks.

This was presented. It is essentially a set of ansible playbooks to get
a cluster up and running as quickly as possible

>From their github, https://github.com/XSEDE/CRI_XCBC:

"This repo will get you to the point of a working slurm installation
across your cluster. It does not currently provide any scientific
software or user management options!

The basic usage is to set up the master node with the initial 3 roles
(pre_ohpc,ohpc_install,ohpc_config) and use the rest to build node
images, and deploy the actual nodes (these use Warewulf as a
provisioner by default)."

(2) clusterworks - was not presented at HPCSYSPROS18, it lost out to
the above marginally but is very similar to the first one above. From
their https://github.com/clusterworks/inception:

"clusterworks is a toolkit that brings together the best modern
technologies in order to create fast and flexible turn-key HPC
environments, deployable on bare-metal infrastructure or in the cloud"

They may be of some use here. Instead of having to start everything
from scratch you can build on top of those foundations. I don't know
how current those projects are or if they are still being developed
though.

Sean


On Wed, Aug 21, 2019 at 10:27:41AM -0400, Alexander Antoniades wrote:

> We have been building out a cluster based on commodity servers (mainly
> Gigabyte motherboards) with 8x1080ti/2080ti per server.
> 
> We are using a combination of OpenHPC compiled tools and Ansible. I would
> recommend using the OpenHPC software so you don't have to deal with
> figuring out what versions of the tools you need to get and manually
> building them, but I would not go down their prescribed way for building a
> cluster with base images and all for a small heterogeneous cluster. I would
> just build the machines as consistently as they can and then use the
> OpenHPC versions of programs where needed and augment the management with
> something like ansible or even pdsh.
> 
> Also unless you're really just doing this an exercise to kill time on
> weekends, or you literally have no money and can get free power/cooling, I
> would really consider looking at what other more modern hardware is
> available, or at least benchmark your system against a sample cloud system
> if you really want to learn GPU computing.
> 
> Thanks,
> 
> Sander
> 
> On Wed, Aug 21, 2019 at 1:56 AM Richard Edwards <ejb at fastmail.fm> wrote:
> 
> > Hi John
> >
> > No doom and gloom.
> >
> > It's in a purpose built workshop/computer room that I have; 42U Rack,
> > cross draft cooling which is sufficient and 32AMP Power into the PDU???s. The
> > equipment is housed in the 42U Rack along with a variety of other machines
> > such as Sun Enterprise 4000 and a 30 CPU Transputer Cluster. None of it
> > runs 24/7 and not all of it is on at the same time, mainly because of the
> > cost of power :-/
> >
> > Yeah the Tesla 1070???s scream like a banshee???..
> >
> > I am planning on running it as power on, on demand setup, which I already
> > do through some HP iLo and APC PDU Scripts that I have for these machines.
> > Until recently I have been running some of them as a vSphere cluster and
> > others as standalone CUDA machines.
> >
> > So that???s one vote for OpenHPC.
> >
> > Cheers
> >
> > Richard
> >
> > On 21 Aug 2019, at 3:45 pm, John Hearns via Beowulf <beowulf at beowulf.org>
> > wrote:
> >
> > Add up the power consumption for each of those servers. If you plan on
> > installing this in a domestic house or indeed in a normal office
> > environment you probably wont have enough amperage in the circuit you
> > intend to power it from.
> > Sorry to be all doom and gloom.
> > Also this setup will make a great deal of noise. If in a domestic setting
> > put it in the garage.
> > In an office setting the obvious place is a comms room but be careful
> > about the ventilation.
> > Office comms rooms often have a single wall mounted air conditioning unit.
> > Make SURE to run a temperature shutdown script.
> > This air con unit WILL fail over a weekend.
> >
> > Regarding the software stack I would look at OpenHPC. But that's just me.
> >
> >
> >
> >
> >
> > On Wed, 21 Aug 2019 at 06:09, Dmitri Chubarov <dmitri.chubarov at gmail.com>
> > wrote:
> >
> >> Hi,
> >> this is a very old hardware and you would have to stay with a very
> >> outdated software stack as 1070 cards are not supported by the recent
> >> versions of NVIDIA Drivers and old versions of NVIDIA drivers do not play
> >> well with modern kernels and modern system libraries.Unless you are doing
> >> this for digital preservation, consider dropping 1070s out of the equation.
> >>
> >> Dmitri
> >>
> >>
> >> On Wed, 21 Aug 2019 at 06:46, Richard Edwards <ejb at fastmail.fm> wrote:
> >>
> >>> Hi Folks
> >>>
> >>> So about to build a new personal GPU enabled cluster and am looking for
> >>> peoples thoughts on distribution and management tools.
> >>>
> >>> Hardware that I have available for the build
> >>> - HP Proliant DL380/360 - mix of G5/G6
> >>> - HP Proliant SL6500 with 8 GPU
> >>> - HP Proliant DL580 - G7 + 2x K20x GPU
> >>> -3x Nvidia Tesla 1070 (4 GPU per unit)
> >>>
> >>> Appreciate people insights/thoughts
> >>>
> >>> Regards
> >>>
> >>> Richard
> >>> _______________________________________________
> >>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> >>> To change your subscription (digest mode or unsubscribe) visit
> >>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >>>
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> >> To change your subscription (digest mode or unsubscribe) visit
> >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >>
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >

> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


-- 
Sean McGrath M.Sc

Systems Administrator
Trinity Centre for High Performance and Research Computing
Trinity College Dublin

sean.mcgrath at tchpc.tcd.ie

https://www.tcd.ie/
https://www.tchpc.tcd.ie/

+353 (0) 1 896 3725



More information about the Beowulf mailing list