[Beowulf] Build Recommendations - Private Cluster

Richard Edwards ejb at fastmail.fm
Wed Aug 21 15:00:41 PDT 2019


Hi Everyone

Thank you all for the feedback and insights.

So I am starting to see a pattern. Some combination of CentOS + Ansible + OpenHPC + SLURM + Old CUDA/Nvidia Drivers ;-). 

Sean thank you for those links they will certainly accelerate the journey. (Note to anyone looking you need to remove the “:” at the end of the link else you will get a 404)

Finally, yes I am very aware that the hardware is long in the tooth but it is what I have for the time being. Once my capability out strips the capability of the hardware then I am bound to upgrade. At that point I plan to have a manageable cluster that I can add/remove/upgrade at will :-).

Thanks again to everyone for the responses and insights. Will let you all know how I go over the coming weeks.

Cheers

Richard

> On 22 Aug 2019, at 1:26 am, Sean McGrath <smcgrat at tchpc.tcd.ie> wrote:
> 
> Hi guys,
> 
> I was on the Programme Committee for the HPC Systems Professionals
> Workshop, HPCSYSPROS18 at Super Computing last year, 
> http://sighpc-syspros.org/workshops/2018/index.php.html.
> 
> A couple of the submissions I reviewed may be of interest here.
> 
> (1) Rapid Deployment of Bare-Metal and In-Container HPC Clusters Using
> OpenHPC playbooks.
> 
> This was presented. It is essentially a set of ansible playbooks to get
> a cluster up and running as quickly as possible
> 
> From their github, https://github.com/XSEDE/CRI_XCBC:
> 
> "This repo will get you to the point of a working slurm installation
> across your cluster. It does not currently provide any scientific
> software or user management options!
> 
> The basic usage is to set up the master node with the initial 3 roles
> (pre_ohpc,ohpc_install,ohpc_config) and use the rest to build node
> images, and deploy the actual nodes (these use Warewulf as a
> provisioner by default)."
> 
> (2) clusterworks - was not presented at HPCSYSPROS18, it lost out to
> the above marginally but is very similar to the first one above. From
> their https://github.com/clusterworks/inception:
> 
> "clusterworks is a toolkit that brings together the best modern
> technologies in order to create fast and flexible turn-key HPC
> environments, deployable on bare-metal infrastructure or in the cloud"
> 
> They may be of some use here. Instead of having to start everything
> from scratch you can build on top of those foundations. I don't know
> how current those projects are or if they are still being developed
> though.
> 
> Sean
> 
> 
> On Wed, Aug 21, 2019 at 10:27:41AM -0400, Alexander Antoniades wrote:
> 
>> We have been building out a cluster based on commodity servers (mainly
>> Gigabyte motherboards) with 8x1080ti/2080ti per server.
>> 
>> We are using a combination of OpenHPC compiled tools and Ansible. I would
>> recommend using the OpenHPC software so you don't have to deal with
>> figuring out what versions of the tools you need to get and manually
>> building them, but I would not go down their prescribed way for building a
>> cluster with base images and all for a small heterogeneous cluster. I would
>> just build the machines as consistently as they can and then use the
>> OpenHPC versions of programs where needed and augment the management with
>> something like ansible or even pdsh.
>> 
>> Also unless you're really just doing this an exercise to kill time on
>> weekends, or you literally have no money and can get free power/cooling, I
>> would really consider looking at what other more modern hardware is
>> available, or at least benchmark your system against a sample cloud system
>> if you really want to learn GPU computing.
>> 
>> Thanks,
>> 
>> Sander
>> 
>> On Wed, Aug 21, 2019 at 1:56 AM Richard Edwards <ejb at fastmail.fm> wrote:
>> 
>>> Hi John
>>> 
>>> No doom and gloom.
>>> 
>>> It's in a purpose built workshop/computer room that I have; 42U Rack,
>>> cross draft cooling which is sufficient and 32AMP Power into the PDU???s. The
>>> equipment is housed in the 42U Rack along with a variety of other machines
>>> such as Sun Enterprise 4000 and a 30 CPU Transputer Cluster. None of it
>>> runs 24/7 and not all of it is on at the same time, mainly because of the
>>> cost of power :-/
>>> 
>>> Yeah the Tesla 1070???s scream like a banshee???..
>>> 
>>> I am planning on running it as power on, on demand setup, which I already
>>> do through some HP iLo and APC PDU Scripts that I have for these machines.
>>> Until recently I have been running some of them as a vSphere cluster and
>>> others as standalone CUDA machines.
>>> 
>>> So that???s one vote for OpenHPC.
>>> 
>>> Cheers
>>> 
>>> Richard
>>> 
>>> On 21 Aug 2019, at 3:45 pm, John Hearns via Beowulf <beowulf at beowulf.org>
>>> wrote:
>>> 
>>> Add up the power consumption for each of those servers. If you plan on
>>> installing this in a domestic house or indeed in a normal office
>>> environment you probably wont have enough amperage in the circuit you
>>> intend to power it from.
>>> Sorry to be all doom and gloom.
>>> Also this setup will make a great deal of noise. If in a domestic setting
>>> put it in the garage.
>>> In an office setting the obvious place is a comms room but be careful
>>> about the ventilation.
>>> Office comms rooms often have a single wall mounted air conditioning unit.
>>> Make SURE to run a temperature shutdown script.
>>> This air con unit WILL fail over a weekend.
>>> 
>>> Regarding the software stack I would look at OpenHPC. But that's just me.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Wed, 21 Aug 2019 at 06:09, Dmitri Chubarov <dmitri.chubarov at gmail.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> this is a very old hardware and you would have to stay with a very
>>>> outdated software stack as 1070 cards are not supported by the recent
>>>> versions of NVIDIA Drivers and old versions of NVIDIA drivers do not play
>>>> well with modern kernels and modern system libraries.Unless you are doing
>>>> this for digital preservation, consider dropping 1070s out of the equation.
>>>> 
>>>> Dmitri
>>>> 
>>>> 
>>>> On Wed, 21 Aug 2019 at 06:46, Richard Edwards <ejb at fastmail.fm> wrote:
>>>> 
>>>>> Hi Folks
>>>>> 
>>>>> So about to build a new personal GPU enabled cluster and am looking for
>>>>> peoples thoughts on distribution and management tools.
>>>>> 
>>>>> Hardware that I have available for the build
>>>>> - HP Proliant DL380/360 - mix of G5/G6
>>>>> - HP Proliant SL6500 with 8 GPU
>>>>> - HP Proliant DL580 - G7 + 2x K20x GPU
>>>>> -3x Nvidia Tesla 1070 (4 GPU per unit)
>>>>> 
>>>>> Appreciate people insights/thoughts
>>>>> 
>>>>> Regards
>>>>> 
>>>>> Richard
>>>>> _______________________________________________
>>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>>>> To change your subscription (digest mode or unsubscribe) visit
>>>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>>> 
>>>> _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>>> To change your subscription (digest mode or unsubscribe) visit
>>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>> 
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>> 
>>> 
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>> 
> 
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> 
> 
> -- 
> Sean McGrath M.Sc
> 
> Systems Administrator
> Trinity Centre for High Performance and Research Computing
> Trinity College Dublin
> 
> sean.mcgrath at tchpc.tcd.ie
> 
> https://www.tcd.ie/
> https://www.tchpc.tcd.ie/
> 
> +353 (0) 1 896 3725
> 



More information about the Beowulf mailing list