[Beowulf] Diskless cluster provisioning/installation
olli-pekka.lehto at csc.fi
Sat Nov 7 04:26:18 PST 2015
Development Manager, Computing Platforms
CSC - IT Center for Science Ltd.
E-Mail: olli-pekka.lehto at csc.fi // Tel: +358 50 381 8604 // skype: oplehto // twitter: @ople
On 06 Nov 2015, at 19:15, Joe Landman <landman at scalableinformatics.com> wrote:
> On 11/04/2015 04:06 AM, tegner at renget.se wrote:
>> I need to set up a diskless cluster. Some years ago I did this using
>> Perceus, http://www.perceus.org/. Wored nicely, but since I know there
>> are several other systems around; e.g., Warewulf,
>> http://warewulf.lbl.gov/trac or
>> and Onesis, http://onesis.org/ I just wanted to check what people are
>> using these days.
>> is there one system in particular which is more actively developed? Or
>> do people tend to use different, more "basic" tools these days?
> Open source: Warewulf is your go-to system. Does clustering, very well. Xcat is out there and in use (actually Xcat2 from what I see/hear). Onesis was neat when I played with it a while ago, but I am not sure how up to date it is.
> Closed source: Bright's system, probably a few others
> We took the track of developing our own, as we needed much greater flexibility than cluster distros allowed. This was also due, in part, to linux distros and their (famously) broken init/startup bits, not to mention how they handle networking, and many other things. Don't get me started on their kernels.
Yeah.. At some point you tend to hit some sort of ceiling with the cluster distros and start to work around the limitations manually. At some point you then reach a point where you wonder what was the point of going with a cluster distro anyway.
Warewulf has worked quite well as it does the basic stuff pretty well and then gets out of the way. However evan that has had it’s limitations and idiosyncracies.
Now we’re working on a new stack which will likely consists of something like:
- Ansible for config management of the base system
- EasyBuild+lmod for config management of the user software stack
- collectd+Graphite+Grafana for monitoring
- ELK for log analytics and correlation
There are quite a lot of work-in-progress repos related to this in our GitHub: https://github.com/CSC-IT-Center-for-Science
Provisioning is a bit of an open issue. It would be nice to have something that the flexibility to work not only for HPC but also for general-purpose server deployments. A simple Kickstart+PXE is a starting poiint now there’s some promising tools coming from the scale-out world such as Razor, Foreman, Crowbar etc. It would be interesting to hear if people have experience in using these, especially on a reasonable scale (~1000 nodes).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf