[Beowulf] anyone using SALT on your clusters?

Thu Jun 27 15:53:46 PDT 2013

On 06/25/2013 04:55 PM, Paul English wrote:

[...]

>
> For shuffling data around and the equivalent of what Salt's original
> purpose in life was, we use prsync and pssh. They work very well. We can

To this day we still prefer pdsh from LANL.  Its (IMO) the best of the 
lot, and we leverage it extensively in tiburon.

> use them with canned lists of hosts (generated by cf-engine for this
> purpose eg: all hosts of type X), with ssh keys etc. I suspect they are
> slower and and perhaps less "something" (scalable perhaps? we are a
> relatively small site in HPC terms) than Salt's ZMQ based approach. But
> they've been good enough for what we've done so far.

We'd been exploring message bus based scenarios for monitoring portions 
of tiburon.  Honestly, we are returning to baseline per-node servers 
running either nodejs or perl (Mojolicious) based code.

There are still a few folks out there using multicast and other 
technologies on the monitoring side, though these are very bad to use 
for low latency networks.  We also do see quite a few folks doing a push 
to a central engine for monitoring.

Honestly configuration management is largely a moot point for 
image/remote boot, and an annoying necessity for local boot/management.

[opinion]

After decades doing this stuff, I've come to the realization that pxe 
booting (when architected correctly) scales wonderfully well, while 
installation and configuration management is effectively a tax on resources.

Building stateful systems (installed OSes) make sense in a limited 
number of contexts.  They make almost no sense for many cluster and 
cloud scenarios.

[/opinion]

> I would suggest some caution when approaching Salt - which we did when
> we were considering what to do after chef. While Salt seems to be an
> exceptional approach to "do a bunch of things on a bunch of hosts," AND
> it is in python (win!), it does seem that the configuration management

... some of us don't quite see language of implementation as a win or 
loss, with a notable exception (java)

> part is an add-on and/or afterthought. Yes - configuration management
> does involve lots of doing lots of things on lots of hosts. But
> cf-engine is now in it's third iteration of "what does that _mean_ in
> real terms - with tons of different configuration 'languages', files,
> daemons, restart services etc.. even only on Linux?"

A big chunk of what you write about are best handled by a monitoring 
system as compared to a configuration management system.

If you could completely eliminate the "install OS, run configure scripts 
on it" section of startup, would you?  This isn't a sales pitch, its a 
genuine question.

Chef and puppet are popular in some circles.  They handle some things 
nicely, though writing python objects as configuration is IMO a broken 
design by default.  Ruby is a cool fad-dy language, but debugging it is 
... interesting.  Almost as much fun as debugging some other languages.

At the end of the day, all of these things are about automation, work 
reduction, scaling, maintainability.  I am no fan of things that make 
maintenance hard(er).  I am a huge fan of positive automation benefits. 
  This is why the stateless OS systems scale so well, and are so easy to 
maintain.  If you ever have a problem with a unit, you reboot it.  No 
reloading required.  Just reboot it.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/siflash
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615