[Beowulf] anyone using SALT on your clusters?

Mark Hahn hahn at mcmaster.ca
Tue Jul 2 07:28:03 PDT 2013


> For me, this results in the practical difference that the pub-sub model
> means that the agent has the ability to subscribe to the messages and is
> therefore alive - and that therefore the list of live hosts is always
> current.

I don't understand why that would be the case.  any pub-sub model must
internally have some sort of membership list which it believes to be 
current/live, but it cannot possibly know until it receives some sort 
of response from the host.  even then, it's unknown whether that response
really means enough liveness to execute the devops command you're pushing.

in other words, to probe for whether a node can execute an op, you really
have to try to execute the op.  which means you WILL STILL have to deal with
semi-byzantine failures through timeouts, etc.

which is why I use ssh to mass-admin.  let's be honest, handling timeouts 
is not magic.  ssh is also nicely decoupled, and has excellent ways to 
robustly express asymmetric trust, etc.

also, to me, integration is the devil's playground.  it's easy to pitch
that integration will make life easier, but except in fairly specific
conditions, it also leads to tighter coupling, fragility, inflexibility.

in a sense, the issue here is a failure to tool-build.  for instance,
if it were really a big deal, we could have a standard infrastructure
for collecting node status information.  "standard" in the sense of 
IETF RFC.  all sources of node info on your system could feed into it,
and you might chose, eg, a Bayesian mechanism to make predictions about
whether a particular node will successfully perform a particular op.
(for instance, interconnect fabrics often have a realtime measure of 
whether a node is up, for their definition of up.  similarly, service 
nodes (say, NTP) can often provide a last-seen timestamp.  nodes might 
also run endogenous beacons (say, ganglia, etc).  it's a bit curious 
that this hasn't (AFAIK) been done before in much generality.  anyone?

regards, mark hahn.


More information about the Beowulf mailing list