[Beowulf] General cluster management tools - Re: Southampton engineers a Raspberry Pi Supercomputer

Sun Sep 16 12:48:17 PDT 2012

On 09/16/2012 02:08 PM, Andrew Holway wrote:
>> With regards to risk perception, I am still blown away at some of the
>> conversations I have with prospective customers who, still to this day,
>> insist that "larger company == less risk".  This is demonstrably false.
>>
>> A company with open products (real open products), open software stacks,
>> ... that lowers risks.  With the closed, vendor lock-in stacks, you
>> increase risk.  And this is, perversely, what is called "lowering risk"
>> by people looking for an excuse to go with the larger company.
>
> This is also demonstrably false. Just because cluster vendor A is
> using a completely open source stack does not mean that you have any
> less risk then Cluster Vendor B with their proprietary closed source
> stack.

Risk is a function of your control over the stack against small or large 
change of business operations of one of the suppliers.  If one of the 
critical elements of your stack is completely closed, you have no 
control over that aspect, and cannot change it out without incurring 
great cost/time/effort, yes, that is, by definition, an increased risk 
versus a functionally similar part (of similar operational level and 
quality) which is completely open.

You said my thesis is demonstrably false, and I provided the simple 
argument that supports my thesis.  Your argument is ... what?  You disagree?

> I have seen Rocks clusters that are an utter bag of shieße because the
> people deploying it had no clue and also seen Clusters based on Bright
> et al that were perfectly executed. And vice versa for that matter.

We understand that you are currently engaged in a Bright Cluster Manager 
deployment.  We are (see company info in .sig) , for the record, a 
reseller (in the past anyway) of their tools (though we haven't sold any 
for a number of reasons that I won't get into).  Do you have a business 
relationship with them?  I see no problem advocating for them if you 
disclose your interests, though Chris S/Doug E/...  are the arbiters.

Rocks clusters are great first do-it-yourself clusters.  Its a great way 
to learn some of the basic things you need to worry about.  I don't 
necessarily consider Rocks to be a preferable kit, and with the 
university copyright bit, anyone who wants to deploy it commercially 
needs permission from the university to do so (and will owe some sort of 
license fees for this).

The source is open, but, as a very long time user/deployer/supporter of 
their kit, I can tell you that anaconda (upon which most of their work 
is attached), is an insanely fragile and dangerous platform upon which 
to develop.  They've worked hard to work around issues, but 
fundamentally, there are things so (completely) borked in anaconda, its 
better to simply minimize your time in it at all costs and perform 
installs after it completes.

Unfortunately, Rocks is so closely tied in to anaconda that defects and 
design failures in the latter, negatively, IMO, impact the former.  This 
said, there are many Rocks users out there that don't need/want anything 
more complex than what they have to offer.  Some are on this list.

More to the point, this is a straw man anecdotal argument you make. 
I've seen very crappy ... insanely crappy commercial code deployments 
and excellent open source deployments in identical situations, and vice 
versa.  Doesn't mean much the way you've stated it.

Bright offers some cool features, and we thought we would use it with 
our cluster customers.  Alas, it did not support what our customers 
requested, and Bright wasn't interested in adding it (which is fine, 
they had perfectly good business reasons not to), so we used our own 
tools to handle it.  And for the record, our tools (Tiburon's core 
functionality) is completely open.  Its written in Perl, and if we were 
hit by a bus in a physical or metaphorical sense, our customers could 
continue to get support by paying someone to do this.

Could customers of Bright Cluster Manager get that same support if 
Mattjis and team decided to become anglers for a living?  No?  Why not? 
  Doesn't this indicate ... risk?

This is not to diminsh Bright or Mattjis and team.  The product is very 
good, and if I didn't indicate agreement with your previous posts 
extolling its virtues, then that was my omission.  It is very good. 
Point and clicky.  A cli for those who want.  Many good things built in. 
  But there is an inherent dependency upon a single company for a 
critical function in a system.  This is, potentially, a single point of 
failure for the system should they decide to do something else.

FWIW: I normally recommend it to more Windows-y admin types who like 
pointy-clicky for cluster admin.  Painless setup for them.  They are 
deep in their comfort zone.

This is why there is such an active 3rd party market for some of our 
(former) competitors old storage units ... parts really.  The company no 
longer makes them, no longer supplies parts, and they haven't quite 
(yet) made the decision that their risk reduction involves kicking the 
non-open units to the curb (that or, like many many who insist large 
company == lower risk, they haven't quite internalized that it is almost 
exactly the opposite that is true).

> The risk is with the people doing the deployment and the choices that
> the customer makes. It just takes some bad memory or a batch of bad
> motherboards and the whole project goes to crap as trust is lost.

No ... the risk is in the long term support.  Deployment risk is fairly 
trivial to manage for reasonable size installations, and the issues you 
cite would, for a reputable vendor, never see the light of day in front 
of the customer.  The rack-n-stack shops don't do much of this 
amelioration, but the people with a clue burn stuff in hard.  The beat 
the heck out of the machines before they ship, with the idea that if you 
ship something that is known to work at the outset, it becomes *much* 
easier to debug in the field.

We do this with our storage and, when we build them, clusters. 
Customers occasionally yell at us over the additional delay we tell them 
about after we encounter a failure under our load tests, but they get 
good stuff that is known to work under fairly heavy loads.

> Please save these arguments for the Richard Stallman appreciation society.

And here I thought we were having a nice discussion, and you slip in a 
little snide comment like this.  Sad.

Just remember, archives are forever, and some of your potential future 
customers/employers will be googling for you when you claim expertise in 
some area or the other.  Comments like this reveal something of you as a 
person, add nothing to your reputation, and increase risk associated 
with your "brand."  You might want to consider that carefully before you 
respond.

Lets keep the discussion reasonable and polite.  Keep the S/N high.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615