[Beowulf] Clustering VPS servers

Thu Mar 21 05:42:17 PDT 2013

On 03/21/2013 08:06 AM, Chris Dagdigian wrote:
> Jonathan Aquilina wrote:
>> It’s not that I need to cluster these vps’s I was just wondering if it
>> was possible. What puts me off about amazon is pricing. It seems a bit
>> pricy so to speak.
>>
> MIT StarCluster (the open source stack that builds Grid Engine clusters
> on Amazon mentioned elsewhere in this thread) is able to leverage the
> AWS Spot Market and the potential savings off of the hourly EC2 rate is
> pretty enormous. Via Spot you can run servers for pennies an hour that
> traditionally sell for dollars-per-hour on the "normal" EC2 on-demand
> service. It would be very hard to beat that price on an internal
> infrastructure if one was honest about the fully loaded facility, energy
> and staffing costs.

[slight segue into the cloud side]

Caveat emptor ... the analysis is in favor of ephermal machines and 
clusters if (expected/actual) utilization of resources is low.  In which 
case the capital costs, and TCO are more expensive than the public cloud 
(average pricing).  Not tremendously so, but noticeably.  The spot 
pricing moves the cross over point, but its still not a game changer 
such that remote is always "cheaper".

This said, there are other, very good reasons, why public cloud is not 
appropriate for everyone.  Most of the customers we've been working with 
have tried it for their work and found it lacking one aspect or 
another.  Granted they are highly specialized, with specific needs that 
a paravirtualized infrastructure makes little sense for 
(hypervirtualized makes a great deal more sense for them, but bare 
metal/silicon is optimal in performance, and their needs are driven more 
by that than other factors).

On the costing side, there is no grand conspiracy.  There are costs to 
acquire and run machines, but in many cases, thanks to serious devops 
work, sanely scoped, extremely dense, very performant, and well 
engineered systems, these costs are dropping rapidly (as are the 
revenues in this market, which is part of why the tier1 vendors are 
showing the quarterly results they are).   The cost per processor core 
(and eventually per processor cycle) are rapidly approaching asymptotes, 
where the overall cost is not the major factor in decision making.  To 
wit, look that the OPC designs from Facebook.  This effort is all about 
completely commoditizing their hardware buys.  Google has been doing 
something similar for a while.

The real questions are: to get a value of X, what investment Y is 
required, and what are the constraints Z that we must work with. For a 
large segment of the corporate world where we play, clouds are fine, as 
long as they are private, completely securely controlled, and engineered 
to handle the workloads they need.  Queuing up in line with 10000 of 
your closest friends and neighbors for a chance to bid on infrastructure 
that you need at a particular time, and you have to move a huge bolus of 
data over (think fractions to multiple PB) is simply not going to fly 
until we get 10GbE to demark as a common scenario.  Even then, data 
motion is the killer in many cases, and system/network/storage latency 
and low level performance are at least complicit in the murder of the 
external cloud concept for these folks.

This said, I do encourage folks to try out AWS, Joyent, and many others 
for cloudy bits, and see if the constraints can be worked around.   Our 
friends at Sabalcore and Penguin are doing cool things with clusters on 
demand.  And fundamentally, if you have a part time need for a cluster, 
the cluster rental or cloudy versions are likely to be a better deal for 
you, if the constraints will work.

[cliche' warning]

As I try to tell folks, your mileage may vary, there are no silver 
bullets, and if all you have is a hammer, every problem looks like a 
nail.  Public(private) clouds and infrastructure are not a panacea, and 
there are cases where the other makes more sense.  Any realistic view of 
the data around this (cost, utilization, need, etc.) and a correct 
assessment of the principle decision issues (performance/latency vs TCO 
vs data transport vs availability vs ...) is highly recommended.

[back to clustering VPS]

Honestly this never made a great deal of sense to me.  Rather than 
clustering VPSes, why not cluster bare metal JEOS kvm hypervisor built 
machines?  Not quite VMware stuff.  We are using lots of kvm (obscene 
amounts of it) in our projects across several OSes.  Linux, 
Illumos/SmartOS.  We've been tweaking tiburon to handle such kvm boots, 
so that turning on a very large cluster of virtual machines and having 
them ready would take seconds to minutes at worst.  Not half hours to 
several hours for provisioning.

VPS doesn't quite have the isolation of kvm, which is part of why I'd 
like to see that.  But kvm doesn't have great PCIe pass through (yet, 
its getting better).  VPS might be able to make better use of the resource.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/siflash
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615