[Beowulf] Small Distributed Clusters
ajt at rri.sari.ac.uk
Fri Jul 4 04:10:26 PDT 2008
Ian Pascoe wrote:
> Hi all,
> Firstly before getting into the nitty gritty of my question, a bit of
> Myself and a friend are looking to set up initially two small clusters of 4
> boxes each, using old surplus commodity hardware. The main purpose of the
> cluster is to hold data and perform calculations upon it - the data coming
> in from external sources.
> So far we've decided on a Ubuntu Server base with NFS linking the nodes
> together, and we're looking currently at how to perform the calculations -
> ie write our own software or adapt existing.
Sharing files locally via NFS using UDP is fine, but although you can do
NFS via TCP it's not recommended because it's an insecure protocol. You
can tunnel it, but you might as well use "sshfs", which is what I do.
> However, the question I have relates to linking the two clusters together.
> For the majority of the time, they will be run automonously, but on
> occasions we believe they'll need to be run as a cohesive unit with jobs
> being passed between them, because we don't plan to duplicate the data
> across the clusters, but back up locally.
I suggest you have a look at "dsh" (Dancer's distributed shell) as a
simple way to run programs across local and geographically separate
nodes in your cluster. This is very simple, but works remarkably well,
especially if you use SSH keys for password-less authentication.
> Both will be connected to the Internet using ADSL and the limitation will be
> the upload speed of a maximum of 512Kbs.
Another issue, apart from the 'A' (Assymetric speed) if you're ADSL is
that of setting up your routers to permit incoming connections on port
22, and having static IP addresses. This is straight forward, but does
need to be done before your clusters can communicate.
> How would people suggest linking the two clusters together using a secure
> connection? Performance at this point is not in the equation, just the
> ability to securely connect.
I suggest linking them together via "sshfs" and "ssh/dsh", because doing
things like SSI (Single System Image) or MPI (message Passing Interface)
over ADSL will require many ports to be open/tunneled and will be slow.
> BTW Any thoughts too on a SQL Server that would cope well in this scenario?
I've tunneled MySQL via SSH and it works fine. You would be unwise to
expose the ports used by SQL Server or MySQL to the Internet because
they are very insecure. Of course, you could always use openVPN :-(
Dr. A.J.Travis, | mailto:ajt at rri.sari.ac.uk
Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt
Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751
Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687
More information about the Beowulf