Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Transparant clustering

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mark Hahn hahn at mcmaster.ca
Thu Nov 13 14:52:20 PST 2008


> Service). I have always thought that it should be possible to build a
> cluster which works like a single system. So that when I open an SSH session
> to the cluster I get a connection as normal while in fact I am connecting to
> the clustered system.

the ssh connection is not interesting; what happens after that _is_.
so you ssh to a cluster, and through lvs or similar, you get put onto 
some node.  then what?  is it supposed to still "work like" a single 
system?  it does as long as you don't need more than one node, but 
that's not only boring but begs the question of why a cluster?

making a cluster really "work like" a single system means that no
thread should be aware of the fact that some other thread is on 
a different node.  this means a single pid space, transparent shared
memory, etc.  and even then (an SGI altix is such a machine), none
of this is transparent in a strong sense (ie, the thread will indeed 
be able to tell when a cacheline is remote...)

> I started reading a lot, and it seems as if this can
> be done with beowulf. I just wonder if the head node would make things more
> difficult (since that can go down as well). Is this at all possible (using
> beowulf) and how would I go about configuring this?

are you confusing high-availability with clustering?  avoiding single points
of failure is laudable, but you quickly start to move away from anything 
that resembles high-performance (and necessarily start relying on more 
replication and thus cost...)

> I know this isn't very clear (I am just exploring), so please ask away.

well, "what do you mean?" pretty much covers it.  it's certainly possible
to avoid a single login node as a single point of failure.  it's also 
possible to use HA techniques to avoid other SPOF's (such as where the 
scheduler runs, or filesystems, etc).  but "working like a single system"
is much harder.



More information about the Beowulf mailing list