[Beowulf] Cluster Diagram of 500 PC

Tue Jul 10 15:29:26 PDT 2007

>  We want to setup a Cluster of 500 PC in following Configuration:
>  Intel Duel Core 1.88 GHz 2MB Cache
>  4 GB DDR-2 RAM
>  2 X 80 GB

I'm not saying this configuration is bad, but how did you arrive at it?
there are tradeoffs in each of these hardware choices, and those decisions
are the ones you can't fix later.  (I have especially not often seen 
2x80G be a useful cluster node config.  it's potentially too much for 
a diskful install, even if you insist on disfulness.  and yet clusters 
are normally quite automated, so raid1-ing an OS install doesn't make much 
sense unless you have some specific reasons.  finally, if you have 
disk-intensive applications that can utilize local disks, it would make 
more sense to use larger disks, since these days 250-320G is pretty much
entry-level (one-platter).)

>  How dow we connect these computers and how many will be defined as master.

you don't necessarily even need a master, but to answer this, you must 
quantify your workload fairly precisely.  with 500 nodes, you might well
expect a significant number of users logged in at once, which might 
incurr a significant "support" load (compiling, etc).  or perhaps you 
want to run many, many serial jobs, in which case you'll need to split 
up your "admin" load across multiple machines (queueing, monitoring,
logging.)

>  How do we conect using how many switch.

again, depends entirely on your workload.  it _could_ be quite reasonable 
to have a rack of nodes going to one switch, and just one uplink from 
each rack to a top-level switch.  that would clearly optimize the cabling
at the expense of serious MPI programs.  unless the workload consisted 
solely of rack-sized MPI programs!  large switches of the size you're 
looking for tend to be expensive; if you compromise (say, single 10G 
uplink per rack), modular switches can still be used.

otoh, maybe a spindly ethernet fabric alongside a fast and flat 
512-way myri10G network?

all depends on the workload.

>  How power connection will be provided.

you want some sort of PDU in the rack with high-current, high-voltage
feeds to each rack.  dual 30A 220 3-phase is not an unusual design point.
obviously, if you can make nodes more efficient, you save money on the 
power infrastructure, as well as operating costs.  for instance, the 
cluster I sit next to has dual-95W sockets per node, with each node 
pulling around 300W.  higher-efficiency power supplies might save 30W/node,
which would be only 1.1 kw/rack; 65W sockets would save 60W/node - 
that's starting to be significant.  providing consistently cool air
saves power too (nodes here have 12 fans that consume up to 10W each!).

>  How do we start and stop all nodes using a remote computer.

IPMI is an excellent, portable, well-scriptable interface for control and 
monitoring.  there are some vendor-specific alternatives, as well as 
cruder mechanisms (controllable PDU's).

>  How do we ensure fault tolarent network connectivity.

something like LVS.  it's a software thing, thus easy ;)

>  We want to use windows  XP or windows 2003 as OS. Better persormance Centos ro Linux RHL may be selected.

don't bother with windows unless you really are a windows guru
and also incredibly linux-averse.

>  please advice us and help me in providing a network diagram of the system

nodes in racks, leaf switch(s) in racks, uplinking to top-level switch(s).
knowing nothing about your workload, that's reasonable.