wake-on-LAN, thin nodes, etc.

Fri Oct 4 17:36:18 PDT 2002

A friend and I are designing a new cluster (probably 96 diskless
dual Athlon nodes, using 100 Mbit/s Ethernet, PXE boot) to replace an
old cluster of alphas.  Because of space and budget limitations,
we plan to make this a dense cluster of *extremely thin* nodes.
As currently planned, each node will comprise ...

    one dual processor motherboard with built-in LAN

    two CPUs

    memory

    a power supply

... and NOTHING ELSE.  Each node will get 2 wires: 120V AC to the
power supply, and a CAT5 Ethernet cable to the motherboard.

We know that properly cooling a cluster of this size is a big deal,
but we believe we have an adequate solution to that problem.

We have so far been experimenting with 2 candidate motherboards (Tyan
Tiger MPX S2466N-4M and Gigabyte GA7DPXDW) and have run into a few
problems and questions.

Question 1:

We would like to be able to power-on and power-off the slave nodes
under software control from the master node.  We have been able to
do the power-off via the poweroff command, and we were hoping to do
power-on via wake-on-LAN.  So far we have been unable to make
wake-on-LAN work with the Tyan board.  Wake-on-LAN does work on the
Gigabyte board, but with one problem:  If the power goes COMPLETELY
off (as in a power failure or if the surge protector switch is
turned off), then the standby voltage to the motherboard goes to 0.
When the power is restored, and the standby voltage returns, the
board no longer responds to wake-on-LAN.  It must be manually
powered-up, after which it will respond to wake-on-LAN after the
next poweroff.

Has anyone succeeded in getting wake-on-LAN to power-up a motherboard
from a fully powered-down state (unplugged and then replugged), as
just described?  If so, what hardware were you using and what kind
of BIOS configuration did you have to do to make it work?  If not,
is there some inescapable limitation in the hardware that makes this
impossible, or is there hope that changes in the BIOS code might make
this work?

Question 2:

I've read in some earlier postings on this list, and heard scuttlebutt
elsewhere, that dual-AMD machines and dual-Intel machines differ
in reliability.  I'd be interested in hearing any strong feelings on
this, especially from vendors who have assembled many machines with
a wide variety of hardware.  Feel free to reply offline if you feel
more comfortable with that.  So far our favorite is the Gigabyte
GA7DPXDW dual Athlon.  I'd greatly appreciate comments from anyone
with experience with these boards before I go out and buy 96 of them.

Question 3:

Are PVM and Mosix compatible with one another?

Most of the code I plan to run on the new cluster uses PVM and is
not communication-bound.  I have heard so many good things about
Mosix, though, that I would like to install it, too, and use it
for other tasks.  My understanding of Mosix is that it creates one
unified process space for the whole cluster, and will move processes
around from node to node as necessary to keep the load balanced.
This seems at odds with the PVM way of doing things, in which each
child process is told what host it will run on at the time it is
spawned.  If a PVM-based application is running on a cluster that
has Mosix, will Mosix move the pvm tasks to different nodes as they
are running?  I imagine this would cause catastrophic problems for
such an application, since pvm tasks only communicate through their
local pvm daemons.  I'd also welcome comments from devotees of
bproc.

Question 4:

On a dual-processor machine, the CPUs are numbered 0 and 1 (for
example, in /proc/cpuinfo).  Is there any way for a process to
determine which CPU (0 or 1) it is running on?  If so how?  I'm hoping
for some kind of system call, like a getcpuid(), analogous to getpid().

Finally, a comment to any motherboard manufacturers who might be
following this list:

Several motherboards now on the market come close to being ideal for
use in a cluster of the kind described above.  With a little effort,
a manufacturer could easily grab a big segment of the beowulf market by
paying attention to a few details that can make or break a motherboard
as a thin beowulf slave node.

The Tyan S2466N-4M appealed to us because it had built-in LAN, but
none of the other built-ins often found on modern boards (sound,
video, SCSI controller, IDE RAID, etc.) that are useless to us and
just drive up the cost of the boards.

A properly implemented wake-on-LAN (i.e., one that can truly power-up
a motherboard after a power failure) should be a strong selling feature
for the beowulf market.  It makes it possible to reboot machines from
a remote site, turn some nodes on or off under software control as the
needs of a calculation change, or in response to changes in ambient
temperature, or to be a good citizen in California and reduce power
consumption during peak hours on summer afternoons.  Most importantly
for us, it would eliminate the need for 96 separate power switches,
one for each board.

Several of the problems we have encountered could be fixed by changes
to the BIOS.  A motherboard that supports an open source BIOS (like
LinuxBIOS) would be very appealing to beowulfers, especially if it
got the other details right.

Please feel free to add items from your own motherboard wish lists
to this thread.   Maybe some enterprising manufacturer will take
notice.

Best wishes,

Jack Wathey