scyld scyld on an ASUS A7V266-C

Jorge M. Pacheco pacheco at cii.fc.ul.pt
Thu Jun 27 16:04:06 PDT 2002


   Dear ALL,

Just to let you know the happy ending of a cluster upgrade story.

As I stated before, I had a small scyld beowulf cluster with AMD's 
XP-1600+ & SDRAM PC133 running flawlessly for 6 months, 24h/day.
In view of this fantastic performance, we decided to buy some extra 
nodes, and we got a good deal with brand new XP 2000+ & DDR PC2100.
We expected our task to involve the trivial upgrade for a scyld beowulf, 
namely, that all we needed to do was to floppy-boot each node (no 
cdroms, please) and beoboot-install the operating system on the HD's...
WRONG.
We started to have quite a few problems which, after some tweaking, 
meant we could add the new nodes but they turned out to be quite 
resilient to run whatever program you would submit to them - needless to 
say, MPI programs would simply collapse.
Furthermore, slave node behaviour would depend sensitively on memory 
timings & other bios setup parameters...
This would happen at the same time that all sorts of pings to the new 
nodes from the main node would invariably give 0% packet loss.
Strange hein ? Also, if you take into account that when booting, the 
only strange thing that would happen was the complaint "neighbour table 
overflow"... this would drive you into the thought of a network problem...

Well, the truth is that, as an act of desperation, I decided to install 
THE SAME scyld beowulf software in one of the new machines, and 
transform this new machine into the main node.
Installation was perfect & smooth; at the end, all new nodes could be 
added without a single complaint. Moreover, programs would now run 
nicely (serial & parallel), so everyting went back to normal.
And what about the old nodes ?
Very well, I tried and... they did work fine. No complaints whatsoever.

So... if you decide for an expansion of your nice & stable scyld beowulf 
machine, and if you start getting strange complaints, try & set the 
fastest & most up-to-date hardware as main node, and all the rest as 
daughter-nodes.
If it works for you the same way it worked for us, you're bound to 
become a happy human again.

			Cheers,  J. M. Pacheco




More information about the Beowulf mailing list