[Beowulf] OS for 64 bit AMD

Tony Travis ajt at rri.sari.ac.uk
Wed Apr 6 16:39:29 PDT 2005


Robert G. Brown wrote:
> [ajt...]
>>interesting and relevant to my own situation running openMosix on a 
>>64-node RH-9 based Beowulf cluster. I had a difficult time trying to 
>>upgrade our cluster from RH-8.0 to FC-2 because the Adaptec Ultra160 
>>drivers in FC-2 were broken, so I went to RH-9 instead.
> 
> 
> That probably should read "were broken in the particular kernel snapshot
> I tried to install".  Google fairly quickly turns up a few hits on this
> problem (but not many) and several suggestions on how to proceed.  This
> seems like something that would have rapidly and long since been
> resolved, given the large number of adaptec users out there.  Still, I
> have definitely seen problems (notably a broken USB subsystem) within
> some of the FC kernel snapshots.  This doesn't make FC "broken" wrt to
> EL -- I've had BIGGER problems dealing with EL's broken/out of date
> libraries in my own numerical code.  An old GSL alone is a show stopper
> for HPC applications in my personal opinion.  RH has clearly interpreted
> "stable" as meaning "not to be changed even in clearly positive ways"
> (that is, "stagnant") in my opinion.  Fine for banks, fine for servers,
> not so good for desktops or clusters expected to run a rapidly changing
> mix of applications.

Hello, Robert.

Actually, openMosix is not an application: It's a load-balancing 
extension to the Linux kernel based on patches to vanilla Linux kernel 
sources (2.4.22 in our case).

	http://openmosix.sourceforge.net/

You need a host GNU/Linux environment in which to run openMosix and, of 
course, you need to compile the kernel modules for the openMosix kernel.

Everything worked fine under Red Hat 7.3/8.0, but when I tried to 
install the 'official' FC-2-i386 release I couldn't use it because our 
servers have SCSI disks and the Adaptec Ultra160 driver in FC-2 didn't 
work. I did use Google, but I would have had to build a system with IDE 
disks, install the broken FC-2, attempt fix the SCSI problem, create a 
bootable iso CDROM and install my own version of FC-2 on the servers 
with SCSI disks. Much easier to just install RH-9 instead and leave the 
fixing of FC-2 SCSI drivers to people who know what they are doing!

I've used Red Hat Linux for a long time, and I like it. I used FC-1 on 
the desktop and considered it to be the best distribution I'd ever used, 
but I lost faith in FC-2. Not just because the Adaptec Ultra160 SCSI 
drivers were broken, but that it was generally unstable on the same 
(IDE-disk) desktop that I'd run FC-1 on without any problems. The FC-2 
Intel ethernet drivers were also unstable on our DNS server: The same 
server hardware runs reliably under RH-9. I *know* these problems are 
fixable, but at this point my perception of Fedora changed...

I don't think I'm alone on this list to be nervous about the stability 
of Fedora. We're not a bank, we're a not-for-profit research Institute 
but it does matter that users can run their jobs on the cluster without 
it crashing too often. OK, RH-9 was a 'safe' option but I realise that 
we can't run RH-9 indefinitely. Debian Sarge/Testing is also a 'safe' 
option but seems to have a more open-ended future. The new Debian Sarge 
installer RC2 makes installing Debian easier. I've also looked at Ubuntu 
Debian, and Progeny Debian as alternative ways of installing Debian.

	http://www.ubuntulinux.org/
	http://www.progeny.com/

I'm reluctant to leave the Red Hat / Fedora camp but I'm not convinced 
by anything I've read here or anywhere else about the merit of RHEL or 
its derivatives (white box or otherwise). Progeny Debian seems to be a 
reasonable compromise to me because, you get the 'best' of both worlds:

The Progeny port of anaconda to Debian brings undisputed advantages of 
Red Hat's hardware detection and ease of installation to the Debian 
world. This, of course, has already been done to some extent by Knoppix 
and its derivatives but Knoppix is a 'live' CD and is not intended to be 
installed on a hard disk (early versions of clusterKnoppix did have a 
hard disk installation script but this has now been removed). Another 
contender that I considered is Simply MEPIS which is a live CD very 
similar to Knoppix but differs in that it can be installed permanently 
onto a hard disk. However, Simply MEPIS is based on Debian 'unstable' 
which, I think, has the same disadvantages as FC-2 for cluster servers.

	http://www.mepis.org/

I don't see that a "rapidly changing mix of applications" has much 
bearing on which server OS to choose: This thread was originally about 
an "OS for 64 bit AMD". I'm running 32-bit Progeny Debian on a 64-bit 
AMD Opteron server just now, and plan to run openMosix on this and a 
small cluster of eight AMD Athlon64 compute nodes. The 64-bit version of 
openMosix is not yet released, but I picked up on this thread because 
I'm interested to know about the 64-bit OS's that people are thinking of 
using for Beowulf clusters...

Best wishes,

	Tony.
-- 
Dr. A.J.Travis,                     |  mailto:ajt at rri.sari.ac.uk
Rowett Research Institute,          |    http://www.rri.sari.ac.uk/~ajt
Greenburn Road, Bucksburn,          |   phone:+44 (0)1224 712751
Aberdeen AB21 9SB, Scotland, UK.    |     fax:+44 (0)1224 716687



More information about the Beowulf mailing list