[Beowulf] OS for 64 bit AMD
ajt at rri.sari.ac.uk
Wed Apr 6 16:39:29 PDT 2005
Robert G. Brown wrote:
>>interesting and relevant to my own situation running openMosix on a
>>64-node RH-9 based Beowulf cluster. I had a difficult time trying to
>>upgrade our cluster from RH-8.0 to FC-2 because the Adaptec Ultra160
>>drivers in FC-2 were broken, so I went to RH-9 instead.
> That probably should read "were broken in the particular kernel snapshot
> I tried to install". Google fairly quickly turns up a few hits on this
> problem (but not many) and several suggestions on how to proceed. This
> seems like something that would have rapidly and long since been
> resolved, given the large number of adaptec users out there. Still, I
> have definitely seen problems (notably a broken USB subsystem) within
> some of the FC kernel snapshots. This doesn't make FC "broken" wrt to
> EL -- I've had BIGGER problems dealing with EL's broken/out of date
> libraries in my own numerical code. An old GSL alone is a show stopper
> for HPC applications in my personal opinion. RH has clearly interpreted
> "stable" as meaning "not to be changed even in clearly positive ways"
> (that is, "stagnant") in my opinion. Fine for banks, fine for servers,
> not so good for desktops or clusters expected to run a rapidly changing
> mix of applications.
Actually, openMosix is not an application: It's a load-balancing
extension to the Linux kernel based on patches to vanilla Linux kernel
sources (2.4.22 in our case).
You need a host GNU/Linux environment in which to run openMosix and, of
course, you need to compile the kernel modules for the openMosix kernel.
Everything worked fine under Red Hat 7.3/8.0, but when I tried to
install the 'official' FC-2-i386 release I couldn't use it because our
servers have SCSI disks and the Adaptec Ultra160 driver in FC-2 didn't
work. I did use Google, but I would have had to build a system with IDE
disks, install the broken FC-2, attempt fix the SCSI problem, create a
bootable iso CDROM and install my own version of FC-2 on the servers
with SCSI disks. Much easier to just install RH-9 instead and leave the
fixing of FC-2 SCSI drivers to people who know what they are doing!
I've used Red Hat Linux for a long time, and I like it. I used FC-1 on
the desktop and considered it to be the best distribution I'd ever used,
but I lost faith in FC-2. Not just because the Adaptec Ultra160 SCSI
drivers were broken, but that it was generally unstable on the same
(IDE-disk) desktop that I'd run FC-1 on without any problems. The FC-2
Intel ethernet drivers were also unstable on our DNS server: The same
server hardware runs reliably under RH-9. I *know* these problems are
fixable, but at this point my perception of Fedora changed...
I don't think I'm alone on this list to be nervous about the stability
of Fedora. We're not a bank, we're a not-for-profit research Institute
but it does matter that users can run their jobs on the cluster without
it crashing too often. OK, RH-9 was a 'safe' option but I realise that
we can't run RH-9 indefinitely. Debian Sarge/Testing is also a 'safe'
option but seems to have a more open-ended future. The new Debian Sarge
installer RC2 makes installing Debian easier. I've also looked at Ubuntu
Debian, and Progeny Debian as alternative ways of installing Debian.
I'm reluctant to leave the Red Hat / Fedora camp but I'm not convinced
by anything I've read here or anywhere else about the merit of RHEL or
its derivatives (white box or otherwise). Progeny Debian seems to be a
reasonable compromise to me because, you get the 'best' of both worlds:
The Progeny port of anaconda to Debian brings undisputed advantages of
Red Hat's hardware detection and ease of installation to the Debian
world. This, of course, has already been done to some extent by Knoppix
and its derivatives but Knoppix is a 'live' CD and is not intended to be
installed on a hard disk (early versions of clusterKnoppix did have a
hard disk installation script but this has now been removed). Another
contender that I considered is Simply MEPIS which is a live CD very
similar to Knoppix but differs in that it can be installed permanently
onto a hard disk. However, Simply MEPIS is based on Debian 'unstable'
which, I think, has the same disadvantages as FC-2 for cluster servers.
I don't see that a "rapidly changing mix of applications" has much
bearing on which server OS to choose: This thread was originally about
an "OS for 64 bit AMD". I'm running 32-bit Progeny Debian on a 64-bit
AMD Opteron server just now, and plan to run openMosix on this and a
small cluster of eight AMD Athlon64 compute nodes. The 64-bit version of
openMosix is not yet released, but I picked up on this thread because
I'm interested to know about the 64-bit OS's that people are thinking of
using for Beowulf clusters...
Dr. A.J.Travis, | mailto:ajt at rri.sari.ac.uk
Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt
Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751
Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687
More information about the Beowulf