phasing out Solaris/Oracle/Netscape with Linux/PostgreSQL/Apache

Eugene.Leitl at lrz.uni-muenchen.de Eugene.Leitl at lrz.uni-muenchen.de
Thu Feb 8 22:43:09 PST 2001


This is somewhat off-topic, so bear with me.

Um, I need some advice. The company I'm with has a Solaris Sun with
Netscape Enterprise Server. It is running a mix of C, Perl, Oracle
and Daylight. The latter beast currently requires 1.5 GByte RAM,
but is still creaking at all seams. We're going to kick Daylight 
sooner or later, so the memory footprint will drop, possibly a lot.

I don't yet understand the architecture of the chemical database
we're running, so I don't know where the bottlenecks are, but I need
to build a Linux machine which could outperform the Sun at a fraction
of the price. I will set it up locally, and hammer it with queries,
doing measurements, stability tests, etc.

I would rather like to use Athlons, especially recent DDR Dual Athlons,
but this doesn't seem to go too well with required stability. This
means I should use a dual Pentium III motherboard, right? Does Linux
handle these well? I heard these can take up to 8 GByte RAM, I think 
I should start with 2 GBytes. I don't think the application is CPU 
bound, but I don't know for sure yet. It certainly seems to exercise 
disks strongly, so here is another question:

do I absolutely, positively need SCSI? I was thinking about putting a
second 100 EIDE host adapter in, and run disk striping plus mirroring
over 4 EIDE hard drives (the better models from IBM). Or should I
use a Dual-Pentium mumboard with onboard SCSI, and buy several fast,
hot & noisy scuzzys, soft-RAIDing them? Perhaps even harware RAID?
I don't think the disks need to be very large, but they better be fast.

I would like to use Reiser FS, so this only allows me RAID 0/1, right?
Higher stability, lack of fscking delay in case machine needs to be
rebooted and no 2 GByte file size limit would seem to be needed.
Since this is ReiserFS, I should obviously go with the latest stable
kernel. I've been using Mandrake for my desktop and small time testing, 
but for this application this is probably not the way to go. Which 
distro should I choose? Debian?

We seem to be moving towards an architecture consisting of a bunch 
of Perl programs (it's not settled, but the few routines we have are
in Perl) communicating via sockets. (Right now it's a mix of C and 
Perl, talking via named pipes). The queries (a chemical structure 
drawn within a browser, using a Java applet or a plug-in) are 
handled with a dispatcher (a cgi-bin Perl thing). Sooner or 
later we will spread the query to a farm of boxes, each containing 
a database, or segmenting a database across several, cheap, redundant 
boxes. (But where not there yet). There's no alternative to sockets, 
right?

Right now a query can take up to minutes, so I don't think mod_perl
is needed. We don't get a lot of query hits, at least not yet. Should 
I try using Apache mod_perl instead of the Netscape Server nevertheless? 

The database question. The database is large, but entirely static.
Would PostgreSQL be able to handle this comparably to Oracle? The 
licenses for Oracle are not so very expensive anymore, but if we're
going towards a farm of boxes, this will start to cumulate. Otoh,
some Oracle stuff seems to be able to handle parallel databases
natively. Hmm.

Sorry for the bunch of clueless questions, but I'm not exactly a
computer person, nor is the company extremely competent in technical
questions (they're all a bunch of chemists, which just have been
working with computers for a long time). The budget for hardware seems
to be there, but I'm a new guy, and I can't afford buying a bunch
of expensive crap which will just sit there gathering dust.

I would also welcome some pointers towards lists where questions
such as these are handled holistically. I'd settle for a bunch
of dedicated lists too, though.

<off-topic/>




More information about the Beowulf mailing list