phasing out Solaris/Oracle/Netscape with Linux/PostgreSQL/Apa che

Fri Feb 9 10:31:55 PST 2001

Let me know if I can help beyond what I write below.  Additional comments
from the crowd welcome.  See below . . . 

> -----Original Message-----
> From: Eugene.Leitl at lrz.uni-muenchen.de
> [mailto:Eugene.Leitl at lrz.uni-muenchen.de]
> Sent: Thursday, February 08, 2001 10:43 PM
> To: linux-elitists at zgp.org; pigdog-l at bearfountain.com;
> beowulf at beowulf.org
> Subject: phasing out Solaris/Oracle/Netscape with
> Linux/PostgreSQL/Apache
> 
> 
> 
> This is somewhat off-topic, so bear with me.
> 
> Um, I need some advice. The company I'm with has a Solaris Sun with
> Netscape Enterprise Server. It is running a mix of C, Perl, Oracle
> and Daylight. The latter beast currently requires 1.5 GByte RAM,
> but is still creaking at all seams. We're going to kick Daylight 
> sooner or later, so the memory footprint will drop, possibly a lot.
> 
> I don't yet understand the architecture of the chemical database
> we're running, so I don't know where the bottlenecks are, but I need
> to build a Linux machine which could outperform the Sun at a fraction
> of the price. I will set it up locally, and hammer it with queries,
> doing measurements, stability tests, etc.

This could get you.  Critical to understand some basic things about the
existing application.  Exporting data, etc is your first concern - you need
to know how to do that.  That could very well set the pace of a conversion.

> 
> I would rather like to use Athlons, especially recent DDR 
> Dual Athlons,
> but this doesn't seem to go too well with required stability. This
> means I should use a dual Pentium III motherboard, right? Does Linux
> handle these well? I heard these can take up to 8 GByte RAM, I think 
> I should start with 2 GBytes. I don't think the application is CPU 
> bound, but I don't know for sure yet. It certainly seems to exercise 
> disks strongly, so here is another question:

Linux does handle them well, but I'm impartial to FreeBSD.  Since you're
using Sun OS already, you could continue with Sun X86 version without making
the big switch all at once - it's free and a works well enough.  It would
make a fine part of a beowulf, and allow you to get used to Linux over a
longer period of time.  If you can make the transition in baby steps, it's
better.

> 
> do I absolutely, positively need SCSI? I was thinking about putting a
> second 100 EIDE host adapter in, and run disk striping plus mirroring
> over 4 EIDE hard drives (the better models from IBM). Or should I
> use a Dual-Pentium mumboard with onboard SCSI, and buy several fast,
> hot & noisy scuzzys, soft-RAIDing them? Perhaps even harware RAID?
> I don't think the disks need to be very large, but they 
> better be fast.
> 

SCSI is GREAT, and you should set up redundant hot swaps so if you crash,
you insert a new disk, type "boot", and you're back online with a node.  I
think Sun stations outperform the Intel boards on disk throughput, but you
could check.

> I would like to use Reiser FS, so this only allows me RAID 0/1, right?
> Higher stability, lack of fscking delay in case machine needs to be
> rebooted and no 2 GByte file size limit would seem to be needed.
> Since this is ReiserFS, I should obviously go with the latest stable
> kernel. I've been using Mandrake for my desktop and small 
> time testing, 
> but for this application this is probably not the way to go. Which 
> distro should I choose? Debian?

Not sure on this one.  Not enough exposure with all of those filesystems.  I
know, use MFS, and load up machines with 2GB of ram!  Just kidding.

> 
> We seem to be moving towards an architecture consisting of a bunch 
> of Perl programs (it's not settled, but the few routines we have are
> in Perl) communicating via sockets. (Right now it's a mix of C and 
> Perl, talking via named pipes). The queries (a chemical structure 
> drawn within a browser, using a Java applet or a plug-in) are 
> handled with a dispatcher (a cgi-bin Perl thing). Sooner or 
> later we will spread the query to a farm of boxes, each containing 
> a database, or segmenting a database across several, cheap, redundant 
> boxes. (But where not there yet). There's no alternative to sockets, 
> right?

You'll take a big performace hit running perl too much - it's interpreted.
I know it's popular but with a big project, you should get the training
required to do it in a compiled language, like C/C++.  If you insist on an
interpreted language, use Java - it's a 'cleaner' language to manage on a
large scale.  For socket work I would definitely stick with C - and create a
library specific to your needs that your perl programs could call.

> 
> Right now a query can take up to minutes, so I don't think mod_perl
> is needed. We don't get a lot of query hits, at least not yet. Should 
> I try using Apache mod_perl instead of the Netscape Server 
> nevertheless? 

How much data does the longer queries access?  If you're seeing less than
sub-minute response times, it's a red flag.  That's just a rule of  thumb I
use.  I've seen five second return times on several gigs of data with a Sun
and Oracle.