phasing out Solaris/Oracle/Netscape with Linux/PostgreSQL/Apache

Fri Feb 9 06:13:07 PST 2001

On Fri, 9 Feb 2001 Eugene.Leitl at lrz.uni-muenchen.de wrote:

>
> I would rather like to use Athlons, especially recent DDR Dual Athlons,

Even a single athlon will come damn near matching the performance of a
dual PIII with similar clock in aggregate float, according to my newly
improved and vastly repaired cpu-rate tool.  I attache a PRELIMINARY
update of my performance graphs, where the systems tested are (bottom to
top) a 466 MHz Celeron, a 933 MHz PIII (PC133 SDRAM), a dual 933 MHz
PIII with RDRAM running one (filled) and two (open) benchmarks at once,
and an 800 MHz Athlon thunderbird with PC133 SDRAM.  The top two figures
are a 667 MHz Alpha EV67 (XP1000)with ccc (filled) and gcc (open).

These curves now more or less agree with common wisdom on relative CPU
speeds -- my previous revision of cpu-rate was fooled by a hardware
Intel optimization -- float division by any power of two is recognized
and handled separately (presumably by a shift operation) so e.g. x/2.0
is done about six times as fast as x/2.000001.  Infinite thanks to
Sebastien Cabaniols for figuring this out.  I now divide by PI and have
improved the timing loop accuracy as well.

I'm going to announce the update of cpu-rate in a few days -- another
list member (who discovered the division/shift optimization bug) is
testing the revised code on more alphas.  I'd love a volunteer to
provide test results from a dual Athlon as well -- we don't have any
here (yet).  Even single processor, Athlons now top my private
price/performance list by a goodly amount.

The usual disclaimers/warnings apply to the results presented in the
attached figure, BTW -- a) they could be flat wrong (the results of the
previous version certainly were:-); b) they aren't necessarily a good
predictor of performance, especially of a general float/int code mix
(they DON'T predict relative performance well of my own Monte Carlo
code, on which an alpha is only 1.25x faster than a PIII at equivalent
clock); c) Your Mileage May Vary; and d) given all of this, it isn't my
fault if you use these results to make a decision and choose poorly.

> but this doesn't seem to go too well with required stability. This

I'm not certain, but I think that the stability issue is "binary".  As
in, if you have a bad motherboard/bios/cpu combination there can be
problems, but if you have a good one it is pretty much as stable as an
Intel.  We have athlons that have been up for months, at any rate, under
a varying load.  Although I've just started using them personally, I
have one that has been up for weeks running a somewhat broken RH 7.0.
I'm not too worried about stabilizing it, and it is nice and peppy.  So
long, Celerons.  Sayonara, PIII's.

They are cheap enough (as in less than $1000 including a monitor for a
256 MB system) that you can prototype one and beat it apart and if you
can't stabilize it as a compute server you can always use it as a
desktop.  If your prototype is stable, then why worry?

> means I should use a dual Pentium III motherboard, right? Does Linux
> handle these well? I heard these can take up to 8 GByte RAM, I think
> I should start with 2 GBytes. I don't think the application is CPU
> bound, but I don't know for sure yet. It certainly seems to exercise
> disks strongly, so here is another question:

If you do get a dual PIII and plan to run memory bound stuff, there is a
visible performance advantage to having at least moderately fancy
memory.  When running out of memory, our dual 933 MHz PIII's equipped
with RDRAM do indeed provide more aggregate FLOPS than a 750 MHz Athlon,
although the Athlon is still faster as a single CPU than any PIII.

> do I absolutely, positively need SCSI? I was thinking about putting a
> second 100 EIDE host adapter in, and run disk striping plus mirroring
> over 4 EIDE hard drives (the better models from IBM). Or should I
> use a Dual-Pentium mumboard with onboard SCSI, and buy several fast,
> hot & noisy scuzzys, soft-RAIDing them? Perhaps even harware RAID?
> I don't think the disks need to be very large, but they better be fast.

I personally don't think so, as long as you run only one disk per IDE
channel.  SCSI is still so darned expensive relative to IDE.  There do
exist some very pretty IDE-based RAID units (four IDE drives in a
cabinet with a RAID controller that hooks up to a SCSI interface in your
system) that can provide cheap, fast, reliable storage.

I will say that over the years I've found SCSI disks to be many times
more reliable as pure hardware than IDE disks.  Manufacturers seem to
reserve their higher tolerance/better disks for the SCSI market.  This
has been (anecdotally) less true over the last 1-2 years -- I haven't
had an IDE drive crash in that time (although I haven't had a SCSI drive
crash in much longer with a much larger pool).

I just get IDE drives in my own systems these days -- the newest drives
are quite fast and seem to complement a fast CPU reasonably well.  If
your server has very great stability requirements, though, I'd at least
think about either an external IDE RAID on a SCSI channel or SCSI
drives.  You might need a SCSI interface anyway, as you've got to back
everything up on a server-type platform and any high-speed backup device
will almost certainly be SCSI based.

> I would like to use Reiser FS, so this only allows me RAID 0/1, right?
> Higher stability, lack of fscking delay in case machine needs to be
> rebooted and no 2 GByte file size limit would seem to be needed.
> Since this is ReiserFS, I should obviously go with the latest stable
> kernel. I've been using Mandrake for my desktop and small time testing,
> but for this application this is probably not the way to go. Which
> distro should I choose? Debian?

Sorry, no help here.  Reiser is way out past my envelope.  I personally
use Red Hat (to be able to use kickstart to install and now yup to
maintain via an ftp drop of common RPM's).  With yup around to finally
provide RPM functionality somewhat similar to Debian's apt, I do think
that having an RPM based distro is a very good idea.  However, I don't
want to get into a religious war over it.

For those who DO use rpm-based distro's, I will say that I've now used
yup to update a number of hosts and it totally kicks butt.  It
transparently manages all the dependency issues that used to bug the
heck out of me.  I haven't tried setting up deliberately tortuous
dependencies (package a and b both require c, update b (which requires
an updated c, which requires that a also be updated so it does both even
though you didn't ask for it) but it should have managed a fair number
of these on the updates I've done (one of which was of a really nasty
system, see below).  To learn more, visit:

   http://devel.yellowdoglinux.com/rp_yup.shtml

and

   http://sourceforge.net/projects/yup/

We're using a version labelled 0.7.2 (modified for dulug, see
www.dulug.duke.edu).  The one problem I know of that the developers are
still working on that I've encountered is that yup will only work on a
relatively "clean" system. If you start with a system that is a mess
because you've done a lot of --force'd rpm installs or reinstalled from
a tarball after installing an RPM without removing the RPM first or
broken dependencies some other way (all of which I'd done on one of my
systems:-( then you have to clean it up by hand so that the rpm database
accurately describes your actual installation before it will run to
update it (sensible enough).  Earlier versions would sometimes encounter
these nasty little twists and barf without giving you enough information
to identify the offending package or program, but this latest one, with
some work, finally let me purge all the evil from the very old
ex-development system in question.

Once it is clean and updates a single time, I suspect that it will keep
your system clean and consistent forever after from a central
repository.  I think I'm in love.  (Is it sick to love a maintenance
tool?  I don't think so...:-)

> We seem to be moving towards an architecture consisting of a bunch
> of Perl programs (it's not settled, but the few routines we have are
> in Perl) communicating via sockets. (Right now it's a mix of C and
> Perl, talking via named pipes). The queries (a chemical structure
> drawn within a browser, using a Java applet or a plug-in) are
> handled with a dispatcher (a cgi-bin Perl thing). Sooner or
> later we will spread the query to a farm of boxes, each containing
> a database, or segmenting a database across several, cheap, redundant
> boxes. (But where not there yet). There's no alternative to sockets,
> right?
>
> Right now a query can take up to minutes, so I don't think mod_perl
> is needed. We don't get a lot of query hits, at least not yet. Should
> I try using Apache mod_perl instead of the Netscape Server nevertheless?
>
> The database question. The database is large, but entirely static.
> Would PostgreSQL be able to handle this comparably to Oracle? The
> licenses for Oracle are not so very expensive anymore, but if we're
> going towards a farm of boxes, this will start to cumulate. Otoh,
> some Oracle stuff seems to be able to handle parallel databases
> natively. Hmm.

Again, can't help, although a colleague or two of mine who are modest
experts on DB's (and usually track the list when they have time) --
perhaps will chime in and comment.  Jesse?  Derek?  Are you there?

> Sorry for the bunch of clueless questions, but I'm not exactly a
> computer person, nor is the company extremely competent in technical
> questions (they're all a bunch of chemists, which just have been
> working with computers for a long time). The budget for hardware seems
> to be there, but I'm a new guy, and I can't afford buying a bunch
> of expensive crap which will just sit there gathering dust.

The great thing about life today is that you can afford to prototype
anything but an alpha for chickenfeed (as in a few thousand dollars
TOTAL) and recycle any systems that don't make the cut.  An alpha is
harder to prototype, but you can probably arrange for a loaner.  For
certain problems, its floating point advantages are really impressive.
It sounds like your system isn't going to do a lot of parallel work, so
having a very fast CPU and bus may be the only way available to speed
things up.

They ARE a lot more expensive, of course.

  rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu