Testing hardware

William R. Dieter dieter at dcs.uky.edu
Tue Apr 24 08:21:01 PDT 2001


Gunnar Lindholm wrote:
> 
> Hello 'wulfers.
> 
> Does anybody know about any good programs to run to test for hardware errors?
> To calculate something well known is a good one, and I've heard of a memory 
> tester that I'm currently trying to find.

http://reality.sgi.com/cbrady_denver/memtest86/

is the url for memtest86, an x86 memory tester.  It is a fairly thorough
memory tester.  If you only have a few bad spots in your RAM and it is
no longer under warranty you can sometimes get away with mapping out the
bad spots (see http://rick.vanrein.org/linux/badram/).

Another good test is to repeatedly compile the linux kernel.  Start the
following and leave it running for a while:

#!/bin/sh
cd /usr/src/linux
date > make.out
dmesg | egrep ^Memory >>make.out
count=1000
while [ $count != 0 ]
do
        let "count = $count - 1"
        make clean >>make.out 2>&1 && make bzImage >>make.out 2>&1
done
date >>make.out

You might want to adjust the count depending you your machine speed and
level of patience.  If you have bad hardware (memory, motherboard, disk)
one or more of the compiles will probably fail.  Grep for "rror" in
make.out to determine if you got any errors.

I have seen cases where memtest86 passes, but the kernel compile test
fails.  I did not have a chance to isolate the problem to determine if
it was memory, motherboard, disk, or some combination.

Bill.
dieter at dcs.uky.edu




More information about the Beowulf mailing list