Diagnostic tools

Donald Becker becker at scyld.com
Mon Oct 21 08:42:53 PDT 2002


On Mon, 21 Oct 2002 alvin at Maggie.Linux-Consulting.com wrote:

> On Mon, 21 Oct 2002, Manel Soria wrote:
> 
> > We are looking for a diagnostic tool that (ideally) would
> > allow us to determine what component/s of a node fail. It should
> > test the processor, RAM, disk and network cards under heavy load
> > but in repeatable conditions.
> 
> testing those items individually is a lot of work ...
> 
> test process/proceedure is more important  than the actual test ??
> 
> - many different cpu/disk/memory/nic tests
> 	http://www.Linux-1U.net/Diags/

The only Linux hardware tests you list are a CPU test (cpuburn) and many
entries for memtest86.  You missed several Linux "SMART"-based disk
diagnostics tools and the NIC diagnostics at
  http://www.scyld.com/diag/index.html

> > -Monitor the CPU temperature.
> 
> use i2c-2.6.5 and lm_sensors to read the health monitors on the
> mbotherboard
> 
> also get a regular digital thermometer from your local hw store
> for sanity checking

Good advice, since lm_sensors can only guess what type of thermal sensor
is on the motherboard.  When the guessed calibration is off, it is
usually way off, but you cannot count on that.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993




More information about the Beowulf mailing list