Diagnostic tools

alvin at Maggie.Linux-Consulting.com alvin at Maggie.Linux-Consulting.com
Mon Oct 21 04:38:46 PDT 2002


hi ya manel

On Mon, 21 Oct 2002, Manel Soria wrote:

> Hi,
> 
> We are looking for a diagnostic tool that (ideally) would
> allow us to determine what component/s of a node fail. It should
> test the processor, RAM, disk and network cards under heavy load
> but in repeatable conditions.

testing those items individually is a lot of work ...

test process/proceedure is more important  than the actual test ??

- many different cpu/disk/memory/nic tests
	http://www.Linux-1U.net/Diags/
	( not quite finished yet...

- many ways to tweek the system to maximize its performance
	http://www.Linux-1U.net/Tuning/
	( way-incomplete but .. maybe its useful to ya ??

> Other desirable features would be:
> -Run from a floppy, without OS in the disk, in order to allow
>  good quality control of the new nodes.

running from floppy is a wee bit tricky to squeeze your kernel
into 1.44MB ( 1.77MB ) that can boot it and get iton the network
	- newer mb might be simpler/easier for network booting too
	( diskless etwork booting is easier.. than a floppy boot

- use a 4MB compact flash ... and the problem is trivial
  to be a diskless node for booting

> -Monitor the CPU temperature.

use i2c-2.6.5 and lm_sensors to read the health monitors on the
mbotherboard

also get a regular digital thermometer from your local hw store
for sanity checking

have fun
alvin

> 
> We would appreciate suggestions and comments about this topic.
> 
> Thanks for your help.




More information about the Beowulf mailing list