(Stress-)Testing of nodes in a beowulf cluster

Steven Timm timm at fnal.gov
Mon Mar 12 06:05:40 PST 2001


At Fermilab in our PC Farms our cluster is not a true Beowulf,
but we do an extensive stress test of 30 days.  Our test
consists of continuously running seti at home for 30 days on
both cpu's, then every hour on the hour using "bonnie" to
write a 1 GB test file to each disk and "nettest" to simultaneously
push 400 MB over the net.  "Streams" could be added to this as well.

In addition, there are starting to be utilities out there that
can read the event logs in the BIOS, which track if you have
any memory faults or any power supply faults.  In our experience,
power supplies are the most likely thing to go bad in the
first 30 days, and sometimes you get a bad batch of memory too.
The stress test above makes the machine draw almost the highest current
it will draw, and if the power supply is going to die, it will do so
quickly.


------------------------------------------------------------------
Steven C. Timm (630) 840-8525  timm at fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division/Operating Systems Support
Scientific Computing Support Group--Computing Farms Operations

On Mon, 12 Mar 2001 Kian_Chang_Low at vdgc.com.sg wrote:

> Hi,
>
> I have been playing with beowulf cluster for quite a while and have put
> together a small cluster as a test to show that it can be done.
>
> Now I am faced with a question about the reliability of the nodes (slave
> or/and master). Is there any tests (or stress-tests) that we can run to
> check the reliability of the following,
> 1) CPU
> 2) memory
> 3) network interface card
> 4) disk
> 5) motherboard
> 6) any other?!
>
> I heard of using memtest to test the memory. But what about tests for the
> other components?
>
> I thought it will be great if there is a suite of tests that the node has
> to undergo before being added to the cluster. Rather than trying to
> determine the cause of failure after putting the cluster together, we at
> least know that a node is downright faulty from the beginning.
>
> Thanks,
> Kian Chang.
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>





More information about the Beowulf mailing list