Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Re:hardware question: building a cluster node/ student

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Lombard, David N dnlombar at ichips.intel.com
Fri Jul 27 07:44:17 PDT 2007


On Thu, Jul 26, 2007 at 08:48:35AM -0700, David Mathog wrote:
> "Nathan Moore" <ntmoore at gmail.com> wrote
> 
> > Earlier this summer, the case fan on one of the machines failed, and the
> > result seems like a cooked motherboard (erratic errors with the integrated
> > NIC).
> 
> There should be an automatic shutdown script running to detect
> temperature events and shut down the machine before it is damaged. 
> This is what I use on some machines:
> 
> ftp://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/sensor_monitor.tar.gz

Depending on the board and kernel, ACPI will also provide these services.  On
an FC4 (2.6.14) system, I had to do the following to get that to work:

	echo 90           > /proc/acpi/thermal_zone/THRM/polling_frequency
	echo 80:0:70:65:0 > /proc/acpi/thermal_zone/THRM/trip_points

The first echo caused the auto shutdown to work; the second set the values I
wanted, i.e., shutdown at 80C.  Some ACPI cognescenti said the fact that I
had to "manually enable" the polling/shutdown was an error in that version
of the kernel.

I discovered all this when I came home to that sickening overly-hot electronics
smell, a case *very* hot to the touch, and the CPU at 104C due to a dead CPU
fan.  Happily, it took a licking and kept on ticking.

-- 
David N. Lombard, Intel, Irvine, CA
I do not speak for Intel Corporation; all comments are strictly my own.



More information about the Beowulf mailing list