thermal kill switch

Steve Cousins cousins at limpet.umeoce.maine.edu
Fri Oct 25 13:51:00 PDT 2002



> From: "Timothy H. Keitt" <tkeitt at mail.utexas.edu>
> Date: 24 Oct 2002 13:34:23 -0700
> 
> Anyone have experience with the APC UPS + sensor card approach? I was
> thinking of going that route.

Yes, I was going to post this solution but found that someone already did.
It works very well for our cluster.  It has been tested many times due to
the problems that we've had with an air conditioner that was mandated to
be installed by people who didn't know anything about our needs...  ( long
OT rant omitted ).

Our head node runs the APC Powerchute software and it is connected to the
UPS via an APC proprietary serial cable. With the Environtmental Module in
the UPS, Powerchute gives you options specific to Temperature and
Humidity.  It also gives you a number of pairs of leads that you can
connect to other types of sensors.  You could put a switch on the door
that would trigger an event to be logged or emailed to you so you could
see if someone has entered the room for instance.

So, in Powerchute I have it set up so if the temp goes above 80 degrees F.
it does three things:

	1. it emails me
	2. it runs a script that tells all of the slave nodes to shut down
	3. it shuts down the head node.

Since the AC is set to keep the room at 68, if it gets to 80 there is a
problem.  You could set it up to email and then wait for however many
seconds to give it a chance for the temperature to come down.  While this
works well for the power, the chance that the temperature is going to come
back down is next to zero.

Since it lets you run a script if an event is triggered, you can do just
about anything you want.  You could get it to page you for instance. It
has saved a lot of aggravation and it is relatively inexpensive,
especially since only one is needed for the whole cluster.

I highly recommend it.

Steve

> 
> T.
> 
> On Wed, 2002-10-23 at 02:03, Robert G. Brown wrote:
> > On Tue, 22 Oct 2002, Andre Lehovich wrote:
> > 
> > > We had the air-conditioning fail yesterday.  Caught it in
> > > time to shut down by hand, but we won't be so lucky next
> > > time.  RGB's book recommends a thermal kill switch, but
> > > doesn't give details on implementation.  One obvious idea is
> > > to have a daemon monitor lm-sensors and shutdown each node
> > > as it gets too hot.  This is easy and cheap.
> > > 
> > > But, is there anything better?  We have not yet had the
> > > electric and cooling contractors refit our server room.  Is
> > > there anything we should have them install during the
> > > rewiring?  What are the pros/cons of a room-wide kill switch
> > > vs. the lm-sensors approach?
> > 
> > We have a room-wide kill switch set to be a "last resort".  They are
> > remarkably difficult to find in e.g. a web search, but our architect and
> > electrical contractors came up with one, so they must be in electrical
> > component catalogs somewhere if you know where to look.
> > 
> > A second option is to get an electronically readable thermometer (with
> > one or more sensors) for the ambient room air.  netbotz (netbotz.com)
> > sell moderately expensive (order $1K) monitoring devices that sample
> > room air temperature, humidity, switch state (so you can get an alarm or
> > take pictures when a door is opened or a motion detector detects motion)
> > and have a built in camera and both a web and SNMP interface for remote
> > monitoring.  It generates "alarm" mail if e.g. temperature or sound
> > levels exceed a given threshold.  It is a straightforward matter to hook
> > a script into one that either polls the device and sends nodes a
> > poweroff command on an alarm or responds to alarm mail ditto.
> > 
> > If you are a DIY sort of person and don't want to pay for a netbot, you
> > can build the functional equivalent of a netbot out of component parts
> > and scripts.  A PC-TV card (bttv driver) and an X10 camera will let you
> > watch real-time video of your cluster room in an xawtv window or serve
> > you images updated every second or five on a web page -- I have the
> > scripts and html for the latter already set up, as I have one at home.
> > To do temperature, you can invest in an ibutton thermochron:
> > 
> > http://www.ibutton.com/ibuttons/thermochron.html
> > 
> > or (perhaps more reasonably) in a sensorsoft thermometer, readable from
> > an RS232 interface for around $100.  Or build your own serial port
> > readable thermometer for around $35 if you are a real DIY fanatic and
> > have a 5V power supply handy.  Again, scripts to read and act are
> > necessary, some are already posted on the web.  I imagine that one could
> > set up sound alarms with an ordinary microphone and sound card although
> > I've never tried it.  In our server room we'd be checking to make sure
> > that the sound level stays HIGH, as the AC is in the room so ambient
> > noise is like working right behind a jet engine during takeoff.  We'd
> > want an alarm to be triggered if that lovely sound ever went OFF.
> > 
> > lmsensors is the final option, but it has some flaws.  For one thing, it
> > monitors temperatures inside individual systems, not ambient room
> > temperatures.  Not all systems/chips are well supported.  The lmsensors
> > kernel module was designed by individuals who have never heard of the
> > term "API" (as in, you'll need custom code to glean results for EACH
> > CHIP AND CONFIGURATION as they don't digest raw output at all -- you
> > might as well plan to become expert in the particular chip(s) your
> > systems have to monitor them).  Some silly motherboards (the pile of
> > Tyan dual AMD's we own coming to mind) have insane BIOSn that require(d)
> > one to hand-enable onboard sensors at the beginning of EACH BOOT in
> > order to have them functioning and accessible to lmsensors.
> > 
> > In summary, lmsensors is great if it works for you, primarily to
> > protect individual systems but not so great for protecting the entire
> > room.
> > 
> > This gives you a pretty wide range of ways to protect and monitor your
> > cluster/server room, at a wide range of prices -- "free" (if it works)
> > for lmsensors, a few $100 for DIY or over-the-counter thermal sensors
> > and video, order of $1000 to get serious integrated monitors that are
> > almost plug-n-play with a minimal amount of your time and effort
> > (netbotz are network appliances so they literally plug in, snap onto
> > your network, get IP from DHCP and can be configured and monitored from
> > a serial interface or over the network -- a bit windows-centric in
> > supplied configuration tools as usual, but one CAN get by with minicom).
> > 
> > HTH
> > 
> >    rgb
> > 
> > -- 
> > Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> > Duke University Dept. of Physics, Box 90305
> > Durham, N.C. 27708-0305
> > Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> > 
> > 
> > 
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> -- 
> Timothy H. Keitt
> The University of Texas at Austin
> Section of Integrative Biology
> 1 University Station C0930
> Austin, Texas 78712-0253 USA


_____________________________________________________________
 Steve Cousins                 Email: cousins at umit.maine.edu
 Research Associate            Phone: (207) 581-4302
 Ocean Modeling Group
 School of Marine Sciences     208 Libby Hall
 University of Maine           Orono, Maine 04469






More information about the Beowulf mailing list