Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Anyone having IPMI problems on Intel S3200 series motherboards?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Henning Fehrmann henning.fehrmann at aei.mpg.de
Mon Apr 20 00:24:09 PDT 2009


On Wed, Apr 15, 2009 at 04:51:57PM -0400, Perry E. Metzger wrote:
> 
> In our brand new cluster, we're using Intel S3210SH motherboards.
> 
> The boards are going to be managed by a pure hands off system I've
> built. IPMI is used for tasks like monitoring and telling the boards to
> PXE boot so they can be re-installed by a purely automated system when
> software upgrades happen.
> 
> Unfortunately, every once in a while, the IPMI BMCs on my test systems
> simply stop talking to the network. This isn't overly tragic since I can
> have a process go over to such a board when it detects that pings have
> stopped working and use a local IPMI command to cold rest the BMC, but
> it is still really Not The Right Thing. Also, I suspect every once in a
> great while I'll get a simultaneous OS and IPMI BMC failure and shoe
> leather will be needed to reset the box, which I don't like.

We also had this problem with Supermicro boards and IPMI cards in a large
scale. Finally we found a solution by upgrading the firmware of the NICs which are
actually from Intel.
You might want to ask the vendor or you trader to get a beta version
which is more recent than the public available one for both -  the IPMI
cards and the NICs.
Unfortunately, this problem occurs occasionally which makes testing
difficult. We took a supset of nodes, played the new NIC firmware onto
it and waited a long time.  

Good luck,
Henning



More information about the Beowulf mailing list