[Beowulf] Anyone having IPMI problems on Intel S3200 series
henning.fehrmann at aei.mpg.de
Mon Apr 20 00:24:09 PDT 2009
On Wed, Apr 15, 2009 at 04:51:57PM -0400, Perry E. Metzger wrote:
> In our brand new cluster, we're using Intel S3210SH motherboards.
> The boards are going to be managed by a pure hands off system I've
> built. IPMI is used for tasks like monitoring and telling the boards to
> PXE boot so they can be re-installed by a purely automated system when
> software upgrades happen.
> Unfortunately, every once in a while, the IPMI BMCs on my test systems
> simply stop talking to the network. This isn't overly tragic since I can
> have a process go over to such a board when it detects that pings have
> stopped working and use a local IPMI command to cold rest the BMC, but
> it is still really Not The Right Thing. Also, I suspect every once in a
> great while I'll get a simultaneous OS and IPMI BMC failure and shoe
> leather will be needed to reset the box, which I don't like.
We also had this problem with Supermicro boards and IPMI cards in a large
scale. Finally we found a solution by upgrading the firmware of the NICs which are
actually from Intel.
You might want to ask the vendor or you trader to get a beta version
which is more recent than the public available one for both - the IPMI
cards and the NICs.
Unfortunately, this problem occurs occasionally which makes testing
difficult. We took a supset of nodes, played the new NIC firmware onto
it and waited a long time.
More information about the Beowulf