[Beowulf] recommendation on crash cart for a cluster room: full cluster KVM is not an option I suppose?

Joe Landman landman at scalableinformatics.com
Wed Sep 30 07:09:23 PDT 2009


Rahul Nabar wrote:
> On Wed, Sep 30, 2009 at 8:19 AM, Hearns, John <john.hearns at mclaren.com> wrote:
> 
>> It depends. Supermicro use the shared-socket approach (actually it is a bridge
>> somewhere on the motherboard), or with Supermicro you can have a separate
>> socket using a little cable with a minu-USB connector onto the IPMI card.
>> Other manufacturers use (a) or (b).
>> On a blade setup the IPMI is carried over the backplane Ethernet links.
>>
>>
>> If you have a separate IPMI network (ILOM, DRAC, whatever they call it) you
>> do not need the same type of switches. What you need is some cheap 10/100 switches,
>> one in each rack. Say Netgear or D-Link. Not a central switch with a huge backbone capacity.
>> Then you just connect the switches together in a loop.
>>
> 
> 
> I like the shared socket approach. Building a separate IPMI network
> seems a lot of extra wiring to me. Admittedly the IPMI switches can be

Allow me to point out the contrary view.

After years of configuring and helping run/manage both, we recommend 
strongly *against* the shared physical connector approach.  The extra 
cost/hassle of the extra cheap switch and wires is well worth the money.

Why do we take this view?  Many reasons, but some of the bigger ones are

a) when the OS takes the port down, your IPMI no longer responds to arp 
requests.  Which means ping, and any other service (IPMI) will fail 
without a continuous updating of the arp tables, or a forced hardwire of 
those ips to those mac addresses.

b) IPMI stack bugs (what ... you haven't seen any?  you must not be 
using IPMI ...).  My favorite in recent memory (over the last year) was 
one where IPMI did some a DHCP and got itself wedged into a strange 
state.  To unwedge it, we had to disconnect the IPMI network port, issue 
an mc reset cold, wait, and the plug it back in.  Hard to do when the 
eth0 and IPMI share the same port.

Of course I could also talk about the SOL (serial over lan) which didn't 
(grrrrrrrrrr)

Short version, we advise everyone, including some on this list, to 
always use a second independent IPMI network.  We make sure that anyone 
insisting upon one really truly understands what they are in for.

I want to emphasize this.  It is, in my opinion, one of the many false 
savings you can make in cluster design, to pull out the extra switch and 
wires for IPMI.  Its false savings, in that you will likely eat up the 
cost/effort difference between the two variants in terms of excess 
labor, self-hair removal, ...

Really ... its not worth the pain.  Go with two nets.

FWIW: most of the server class Supermicro boards (the Nehalems) now come 
with IPMI and kvm over IP built in, on a separate NIC.  Some do share 
the NIC, we simply avoid using those boards in most cases.

Note also: for real lights out capability, we configure alternative 
management paths.  Again, it saves you time/effort/resources down the 
road for a modest/minimal investment up front.  Switched PDUs and a 
serial port concentrator (or our management node with lots of serial 
ports ...).  It makes life *sooo* much better when "b" strikes, and you 
need to de-wedgify a node or three, and you are too far to drive in.

There is lots to be said for real lights out capability.  Park one crash 
cart in a corner, and hope you will never have to use it.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615



More information about the Beowulf mailing list