[Beowulf] recommendation on crash cart for a cluster room: full cluster KVM is not an option I suppose?

Joe Landman landman at scalableinformatics.com
Sat Oct 3 09:54:57 PDT 2009


Rahul Nabar wrote:
> On Fri, Oct 2, 2009 at 10:13 PM, Skylar Thompson <skylar at cs.earlham.edu> wrote:
>> Rahul Nabar wrote:
>>> On Wed, Sep 30, 2009 at 9:09 AM, Joe Landman
>>> <landman at scalableinformatics.com> wrote:
> 
> 
>> In addition to the console, the other really useful feature of IPMI is
>> remote power cycling. That's useful when the console itself is totally
>> wedged.
>>
> 
> True. That's a useful feature. But that "could" be done by sending
> "magic packets" to a eth card as well, right? I say "can" because I
> don't have that running on all my servers but had toyed with that on
> some. I guess, just many ways of doing the same thing.

Hmmm...

If I were building a cluster of anything more than 4 machines (not 
racks, machines), I would be insisting upon IPMI 2.0 with a working SOL 
and kvm over IP capability built in.

For the 250-300 machine system you are looking at, you *want* IPMI 2.0 
with KVM over IP.  You *want* switched remotely accessible PDUs, for 
those times when IPMI itself gets wedged (rarer these days, but it does 
still happen).  IMO you *want* this IPMI on a separate network. You 
*want* a serial concentrator type system to provide a redundant path in 
the event of an IPMI failure.  Problems don't go away just because IPMI 
stopped working.  You *need* an inexpensive crash cart that just works, 
and plugs into your PDUs.

Understand that administration time could scale linearly with the number 
of nodes if you are not careful, so you want to (carefully) use tools 
which significantly help reduce administrative load.  IPMI 2.0 is one 
such tool.

Sending "magic" bytes to an eth won't work if the OS/machine is wedged. 
  You are (likely) thinking of power-on when traffic shows up on LAN. 
This is a very different beast.

If you could simply toggle power state of a server by sending "magic 
bytes to the eth port, lots of people would be very unhappy from the 
never ending denial of service attack this opens up.

Take it as a given that you want functional IPMI 2.0 with operational 
SOL, and you really do want remote kvm over IP built in.  The latter is 
my opinion, but it is again based on experience over the last decade+ in 
building/supporting these things.


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615



More information about the Beowulf mailing list