[Beowulf] Re: Beowulf Digest, Vol 68, Issue 44
Greg at keller.net
Mon Oct 26 09:57:11 PDT 2009
On Oct 26, 2009, at 10:55 AM, beowulf-request at beowulf.org wrote:
> Message: 6
> Date: Mon, 26 Oct 2009 10:50:26 -0500
> From: Rahul Nabar <rpnabar at gmail.com>
> Subject: Re: [Beowulf] any creative ways to crash Linux?: does a
> shared NIC IMPI always remain responsive?
> To: Bogdan Costescu <bcostescu at gmail.com>
> Cc: Beowulf Mailing List <beowulf at beowulf.org>
> <c4d69730910260850w5daf7de0ue26340adf8589da1 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> On Mon, Oct 26, 2009 at 8:11 AM, Bogdan Costescu
> <bcostescu at gmail.com> wrote:
>> On Sat, Oct 24, 2009 at 11:13 PM, Rahul Nabar <rpnabar at gmail.com>
>>> What surprised me was that even if I take down my eth interface
>>> with a
>>> ifdown the IPMI still works. How does it do that ?
>> The IPMI traffic is IP (UDP) based and by inspecting the IP header
>> can make a difference between packets with the same MAC and different
> Actually, the MAC is different too. I have one NIC but it responds to
> two MACs. I guess one is transparent to the OS and the other is
> handled by the BMC.
Correct. In some blades they used to share the mac, but I don't think
anyone does that anymore. The BMC MAC/IP is hidden and functional
regardless of the OS state. IPMI drivers can talk to the chip through
the OS if need be by starting the appropriate service or kernel
modules, but that's usually only fun for configuring the card, since
you'll use the Network interface in most situations.
>> taken down, it's the Linux networking stack that doesn't see any
>> packet coming in, however the BMC's network stack will still be
>> active. That's the whole point of the BMC being a separate entity
>> the main system, so that its functionality remains undisturbed when
>> something bad happens to the main system.
> I see. So I assume the BMC's network stack is something that's
> hardware or firmware implemented. It's funny that in spite of this the
> IPMI gets hung sometimes (like Gerry says in his reply). I guess I can
> just attribute that to bad firmware coding in the BMC.
"A Rich feature set" includes these issues :)
>>> Another mysterious observation was this: Whenever I took eth down
>>> via the OS there is a latent period when the IPMI stops
>>> responding but then somehow it magically resurrects itself and
>>> starts working again.
>> Without claiming that this is the best explanation: it's possible
>> the Linux driver talks to the hardware and takes down the link at the
>> physical level. The BMC driver then detects this and brings the link
>> back up so that it can continue to receive the IPMI packets.
> You are probably right. THe explanation sounds reasonable to me. A
> similar observation is for accessing the BIOS as well. The BMC stack
> is not responsive right from the power-up. It does become responsive
> for a bit but then the system drags it down (maybe when the BIOS hands
> over to PXE). If I manage to "ipmitool sol activate" within this
> correct window then I am able to see the BIOS. But that's pretty much
> trial and error.
You will probably also notice the BMC only brings the link up at 100Mb
but the OS brings it up to 1Gb. Switches can add some lag here too,
if Spanning tree is enabled. Turning off Spanning Tree or turning on
"Port Fast" will help. Otherwise there is a period of up to about 40
seconds that the link is "up" but the switch hasn't started passing
traffic (as it checks to make sure there's no ethernet loop). This
has caused many Cluster Deployments hours of head banging.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf