Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Best Practices SOL vs Cyclades ACS

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Marian Marinov mm at yuhu.biz
Fri Oct 9 22:33:23 PDT 2009


On Saturday 10 October 2009 08:09:45 Mark Hahn wrote:
> > We have more then 400 machines. Every month there is one machine that we
> > can not reboot using IPMI or the SOL is not working.
>
> we have something like 2500 nodes, mostly HP dl145g2's, and have a
> BMC-wedge probably 6-12 times/year.  can I ask what brand/model has such
> flakey IPMI? if you run "ipmi mc reset" on the node, does it resolve the
> problem? I wonder whether flakiness might also correspond to some config or
> usage pattern.  (ours dhcp from a local server - actually all the traffic
> is local.)

These are only Dell machines used for shared hosting. 

Usually these problem appear when there is DoS/DDoS or very high system 
resource usage(for example load over 100 on machine with 4 cores).

Our problem is that in such situations IPMI sometimes is unreliable as you can 
not connect on serial nor reboot the machine.

-- 
Best regards,
Marian Marinov



More information about the Beowulf mailing list