[Beowulf] PowerEdge SC 1435: Unexplained Crashes.
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Rahul Nabar rpnabar at gmail.comFri Oct 17 08:37:17 PDT 2008
- Previous message: [Beowulf] PowerEdge SC 1435: Unexplained Crashes.
- Next message: [Beowulf] PowerEdge SC 1435: Unexplained Crashes.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Oct 17, 2008 at 10:22 AM, Nifty niftyompi Mitch <niftyompi at niftyegg.com> wrote: > Check the baseboard management controller log (Ctrl+E). > > Tell us what software distribution you are running and any changes that might have > been made (no matter how small). What is the default run level (is X11 active/ not active). > Are power saving options enabled in the BIOS? Distro: Centos 5.2. Linux node03 2.6.18-92.el5 #1 SMP Tue Jun 10 18:51:06 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux No changes made to standard kernel. X11 not active. Power saving not enabled. > Also what hardware monitor software are you running. I have seen system admins add > their own package to systems only to find that RHEL has an equivalent package > that uses different device drivers for the same hardware with impossible to diagnose > results. Custom kernel? I am not sure what you mean by "hardware monitor software". I do not recall installing anything special. > Disable cpuspeed, hardware monitor and hardware control software to see if stability changes. There are a bunch of Dell utilities that come up at boot-time. BMC, RAID, Bradcom-PXE, Remote manage controllers. You want me to disable those? Stability has already changed. After I swapped motherboard+cpu. No more dead nodes in over 2 weeks now (yay!) But I just want to make sure this won't be a recurring problem with these SC1435's before we go in for our next expansion. > What additional hardware is in the chassis? None other than what came with the original Dell units. These are only 2 months old now. They do have dual NICs and no CDROMs. Have disks. Linked to a Dell KVM via a SIP module. No min-n-matching of Hardware. Was a monolithic Dell order. > The "poweredge indicator turning orange" tells me that the problem was detected by the > system and there should be a hint in the log. The orange state is sticky and > needs to be cleared.... Funny. It wasn't sticky for me. When I rebooted the orange light cleared. I did not need to reset it via the BIOS. Unfortunately the SC series does not have the tiny LCD for an error display. -- Rahul
- Previous message: [Beowulf] PowerEdge SC 1435: Unexplained Crashes.
- Next message: [Beowulf] PowerEdge SC 1435: Unexplained Crashes.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
