[Beowulf] Re: PowerEdge SC 1435: Unexplained Crashes.

Rob Lines rlinesseagate at gmail.com
Fri Oct 10 08:19:14 PDT 2008


On Fri, Oct 10, 2008 at 10:54 AM, Rahul Nabar <rpnabar at gmail.com> wrote:
>>Have you checked in the baseboard management log to see if it is
>>throwing an error.
>
> Apparently the SC1435 does not have OpenManage. "Simple Computing" is
> too simple to warrant that, I was told. They do have dset to look at
> the  ESM logs but not for CentOS nor Fedora. Redhat is their
> "validated" [sic] OS. That's the only one they support. So I'm sort of
> stuck there.

We have a cluster of SC1435 without the DRAC card but they still have
the baseboard management.  It should be accessible right at the end of
the POST by hitting ctrl+e and normally it shows a little splash
screen with the ip it has been configured with and what keys to press.
 It gives some great information especially if you are running into
ram issues.  I don't remember if we had to run on the logging or if it
was on from the factory.

>
> I'll try ipmi. I was trying lm_sensors but apparantly it does not have
> a driver for this chipset / motherboard combination. Not sure if its
> an AMD Opteron specific driver issue or a
> vendor-not-relesing-motherboard-specs issue (heard both versions on
> the net). Anybody else had success using lm_sensors on the SC1435?

We never tried to deal with lm_sensors on these machines because ipmi
had more options.  We are using CentOS 5 on our cluster machines.

Here are a few links we used when working on ipmi.
OpenIPMI
http://www.barryodonovan.com/index.php/2007/04/11/dell-ipmi/http://66.102.9.104/search?q=cache:IoPmIimMExQJ:www.barryodonovan.com/index.php/2007/04/11/dell-ipmi/+linux+ipmi+temperature+sensor+read&hl=en&gl=us&strip=1
http://www.hollenback.net/index.php/LinuxServerManagementIpmi
http://openipmi.sourceforge.net/
http://ipmitool.sourceforge.net/
http://linux.dell.com/files/presentations/Red_Hat_Summit_May_2006/ipmi_presentation-redhat_summit.pdf
http://buttersideup.com/docs/howto/IPMI_on_Debian.html

>We used to run Fedora. Now run CentOS. Same issues. They only support RedHat. I have a hard time being 100% certain but the more I see it the more I am convinced it is the hardware.

The real key is to go back to the sales person that you purchased them
from and tell them the problems you are having and have them help
escalate the situation.  I am not sure what diagnostics you are
running but they have a bootable cd for doing a memory test at the
link below.  Even when they have tried to point at an OS problem if I
can get an error on that cd they really have nothing that they can
hide behind.

http://support.dell.com/support/downloads/format.aspx?c=us&cs=19&l=en&s=dhs&deviceid=196&libid=13&releaseid=R169189&vercnt=5&formatcnt=0&SystemID=PWE_XEO_1435SC&servicetag=85PG2F1&os=LIN4&osl=en&catid=-1&impid=-1

Best of luck,
Rob



More information about the Beowulf mailing list