[Beowulf] PowerEdge SC 1435: Unexplained Crashes.
hearnsj at googlemail.com
Fri Oct 10 01:17:27 PDT 2008
2008/10/10 John Hearns <hearnsj at googlemail.com>
> (1) Tell your Dell salesman that you have asked for help on this problem
> on a public mailing list for High Performance Computing. Tell him/her that
> you need high level Dell support on this.
ps. By high level support I mean that you are put in direct contact with one
of the engineers responsible for the design of these systems in the factory.
You do not want to talk to the normal support chain, and be asked if you
have run some diagnostics program downloaded from the Dell site, etc.
As a general observation, not particularly aimed at Dell, I have seen this
type of behaviour before.
Beowulf clusters are built from COTS systems - servers which are disigned
for a workload of webserving, or running farms of virtual servers. I really
think that the manufacturers would be surprised at the typical HPC workload
- running all cores flat out at 100%, having applications which can
comfortably use a couple of hundred gigs of RAM.
My feeling is that the PSUs are specced to cope with these loads, but any
out of spec ones will be on the edge. RAM timings too - if you are stressing
all the RAM in a system you'll show up any weaknesses.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf