[Beowulf] OOM errors when running HPL
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Prentice Bisbal prentice at ias.eduMon Dec 22 05:52:44 PST 2008
- Previous message: [Beowulf] mpich2 1.0.8 package for Debian 5.0 Lenny
- Next message: [Beowulf] using Nagios to monitor compute nodes: NPRE vs check_by_ssh
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Alan Louis Scheinine wrote: > A year ago large memory jobs would cause AMD nodes to crash > on the cluster for which I was system administrator. > /var/log/messages showed out of memory errors before the crash. > I can't say that the problem has been solved, I refer to last > year because I changed jobs. > > In order to understand if the problem is a known bug (as in the > case cited above) please specify the main board, the amount of > memory, the number of cores and the version of the kernel. > > You wrote: >> I used to run hpl jobs much bigger than this on my cluster w/o a >> problem. > > How does the amount of memory on the new cluster compare to the cluster > in which you did not have a problem. In particular, the amount of > memory per core, assuming all cores were used in your testing. Alan, thanks for the reply. It's the same cluster - jobs that ran on it a few weeks ago, are no longer running. There has been no hardware changes, so I don't think it's a hardware problem. The only difference I can think if is that I'm now using SGE to launch these jobs, which I may not have been doing the last time I ran a job this big. The only other possible software changes are kernel package updates that may have occurred since the last successful run of a job this big. -- Prentice
- Previous message: [Beowulf] mpich2 1.0.8 package for Debian 5.0 Lenny
- Next message: [Beowulf] using Nagios to monitor compute nodes: NPRE vs check_by_ssh
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
