Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] hang-up of HPC Challenge

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Peter St. John peter.st.john at gmail.com
Tue Aug 19 15:12:36 PDT 2008


I surely don't know the problem, but can anyone tell me (or point me to...)
how "unlimited" stacksize works?
Peter

On 8/18/08, Mikhail Kuzminsky <kus at free.net> wrote:
>
> I ran a set of HPC Challenge benchmarks on ONE dual socket quad-core
> Opteron2350 (Rev. B3) based server (8 logical CPUs).
> RAM size is 16 Gbytes. The tests performed were under SuSE 10.3/x86-64, for
> LAM MPI 7.1.4 and MPICH 1.2.7 from SuSE distribution, using Atlas 3.9.
> Unfortunately there is only one such cluster node, and I can't reproduce the
> run on another node :-(
>
> For N (matrix size) up to 10000 all looks OK. But for more large N
> (15000/20000/...) hpcc execution (mpirun -np 8 hpcc) leads to Linux hang-up.
>
> In the "top" output I see 8 hpcc examplars each eating about 100% of CPU,
> and reasonable amounts of virtual and RSS memory per hpcc process, and the
> absense of swap using. Usually there is no PTRANS results in hpccoutf.txt
> results file, but in a few cases (when I "activelly looked" to hpcc
> execution by means of ps/top issuing) I see reasonable PTRANS results but
> absense of HPLinpack results. One time I obtained PTRANS, HPL and DGEMM
> results for N=20000, but hangup later - on STREAM tests. May be it's simple
> because of absense (at hangup) of final writing of output buffer to output
> file on HDD.
>
> One of possible reasons of hang-ups is memory hardware problem, but what is
> about possible software reasons of hangups ?
> The hpcc executable is 64-bit dynamically linked. /etc/security/limits.conf
> is empty. stacksize limit (for user issuing mpirun) is "unlimited", main
> memory limit - about 14 GB, virtual memory limit - about 30 GB. Atlas was
> compiled for 32-bit integers, but it's enough for such N values. Even
> /proc/sys/kernel/shmmax is 2^63-1.
>
> What else may be the reason of hangup ?
>
> Mikhail Kuzminskiy
> Computer Assistance to Chemical Research Center
> Zelinsky Institute of Organic Chemistry
> Moscow
>
>
>  _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20080819/dc39a7e2/attachment.html


More information about the Beowulf mailing list