[Beowulf] hang-up of HPC Challenge
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mikhail Kuzminsky kus at free.netTue Aug 19 16:45:43 PDT 2008
- Previous message: [Beowulf] hang-up of HPC Challenge
- Next message: [Beowulf] hang-up of HPC Challenge
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
For some localization of possible problem reason, I ran pure HPL test instead of HPCC. HPL performs direct output to screen instead of writing to the file. Using MPICH w/np=8 I obtained normal HPL result for N=35000 - including 3 "PASSED" strings for ||Ax-b|| calculations. BUT ! Linux hang-ups immediately after output of this strings. Mikhail In message from "Mikhail Kuzminsky" <kus at free.net> (Mon, 18 Aug 2008 22:20:16 +0400): >I ran a set of HPC Challenge benchmarks on ONE dual socket quad-core >Opteron2350 (Rev. B3) based server (8 logical CPUs). >RAM size is 16 Gbytes. The tests performed were under SuSE >10.3/x86-64, for LAM MPI 7.1.4 and MPICH 1.2.7 from SuSE >distribution, using Atlas 3.9. Unfortunately there is only one such >cluster node, and I can't reproduce the run on another node :-( > >For N (matrix size) up to 10000 all looks OK. But for more large N >(15000/20000/...) hpcc execution (mpirun -np 8 hpcc) leads to Linux >hang-up. > >In the "top" output I see 8 hpcc examplars each eating about 100% of >CPU, and reasonable amounts of virtual and RSS memory per hpcc >process, and the absense of swap using. Usually there is no PTRANS >results in hpccoutf.txt results file, but in a few cases (when I >"activelly looked" to hpcc execution by means of ps/top issuing) I >see reasonable PTRANS results but absense of HPLinpack results. One >time I obtained PTRANS, HPL and DGEMM results for N=20000, but hangup >later - on STREAM tests. May be it's simple because of absense (at >hangup) of final writing of output buffer to output file on HDD. > >One of possible reasons of hang-ups is memory hardware problem, but >what is about possible software reasons of hangups ? >The hpcc executable is 64-bit dynamically linked. >/etc/security/limits.conf is empty. stacksize limit (for user issuing >mpirun) is "unlimited", main memory limit - about 14 GB, virtual >memory limit - about 30 GB. Atlas was compiled for 32-bit integers, >but it's enough for such N values. Even /proc/sys/kernel/shmmax is >2^63-1. > >What else may be the reason of hangup ? > >Mikhail Kuzminskiy >Computer Assistance to Chemical Research Center >Zelinsky Institute of Organic Chemistry >Moscow > > > > > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf
- Previous message: [Beowulf] hang-up of HPC Challenge
- Next message: [Beowulf] hang-up of HPC Challenge
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
