[Beowulf] Performance degrading
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Gus Correa gus at ldeo.columbia.eduTue Dec 15 11:36:51 PST 2009
- Previous message: [Beowulf] Performance degrading
- Next message: [Beowulf] Performance degrading
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Jorg If you have single quad core nodes as you said, then top shows that you are oversubscribing the cores. There are five nwchem processes are running. In my experience, oversubscription only works in relatively light MPI programs (say the example programs that come with OpenMPI or MPICH). Real world applications tend to be very inefficient, and can even hang on oversubscribed CPUs. What happens when you launch four or less processes on a node instead of five? My $0.02. Gus Correa --------------------------------------------------------------------- Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA --------------------------------------------------------------------- Jörg Saßmannshausen wrote: > Dear all, > > I am scratching my head but apart from getting splinters into my fingers I > cannot find a good answer for the following problem: > I am running a DFT program (NWChem) in parallel on our cluster (AMD Opterons, > single quad cores in the node, 12 GB of RAM, Gigabit network) and at certain > stages of the run top is presenting me with that: > > top - 15:10:48 up 13 days, 22:20, 1 user, load average: 0.26, 0.24, 0.19 > Tasks: 106 total, 1 running, 105 sleeping, 0 stopped, 0 zombie > Cpu0 : 8.0% us, 2.7% sy, 0.0% ni, 82.7% id, 0.0% wa, 1.3% hi, 5.3% si > Cpu1 : 4.1% us, 1.4% sy, 0.0% ni, 94.6% id, 0.0% wa, 0.0% hi, 0.0% si > Cpu2 : 2.7% us, 0.0% sy, 0.0% ni, 97.3% id, 0.0% wa, 0.0% hi, 0.0% si > Cpu3 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si > Mem: 12250540k total, 5581756k used, 6668784k free, 273396k buffers > Swap: 16779884k total, 0k used, 16779884k free, 3841688k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 16885 sassy 15 0 3928m 1.7g 1.4g S 4 14.4 312:19.92 nwchem > 16886 sassy 15 0 3928m 1.7g 1.4g S 4 14.5 313:08.77 nwchem > 16887 sassy 15 0 3920m 1.7g 1.4g S 3 14.4 316:18.24 nwchem > 16888 sassy 15 0 3923m 1.6g 1.3g S 3 13.3 316:13.55 nwchem > 16890 sassy 15 0 2943m 1.7g 1.7g S 3 14.8 104:32.33 nwchem > > It is not a few seconds it does it, it appears to be for a prolonged period of > time. I checked it randomly for say 1 min and the performance is well below > 50 % (most of the time around 20 %). I have not noticed that when I am > running the job within one node. > > I have the suspicion that the Gigabit network is the problem, but I really > would like to pinpoint that so I can get my boss to upgrade to a better > network for parallel computing (hence my previous question about Open-MX). > Now how, as I am not an admin of that cluster, would I be able to do that? > > Thanks for your comments. > > Best wishes from Glasgow! > > Jörg >
- Previous message: [Beowulf] Performance degrading
- Next message: [Beowulf] Performance degrading
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
