[Beowulf] torque: 4GB resources_used.mem limit
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Bernd Schubert bernd-schubert at gmx.deWed Jun 29 03:01:08 PDT 2005
- Previous message: [Beowulf] 1. Re: Jury rigged ethernet? (Robert G. Brown, et al)
- Next message: [Beowulf] WRF model on linux cluster: Mpi problem
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello, I already posted this to torqueusers at supercluster.org, but I think this list has rather little traffic and I guess there are more people subscribed to this list, who also already might have had this problem. We have a cluster running a combination of torque + maui (just for those who might not know, torque is the recent version of openpbs). In principle its running fine, we only have one pretty annoying problem, torque does not detect jobs requiring more than 4GB, qstat always only shows 'actual_size - 4GB' for jobs with more than 4GB. If it only would be a problem of qstat, we wouldn't care. Unfortunately it also prevents torque to kill improperly specified jobs. So it can happen and already happend several times, that one job required all memory on a node, but torque happily started another job on this node, just because at least one user didn't properly specify how much memory his/her jobs required and since torque didn't kill those jobs automatically. Of course, this results in heavy swap usage and slowes down both jobs dramatically. We hoped this issue would be solved after the installation of the 64-bit (its a 32/64bit biarch debian system) version of torque, but this didn't help. Anyone here having an idea whats going on, how to debug or even how to solve this? I'm pretty unfamiliar with torque+maui (we don't maintain the basic stuff ourselves) and also havn't looked into the source code. From thinking in the C language, I can only imagine that someone has directly specified a 32bit integer for the memory variable, but who would do this? The torque version is 1.2.0p3 and maui is 3.2.6p11-2. Thanks in a advance, Bernd -- Bernd Schubert PCI / Theoretische Chemie Universität Heidelberg INF 229 69120 Heidelberg e-mail: bernd.schubert at pci.uni-heidelberg.de
- Previous message: [Beowulf] 1. Re: Jury rigged ethernet? (Robert G. Brown, et al)
- Next message: [Beowulf] WRF model on linux cluster: Mpi problem
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
