Jobs hanging after a period of time
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Jeff Layton jeffrey.b.layton at lmco.comWed Oct 30 06:29:42 PST 2002
- Previous message: Cluster System for Computer Graphics Rendering
- Next message: Beowulf & VMWare
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Good mornign, We've got a Linux cluster with 64 nodes of dual PIII/850's connected with Fast Ethernet. We've also installed OpenPBS 2.3pl2-1 throughout the cluster. We have one node (penguin46) that only has the exechost RPM installed. Jobs that include penguin46 will work for a day or two and then all of a sudden any job using penguin46 will just hang. The actual code starts running, but it will just hang after a period of time. The PBS logs show nothing except job start, job end (or job kill if we do a qdel). Also, I see nothing strange at all in /var/log/messages. I know this is a rather nebulous error description, but this is the best we can find out in over two weeks of checking. We have reinstalled and reconfigured PBS on penguin46 several times with no change in behavior. We have checked the NIC by flooding it with ping packets (I know it's not an extensive test, but it's a start :) and we starting to look at the switch port as well. Thanks! Jeff -- Jeff Layton Senior Engineer Lockheed-Martin Aeronautical Company - Marietta Aerodynamics & CFD "Is it possible to overclock a cattle prod?" - Irv Mullins This email may contain confidential information. If you have received this email in error, please delete it immediately, and inform me of the mistake by return email. Any form of reproduction, or further dissemination of this email is strictly prohibited. Also, please note that opinions expressed in this email are those of the author, and are not necessarily those of the Lockheed-Martin Corporation.
- Previous message: Cluster System for Computer Graphics Rendering
- Next message: Beowulf & VMWare
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
