[Beowulf] Any Gaussian users out there?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Rafael R. Pappalardo rafapa at us.esTue Jan 9 00:33:42 PST 2007
- Previous message: [Beowulf] Any Gaussian users out there?
- Next message: [Beowulf] OT: Software RAID & Multipath
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Monday 08 January 2007 04:49, Joe Landman wrote: > I found a neat ... feature ... of Linux while getting g03 running in SMP > on cluster nodes. Long story, but the folks I am doing this for don't > have/want to use Linda. They asked us to help them get g03 operational > in SMP parallel. This wasn't painful. Have it integrated into SGE and > our SICE interface now as well. > > Basic idea is that we are getting a kernel exception in the VFS layer > only when running with 2 or more CPUs on an SMP node. Shows up only on > SuSE 9.3 nodes. The other nodes are RHEL 3 based (2.4 kernel, but hey, > its really stable). > > I don't want to post a nasty-looking trap here. > > The problem occurs with both xfs and jfs. Haven't had the chance to try > ext3 yet, though if the issue is in the vfs layer, I can't see how > changing the underlying block device is going to alter the layers (VFS) > above it. > > The net effect of this is that it runs great on the 2.4 based machines, > but gets SIGKILLs when running on the 2.6 based SuSE 9.3 machines. > Looks like the app is tickling the OS bug. I can repeatably cause this > trap, though it seems to occur at "random" places, well, not really. > The way Gaussian runs, it has "links" which are binary modules which > execute a particular portion of the calculation (its pretty neat > really). Each link is read in from the disk. This VFS bug gets > triggered regardless of local or remote FS. > > Any Gaussian users out there see that? Does a kernel upgrade fix it? > Inquiring minds want to know ... Don't know if it's threads related but... Sometimes setting LD_ASSUME_KERNEL to 2.4.1 in the environment solves this kind of problems. There are other possible values, you can have a look at: http://people.redhat.com/drepper/assumekernel.html Best regards, Rafael -- Dr. Rafael R. Pappalardo Dept. Physical Chemistry, Univ. de Sevilla (Spain) e-mail: rafapa at us.es
- Previous message: [Beowulf] Any Gaussian users out there?
- Next message: [Beowulf] OT: Software RAID & Multipath
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
