Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Weird problem with mpp-dyna

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Joshua Baker-LePain jlb17 at duke.edu
Wed Mar 14 11:14:31 PDT 2007


On Wed, 14 Mar 2007 at 10:05am, Michael Will wrote

> You mentioned your own code does not exhibit the issue but mpp-dyna
> does.

Yep.

> What does the support team from the software vendor think the problem
> could be?

They say that we have an academic license which does not entitle us to any 
support, but they'll look at the issue if/when they have some spare time. 
Which, really, is fair, given what we pay for it.

> Do you use a statically linked binary or did you relink it with your
> mpich?

Agh.  I forgot to mention this little wrinkle.  LSTC software distribution 
is... interesting.  For mpp-dyna, they ship dynamically linked binaries 
compiled against a specific version of LAM/MPI (7.0.3 in this case). 
They also provide the matching pre-compiled LAM/MPI libraries on their 
site. For a fun little wrinkle, RHEL/CentOS ships LAM/MPI 7.0.6. 
However, the spec file in their RPM does *not* include the --enable-shared 
flag.  IOW, the OS vendor's LAM/MPI package has no .so files.

It seems like it'd be worth re-compiling the centos lam RPM to include the 
shared libraries and run against those to see if it helps.

> We have ran lstc ls-dyna mpp970 and mpp971 across more than 16 nodes
> without any issues on Scyld CW4 which is also centos 4 based.

We can run straight structural sims across as many nodes/CPUs as we've 
tried, and ditto for straight thermal sims.  It's just on coupled 
structural/thermal sims that this issue crops up.  That, to me, rather 
points to a bug in dyna itself.  But the fact that the bug manifests 
itself (at least in part) by the MPI job trying to talk to a different 
network interface than was 'lamboot'ed is what is throwing me off a bit.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University



More information about the Beowulf mailing list