su segfault (core dump) problem.
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduMon Jan 22 14:18:41 PST 2001
- Previous message: su segfault (core dump) problem.
- Next message: su segfault (core dump) problem.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 22 Jan 2001, Georgia Southern Beowulf Cluster Project wrote: > Hello, > > I'm running 15 diskless nodes attached to another node using NFS to export > to each node: a / filesystem, a shared /home filesystem, and a shared /usr > filesystem. This is a new setup I just implimented (mostly sharing /usr as > before each node had its own instead of sharing). I'm running RH 6.2 with > PVM3 and I've been testing it with pvmpovray. As of right now, I can rsh > into all nodes from any other node (security is not an issue). However, > when I su to become root and do administrative tasks all nodes will now > segfault and sometimes (not always) cause a core dump. Now, I'm far from > knowledgeable about cores, but does anyone have any suggestions or previous > experiences with similar problems. This did not happen before, when all > nodes had individual /usr directories (this made software addition > horrible). Are there any key files opened by programs you are doing maintenance on that are inadvertently shared by and used by all the different nodes? Unfortunately there is plenty of non-FHS compliant programs out there -- anything that you are running that is (for example) opening a file in /usr (presumed shared and static) instead of /var (presumed local and volatile) is a candidate for such a problem if node a reads from and writes to the file while node b is doing the same thing. A shared /dev can be equally evil the same way. Don't assume that even RH binaries are all "clean" in this regard -- remember that they package source code from literally hundreds of folks and there is no guarantee that it is all FHS compliant or even sane. Plenty of programs assume that you can write to anything is if it is local because on the developer's system it is! Second point is that you might want to check out e.g.: ftp://ftp.yellowdoglinux.com/pub/yellowdog/software/yup/ (from a recent freshmeat posting). yup promises to be the long-awaited automagical package maintainer for RPM's. It is easy enough to install RH via kickstart so that all nodes are identical -- just use a common kickstart file and use dhcpd to distribute node identities on the basis of MAC address. It is harder to keep them that way. yup allows you to automagically synchronize RPM-based host descriptions and handles that eternally annoying dependency tree problem for you. It goes and finds new packages AND all their dependencies and updates everything necessary. I've only seen it demonstrated a few times (and haven't installed it myself yet) but the demos were awesome. It makes managing updates of installed packages absolutely trivial and automatic. In other words, there are new tools on the horizon that will radically improve one's ability to scalably manage RPM-based clusters of all sorts, from beowulfs up to simple departmental clusters of workstations. The last remark is that you might want to look over the core dumps themselves to see what binary is producing them. In elder days I could do this with adb. gdb doesn't seem to have a command for that tells you the name of the crashed program (or if it does I don't know it and haven't been able to figure it out -- please feel free to Enlighten me, anybody who knows). However "strings core | less" will usually work well enough -- it certainly reveals to me that most of my personal local core files come from netscape (sigh). rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: su segfault (core dump) problem.
- Next message: su segfault (core dump) problem.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
