[Beowulf] New person to building a beowulf cluster
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduFri Nov 12 12:02:23 PST 2004
- Previous message: [Beowulf] New person to building a beowulf cluster
- Next message: [Beowulf] New person to building a beowulf cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 9 Nov 2004, Andrew wrote: > Recently took on a project doing a Beowulf cluster and I have configured the > files necessary to run MPICH-1.2.6 to run on 3 computers using Red Hat > 7.2(following directions). I am running into a problem where I can not run > it on my two slave nodes(yet I can run it on my master node) it will pause > for a while and then says p4_error: Could not get host by name for host > node0.home.net I think it might have something to do with the way I > configured my hosts file in which the first line is the node name, second > line is local host, and third is my master node (node0). Anyone have any > other suggestions or comments as to what I should check? No, but a possibly apropos meta-remark. Let's see: RH 7.2 <- your version RH 7.3 RH 8 RH 9 (say) FC 1 FC 2 <- current stable FC 3 <- current new You are years and years out of sync with current linux distros (with similar divergences between 7.2 and other flavors of linux distro). In the meantime the kernel has radically changed, glibc has radically changed, the compiler(s) have radically changed, the support libraries have radically changed, and the general user interface has radically changed. That's a lot of radical change. To even get good help from this list you'll likely need to upgrade to something more current, as otherwise nobody will be able to tell if your problem is in MPICH as distributed in 7.2 (unless they can remember that far back), MPICH as currently distributed but BUILT on 7.2 (old library bugs or incompatibilities), the compiler, the kernel, the networking stack... True, it kind of looks like your problem is likely to be at the administrative level -- having the correct format for /etc/hosts. It should look something like: # /etc/hosts for rgb's private home network # # This is required for loopback access to localhost 127.0.0.1 localhost localhost.localdomain # # The inside/server/gateway/firewall address of Eden 192.168.1.1 node1.private.net node1 192.168.1.2 node2.private.net node2 192.168.1.3 node3.private.net node3 ... When it is correctly set up, you should be able to ping hosts by name as: ping node1.private.net or ping node1 (via the alias defined as the third entry in /etc/hosts). This hosts file should likely be on all nodes, not just the head node. You many also need to check /etc/nsswitch.conf, to make sure "files" is listed as an entry for "hosts:". Hope this helps... rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] New person to building a beowulf cluster
- Next message: [Beowulf] New person to building a beowulf cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
