Host/interface naming and network path selection
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Josip Loncaric josip at icase.eduThu Jan 25 13:25:30 PST 2001
- Previous message: PBS multi-cpu sends
- Next message: Host/interface naming and network path selection
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Some of our machines now have multiple network interfaces, which leads to the following question: Say machines A,B,C,... can communicate over multiple networks labeled 1,2,...; and say you have a parallel application which launches processes on A, B and C. How does your parrallel application know which communication paths to use? Of course, routing is done based on IP addresses, so the choice of the path is actually made when names are resolved to IP addresses. Several weird situations can arise. (1) Say that network 2 is faster than network 1 but that there is no A2 interface. We could globally identify A=A1, B=B2, C=C2. Now, paths C2<->B2 and B1,C1->A1 work fine, but A1->B2,C2 requires a gateway (very bad). One might change /etc/hosts on A such that B=B1 and C=C1 (on A only), but this is not a globally consistent naming scheme. Some software needs globally unique machine name -> IP address mappings (it gets confused when A thinks B=B1 but C thinks B=B2). (2) Message passing model does not care which interface is used -- it just wants to talk to some process on some host. A sensible expectation is that gethostbyname(A) would return a prioritized list of A's interface IP addresses. This is not what happens. If /etc/hosts is used, gethostbyname(A) returns the IP address of the first match; if DNS is used and A is associated with multiple IP addresses, gethostbyname(A) returns the address list BUT rotates IP addresses on each invocation (the aim is to provide load sharing for web sites, I guess). This fails to prioritize paths and can confuse applications which assume globally unique name<->address mappings. (3) Another problem can arise in naming public/private interfaces. The /etc/hosts file can look like this: 192.168.1.1 A-1.domain A-1 A # fast network, private 128.2.2.2 A.domain A-2 A # slow network, public Locally, A resolves to the fast private network while the FDQN form A.domain gives the slow public interface, but unfortunately A and A.domain resolve to different addresses... I'm sure other related examples can be found. On our system, we were forced to do the following: (i) All hosts within the cluster use the same primary network 1 so that canonical names resolve to A=A1, B=B1, etc. (ii) Secondary names like A2,B2,... are used where appropriate (iii) Parallel codes use either hostnames A,B,... (network 1) or A2,B2,... (network 2) but almost never a mixture of the two This situation begs for a better solution. One approach (not universally followed) is to name interfaces A-1,A-2,... and then derive the canonical hostname A by truncating each name at the '-' character (some software packages use this procedure). Some kind of consensus on whether we are talking about hosts or interfaces is needed, particularly since we'd like parallel codes to be portable between clusters. Administrator tools to prioritize addresses returned by gethostbyname() would also be nice. Any suggestions? Josip P.S. My personal preference would be to use canonical hostnames like A and let the local system figure out what's the best IP address to use. This would imply that parallel applications should identify participating hosts by canonical hostnames, not by IP addresses (a host could have several). Interface naming could follow the A-1,A-2,... style, but unfortunately this style is not a standard. -- Dr. Josip Loncaric, Senior Staff Scientist mailto:josip at icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric at larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134
- Previous message: PBS multi-cpu sends
- Next message: Host/interface naming and network path selection
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
