[Beowulf] IP address mapping for new cluster
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Larry Stewart larry.stewart at sicortex.comWed Aug 1 20:47:58 PDT 2007
- Previous message: [Beowulf] nfs: server starsrv not responding, still trying
- Next message: [Beowulf] IP address mapping for new cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Carsten Aulbert wrote: > Hi, > > <scheme for assigning IP addresses to cluster components> I clicked reply to say this seems like a lot of trouble to go through to make it easy to go from IP address to location and function, but it turns out that we do something very similar in our machines. A 972 node SC5832 uses a class B IP address like A.B.y.z/16. The interconnect fabric isn't solely an IP network, but we emulate Ethernet/IP using IP addresses like A.B.200+<module ID>.100+<node ID>/18. Each node also has a second IP address that doesn't depend on the interconnect -- the control plane network -- with an ID like A.B.0 + <module ID>.100 + <node ID>/18, These are like your IPMI ports in function but are actually serial point to point IP links, the other end of which is an interface on a module service processor that does booting and so forth. The module service processors each have a control plane IP address like A.B.0.100+<module ID>/24. Then the fans, power supplies and so forth have addresses in A.B.0.20-99/24. The main service processor has A.B.0.1 as its interface on the control network. The system has a third IP network connecting some gateway nodes on some modules to the service processor. These interfaces have address is A.B.150.X/24. I was going to say "how often do you really deal with the A.B.C.D rather than DNS names anyway?" but I've just spent a couple of weeks doing just that and it really is convenient when you are in the weeds. One comment is that nearly all software that deals with dotted quads prints in decimal, which makes binary encodings of the meaning awkward. So using 4 bit fields for the X and Y coordinates is hard to translate in your head. Instead, making the third octet be (row*20)+column would be a lot easier on the brain and supports 12 rows. This is why we do things like A.B.200+<module ID>.100+<node ID>/18. It's a little awkward to get started, but then it is trivial to map in your brain from IP to function and position. The next issue is how all this gets initialized. Pretty much the only way to do it is to have the DHCP servers configured to map MAC addresses to IP addresses in a stable way. We don't really have that problem because pretty much the only interfaces that have random MAC addresses are the module service processors. The MAC address maps to the manufacturing serial number, which is essential for tracking faults, but the position (slot ID/module ID) is reported in the DHCP request in a <vendor> field and the DHCP server knows what to do. It seems like when you install something, you will have to enter its MAC addresses into the DHCP server database and map to a stable IP address given database knowlege of the position and function of the device. It also seems like as boxes get pulled out of a rack for service, replaced by spares, and later put back in service somewhere else that you should maintain a database of MAC address to device serial number so you can recognize a lemon when it comes back with a different IP address but the same symptoms. The database will have to be clever to support coherent views of FRUs in cases like when an interconnect card is moved from one flakey motherboard to another, changing the MAC binding but not the failures. For us, there were a number of benefits in going to "IP address maps to function": * Humans can debug given the IP addresses alone * No DNS lookups required in performance critical paths * Higher level configuration files for things like SLURM can be nearly static Nevertheless, is the benefit of mapping IP to physical location really valuable? Trying to maintain this given the probable frequency of swapping out boxes will cause trouble with DHCP and ARP. Either you make the leases short and wait for them to expire before powering on a replacement, or you have to go around manually flushing leases and arp tables. Ugh. Instead, it may make more sense to give a type of device a stable IP address without regard to position, and to maintain a database mapping MAC/IP to location separately. For a few 1000's of devices, grepping the location file will be faster than walking over to the right rack anyway. We have this problem with modules. The service guys want to swap modules in the backplane to see if a problem follows it and it has cost us some DHCP hackery to let the addressing respond smoothly. -Larry
- Previous message: [Beowulf] nfs: server starsrv not responding, still trying
- Next message: [Beowulf] IP address mapping for new cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
