[Beowulf] What services do you run on your cluster nodes?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduTue Sep 23 07:41:00 PDT 2008
- Previous message: [Beowulf] What services do you run on your cluster nodes?
- Next message: [Beowulf] What services do you run on your cluster nodes?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 23 Sep 2008, John Hearns wrote: > 2008/9/23 Robert G. Brown <rgb at phy.duke.edu> > >> >> This meant that there could be hundreds or even thousands of machines >> that saw every packet produced by every other machine on the LAN, >> possibly after a few ethernet bridge hops. This made conditions ripe >> for what used to be called a "packet storm" (a term that has been >> subverted as the name and trademark of a company, I see, but alas there >> is no wikipedia article on same and even googled definitions seem >> scarce, so it is apparently on its way to being a forgotten concept). >> >> > Bob, the packet storm is not a forgotten concept. I've seen many a packet > storm, and not that long ago. > On Beowulf clusters. Just think what happens if your Spanning Tree protocol > goes wonky. > That's a reason why I'm no great lover of Ganglia too - it just sprays > multicast packets all over your network. > Which really should be OK - but if you have switches which don't perform > well with multicast you get problems. It isn't a forgotten concept as long as there are Old Guys still around, for sure, but the switch DID all but elminate "collisions" and the timing problems that led to the worst pathology, and it also dropped the CPU load associated with global network traffic from "tightly coupled" to "minimally coupled". Multicast/broadcast traffic has always been a problem (remember DECNET and Appletalk? Couldn't tie its own shoes without broadcasting to the entire network, so it didn't NEED external nucleation, being on one of the networks beyond a certain size was sort of like living in a perpetual storm:-) but it is much BETTER than it used to be, at least. I don't know why ganglia uses multcasts. Old Guys also remember rwhod, a very early Sun daemon that basically was part of the "network is the computer" thing they had going. Every few seconds, it would wake up and broadcast more or less what one got out of the "uptime" command, and on clients anyone could enter the "ruptime" command and basically get uptime from the entire LAN (inside the broadcast radius of the nearest broadcast-blocking device). This as fine for tiny LANs with network isolation. For big flat LANs with few bridges, lots of hosts, forwarding of broadcasts (necessary if servers spanned the LAN), well, let's just say it didn't scale and leave it at that. So it isn't like people haven't KNOWN better than to use a broadcast simply forever. Besides, the nodes in most clusters -- or clients in most LANS -- don't all need to know what the other nodes/clients/servers are doing. Management and monitoring is intrinsically master/slavelike -- one host (the one I'm sitting at) wants to access all the information from the node/client/servers. It is intrinsically a serial bottleneck. Persistent network connections and round-robin will intrinsically optimize at least PART of the bottleneck associated with gathering the information. Dumping out multicasts doesn't mean that the toplevel monitoring host isn't serially bottlenecked, it only means it doesn't have any control over what gets sent, collisions, when it handles incoming information. If there were FIFTY hosts each needing information about all the others, multicasts would be good, but when there is just one it seems like they would be bad. Truthfully, one thing I learned writing xmlsysd is that a monitoring system for a cluster IS a parallel application. Different IPC designs have very different scaling. Master-slave is ideal for certain regimes. Multicast or tree structures might well scale better for other regimes. I wrote xmlsysd to work well for relatively small clusters -- out to somewhere over 100 nodes -- or most LANs. I have no idea how well it would work at 1024 nodes -- maybe terribly. And there are still a few small flaws in it that I'd like to work on one day, and probably will if anybody other than myself and the four or five other people that I know of start using it...;-) But nothing show stopping that I know of -- I use it routinely over days of continuous monitoring and it seems to work just fine. rgb -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
- Previous message: [Beowulf] What services do you run on your cluster nodes?
- Next message: [Beowulf] What services do you run on your cluster nodes?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
