disadvantages of a linux cluster
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduSat Nov 9 16:21:32 PST 2002
- Previous message: disadvantages of a linux cluster
- Next message: disadvantages of a linux cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, 6 Nov 2002, Jim Lux wrote: > > > > > b) Uptime, measured as (total time systems are booted into the OS and > >available for numerical tasks/total mount of time ALL systems have been > >around). > > > >This means that if you have 9 systems booted and a hot spare, the best > >you can count for uptime is 90%. It also means that if a system crashes > >in the middle of the night and you don't get around to fixing it until > >the next day, you lose eight or twelve hours, not the ten minutes it > >eventually takes you to fix it after discovering the crash, pulling the > > > If the cluster were claimed to have 9 processors worth of processing > capability, and the OS and scheduler allow transparent use of the hot > spare, then, you could get 100% uptime as long as you only had 1 failure. Well sure, but then I can claim that my ganesh cluster (which currently has only 13 nodes out of 16 running) is really only a "12 node cluster". My OS and scheduler (the latter being "me":-) not only allow transparent use of the hot spares, they allow transparent use of the hot spares even when one of the 12 nodes hasn't died yet and I get to count that time up against eventual node death. So now I can laugh at mere nines of uptime -- I'm well over 100%! Well over 100% cumulative duty cycle, too. Hooray! I'm now even more efficient that CTC! Of course this isn't correct, I really have 16 nodes and think keeping hot spares sitting around idle is silly. I could choose to leave one, two, or even four idle but configured and call them "hot spares" to pump up my "uptime" if my goal was to show that I can keep at least 12 nodes running out of a pool of 16 or to be able to issue a nifty press release about "99.99846% uptime". For the purpose of getting work done -- a moment of reflection will surely convince you that this is a cosmically silly and somewhat dishonest thing to do:-). For people interested in getting work done, the ONLY THING THAT MATTERS is the aggregate work accomplished during the useful lifetime of the cluster, which (as has been discussed repeatedly on this list) is somewhere in the ballpark of two or three years. (Some claim only one year and can even back up the claim with some real numbers; some -- like me -- use nodes for as long as five years anyway because any CPU that ain't dead yet can still contribute cycles, and there are all sorts of opportunity cost and infrastructure nonlinearities that make simple answers for the ideal optimax wrong;-) If one has 10 nodes and deliberately leaves one node idle, the MOST work one can get done is 90% of the work one could have gotten done with all 10 nodes cranking away all of the time. Sure, you can lose any one node and not get any WORSE, but you achieve this at the expense of basically being bad all the time. It can be sliced and diced any way one wants, but the bottom line is that one has wasted 10% of the resource even BEFORE a failure occurred, and one will never, ever, recover the work that could have been done by the deliberately idled node. The real failure is in the brain, which left a valuable system, already paid for, doing no work while its useful lifetime and time under warranty frittered away. For this reason, in my opinion, one counts hot spares as a dead loss FROM THE BEGINNING in any fair (that is, not deliberately stupid) assessment of cluster "uptime". One does not get to "pad" one's uptime just because one can quickly insert a node you've paid for but are leaving idle (that is to say, DOWN) unless/until you can show that there exists ANY circumstance where you are likely to get more net work done, per dollar spent, that way instead of just artificially bumping some otherwise irrelevant numbers. CBA, CBA, CBA, with a clear statement of one's work goals and one's total means to accomplish them. That's the only way to do fair comparisons. Otherwise we might as well all buy Crays, because they are big, expensive, and come with hot and cold hardware elves. Well, maybe we might as well NOT all buy Crays because most of us just plain can't afford them! I think that one can make a strong economic case for NEVER purchasing a service contract for a cluster, and NEVER buying and holding idle spare parts (beyond, perhaps, a hard disk and DIMM or two), and for (in fact) having the cluster "eat its own dead" -- using dead nodes to repair nodes as they die and gradually permitting the number of nodes to shrink. This is because of Moore's Law, which rather brutally punishes node repair compared to purchasing new nodes pretty much anytime after the typical one year warranty of a node expires. This is probably a bit too extreme for true optimax behavior -- replacing a memory DIMM or a power supply or a hard drive for order of $100 or less to get another year's use out of a node is almost certainly worth it in the first or second year of a node's existence, but maybe by the third and certainly by the fourth it is a waste of time and money -- you're better off putting the $100 into the kitty for a new node that is likely 8x as powerful as the node you're replacing. Now, before anyone brings it up, I will cheerfully admit that there are SOME cases of parallel computation that really might get unhappy if the cluster is supposed to have "16" nodes and one dies and the number available goes to "15" (or 256 nodes goes down to 255 or whatever). Those cases are rare, of course, and invariably are fine grained synchronous cases where the computation itself globally fails when any node goes down (so "failover" mechanisms other than checkpointing of the code itself are a waste of time) but they exist. EVEN THEN one would have to do the CBA to convince me that hot spares are a net-productivity-increasing investment, but at least then I'd be willing to accept the possibility, especially if the computation was using all (say) 256 nodes and couldn't be restarted from its last checkpoint until there are 256 nodes to run on again. In all other cases, especially in the typical coarse grained or embarrassingly parallel applications or applications that don't use all N nodes of the cluster anyway, I'd just have to say "Is that a Sears poncho or a real poncho? Hmmm. No fooling." rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: disadvantages of a linux cluster
- Next message: disadvantages of a linux cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
