[Beowulf] RE: Capitalization Rates - How often should you replace a cluster? (resent - 1st sending wasn't posted ).
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mark Hahn hahn at mcmaster.caFri Jan 16 17:10:43 PST 2009
- Previous message: [Beowulf] RE: Capitalization Rates - How often should you replace a cluster? (resent - 1st sending wasn't posted ).
- Next message: [Beowulf] RE: Capitalization Rates - How often should you replace a cluster? (resent - 1st sending wasn't posted ).
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> The question was raised as "When should all these servers be upgraded or replaced again?" 3-5 years, IMO. if you replace hardware in <3 years, you're obviously burning money. that's defensible sometimes, but always pretty dubious - or else you need an inflated sense of self-importance like oh, say the dearly departed financial industry ;) when pushing past 5 years, it's not terrible to keep the cluster running, perhaps by sacrificing a few nodes, but it becomes increasingly unattractive to _use_. that is, after 5 years, "current" machines will have noticably better clock, flops, cache, memory bw and latency, memory size, disk, interconnect, power efficiency, etc. there are still people who will use a 7-year old cluster, but they're outliers, one way or another ;) > But there are other factors - over time the "older systems" are harder to > maintain.... don't run newer licenses of SW products, what is this "license" thing of which you speak? ;) seriously, the industry isn't change _that_ fast. yes, you can probably find some software which doesn't bother providing ia32 versions, only x86_64, but the latter has been around for quite a while (6 years?). the big changes have been mainly cache and cores and memory. > need spare parts, some of which are hard or very hard to find (e.g. old RAM modules - on Ebay?!). ddr2 has been pretty long-lived, and it's still pretty easy to find early (low-clock) modules. that already takes you back something like 5 years. > Sometimes the newer technology uses less power and is cheaper to operate....(anyone ever create a KW/MFLOP vs. Time curve? has that really gone down? ) unambiguously, w/flop has been drastically improved, through both process and architectural changes. 65W quad-core, 4flops/cycle is pretty amazing, considering that only 3 years ago, 95W single-core 2flops/cycle was reasonable. (dual-core at the time compromised clock fairly seriously). 3-5 years ago, the main action was clock scaling from about 1.4 -> 3 GHz, but that was generally more flops for more power, rather than 4x flops for 30% _less_ power. > After several years (e.g.. 6,7, or 8) the systems Admin costs on the older systems may be higher - e.g.. more labor, specialized training, unique tools .. past 5 years is, imo, more of a museum curation effort, rather than really running a compute facility. > If we know the required life is a long time our customer insists on tracking end of life points and buying spares to have on hand. That costs more for longer time spans. you can keep doing the same thing for 6+ years, and I don't see why the costs would blow up if you're smart about spares and/or cannibalization. but you have to ask yourself: is it worth using this old stuff which is so slow and inefficient, relative to cheap new stuff? consider a car analogy - old cars can be pretty neat. but keeping a 1957 chevy running makes the most sense in, say, Cuba, where labor is cheap, replacements are difficult, and where the weather and other climate make you not care so much about whether you have side impact airbags or AWD or 40 MPG at 70 MPH or builtin gps/handsfree/blueray-player. > Do operating costs really go up as a cluster ages? What other factors are there? well, I think the main thing is opportunity costs of a new, faster, more efficient cluster. power and admin overhead don't really add up very fast, maybe 5% of purchase cost/year. I've heard people quote MUCH higher numbers than that - no doubt their sysadmins make more than me, and might have UPS for compute nodes, etc. > For some the upgrade is necessary to tackle a larger or more complex >problem - but others can just let the system churn a bit longer to get the >answer. Is that the real driver? "a bit longer" is fine - I assume you mean a base2 order of magnitude ;) the cluster closest to my chair is around .035 Gflops/W. a modern replacement would be about .256, or a factor of 7.4x better. IMO, an honest figure of merit would be higher than this, since the new machine would also give the answer faster. > I recall that one Beowulf user facility operated both a new and an old > production cluster, and replaced the "old one" with a "newer" one (the new > "new one" ) on a regular basis. my organization has a large variety of clusters, installed since 2001. in fact, we're just now decommissioning our original 2001 stuff (compaq es40's). we could have kept it going, and it still got some use. the main problems with it were that we have 20 other clusters and no excess staff, and that it's been some time since anyone produced a modern distro for alphas (which you can think of as a reiteration of the staff issue.) in the end, the fact that it was ~100 4p nodes that provided 6.6 Gflops for ~600W also factored into the decision. > Is that common? Do most of you as users see the old system just chucked > out as the new one is brought online? we overlap our systems. partly because government funding is ah, "whimsical".
- Previous message: [Beowulf] RE: Capitalization Rates - How often should you replace a cluster? (resent - 1st sending wasn't posted ).
- Next message: [Beowulf] RE: Capitalization Rates - How often should you replace a cluster? (resent - 1st sending wasn't posted ).
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
