Building a beowulf with old computers
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduMon Mar 10 05:37:08 PST 2003
- Previous message: Building a beowulf with old computers
- Next message: Building a beowulf with old computers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sun, 9 Mar 2003, Robert Myers wrote: > Robert G. Brown wrote: > > >The sad truth is that cluster nodes have an ECONOMICALLY useful lifetime > >of somewhere between 18 months and 3 years, depending on lots of things, > >although one can arguably get work done out to 5 years on nodes that > >require no human time to run or repair that other people are paying to > >feed and cool. > > > > > > > That makes a strong argument for considering energy consumption when > building a cluster in the first place. Lower energy consumption = Lower > energy cost, longer economically useful life = Lower TCO/year. It absolutely is unwise to ignore energy consumption (for power and cooling), renovation costs, "rent" on the physical space, human mangement costs and so forth -- really all the infrastructure and management costs -- when building a cluster. That doesn't stop many folks from doing so. Let us do a little review of cluster economics. When computing TCO for clusters being inserted into existing facilities and being run by existing personnel (as opposed to ones being created from whole cloth with everything a line item and full time enterprise) a lot of the expense is "opportunity cost". (Opportunity cost is an economic term for the value of the time, space, power that people spend building and managing a cluster that THEY COULD HAVE SPENT ON SOMETHING ELSE. Or more particularly, it is the cost of the something else. This needs to be compared to the BENEFIT of building the cluster and diverting all of those resources.) If you have a suitable space and preexiting LAN infrastructure handy, some low priority/low return tasks that can easily be displaced or put off, and a decent "return" from building a cluster (in term of whatever goals you might have) then cluster infrastructure, except for power, can be nearly "free". If you have to renovate a space in a crowded building that displaces or prevents other work from being done, and manage the cluster with the time of an ALREADY overworked systems manager who has to delay other tasks that reduce the productivity of the environment, using a LAN infrastructure built and managed from the ground up just for the cluster and task at hand, the cost of a cluster can be high, much higher than you likely estimated when considering building the cluster. MANY of the original beowulfish clusters, and many clusters today, have a very low relative TCO because much of their indirect (non-hardware) expense is/was opportunity cost labor and other resources that wouldn't otherwise be able to produce anything of comparable value, OR because they had dedicated resources but those resources were still far less costly than alternative approaches to getting the same work done with big iron. Or both. Frankly, this is still very much true, but as clusters get bigger the differences between their infrastructure requirements and those of big iron clusters gets smaller. When clusters have more than 8-16 nodes, the recurring costs for operating them start to get to large to safely ignore. I think the $1/watt/year figure comes as a bit of a surprise to a lot of folks (it's based on electrical costs of $0.08 per kilowatt-hour, 8760 hours per year, or about $0.70 for the electricity up front, plus another estimated $0.30 to remove the heat with an air conditioner with a coefficient of efficiency in the range of 2-3, so you can see that there is nothing up my sleeve). Of course it could be off by a factor of 2 either way depending on energy costs and AC efficiency in your actual environment. I remain very aware of these numbers as I pay them out of pocket for my home beowulf. It's "different" when it is your money...;-) > Same argument works for server blades, and I'm amazed that energy costs > don't come up as a consideration more often. > > A researcher at LANL has built a cluster based on Transmeta chips called > Green Destiny, making the energy cost argument, which is documented in > > http://public.lanl.gov/feng/Bladed-Beowulf.pdf > > He claims a much lower TCO for his Transmeta-based system, but only a > small part of the claimed savings is electricity costs. I'm not convinced that these save THAT much money (or any at all) for the following reasons: a) From on-list discussions in the past it does not appear that one saves THAT much energy PER FLOP (or aggregate MHz, or bogomip, or whatever measure of performance you like). Indeed, it seems likely that in a lot of cases one will LOSE energy per unit of actual work done. This is for a lot of reasons, the most fundamental of which is that it takes a certain amount of energy to switch the state of a flip-flop, a certain amount of energy to hold the state of a flip-flop. That energy isn't scale invariant in either the spatial domain or temporal domain. Also, there is the energy cost of shared infrastructure for the CPU -- the number of disks, the amount of memory, the number and kind of peripheral cards supported PER UNIT OF WORK DONE (not per CPU). I'm not certain that anybody has done a systematic study of the scaling laws for WORK done for typical numerical tasks, but my feeling (which could be incorrect) is that one's total energy cost per unit of work done by a non-idle system actually decreases with e.g. CPU clock and VLSI generation. As in my guestimate for the P5's vs P6's, a 2.4 GHz P6-class CPU might well get 24 times the work done as a 200 MHz P5 class CPU, but I don't think that there is any way in hell that it draws 24 times as much power. More like 2-3. If we are generous and presume four times as much power for twenty four times as much work over six years, this suggests VERY CRUDELY that there is a Moore's Law-like scaling law that decreases power cost per unit of work done with a time constant maybe twice that of Moore's Law itself. Then there is Amdahl's law, which dictates that it is (nearly, barring accidental superlinear speedup) ALWAYS less efficient to use three 800 MHz CPUS than one 2400 MHz CPU, all things being equal. Some computer-science-economist out there may have done the math and published it, or somebody out there may read this and find a master thesis or senior honors thesis...;-) So I'm by no means convinced that blades signficantly lower total power consumed per unit of work done relative to much higher clock, hotter, but faster node units and wouldn't be horribly surprised if it were the opposite. b) On top of that, blades are very, very expensive per raw FLOP (or whatever measure du jour you like/need). At any given point in time, there is some hardware combination out there that gives you optimum work accomplished per dollar spent for your particular task. For a CPU-bound task, that is likely to currently be something in the lowball Celeron/Duron/Athlon/P4 family, in a tower (cheapest but space inefficent) or rack case (if space is an issue) depending on how sensitive your task is to memory type and speed and CPU cache size. For a memory bound task, likely a P4 or Athlon, on a relatively good motherboard with high FSB clock and with high end memory. For communications bound parallel tasks, a 64/66 PCI bus and high end communications card is necessary. Since CPU >>prices<< tend to vary highly nonlinearly with clock (and many times work done scales at least approximately with clock) one can just go down the list and pick the most cost efficient CPU clock and packaging. I don't think it will be a blade. Now, if blade packaging of a relatively low-clock CPU is only TWICE as expensive per unit of work done than a current generation cost-optimum packaging, the energy savings over the lifetime of the unit in no way justify the higher cost. The cost per year to operate a 2.4 GHz CPU is likely to be in the $100-150 range, and over a three year lifetime that is a TOTAL cost of $300-450. The marginal cost of three 800 MHz blades (presuming work done that DOES scale perfectly with clock) is way, way higher than $300, and although they >>may<< draw less total power, they aren't going to draw no power at all, so this will be further reduced. c) Even the management/ease of installation element of TCO touted for blades is, in my opinion, very questionable. Or rather, they may well be extremely easy to install and manage, but >>so are plenty of alternative, more traditional hardware configurations<<. We are well past the "hobby cluster" stage, and linux has come a long way from when nodes had to be cobbled together and installed "by hand" one at a time, taking perhaps and hour or hours each. The http://www.phy.duke.edu/brahma/linux-mag.html collection is just a small and probably highly incomplete snapshot of installation and management methodologies that reduce the marginal cost per node for installation and management to near zero (once a fixed cost of setting up for the methodology of your choice is paid). RH archive+DHCP/PXE+Kickstart+Yum, Debian archive+DHCP/PXE+Apt, Scyld, Clustermatic, various turnkey vendors, there are open source and free installation methodologies, shrink-wrapped methodologies that you can buy with support, and turnkey clusters you can buy where you've already paid the fixed cost of installation and setup and even the cost of programming and customizing your primary application(s), usually for a fairly paltry 10-20% of the per-node hardware price. Even hardware reliability isn't necessarily a significant differentiator, as one can easily enough buy nodes with 3 year onsite service plans, or maintain a small stock of spare parts to minimize hardware downtime. The BEST reason to consider a blade cluster, in my opinion, is to save the nonlinear and potentially high capital costs for large spaces and/or renovation in environments where the "cost" of space and renovation is high, or there is no way to reasonably amortize thosee costs over (say) 10 years (where they diminish to being a comfortably small fraction of the recurring costs for power and cooling you're paying anyway). Bottom line: I think one has to do a fairly serious CBA for the various alternative ways of building a cluster for accomplishing any particular task in any particular environment. One will very likely need to either ignore vendor claims for their "TCO" or take them with a large, shiny grain of salt and do your own less biased estimates. Perhaps estimates derived by companies that make roughly equal amounts of money selling both bladed and rackmount and tower/shelf clusters all three can be trusted, I don't know. However, there is no substitute for running, and fully understanding, the numbers yourself. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: Building a beowulf with old computers
- Next message: Building a beowulf with old computers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
