cooling
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduWed Apr 24 06:07:51 PDT 2002
- Previous message: cooling
- Next message: cooling
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 23 Apr 2002, Robert B Heckendorn wrote: > We are looking at the facilities issues in installing a beowulf on the > order of 500 nodes. What facilities is telling us is that it is going > to almost cost us more to buy the cooling for the machine than to buy > machine itself. How are people making the air conditioning for their > machines affordable? Have we miscalculated the HVAC loads? Are we > being over charged? No, this is one of the miracles of modern beowulfery. Our new facility in the physics department here is a modest sized room, perhaps 5mx13m. It has 75 KW of power in umpty 20A and 15A (120VAC) circuits. It has a heat exchange unit in one end of the room (unfortunately we were unable to commandeer the small room next door which would have put it and its noise out of the space itself) that is about 3mx3mx3m (to the ceiling, anyway) and that eats an extra half-meter or more on the sides in wasted space (across from the door, fortunately), making the first 3m+ of the room unusable for anything but entrance and AC. The room did require a certain amount of prep -- old floor out, new floor in, asbestos removal, paint. It did require fairly extensive wiring for all of the nodes -- a couple of large power distribution panels, power poles every couple of meters where they can service clusters of racks, a nifty thermal kill for the room power (room temp hits a preset of say, 30-35C and bammo, all nodes are just shut down the hard way). It did require a certain number of overhead cable trays and so forth. Still, I believe that the AC alone (one capable of removing 75 KW continuously) dominated the cost of the $150K renovation. It was so expensive that we had to reall work to convince the University to do it at all, and share the space with another department to ensure that it is filled as much as possible. Right now we are probably balancing along at the point where the number of nodes in the room equals the cost of renovation -- we probably have on the order of $150K worth of systems racked up and shelved. However, we are also ordering new nodes and upgrades pretty steadily as grants and so forth come in, and will likely have well over $250K worth of hardware in the room by the end of the year (which will translate into order 250 CPUs -- even buying duals, our nodes (without myrinet and with only some nodes on gigabit ethernet) are costing roughly $1K/cpu in a 2U dual athlon rackmount configuration. By the time the room is FULL (or as full as we can get it), probably in a couple of years, it should have order of 500 cpus (we're highball estimating 150W per CPU, although we're hoping for an average that is more like 100W -- high end Athlons draw about 70W loaded all by themselves, and then there is the rest of the system). At that point our node investment will likely exceed our renovation expense by 3 to 1 or better, and of course the value to the University in grant-funded research enabled by all of those nodes will be higher still -- every postdoc or faculty person grant-supported by research done with the cluster will probably net the university $30K or more in indirect costs. Overall, I therefore think that this is a solid win for the University and an investment essential to keeping the University current and competitive in its theoretical physics (and statistics, the group with whom we share) research. The University has at this point some two or three similar facilities in several buildings on campus. Computer science has an even (much) larger cluster/server facility that it shares with e.g. math (which has at least one large cluster doing imaging research supported by petrochemical companies). I believe that they are considering the construction of an even larger centralized facility to put genomic research and some biomed engineering clusters in. In a way it this is wistfully interesting. Old Guys (tm) will remember well the days of totally centralized compute resources, where huge, expensive facilities housed rows of e.g. IBM 370s. There were high priests who cared for and fed these beasts, acolytes who scurried in and out, and one prayed to them in the form of Fortran IV card decks with HASP job control prologue/epilogues and awaited the granting of your prayers in the form of a green-barred lineprinter output (charged per page including the bloody header page) placed into the box labelled with your last name initial. It was all very solemn, expensive, and ritualized. Then first the minicomputer, then the PC, liberated us from all of that. An IBM PC didn't run as fast as a 370, but time on the 370 was billed at something like $1/minute of CPU and time on the PC, even at a capital cost of $5K for the PC itself (yes, they were expensive little pups) was amortized out over YEARS (at 1440 minutes/day). Even using the PC as a terminal to the 370 allowed one to edit remotely instead of on a timeshare basis (billed at $1/minute, damn it!) and saved one loads of connect time (hence money). And then came Sun workstations, faster PCs, linux and somewhere in there computing became almost completely decentralized with a client/server paradigm -- yes, there were a few centralized servers, but most actual computation and presentation was done at the desktop. Even early beowulfs were largely spread out and not horribly centralized. An 8 node or 16 node system could fit in an office, a 32 node or even 64 node shelved beowulf could fit in a small server room. The beauty of them was that you bought one for YOUR research, you didn't share it (time or otherwise), and once you figured out how to put it all together it didn't require much care and feeding, certainly not at the high priest/acolyte stage (although cooling even 32 nodes starts to be serious business). Alas, we now seem to have come full circle. Beowulfs are indeed COTS supercomputers, but high density beowulfs are rackmounted and put in centralized, expensive, often shared server rooms and strongly resemble those centralized computers from which we once were freed. I exaggerate the woe, of course. The whole cluster NOW is transparently accessible at gigabit speeds from your desktop across campus (and wouldn't be any MORE accessible if you were sitting at a workstation in the room with it listening to 80db worth of AC roar in your ear), linux is excruciatingly stable (when it isn't unstable as hell, of course:-), and once you get the nodes installed and burned in a human needs to actually visit the cluster room only once in a long while. We've replaced the high priests and acolytes with sysadmin wizards and application/programming gurus but this is a welcome change, actually (they may appear similar but philosophically they are very different indeed:-). Still, the centralization threatens to a greater or lesser extent the freedom -- it puts control much more into the hands of administrators, costs more, involves more people in decisions. Not much to do with the original question, sure, but I needed a little philosophical ramble to start my day. Now I have to write an hour exam for my kiddies, which is less fun. Last day of class, though, which IS fun! Hooray! (It isn't only the students that anticipate summer...;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: cooling
- Next message: cooling
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
