Beowulf Questions
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduMon Jan 6 07:53:13 PST 2003
- Previous message: Beowulf Questions
- Next message: Beowulf Questions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 6 Jan 2003, Mark Hahn wrote: > OK, so grid is just cycle scavenging with its own meta-queueing, > its own meta-authentication and its own meta-accounting? And perhaps most important (but not yet significantly implemented, although there is a very serious project here at Duke to implement it called Computers On Demand -- COD) -- a meta-OS-environment and meta-sandbox for the distributed users that can be loaded literally on demand (at a suitable time granularity, of course:-). Multiuser/multitasking with a vengeance, where "the network is the computer" on a very broad scale indeed. Mark, you shouldn't discount the economics of cycle scavenging or refer to it as "just" that. In once sense, all multiuser/multitasking computing is cycle scavenging, but who would deny its benefit? Even now, things like Scyld can be booted from e.g. floppy on a node, leaving the node's hard disk and primary install intact. Or, people can install two or three or ten bootable images on a modern disk and choose between them with grub. Surely it isn't crazy to develop tools to take the individual craft and handwork out of these one-of-a-kind solutions and make them generally and reliably implementable? Just to give you a single example of the economics that drive this process, Duke has gone from a couple of clusters (mine and one over in CS) to literally more clusters than the University per se can track over the last five years. From order 10-100 nodes total to thousands of nodes in tens of departments in maybe 100 independent groups. Some groups (the EP folks like myself) are always and inevitably cycle-hungry. They build all the nodes they can afford and run on them continuously. They don't need much in the way of network. They do need a "known environment" on the nodes -- e.g. PVM or MPI or GSL or ATLAS or ETC libraries, the right OS and release number to support their binaries, appropriate permissions. Other groups need more nodes than they can ever afford to buy, but only for three short weeks a year. When they need them, they REALLY need them, but the rest of the time they are doing something else (e.g. analyzing the results of those three weeks, thinking, writing, teaching). Some groups need tightly coupled, synchronous clusters. Others are EP. The potential benefits that can be obtained by providing a suitable interface to permit these various groups, with their widely disparate needs and usage patterns to mutually optimize their investment and usage patterns across ALL organization-level (in our case initially departments, eventually perhaps the University itself) cluster resources are significant -- equal or greater in value to the total value of all of those resources, presuming that the AVERAGE resource utilization is likely to be below 50% as things currently stand (a not unreasonable number, BTW). This may not be Grid Computing (big G, on the RC5/Seti scale), but is rather grid computing (small g, on a purely sensible institutional scale) within a single organization with a single domain of trust, an adequate backbone and other infrastructure, and a unified model and toolset for permitting optimization of the utilization of its resources at the institutional level rather than the individual research group level. Here it makes sense, I think, although time and experience will prove this right or wrong. At any rate, there is a clear economic benefit that drives the development process -- it remains to be seen whether or not it can be realized. The tools being developed will really revolutionize "node" resource allocation, by the way. In principle they will allow the automated recovery of cycles wasted at night on e.g. computer systems that support the undergraduate physics labs here -- by day they run NT and are in use by students. By night they are totally wasted and sit idle, consuming electricity that the University pays for to no visible profit, while my work limps along on all the systems I can afford but would greatly benefit from more. On other scales the toolset might control the allocation of a department-wide compute cluster among four or five groups of researchers on an access-granularity scale of hours to days, or reallocation of the undergraduate clusters provided as "terminals" to the network to EP tasks during holidays and breaks. I expect/hope that over the next two or three years, GPL tools will be developed (some of them here by Justin Moore and Jeff Chase) and perfected that permit a midsized or larger organization to increase their utilization efficiency for a wide range of compute cluster/node resources by a factor of 2-3 with no significant degradation of security and with an overall INCREASE in the productivity of just about everybody associated with a shared group. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: Beowulf Questions
- Next message: Beowulf Questions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
