[Beowulf] Redmond is at it, again
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduThu Jun 3 07:39:40 PDT 2004
- Previous message: [Beowulf] Redmond is at it, again
- Next message: [Beowulf] Redmond is at it, again
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, 3 Jun 2004, Laurence Liew wrote: > Hi, > > Thanks for the comments, and I think I from the discussion, we can > separate the issue into 4 distinct issues: > > (1) providing the OS distribution and updates > > (2) providing a cluster toolkit which makes #1 into a working cluster, > ie cluster management tool, mpich, pvm, blas, scalapack etc (and updates) > > (3) OS support > > (4) cluster support > > For (1), for experienced cluster admins, paying per node price does not > make sense, as they can take ONE copy from a mirror (free or paid) and > duplicate it to hundreds and thousand of nodes. So in this scenario, the > price point is basically one copy (per year). > > For (2), some (and there are a lot of less capable sys admins out there) > would be willingly to pay for an easier way to do (1) above. > > For (3), it is assumed that in a university setting, the sysadmin would > be linux OS aware and can support himself. This is not really true in > some cases. We do get support calls like "how to setup printing from the > cluster". > > For (4), most linux sysadmins who manages a cluster is not a cluster > admin, and he cannot answer questions like "how come my mpi code does > not work", "why is SGE not able to run my job". Typically we have to > answer those questions on behalf of the sysadmins. (Of course we do have > syadmins that learn quickly and become self-supporting and less reliant > on us - but not always). > > As clearly articulated by RGB, providing OS distros and updates should > be minimal costs as most of the costs has been borned by others. - I Agree. > > And a cluster distro with some value add like integrated cluster > management tools, mpich, blas, scalapack... again a small fee is > acceptable for the work done in integrating and packaging it. Beautifully and succinctly put. I personally don't do succinct, alas;-) Real added value is worth a fee, although it isn't clear that the fee should scale like cluster size per se. It is perfectly reasonable, though, for a vendor to set up several price points that effectively give a price break to somebody just getting started with a tiny cluster. > So I guess for cluster vendors out there what the community would like > to see in any cluster distro pricing is: > > a) low per unit costs if any (say $10-$20/node) for a cluster distro > WITH VALUE ADDED, OR a one time annual site license for unlimited > distritbutions within the campus I don't view $10 per node to be a low unit cost, and don't think that this cost should in general scale with the size of the cluster or LAN or whatever. As has been pointed out several times, one buys "one copy" of the product, which then can be used once or a hundred times. There are no additional marginal costs to you (the seller of the product) if it is used on a two node cluster or a 2000 node cluster (issues like dealing with the failure of tool X to scale to 2000 nodes aside, but those can be addressed with SUPPORT charges below that scale with usage and time). However, this is really a "what the market will bear" issue where my opinion isn't important. As you note in your summary and Joe has remarked offline, there are a number of cluster markets emerging where the administrators are not cluster experts (and sometimes not even linux/unix experts) and who are perfectly happy to pay money on a per node or workstation LAN client basis as an alternative to hiring an experienced administrator or cluster programmer (which does come at a nontrivial baseline cost of say $40-100K). These are groups that might not be able to set up e.g. a repository, PXE, kickstart, yum on their own, groups that cannot build/package/assemble PVM, MPI etc on their own, groups that don't know how to install and configure some canned toplevel cluster package e.g. blast on their own. They want a turnkey solution that can run "unmanaged" locally or with minimal, fairly poorly trained, local management. They will pay for it. Determining whether there are ENOUGH of them to support a company providing the solution at prices they will pay, whether they will remain customers or learn enough to do it on their own and save the money (high prices being a strong incentive to do so), whether a company can survive as "software only" or if the market is best pursued in the Penguin/Scyld model of sales of hardware, software, and integrated hardware/software (so one can make money any of the above ways when times are tough and clients few in any particular dimension) -- well, that's what business and risk is all about. This particular list, of course, likely contains relatively poor candidates for customers as most of the longtime members are strictly DIY kinds of people who are used to squeezing dollars til they squeak and who will only pay for things that provide real measurable value compared to their doing things themselves or with staff resources. Universities in particular tend to have both the expertise and opportunity cost time available to do a lot of things themselves. However, you also see new list members who in some cases would be ideal customers as they are clueless and need help. Remember also that the market for integrated solutions largely exists because of sheer laziness on the part of the NON-clueless cluster persons around the world. If every cluster tool were properly packaged to be auto-rebuildable (and many of them are so packaged, but not all) and hence could be included in an open distribution like Fedora or Debian with "no marginal effort" beyond an e.g. rpm --rebuild on an upgrade (plus any debugging/patching associated with actual library changes) then ALL open distributions would be cluster distributions. Fedora, for example, comes with pvm and lam MPI ready to roll (as did RH before it). SGE is lacking as SGE isn't distributed in cleanly rebuildable source rpm form (why not, I don't know, as the exercise of making it so would likely improve the product). mpich is also missing, which seems silly. blas it has. scalapack it doesn't have. However, it COULD have ALL of these things if somebody who needs them and builds them into rpm's for local distribution anyway would simply "own" the package and contribute/maintain it for the fedora core. This isn't exactly a business opportunity, but to me it makes sense for e.g. the NSF and DOE and NIH to get together and fund a cluster group for the sole purpose of so "owning and maintaining" these and other core packages "forever". I don't mean developing MPICH per se -- I mean doing the requisite work of packaging and maintaining MPICH for the open distributions so that its stable version is primarily distributed THERE instead of from its project specific websites. This could be supported by simply adding it to the charge of the various groups developing these packages (many of which are likely supported by government grants already) or by creating a group at some University or government lab or a consortium of the above and funding it. This is a better idea than the oscar/rocks/etc approach of building a "cluster distribution", open or closed. What is this "cluster distribution"? Nothing but a collection of packages. These packages can equally well be made a standard part of standard open distributions so EVERY such distribution is a cluster distribution. The "cluster distributions", even funded commercial ones such as Scyld, tend to lag the open or commercial linux distributions from which they are inevitably derived by months to years, to the point where they are insecure and no longer being maintained and missing all sorts of recent software developments and improvements. This is a serious, and largely unnecessary, problem (although in the case of Scyld, which does quite a bit of additional work to really fairly radically alter the base distribution with a consequent need for extensive testing and longer term maintenance and stability one can argue for an exception). With yum and kickstart the issue is even MORE moot. A "cluster node" fundamentally differs from a "desktop workstation" by nothing more than the selection of packages installed (including a package with a suitable %post to alter configuration or the same script run some other way). The packages themselves can be collected into package groups to make converting a workstation into a cluster node or a workstation that is ALSO a cluster node a single command or line in a kickstart file plus a reinstall. Going this route makes not only a cluster, but the institutional-scale linux IT environment scale close to the theoretical limit of scalability. > b) separate out the support charges, so that experienced sysadmins can > pay less or opt to pay for support (and finger pointing), and not bundle > the support into the per node price. HERE is where I think there is a real market. As you say, 1) and 2) "should" be cheap to free under any circumstances except perhaps where you add real value in the form of locally developed and maintained tools. In most cases any price "should" be institutional with little to no per unit scaling, but of course this "should" is really controlled by Adam Smith's invisible hand, and if you, or Red Hat, can get folks to pay for it per box have at it -- it is just like printing money on a color laserprinter except that it is legal. In the long run, though, expect to have to deliver real value for real money, and don't expect to get rich quick without a really new idea or killer app, as most people will eventually decide to get a color laserprinter of their own. 3) and 4) in your list (and an unlimited redistribution update feed for a "certified and ready to play" repository) are what I consider to be "support" and worth paying for, and while I personally think that the feed should be cheap cheap cheap (go for the mass market, viewing free free free Fedora or Debian as the "competition") and scale only very weakly indeed with institution size (e.g. "personal" for $5/year of yum-driven updates and access to the repository or $15/year for "household/departmental" rsync access so you can mirror the repository for something like my household, to maybe $1000/year for an institutional feed/mirror where an "institution" is something like duke.edu with a couple of sensible intermediate points in between). However, even at Duke 4) in particular is a major issue. The University can support linux very well and does. We have considerable linux and Unix and general systems expertise on campus, and even have a decent amount of cluster-specific expertise on campus and available on a consultative basis to would-be cluster builders. We have and support both local/group level clusters, department level clusters, and institutional shared cluster spaces where groups buy machines but don't have to install them, care for them, feed and cool them, in local infrastructure and with local administrative expertise. So we can solve 1-3 very efficiently -- close to the theoretical limit of efficiently although we do have to build and package various cluster tools that we need but that aren't yet in distros. 4) we CANNOT solve in this way without maintaining a "cluster programming group" of some sort with a very wide range of expertise indeed, and reselling its services to all takers (or providing them "for free" ditto). This is unwieldy and not cost effective. I and several others on campus consult for groups on clustering issues, but I cannot efficiently figure out a (say) parallelized quantum chemistry application, or BLAST, or help somebody in the medical center parallelize an antique fortran application that is the key to their studies of electron transport processes in human tissue. With that said, I don't know how easy it would be to get the groups doing this sort of thing to buy the required services from an ISV. Some of them would rather hire a programmer and do it themselves, or are willing to take the time to educate a postdoc or graduate student and then maintain a chain where they educate incoming postdocs and grad students (well known sources of slave labor, cheap relative to a "real" programmer:-). Still, I think this is where Joe and others in the business make money. Here you aren't providing a "distribution" per se (although of course you may be). You aren't providing a cluster toolkit per se (although you may be). You may well be providing raw OS support to relatively ignorant groups, although dealing with this inside a real institution with its security and other contraints may or may not be possible in all cases even where the group in question is ignorant (at Duke this would be difficult, for example). You are very definitely providing a valuable service in terms of installing task-specific tools, training people in their use, maintaining them (fixing them when they break, updating them as appropriate), and supporting them (answering the 800 possibly silly questions their use generates per year). And there is plenty of room to do this more cheaply than the institution or group can do it themselves and still make money, assuming e.g. 1/2 an FTE or more to provide the service locally. How to scale this I don't know. The real costs of providing the service still don't scale well with number of nodes -- it is more a matter of number of users, level of ignorance of users, number of personality disorders in user base. One really stupid and obnoxious person with a four node cluster can easily generate more support work than thirty competent and pleasant individuals working on massive cluster. Some poorly packaged software tools might require per-node effort to install. Providing it as part of a turnkey (preinstalled) cluster clearly requires per node effort. Again, the marketplace is what will decide, in the long run, what you can charge for this sort of thing. Again, it is more like running a consultancy than a software business -- you can make a good living but you won't get rich as it is labor intensive and labor DOESN'T scale all that well, but the client is likely to be doing all parts of the labor locally that do scale well and this is what is left over. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] Redmond is at it, again
- Next message: [Beowulf] Redmond is at it, again
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
