[Beowulf] Register article on Opteron - disagree
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduMon Nov 22 16:16:13 PST 2004
- Previous message: [Beowulf] Register article on Opteron - disagree
- Next message: [Beowulf] Register article on Opteron - disagree
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 22 Nov 2004, Joachim Worringen wrote: > The list is provided in XML for just this purpose. However, the list is > designed to show the list of the 500 "fastest" computers - and if many > systems of the same type are among these 500, it is valid information, > too, isn't it? Valid yes, particularly useful no, especially when it could be provided in summary form. > Concerning Geotrace: they propably operate 6 of these clusters > independendly for some reason - why shouldn't they be allowed to list them? Again, what useful purpose is served by listing any after the first, aside from things like impressing shareholder? And I'd be happy to let EVERYBODY list their clusters, one at at time, if there were decent integrated tools with which I could do searches on a structured database of benchmarks, cluster configuration data, cluster cost data. Especially if they stopped calling it a "Top 500" list and just provided the database and a variety of measures of performance that can be searched and sorted according to the needs of the user. Opened it up to anybody that wants to register a cluster! What is there about "500"? Is there part of site maintenance for a tiny database that scales, particularly, with the number of sites registered? Oh. That would interfere with the MARKETING, wouldn't it. > > c) It totally neglects the cost of the clusters. If you have to ask, > [...] > > First, you'd need to define 'cost'. I.e., how would you take into > account the fact that some of the computers are built by volunteers w/o > any effective pay? Does this actually make the computer more > cost-effective? Or the typically much higher maintenance cost for > clusters opposed to "big irons" that many sites experience? I'm perfectly happy to accept cost broken down by category, but I'm personally primarily concerned with hardware costs. Labor is always available at the local going rate, which will sometimes be opportunity cost from an existing labor pool and will sometimes be hired guns and will sometimes involved dedicated local personnel. People can do the math for that tradeoff using their own knowledge of their own organization and labor resources. On the other hand, the time required to get each and every hardware configuration under the sun quoted to do hardware cost estimates IS a significant expense in human time in and of itself in cluster design, and an expense that could at least partially be ameliorated by such a database. It would also help to "keep the vendors honest" if everybody published their cluster hardware costs up front. Hidden costs and privately negotiated deals serve nobody (really) but the system vendors, and clusters like BG and the VT cluster look entirely different when true market price is considered. As for your assertion that "many sites experience" higher maintenance costs for clusters compared to big iron -- I'd like to see that statistically documented beyond "many sites". Unfortunately, I can't, because there is no reliable collection (that I know of) of this sort of data and everybody relies on anecdotal accounts (likely as good or bad as your last experience with either one) or vendor TCO horror stories designed to make you feel good about paying them enough money to purchase two or three times the compute power you end up with in exchange for the promise that what you get will be reliable and "lower your TCO". Note that this isn't stating an opinion on TCO either way. I can tell some stories of my own both ways. In my opinion most big iron supercomputers ARE clusters these days, so what one is really differentiating in any event is the care with which a vendor is selected that can and will properly support the hardware they sell you at any price and however it is labelled. > [...] > > sponsoring institutions are listed). If they want to do us a real > > public service, they could do some actual computer science and see if > > they couldn't come up with some measure a bit richer than just R_max and > > R_peak.... > > Do you know Jack Dongarra's 'HPC Challenge' benchmark, and you track the > discussions on this i.e. on the SC04 panel? My proposal is to make > inclusion of the HPCC results mandatory for new entries in the TOP500, > but keep the list ranked by Linpack for consistency with the results > over time. No I don't, but your proposal is a sensible one depending on whether or not HPCC measures at least some things of use and interest to people who do something other than linear algebra, and the extent to which it includes a useful spectrum of microbenchmark results, which I personally think are a lot more useful than high end "challenge" results likely to be directly applicable to a relatively small range of cluster designers/users. As I said in my response rant that's in proctor-limbo, the way things are now somebody could actually have the fastest HPC computer in the world in terms of (say) the balance between CPU and network and memory to support some fine grained synchronous computation (one that scales out to the largest number of the fastest nodes) and still have the NUMBER of those nodes be only 64 or 128 -- far too small to make a decent Top 500 impression. To that I'll add the proposition that a proper HPC "benchmark suite" should be a lot more concerned with the scaling properties of the suite components than with their utility or "challenge" nature, per se. In many cases, parallel programs have quite predictable scaling properties. One annoying feature of the top 500 is that the linpack scaling properties are so general that it is possible to scale to thousands or tens of thousands of processors (as the current top 100 clearly show) on top of gigabit networks in general. Where are the theoretical curves? What designs are in any sense beating those curves (indicating true innovation) versus lagging them (indicating a sloppy design or perhaps the nature or real engineering)? What designs are living square on the curves, but some at half the hardware cost of others? And of course, none of this is at all useful to people who live on different curves, which is one of my MAJOR complaints. So please, forget the "challenge" part (an open invitation to marketing hype if ever there was one) and provide carefully selected HPC archetypes that might actually be of use to people trying to engineer predictable performance scaling for problems that RESEMBLE those archetypes. Speaking for all the people who do grid-like, coarse grained or embarrassingly parallel applications AND for the people who do computations solving ODE sets with long range interactions AND for the biogenetics people and for the many, many others doing work that is not well represented by Linpack, please include archetypes for those sorts of computations however "boring" and non-challenge-like they may seem at first glance. Of course, one would have to move away from Linpack as the prime determinant of Top 500 membership for this to really do any good, as a lot of potential competitors might have plenty of compute nodes but not much of a network or a hell of a network but not that many compute nodes. > Concerning scaling, R_half gives you some impression already (if > provided - not all sites do so, probably with a reason). > > Apart from this, everyone is free to actually introduce a new kind of list. True. > > > Now, with that said (and it needed to be said, it did it did) the only > > thing most real cluster computer buyers care about is price/performance. > [...] > > I know that at least *some* customers also care for things like > reliability, features like job migratrion, low maintenance overhead and > so on, and include this in their discussion. This is reflected by the All of those are a part of price/performance (or if you prefer, cost/benefit analysis). But no argument. Still, I think that it would be very useful to indicate $/FLOP, if one could ever get honest numbers. Even dishonest numbers would be useful. > fact that not everything in the TOP500 is about beowulf-style > clustering (with the lowest $/linpack-FLOPS ratio). Thus, it might > actually give a fair view of the $/"user value" - if you also, in some > way, could filter out the usual political bias towards or against > certain systems (like, for example, institutions in the U.S. that are > not "allowed" to buy the sort of computer that solves their problems > most efficiently). True enough. A lot of this would be improved by just increasing the range of what is reported to include more detail, and maybe not putting it all on one line. > > > SO I'd have to say that I doubt that the authors of the article were > > particularly well informed, and that AMD is likely to be around and > > kicking for a few years yet. Look, even the Power series hasn't > > disappeared and it has almost no top 500 presence at all, if you > > discount BG itself as IBM showing its marketing clout and finding a use > > for 700 MHz CPUs in Very Large Quantities... > > IBM SP (Power Architecture) has 52 Entries - 10% of the list is just a > little bit different from "almost no top500 presence at all"!? Or did > you mean PowerPC? The latter, sorry. 8 entries (admittedly, four in the top 10:-) -- three of which are big blue, and one of which is the infamous VT cluster. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] Register article on Opteron - disagree
- Next message: [Beowulf] Register article on Opteron - disagree
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
