[Beowulf] Re: vectors vs. loops
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduTue May 3 12:11:18 PDT 2005
- Previous message: [Beowulf] quick and dirty method for starting job on another node?
- Next message: [Beowulf] Announcing nettee 0.1.4
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 3 May 2005, Philippe Blaise wrote: > Robert G. Brown wrote: > > >.... > > > >Still, the marketplace speaks for itself. It doesn't argue, and isn't > >legendary, it just is. > > But, does the hpc marketplace have a direction ? Of course it does. At the moment I would say that the direction isn't obviously apparent only because the last decade plus has been so overwhelmingly directed towards COTS clustering and away from big iron supercomputers. Clustering has been so fantastically successful that clusters have nearly saturated the available HPC marketplace, enabled new growth in comletely new directions (such as bioinformatics and rendering), made inroads into completely new non-scientific disciplines (e.g. economics), started to appear in medical applications (not just research, applications), created whole new genres of games, been the basis for the most persistent and successful part of the internet revolution, and STILL continue to evolve and grow and create new markets and applications. In the HPC market, any apparent lack of direction is caused by the overwhelming satisfaction and degree of cost-benefit optimality in place in all of the clusters in operation in scientific labs and science departments all over the planet, but especially in places with modest resources or a lack of easy access to expensive centralized "supercomputing" centers. There are clusters in operation at small technical colleges in India where buying ANYTHING to support science and the technical development of students is an agonizing process due to pure and simple lack of resources. There are clusters in operation at huge computer centers in the United States where they buy hundreds to thousands of nodes a year. The war is over. Clusters won, and will continue to win until a better (more cost-beneficial) paradigm comes along, and are a "super paradigm" that will likely take systems based on a better paradigm and build a cluster out of them. During this entire period vector systems have never gone completely out of favor, simply because they ARE suitable for a certain class of problems. Some problems in that class are considered "valuable" to both businesses and the scientific community -- valuable enough to justify spending a relatively large amount of money on vector computers in order to accomplish the work faster. Even here, though, the obvious and frequently readily accessible power of parallelism has really pushed vector systems from being "the" supercomputing model in the form of standalone units with relatively few processors towards being CLUSTERS of processors (with whatever network/memory model/interconnect). Nearly anybody doing useful work at all can do more useful work with several systems working in parallel due to scaling and physical limits that will ALWAYS limit how much work one can do with a single "computer". And I'm not talking about the problem of "parallelizing a task" with IPCs and everything -- I'm just talking using a cluster to run tasks in embarrassingly parallel mode, the way nearly anybody can get approximately linear speedup relative to running on a single system. > Few years ago, some people had a "fantastic vision" to replace the > vector machines market : > use big clusters of SMPs with the help of the new paradigm of hybrid > mpi/openmp programming. > Then the main vendors (usa), except Cray, were very happy to sell giant > clusters of smp machines. > > Nevertheless, the japanese guys built the "earth simulator" ; which is > still the most powerful machine in the world > (don't trust this stupid top500 list). > > Then Cray came back ... with vector machines... > > Don't underestimate the power of vector machines. > Yes Fujitsu or NEC vector machines are still very efficient, even with > non contiguous memory access (!!). > > One year ago, the only cpus that sometimes were able to equal vectorial > cpus were alpha (ev7) and itanum2 with > big caches and / or fast memory access. Remember that alpha is dead. > Have a look to the itanium2 market shares. > > The marketplace is not a good argument at all. The "marketplace" is the ONLY argument that matters. The economics of giant "big iron" vector machines is THE dominant force that underlies all current efforts associated with them. Fujitsu, NEC, Cray, SGI, IBM, maybe even Sun -- the surviving big iron companies don't survive on low-margin sales, and never make a massive investment without expecting a profit (high margin profit) at the end of it. They believe that they can spend a fortune designing a system that will never sell to more than a few dozen companies or operations (at millions of dollars apiece) and still make money. Lots of money. As long as they are correct, this sort of design will persist. How correct they are depends in part on how well they market BOTH the machines AND the "importance" to society of the relatively few problems that they are supposed to solve better/cheaper/faster than the alternatives permit. I suspect that in several cases the companies are de facto subsidized by various governments or parts of governments interested in supporting the continuing development of powerful vector systems for reasons of their own (military, industrial, economic) where the application space on the face of things might not really justify it. Remember a point that has been repeatedly made on this list -- the further you are from "commodity" space (measured in mass market units sold and the number of vendors that support it) the more costly things get on the consumer side of things. The economics of system design and marketing is highly nonlinear, highly competitive, and in places survives on razor-thin margins. Note that the alpha did NOT save DEC or provide much benefit to DEC's successive merger-inhalers. Itanium has at best been a break even proposition for Intel -- when I've talked to Intel people directly they tend to hem and haw a bit about its future (corporate party line aside) and I suspect that they actually LOST money on Itanium, which isn't really all that surprising considering how expensive it was to build and how little excitement it generated in at least HPC. The point being that the market has had opportunities time and again to reward DEC or Intel for building fast memory, big cache processors. Instead the market (and I'm still talking HPC market, not even the general market) chose overwhelmingly to purchase really cheap but "fast enough" Durons, Celerons, PIII', P4s, PPros. Even big cache mainstream processors like the Xeons have suffered and lost market share when any significant degree of price premium has been associated with their implementation. The HPC market has >>punished<< high performance departures from the COTS mainstream more often than not for close to a decade now. So if really expensive high performers are failing, who is succeeding? AMD, with the Opteron which combines 64 bits, superior floating point performance, and a LOW PRICE (relatively speaking) -- one where the cost-benefit advantage relative to anything else available is slap-in-the-face obvious for most HPC-like tasks I've tested myself or heard of being tested. The processor and systems design wars that matter (to "most" -- that funny word again -- HPC cycle consumers) aren't being fought in NECs design chambers; they are being fought between Intel and AMD and involve multicore processors, memory crossbars, heat dissipation, and just how much valuable chip real estate to devote to floating point vs integer vs interrupt handling vs cache vs memory integration to make the biggest market segment as happy as possible and provide the best possible BALANCE of performance. To Intel, HPC is "important" but far from critical. To AMD, I think that HPC is maybe critical -- they've successfully defined themselves as a premier HPC platform and clearly invest more resources in floating point. Intel still rules the desktop, though, although AMD continues to fight gamely there, where floating point performance isn't terribly important. The thing that continues to speak against vector processors is the fact that aside from task-specific ASICs (DSP and GPU) the "mass market" (including HPC) has yet to support an OTC vector unit that integrates in any OTS system design. Why this is so I'm not certain -- very likely the memory speed requirements are so much beyond what inexpensive OTS DRAM can support that including them is pointless for most applications -- but whatever the nominal reason, since it is clearly POSSIBLE to do so the REAL reason it doesn't happen is because nobody (thinks they) would make any money from it if they did it. It's all about economics and the marketplace, nothing else. It would cost more than the mass market is willing to pay, and probably more than the more cost-tolerant HPC market is willing to pay in order for the implementer to make enough money to justify the effort and risk. > Vectorization and parallelization are compatible Absolutely. > Hybrid mpi/openmp programming is a harder task than mpi/vector programming. > If you have enough money and if your program is vectorizable, buy a > vector machine of course. I'm not sure what the former means. I honestly think that MOST clusters on the planet don't run "real" parallel code, but rather run N instances of a single threaded application in an embarrassingly parallel way, probably at a ratio of 60:40 or even more. Writing real parallel code with nontrivial message passing and barriers IS a fairly difficult task, although MPI enables it to be done reasonably portably (where the algorithmic and parametric tuning per cluster still remains as a serious portability issue for many applications). Vector programming I view as a primarily single-thread issue independent of parallelization. From the parallel programmer's point of view I would think that it only shifts the balance of compute to communicate around, usually in a way that favors less parallelism in communications bounded code (by decreasing the time spent per work chunk per IPC in a given task partitioning). For EP tasks of course it doesn't matter and the faster the better. > Cluster of SMPs ? they will remain an efficient and low cost solution, > (and quite easy to be sold > by a mass vendor). > And thanks to cluster of SMPs with the help of linux, the HPC market is > now "democratic". > > Of course, it would be nice to have a true vector unit on a P4 or Opteron. > But the problem will be the memory access again. There we are in complete agreement, on both counts. Still a possibility, though, in future designs -- future memory designs in particular look like they'll be much more "democratic" about providing independent (non-CPU mediated) access to the system memory to things like coprocessors. This leaves open as possibilities intriguing models of single system parallelism where sub-tasks are run on an attached processor while the main CPU does its general purpose thing. rgb > > Bye, > > Phil. > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] quick and dirty method for starting job on another node?
- Next message: [Beowulf] Announcing nettee 0.1.4
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
