[Beowulf] Multicore Is Bad News For Supercomputers
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Vincent Diepeveen diep at xs4all.nlFri Dec 5 09:15:01 PST 2008
- Previous message: [Beowulf] Multicore Is Bad News For Supercomputers
- Next message: [Beowulf] Multicore Is Bad News For Supercomputers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Well every scientist who says he needs a lot of RAM now, ECC-DDR2 ram has a cost of near nothing right now. Very cheaply you can build nodes now with like 4 cheapo cpu's and 128 GB ram inside. There is no excuse for those who beg for big RAM to not buy a bunch of those nodes. What happens each time is that at the moment that finally the price of some sort of RAM drops (note that ECC-Registered DDR ram never has gotten cheap, much to my disappointment), that a newer generation RAM is there which again is really expensive. I tend to believe that many algorithms that require really a lot of ram can do with a bit less and profit from todays huge cpu power, using some clever tricks and enhancements and/or new algorithms (sometimes it is difficult to define what is a new algorithm, if it looks so much like a previous one with just a few new enhancements), which probably are far from trivial. Usually programming the 'new' algorithm efficiently low level is the big killerproblem why it doesn't get used yet (as there is no budget to hire people who are specialized here, or simply because they work for some other company or other government body). I would really argue that sometimes you have to give industry some time to mass produce memory, just design a new generation cpu based upon the RAM that's there now and just read massively parallel from that RAM. That also gives a HUGE bandwidth. If some older GPU based upon DDR3 ram claims 106GB/s bandwidth to RAM, versus todays Nehalem claims 32GB/s and is achieving a 17 to 18GB/s, then obviously it wasn't important enough for intel to give us more bandwidth to the RAM. If nvidia/amd GPU's can do it years before, and latest cpu is a factor 4+ off then discussions about bandwidth to RAM are quite artificial. The reason for that is the limitations of SPEC to RAM consumption. They design a benchmark years beforehand to use an amount of RAM that is "common" now. I would argue that those most hungry for bandwidth/core crunching power is the scientific world and/or safety research (air and car industry). Note that i'm speaking of streaming bandwidth above. Most scientists do not know the difference between bandwidth and latency, basically because they are right that in the end it is all bandwidth related from theoretical viewpoint. Yet practical there is so many factors influencing the latency. Intel/ AMD/IBM are doing big efforts of course to reduce latency a lot. Maybe 95% of all their work onto a cpu (blindfolded guess from a computer science guy - so not hardware designer)? In the end it is all about the testsets in spec. If we manage to get a bunch of real WELL OPTIMIZED low level codes that eat gigabytes of RAM finally into that spec then within years AMD and Intel will show up with some real fast cpu's for scientific workloads. If all "professors" type RGB make a lot of noise world wide to get that done, then they have to follow. Any criticism against intel and amd with respect to: "why not do this and that", i'm doing it also all the time, but at the same time if you look to what happens in spec, spec is only about "who has the best compiler and the biggest L2 cache that nearly can contain the entire working set size of this tiny RAM program". Get some serious software into SPEC i'd argue. To start looking at myself: the reason i didn't donate Diep is because competitors can also obtain my code, whereas all those compiler and hardware manufacturers i don't care if they have my proggies source code. Vincent On Dec 5, 2008, at 2:44 PM, Mark Hahn wrote: >> (Well, duh). > > yeah - the point seems to be that we (still) need to scale memory > along with core count. not just memory bandwidth but also concurrency > (number of banks), though "ieee spectrum online for tech insiders" > doesn't get into that kind of depth :( > > I still usually explain this as "traditional (ie Cray) supercomputing > requires a balanced system." commodity processors are always less > balanced > than ideal, but to varying degrees. intel dual-socket quad-core > was probably the worst for a long time, but things are looking up > as intel > joins AMD with memory connected to each socket. > > stacking memory on the processor is a red herring IMO, though they > appear > to assumed that the number of dram banks will scale linearly with > cores. > to me that sounds more like dram-based per-core cache. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf >
- Previous message: [Beowulf] Multicore Is Bad News For Supercomputers
- Next message: [Beowulf] Multicore Is Bad News For Supercomputers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
