[Beowulf] Multicore Is Bad News For Supercomputers

Fri Dec 5 09:15:01 PST 2008

Well every scientist who says he needs a lot of RAM now,
ECC-DDR2 ram has a cost of near nothing right now.

Very cheaply you can build nodes now with like 4 cheapo cpu's
and 128 GB ram inside.

There is no excuse for those who beg for big RAM to not buy a bunch  
of those
nodes.

What happens each time is that at the moment that finally the price  
of some sort
of RAM drops (note that ECC-Registered DDR ram never has gotten  
cheap, much
to my disappointment), that a newer generation RAM is there which  
again is really
expensive.

I tend to believe that many algorithms that require really a lot of  
ram can do with a bit
less and profit from todays huge cpu power, using some clever tricks  
and enhancements
and/or new algorithms (sometimes it is difficult to define what is a  
new algorithm, if it
looks so much like a previous one with just a few new enhancements),  
which probably
are far from trivial.

Usually programming the 'new' algorithm efficiently low level is the  
big killerproblem why
it doesn't get used yet (as there is no budget to hire people who are  
specialized here,
or simply because they work for some other company or other  
government body).

I would really argue that sometimes you have to give industry some  
time to mass produce memory,
just design a new generation cpu based upon the RAM that's there now  
and just read massively
parallel from that RAM. That also gives a HUGE bandwidth.

If some older GPU based upon DDR3 ram claims 106GB/s bandwidth to RAM,
versus todays Nehalem claims 32GB/s and is achieving a 17 to 18GB/s,
then obviously it wasn't important enough for intel to give us more
bandwidth to the RAM.

If nvidia/amd GPU's can do it years before, and latest cpu is a  
factor 4+ off then discussions
about bandwidth to RAM are quite artificial.

The reason for that is the limitations of SPEC to RAM consumption.  
They design a benchmark
years beforehand to use an amount of RAM that is "common" now.

I would argue that those most hungry for bandwidth/core crunching  
power is the scientific world
and/or safety research (air and car industry).

Note that i'm speaking of streaming bandwidth above. Most scientists  
do not know the
difference between bandwidth and latency, basically because they are  
right that in the end it is
all bandwidth related from theoretical viewpoint.

Yet practical there is so many factors influencing the latency. Intel/ 
AMD/IBM are doing
big efforts of course to reduce latency a lot. Maybe 95% of all their  
work onto a cpu (blindfolded guess
from a computer science guy - so not hardware designer)?

In the end it is all about the testsets in spec. If we manage to get  
a bunch of real WELL OPTIMIZED low level
codes that eat gigabytes of RAM finally into that spec then within  
years AMD and Intel will show up with some
real fast cpu's for scientific workloads.

If all "professors" type RGB make a lot of noise world wide to get  
that done, then they have to follow.

Any criticism against intel and amd with respect to: "why not do this  
and that", i'm doing it also all the time,
but at the same time if you look to what happens in spec, spec is  
only about "who has the best compiler
and the biggest L2 cache that nearly can contain the entire working  
set size of this tiny RAM program".

Get some serious software into SPEC i'd argue.

To start looking at myself: the reason i didn't donate Diep is  
because competitors can
also obtain my code, whereas all those compiler
and hardware manufacturers i don't care if they have my proggies  
source code.

Vincent

On Dec 5, 2008, at 2:44 PM, Mark Hahn wrote:

>> (Well, duh).
>
> yeah - the point seems to be that we (still) need to scale memory
> along with core count.  not just memory bandwidth but also concurrency
> (number of banks), though "ieee spectrum online for tech insiders"
> doesn't get into that kind of depth :(
>
> I still usually explain this as "traditional (ie Cray) supercomputing
> requires a balanced system."  commodity processors are always less  
> balanced
> than ideal, but to varying degrees.  intel dual-socket quad-core  
> was probably the worst for a long time, but things are looking up  
> as intel
> joins AMD with memory connected to each socket.
>
> stacking memory on the processor is a red herring IMO, though they  
> appear
> to assumed that the number of dram banks will scale linearly with  
> cores.
> to me that sounds more like dram-based per-core cache.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf
>