[Beowulf] Multicore Is Bad News For Supercomputers
diep at xs4all.nl
Fri Dec 5 09:15:01 PST 2008
Well every scientist who says he needs a lot of RAM now,
ECC-DDR2 ram has a cost of near nothing right now.
Very cheaply you can build nodes now with like 4 cheapo cpu's
and 128 GB ram inside.
There is no excuse for those who beg for big RAM to not buy a bunch
What happens each time is that at the moment that finally the price
of some sort
of RAM drops (note that ECC-Registered DDR ram never has gotten
to my disappointment), that a newer generation RAM is there which
again is really
I tend to believe that many algorithms that require really a lot of
ram can do with a bit
less and profit from todays huge cpu power, using some clever tricks
and/or new algorithms (sometimes it is difficult to define what is a
new algorithm, if it
looks so much like a previous one with just a few new enhancements),
are far from trivial.
Usually programming the 'new' algorithm efficiently low level is the
big killerproblem why
it doesn't get used yet (as there is no budget to hire people who are
or simply because they work for some other company or other
I would really argue that sometimes you have to give industry some
time to mass produce memory,
just design a new generation cpu based upon the RAM that's there now
and just read massively
parallel from that RAM. That also gives a HUGE bandwidth.
If some older GPU based upon DDR3 ram claims 106GB/s bandwidth to RAM,
versus todays Nehalem claims 32GB/s and is achieving a 17 to 18GB/s,
then obviously it wasn't important enough for intel to give us more
bandwidth to the RAM.
If nvidia/amd GPU's can do it years before, and latest cpu is a
factor 4+ off then discussions
about bandwidth to RAM are quite artificial.
The reason for that is the limitations of SPEC to RAM consumption.
They design a benchmark
years beforehand to use an amount of RAM that is "common" now.
I would argue that those most hungry for bandwidth/core crunching
power is the scientific world
and/or safety research (air and car industry).
Note that i'm speaking of streaming bandwidth above. Most scientists
do not know the
difference between bandwidth and latency, basically because they are
right that in the end it is
all bandwidth related from theoretical viewpoint.
Yet practical there is so many factors influencing the latency. Intel/
AMD/IBM are doing
big efforts of course to reduce latency a lot. Maybe 95% of all their
work onto a cpu (blindfolded guess
from a computer science guy - so not hardware designer)?
In the end it is all about the testsets in spec. If we manage to get
a bunch of real WELL OPTIMIZED low level
codes that eat gigabytes of RAM finally into that spec then within
years AMD and Intel will show up with some
real fast cpu's for scientific workloads.
If all "professors" type RGB make a lot of noise world wide to get
that done, then they have to follow.
Any criticism against intel and amd with respect to: "why not do this
and that", i'm doing it also all the time,
but at the same time if you look to what happens in spec, spec is
only about "who has the best compiler
and the biggest L2 cache that nearly can contain the entire working
set size of this tiny RAM program".
Get some serious software into SPEC i'd argue.
To start looking at myself: the reason i didn't donate Diep is
because competitors can
also obtain my code, whereas all those compiler
and hardware manufacturers i don't care if they have my proggies
On Dec 5, 2008, at 2:44 PM, Mark Hahn wrote:
>> (Well, duh).
> yeah - the point seems to be that we (still) need to scale memory
> along with core count. not just memory bandwidth but also concurrency
> (number of banks), though "ieee spectrum online for tech insiders"
> doesn't get into that kind of depth :(
> I still usually explain this as "traditional (ie Cray) supercomputing
> requires a balanced system." commodity processors are always less
> than ideal, but to varying degrees. intel dual-socket quad-core
> was probably the worst for a long time, but things are looking up
> as intel
> joins AMD with memory connected to each socket.
> stacking memory on the processor is a red herring IMO, though they
> to assumed that the number of dram banks will scale linearly with
> to me that sounds more like dram-based per-core cache.
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
More information about the Beowulf