Intel is finally shipping the 64-bit Itanium

Sat May 26 13:24:18 PDT 2001

> The Itaniums will make for nice SMP clusters though with its fast front 
> side bus. Multiple IA-64s sharing a FSB along with many GB of shared 
> memory and Infinband for very fast interconnects will be nice.

choosing a shared FSB (shared remote DRAM) is a fairly bold statement
about the expected uses of a machine.  for instance, it makes sense
if you expect very good local cache hitrates.  and it makes sense if 
you expect zero cache hitrates.  the alternative (local DRAM) is really
an explicitly-managed, much larger, CPU-local cache.

I'm guessing, definitely WAG, that Itanium will be pessimal for 
compute clusters.  suppose, in 1-2 years, we have this scenario:

1. Itanium machines, perhaps 8-way, with CPUs sitting on a 
3.2 to 6.4 GB/s bus, talking to DRAM.  each CPU is roughly
the same speed as a P4/2GHz, and has some small number of MB's
of local cache.

2. Athlon machine, with each CPU connected to its own 2.1-3.2 GB/s
DRAM array, using 6.4 GB/s hyperchannel to to maintain coherence, etc.

now, which do you think will perform better?  the AMD approach has
a HUGE advantage if your working set (as seen by a single CPU)
is more like 2^30 bytes, rather than 2^20.  *and* assuming that 
you can arrange this data reasonably locally.

personally, I'd much prefer the optimistic architecture that 
scales my DRAM bandwidth with ncpus.  in fact, this is really the 
whole idea of clustering, at a different scale.  I believe that 
many-way Itaniums are aimed at "commercial" applications, which
seem to be mainly pumping blocks from one place to another.  
clearly if your DRAM is mainly just a staging area for disk/net IO,
these working-set issues are pretty irrelevant.  afakt, this is 
the rationale for Intel's current 8x Xeon high-end, which would seem
to suck rocks for any computational purpose (2 clusters of 4 cpus
starving on a measly little .8 GB/s bus!)

who knows, maybe a 4M local cache really is enough to make up for 
the fact that big-SMP machines have always delivered pathetic dram
latency (I recall >500 ns for Sun's high-end of a year or so ago,
versus 150 ns for local/uniprocessor)...

I can't imagine Itanium being a mass-market item for years, if ever.
and I pledge allegiance to the Orthodox Church of Beowulf, which
holds that if it's not mass-market, it's not cluster-Kosher ;)

regards, mark hahn.