[Beowulf] Re: OT: informatics software for linux clusters

Mark Hahn hahn at physics.mcmaster.ca
Tue May 16 08:47:41 PDT 2006

> > That is an issue with this code.  The Athlon has a 256k L2 last I
> > remember, and a 128k L1.  Rather hard to keep lots of stuff in cache.

for their time (now well passed), 384 KB was a decent cache capacity.
(remember that AMD has traditionally used an exclusive cache mechanism
so that everything in L1 is not also in L2, unlike Intel.)

> Barton cores had 512k L2 as well as a faster front side bus.

I speculate that AMD will follow Intel to 2M/core caches as soon 
as they start producing 65 nm chips.  hopefully, they'll also add 
better _compute_ units, as well, such as at least matching Intel's
Core2 FP capabilities.

> > Right now the big issue we are running into for another aspect of this
> > project is the lack of a vector max/min function in SSE*.  (If anyone

I'm a complete SSE virgin (almost), but isn't this largely just 
a matter of doing a packed comparison, then using the resulting 
per-unit bit to load and merge?

> > from AMD/Intel is listening, this is a *big* issue, and I even have a
> > rough idea how to do it "quickly" in SSE at the expense of many SSE
> > registers.

I'd think you'd need one reg to hold the current max, one to load 
candidates into, and probably another to do the flag-vector-merge thing.
at the end you do a "horizontal" min/max to get the final result.

More information about the Beowulf mailing list