[Beowulf] Re: OT: informatics software for linux clusters
hahn at physics.mcmaster.ca
Tue May 16 08:47:41 PDT 2006
> > That is an issue with this code. The Athlon has a 256k L2 last I
> > remember, and a 128k L1. Rather hard to keep lots of stuff in cache.
for their time (now well passed), 384 KB was a decent cache capacity.
(remember that AMD has traditionally used an exclusive cache mechanism
so that everything in L1 is not also in L2, unlike Intel.)
> Barton cores had 512k L2 as well as a faster front side bus.
I speculate that AMD will follow Intel to 2M/core caches as soon
as they start producing 65 nm chips. hopefully, they'll also add
better _compute_ units, as well, such as at least matching Intel's
Core2 FP capabilities.
> > Right now the big issue we are running into for another aspect of this
> > project is the lack of a vector max/min function in SSE*. (If anyone
I'm a complete SSE virgin (almost), but isn't this largely just
a matter of doing a packed comparison, then using the resulting
per-unit bit to load and merge?
> > from AMD/Intel is listening, this is a *big* issue, and I even have a
> > rough idea how to do it "quickly" in SSE at the expense of many SSE
> > registers.
I'd think you'd need one reg to hold the current max, one to load
candidates into, and probably another to do the flag-vector-merge thing.
at the end you do a "horizontal" min/max to get the final result.
More information about the Beowulf