[Beowulf] [email@example.com: CCL:dual-core
kus at free.net
Mon Jul 4 10:48:28 PDT 2005
In message from Vincent Diepeveen <diep at xs4all.nl> (Mon, 04 Jul 2005
>Of course we take a large buffer. Around 400MB is the working set
>the hashtable which i use for my chess software (which is reading
>a 8-64 bytes from the cache).
> single cpu A64 : 91 ns (cl2 memory)
> single cpu P4 : 220 ns (cl2 memory, bus overclocked)
> dual opteron : 120 ns
> quad opteron : 133 ns
> dual xeon : 280 ns (800Mhz bus)
> dual xeon : 400 ns (533Mhz bus)
The latencies should depends from processors frequencies (although
RAM part is much higher),
so what was the frequencies for A64/P4/Opteron/Xeon ?
And do I understand you correctly that you have 1/2/4 threads which
perform "random" read of some bytes from main memory ?
>So obviously things that do not fit in L2 cache, the opteron runs
>it. Only if the executable is optimized in question by the intel c++
>compiler it will have done stuff to run it faster at intel processors
>than >at opteron,
>then results do not look too bad for P4.
If the results above are for "bad" (bad optimizing) compiler -
in some sense it's the problem of compiler :-) Yes, old binary
software will work slow. But many, many HPC applications may be
BTW, more good results are for icc++ only - do you know
something about PGI and PathScale compilers ?
> Yet that's a matter of
>it for opteron better, which most software dudes do NOT do, as intel
>delivers good support and AMD historically didn't deliver *any* kind
>support (they are improving now, but even then their math libraries
>pathetic compared to the ease of the intel libraries that i can
>least *that* part of
acml 2.1 gives me a set of good results for Opteron in comparison
More information about the Beowulf