question about Intel P4 versus Alpha's

Fri Jan 10 10:51:24 PST 2003

On Wed, Jan 08, 2003 at 09:49:26AM -0600, Henderson, TL Todd wrote:

> I have an app, that scaled out to the 30ish XP1000's we had very
> nicely, basically linearly.  If you looked at top, it was almost
> always in the 95-100%cpu utilization range.  However, now we have a
> cluster of P4's and when I look at top, it is more like 70-80% cpu
> usage.  This is using the same number of cpu's, the same switch,
> same code, and same problem for the code.  The jobs are still
> completing in about 30-40% less time, so we are getting a speed up.

> My guesstimate was that this was an indication of the memory
> bandwidth and speed to memory.  I know the XP1000's have a nice
> memory subsystem.  Was/is it that much better than the 533/2.4 ghz
> P4's?

Assuming that hyperthreading is off, I believe that if your processes
are only showing 70-80% utilization then your P4's are definitely not
memory bound.  My experience is that memory bound processes show 99%+
utilization - when the processor stalls, say for TLB or cache line
loads, your process is "billed" for that whole time.  When "streams"
runs, for example, it uses up all available memory bandwidth and shows
99%+ cpu utilization.

Likewise, if your processor is stalling on FPU access, I'd guess
utilization would show 99%+ as well.

FWIW, the memory subsystem on 533/2.x GHz P4's is quite a bit better
than an XP1000, at least as far as the streams benchmark shows.  The
"COPY" number posted at the streams website for Compaq_XP1000 is
900 MB/sec.  I just ran on a 533/2.26 P4 (PC1066 RDRAM) and measured
2024 MB/sec.

Perhaps your XP1000 cluster was close to reaching the limits of your
network, and now with the faster P4's your jobs are I/O bound.  If
running a simple cycle-eating process at a low priority on your nodes at
the same time as your job (to reach 100% cpu utilization) doesn't affect
throughput, then the communications are suspect.

Don Holmgren
Fermilab