[Beowulf] Benchmark between Dell Poweredge 1950 And 1435
Peter St. John
peter.st.john at gmail.com
Thu Mar 8 09:14:43 PST 2007
Great thanks. That was clear and the takeaway is that I should pay attention
to the number of memory channels per core (which may be less than 1.0)
besides the number of cores and the RAM/core.
What is the "ncpu" column in Table 1 (for example)? Does the 4 refer to 4
cores, and the 1 and 2 cases don't use all the cores on the motherboard? Or
is "ncpu" an application parameter? I read it as "number of CPUs"? I noted
that the heart simulation didn't have an ncpu column, but that was why I
thought you had multiple nodes going.
Thanks very much,
P.S. and then where does the billiard cue go?
On 3/8/07, Joshua Baker-LePain <jlb17 at duke.edu> wrote:
> On Thu, 8 Mar 2007 at 11:33am, Peter St. John wrote
> > Those benchmarks are quite interesting and I wonder if I interpret them
> > all correctly.
> > It would seem that the Intel outperforms it's advantage in clockspeed
> > faster, but ballpark 1/3 better performance?) so the question would be
> > performance gain per dollar cost (which is fine); however, for that
> > simulation towards the end, it looks like the AMD scales up with
> > nodecount enormously better, and with several nodes actually outperforms
> > faster Intel.
> > Should I guess at relatively poor performance of the networking on the
> > motherboard used with the intel chip or does that have anything to do
> > the CPU itself?
> Each benchmark was run on a single sytem with 4 CPUs (or, rather, 4 cores
> in 2 sockets) -- there was no network involved. The difference (IMO) lies
> in the memory subsystems of the 2 architectures.
> Opterons have 1 memory controller per socket (on the CPU, shared by the 2
> cores) attached to a dedicated bank of memory via a Hypertransport link
> (referred to from here on as HT). That socket is connected to the other
> CPU socket (and its HT connected memory bank) by HT.
> Xeons (still) have a single memory controller hub to which the CPUs
> communicate via the front side bus (FSB). That single hub has 2 channels
> to memory.
> So, yes, clock-for-clock (and for my usage) Xeon 51xxs are faster than
> Opterons. But, if your code hits memory *really hard* (which that heart
> model does), then the multiple paths to memory available to the Opterons
> allow them to scale better.
> This situation has existed for a long time on the Intel side. For P4
> based Xeons it was crippling. The new Core based Xeons, however, don't
> suffer nearly as badly (due to their big cache, maybe?). E.g. the thermal
> simulations in that same file are pretty memory intensive themselves, and
> P4 based Xeons scaled *horribly* on them. But the 51xx Xeons still scale
> very well on them (which surprised me).
> Joshua Baker-LePain
> Department of Biomedical Engineering
> Duke University
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf