Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Benchmark between Dell Poweredge 1950 And 1435

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Peter St. John peter.st.john at gmail.com
Thu Mar 8 09:14:43 PST 2007


Joshua,
Great thanks. That was clear and the takeaway is that I should pay attention
to the number of memory channels per core (which may be less than 1.0)
besides the number of cores and the RAM/core.

What is the "ncpu" column in Table 1 (for example)? Does the 4 refer to 4
cores, and the 1 and 2 cases don't use all the cores on the motherboard? Or
is "ncpu" an application parameter? I read it as "number of CPUs"? I noted
that the heart simulation didn't have an ncpu column, but that was why I
thought you had multiple nodes going.

Thanks very much,
Peter

P.S. and then where does the billiard cue go?


On 3/8/07, Joshua Baker-LePain <jlb17 at duke.edu> wrote:
>
> On Thu, 8 Mar 2007 at 11:33am, Peter St. John wrote
>
> > Those benchmarks are quite interesting and I wonder if I interpret them
> at
> > all correctly.
> > It would seem that the Intel outperforms it's advantage in clockspeed
> (1/6th
> > faster, but ballpark 1/3 better performance?) so the question would be
> > performance gain per dollar cost (which is fine); however, for that
> heart
> > simulation towards the end, it looks like the AMD scales up with
> increasing
> > nodecount enormously better, and with several nodes actually outperforms
> the
> > faster Intel.
> > Should I guess at relatively poor performance of the networking on the
> > motherboard used with the intel chip or does that have anything to do
> with
> > the CPU itself?
>
> Each benchmark was run on a single sytem with 4 CPUs (or, rather, 4 cores
> in 2 sockets) -- there was no network involved.  The difference (IMO) lies
> in the memory subsystems of the 2 architectures.
>
> Opterons have 1 memory controller per socket (on the CPU, shared by the 2
> cores) attached to a dedicated bank of memory via a Hypertransport link
> (referred to from here on as HT).  That socket is connected to the other
> CPU socket (and its HT connected memory bank) by HT.
>
> Xeons (still) have a single memory controller hub to which the CPUs
> communicate via the front side bus (FSB).  That single hub has 2 channels
> to memory.
>
> So, yes, clock-for-clock (and for my usage) Xeon 51xxs are faster than
> Opterons.  But, if your code hits memory *really hard* (which that heart
> model does), then the multiple paths to memory available to the Opterons
> allow them to scale better.
>
> This situation has existed for a long time on the Intel side.  For P4
> based Xeons it was crippling.  The new Core based Xeons, however, don't
> suffer nearly as badly (due to their big cache, maybe?).  E.g. the thermal
> simulations in that same file are pretty memory intensive themselves, and
> P4 based Xeons scaled *horribly* on them.  But the 51xx Xeons still scale
> very well on them (which surprised me).
>
> --
> Joshua Baker-LePain
> Department of Biomedical Engineering
> Duke University
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20070308/d84706e0/attachment.html


More information about the Beowulf mailing list