Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Woodcrest Memory bandwidth

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Peter Kjellstrom cap at nsc.liu.se
Tue Aug 15 10:21:02 PDT 2006


On Tuesday 15 August 2006 17:25, Richard Walsh wrote:
> Mark Hahn wrote:
> >>> Good point which makes perfect sense to me.
> >>> Given that the theoretical maximum is actually 21.3 GB/s
> >>> the real maximum Triad number must be 21.3/3 = 7.1 GB/s.
> >
> > I don't get this - triad does two reads and one write.
> > if you don't use store-through ('nt' versions of mov),
> > then the write also implies a read for write-allocate
> > (filling the cache line).
> > without store-through, the peak theoretical number reported by
> > stream should be 3*peak/4.  the 4 is because there are 3r+1w,
> > and the 3 because stream doesn't give credit for write-allocate.
>
> That looks right.  So, one socket, with write allocate, >>should<< show:
>
>       10.5 GB/sec * .75 or 7.875 GBytes/sec
>
> and two sockets 15.75 GBytes/sec.  The problem could be related
> to  competitive/ineffective use of the shared L2 cache or a bottleneck
> in the North bridge.  It would seem that a look at how the performance
> grows as you add cores within versus across sockets should reveal this.

here you go (dell 2950 with 8 modules and streams compiled with icc-9.1 -O3:

[root at tbox3 streamd]# hostname ; date ; for i in 1 2 3 4 5 ; do export 
OMP_NUM_THREADS=$i ; ./streamd | egrep "Total memory re|Number of Th|Function 
|Copy:|Scale:|Add:|Triad:"; done
tbox3
Fri Aug 11 17:59:22 CEST 2006
Total memory required = 457.8 MB.
Number of Threads requested = 1
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        3945.5494       0.0812       0.0811       0.0813
Scale:       2914.9758       0.1098       0.1098       0.1099
Add:         3227.5618       0.1488       0.1487       0.1489
Triad:       3219.5307       0.1492       0.1491       0.1493
Total memory required = 457.8 MB.
Number of Threads requested = 2
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        4324.2058       0.0741       0.0740       0.0742
Scale:       2999.9626       0.1068       0.1067       0.1069
Add:         3309.2733       0.1451       0.1450       0.1452
Triad:       3309.7031       0.1451       0.1450       0.1452
Total memory required = 457.8 MB.
Number of Threads requested = 3
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        5422.5441       0.0590       0.0590       0.0590
Scale:       4102.8364       0.0780       0.0780       0.0781
Add:         4487.2464       0.1070       0.1070       0.1070
Triad:       4487.7465       0.1070       0.1070       0.1070
Total memory required = 457.8 MB.
Number of Threads requested = 4
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        6023.2969       0.0532       0.0531       0.0533
Scale:       4862.4855       0.0658       0.0658       0.0659
Add:         5264.1973       0.0912       0.0912       0.0913
Triad:       5268.1782       0.0911       0.0911       0.0911
Total memory required = 457.8 MB.
Number of Threads requested = 5
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        5504.9004       0.0582       0.0581       0.0582
Scale:       4318.9044       0.0786       0.0741       0.1147
Add:         4705.1016       0.1042       0.1020       0.1216
Triad:       4705.2885       0.1038       0.1020       0.1184

> Two cores on separate sockets should show higher numbers if it's
> an L2 cache issue.  If they are the same as those for 2 cores on one
> socket then you have a problem with the North bridge or getting
> full bandwidth from the FB-DIMMs.
>
> A complication in this test could be that in the one core per socket case
> the whole L2 cache is allocated to a single core.  Watching performance
> change as the array sizes grow should reveal this.
>
> rbw
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.scyld.com/pipermail/beowulf/attachments/20060815/f80502be/attachment.bin


More information about the Beowulf mailing list