Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Woodcrest Memory bandwidth

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Kozin, I (Igor) i.kozin at dl.ac.uk
Tue Aug 15 10:57:46 PDT 2006


Interesting...
Given that Add and Triad are virtually the same
it's surprising that Copy and Scale are so different.
IMHO Scale should be more like Copy. Compiler effect?


> here you go (dell 2950 with 8 modules and streams compiled with icc-9.1 -O3:
>
> [root at tbox3 streamd]# hostname ; date ; for i in 1 2 3 4 5 ; 
> do export 
> OMP_NUM_THREADS=$i ; ./streamd | egrep "Total memory 
> re|Number of Th|Function 
> |Copy:|Scale:|Add:|Triad:"; done
> tbox3
> Fri Aug 11 17:59:22 CEST 2006
> Total memory required = 457.8 MB.
> Number of Threads requested = 1
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        3945.5494       0.0812       0.0811       0.0813
> Scale:       2914.9758       0.1098       0.1098       0.1099
> Add:         3227.5618       0.1488       0.1487       0.1489
> Triad:       3219.5307       0.1492       0.1491       0.1493
> Total memory required = 457.8 MB.
> Number of Threads requested = 2
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        4324.2058       0.0741       0.0740       0.0742
> Scale:       2999.9626       0.1068       0.1067       0.1069
> Add:         3309.2733       0.1451       0.1450       0.1452
> Triad:       3309.7031       0.1451       0.1450       0.1452
> Total memory required = 457.8 MB.
> Number of Threads requested = 3
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        5422.5441       0.0590       0.0590       0.0590
> Scale:       4102.8364       0.0780       0.0780       0.0781
> Add:         4487.2464       0.1070       0.1070       0.1070
> Triad:       4487.7465       0.1070       0.1070       0.1070
> Total memory required = 457.8 MB.
> Number of Threads requested = 4
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        6023.2969       0.0532       0.0531       0.0533
> Scale:       4862.4855       0.0658       0.0658       0.0659
> Add:         5264.1973       0.0912       0.0912       0.0913
> Triad:       5268.1782       0.0911       0.0911       0.0911
> Total memory required = 457.8 MB.
> Number of Threads requested = 5
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        5504.9004       0.0582       0.0581       0.0582
> Scale:       4318.9044       0.0786       0.0741       0.1147
> Add:         4705.1016       0.1042       0.1020       0.1216
> Triad:       4705.2885       0.1038       0.1020       0.1184
> 
> > Two cores on separate sockets should show higher numbers if it's
> > an L2 cache issue.  If they are the same as those for 2 cores on one
> > socket then you have a problem with the North bridge or getting
> > full bandwidth from the FB-DIMMs.
> >
> > A complication in this test could be that in the one core 
> per socket case
> > the whole L2 cache is allocated to a single core.  Watching 
> performance
> > change as the array sizes grow should reveal this.
> >
> > rbw
> 




More information about the Beowulf mailing list