Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] bizarre scaling behavior on a Nehalem

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mikhail Kuzminsky kus at free.net
Fri Aug 14 08:08:32 PDT 2009


In message from Bill Broadley <bill at cse.ucdavis.edu> (Thu, 13 Aug 2009 
17:09:24 -0700):
>Tom Elken wrote:
>> To add some details to what Christian says, the HPC Challenge 
>>version of
>> STREAM uses dynamic arrays and is hard to optimize.  I don't know 
>>what's
>> best with current compiler versions, but you could try some of these 
>>that
>> were used in past HPCC submissions with your program, Bill:
>
>Thanks for the heads up, I've checked the specbench.org compiler 
>options for
>hints on where to start with optimization flags, but I didn't know 
>about the
>dynamic stream.
>
>Is the HPC challenge code open source?

Yes, they are open.

>
>> PathScale 2.2.1 on Opteron:
>> Base OPT flags: -O3 -OPT:Ofast:fold_reassociate=0 
>> STREAMFLAGS=-O3 -OPT:Ofast:fold_reassociate=0 
>>-OPT:alias=restrict:align_unsafe=on -CG:movnti=1
>
>Alas my pathscale license expired and I believe with sci-cortex's 
>death (RIP)
>I can't renew it.

Now I understand that I was sage :-)
(we purchased perpetual acafemic license). &#1042;&#1058;W, do 
somebody know about Pathscale compilers future (if it will be) ?

Mikhail

>
>I tried open64-4.2.2 with those flags and on a nehalem single socket:
>
>$ opencc -O4 -fopenmp stream.c -o stream-open64 -static
>$ opencc -O4 -fopenmp stream-malloc.c -o stream-open64-malloc -static
>
>$ ./stream-open64
>Total memory required = 457.8 MB.
>Function      Rate (MB/s)   Avg time     Min time     Max time
>Copy:       22061.4958       0.0145       0.0145       0.0146
>Scale:      22228.4705       0.0144       0.0144       0.0145
>Add:        20659.2638       0.0233       0.0232       0.0233
>Triad:      20511.0888       0.0235       0.0234       0.0235
>
>Dynamic:
>$ ./stream-open64-malloc
>
>Function      Rate (MB/s)   Avg time     Min time     Max time
>Copy:       14436.5155       0.0222       0.0222       0.0222
>Scale:      14667.4821       0.0218       0.0218       0.0219
>Add:        15739.7070       0.0305       0.0305       0.0305
>Triad:      15770.7775       0.0305       0.0304       0.0305
>
>> Intel C/C++ Compiler 10.1 on Harpertown CPUs:
>> Base OPT flags:	 -O2 -xT -ansi-alias -ip -i-static
>> Intel recently used
>> Intel C/C++ Compiler 11.0.081 on Nehalem CPUs:
>> 	 -O2 -xSSE4.2 -ansi-alias -ip
>> and got good STREAM results in their HPCC submission on their 
>>ENdeavor cluster.
>
>$ icc -O2 -xSSE4.2 -ansi-alias -ip -openmp stream.c -o stream-icc
>$ icc -O2 -xSSE4.2 -ansi-alias -ip -openmp stream-malloc.c -o
>stream-icc-malloc
>
>$ ./stream-icc | grep ":"
>STREAM version $Revision: 5.9 $
>Copy:       14767.0512       0.0022       0.0022       0.0022
>Scale:      14304.3513       0.0022       0.0022       0.0023
>Add:        15503.3568       0.0031       0.0031       0.0031
>Triad:      15613.9749       0.0031       0.0031       0.0031
>$ ./stream-icc-malloc | grep ":"
>STREAM version $Revision: 5.9 $
>Copy:       14604.7582       0.0022       0.0022       0.0022
>Scale:      14480.2814       0.0022       0.0022       0.0022
>Add:        15414.3321       0.0031       0.0031       0.0031
>Triad:      15738.4765       0.0031       0.0030       0.0031
>
>So ICC does manage zero penalty, alas no faster than open64 with the 
>penalty.
>
>I'll attempt to track down the HPCC stream source code to see if 
>their dynamic
>arrays are any friendlier than mine (I just use malloc).
>
>In any case many thanks for the pointer.
>
>Oh, my dynamic tweak:
>$ diff stream.c stream-malloc.c
>43a44
>> # include <stdlib.h>
>97c98
>< static double	a[N+OFFSET],
>---
>> /* static double	a[N+OFFSET],
>99c100,102
>< 		c[N+OFFSET];
>---
>> 		c[N+OFFSET]; */
>>
>> double *a, *b, *c;
>134a138,142
>>
>>     a=(double *)malloc(sizeof(double)*(N+OFFSET));
>>     b=(double *)malloc(sizeof(double)*(N+OFFSET));
>>     c=(double *)malloc(sizeof(double)*(N+OFFSET));
>>
>283c291,293
><
>---
>>     free(a);
>>     free(b);
>>     free(c);
>
>
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin 
>Computing
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf
>
>-- 
>üÔÏ ÓÏÏÂÝÅÎÉÅ ÂÙÌÏ ÐÒÏ×ÅÒÅÎÏ ÎÁ ÎÁÌÉÞÉÅ × ÎÅÍ ×ÉÒÕÓÏ×
>É ÉÎÏÇÏ ÏÐÁÓÎÏÇÏ ÓÏÄÅÒÖÉÍÏÇÏ ÐÏÓÒÅÄÓÔ×ÏÍ
>MailScanner, É ÍÙ ÎÁÄÅÅÍÓÑ
>ÞÔÏ ÏÎÏ ÎÅ ÓÏÄÅÒÖÉÔ ×ÒÅÄÏÎÏÓÎÏÇÏ ËÏÄÁ.
>




More information about the Beowulf mailing list