Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

Impressive stream results (nvidia nforce/crush)

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Bill Broadley bill at math.ucdavis.edu
Tue Feb 26 00:34:03 PST 2002


I'm building some thin clients, picked up a micro-atx abit with the
nvidia nforce/crush chipset, just for fun I ran stream, on an array
twice as big as stock:

gcc -O1
Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         715.3312       0.0895       0.0895       0.0897
Scale:        683.3954       0.0937       0.0937       0.0938
Add:          819.3646       0.1172       0.1172       0.1172
Triad:        677.4304       0.1419       0.1417       0.1422

gcc -O2
Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         715.8281       0.0895       0.0894       0.0899
Scale:        425.0063       0.1507       0.1506       0.1507
Add:          926.7837       0.1036       0.1036       0.1037
Triad:        552.4164       0.1739       0.1738       0.1742

gcc -O3
Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         716.1721       0.0895       0.0894       0.0896
Scale:        424.9328       0.1507       0.1506       0.1507
Add:          926.4625       0.1037       0.1036       0.1040
Triad:        552.5436       0.1739       0.1737       0.1740

gcc -O4 (prolly just a sampling error vs -O3)
Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         742.4336       0.0889       0.0862       0.0899
Scale:        443.9945       0.1495       0.1441       0.1507
Add:          958.3234       0.1030       0.1002       0.1037
Triad:        570.7626       0.1728       0.1682       0.1740

I was pretty impressed with the numbers, then I remember I
didn't even have a decent cpu:
model name	: AMD Duron(tm) processor
cpu MHz		: 952.172

Keep in mind this cpu has a 100 Mhz FSB (DDR) so it can only
place a request every 10 ns.  Also it doesn't have the special
memory prefetch circuitry available in the palaminos.  Although
I believe the nvidia crush chipset has similar functionality.

If you want a killer, cheap, node I'd at least take a look,
on board ethernet, video, eide, and usb.  Oh and it PXE boots
making it diskless friendly.

In any case these are some of the best numbers I've seen
for an athlon. 


-- 
Bill Broadley
Mathematics/Institute of Theoretical Dynamics
UC Davis



More information about the Beowulf mailing list