Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

Anyone have information on latest LSU beowulf?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Craig Tierney ctierney at hpti.com
Tue Oct 8 09:54:47 PDT 2002


On Tue, Oct 08, 2002 at 01:35:27PM +0100, Daniel Kidger wrote:
> 
> On Mon, 23 Sep 2002, Craig Tierney wrote:
> > http://www.phys.lsu.edu/faculty/tohline/capital/beowulf.html
> > 
> > Their HPL result is 2.2 Tflops!  Very impressive.
> > 
> > (lines deleted) 
> > What HPL settings did they use to achieve their result?
> 
> I think that many people would be interested in their settings for xhpl,
> in particular what percentage peak did they manage for a single CPU run ?
> 
> A 1.8GHz P4 has a theoretical peak of 3.6GFlops/s, but so far I have only
> seen figures of around 60% of this for linpack. Compare this with 75%+ for
> Alpha nodes (and of course 95%+ for vector processors).
> 
> So, in terms of single processor performance:
> 
>    Which compiler did they use ?  (icc version 7 perhaps)
>    And which compiler options ?
>    Did they use mkl or Atlas for the BLAS ?

Probably Atlas.  I tested both and got about 70-80% more speed
out of Atlas compiled with gcc-2.91 (recommended).

>    What value of NB did they settle on ? (80 and 160 seem common choices)
>    any other non-default values in HPL.dat ?

Why are 80 and 160 common choices?  I do know that they used 160
for their run.  I also retested my setup at 160 and it is much
slower than 64.  I was told by someone at UTK that the size of
NB should be a multiple of the L1 cache and that double is good.
So NB = sqrt(8kb * 1024/8)=32 for P4 Xeon.  I tried 64 and that has 
been the best for a single node run.  

The one thing I have not yet determined is that maybe the larger number
is more effective when you start running on large node counts.
I cycled through the NB values from 32 to 160 on 64 processors
and NB=64 was still the best.  

I wonder if having more memory (1 GB vs. 2 GB per node) could
drastically improve scaling.  Anyone know?

Craig

-- 
Craig Tierney (ctierney at hpti.com)


> 
> 
> 
> 
> Yours,
> Daniel.
> 
> --------------------------------------------------------------
> Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
> One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
> ----------------------- www.quadrics.com --------------------




More information about the Beowulf mailing list