Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

Fortran compilers for Linux/mpich

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Robert G. Brown rgb at phy.duke.edu
Sun Nov 25 08:32:25 PST 2001


On Fri, 23 Nov 2001, Don Holmgren wrote:

> At the very bottom of the page,
>    http://qcdhome.fnal.gov/sse/
> I have a table with cycle counts posted for a number of matrix-matrix
> and matrix-vector routines as measured on a P-III (Coppermine), P4, and
> an Athlon MP.  Times are posted for both a pure-C version of each
> routine, built with gcc, as well as for an SSE version.  The sources
> for each are available at
>    http://qcdhome.fnal.gov/sse/catalog.html
> 
> The results are a mixed bag, with each flavor processor sometimes first,
> second, or third.  I'm using only a small subset of SSE - mostly shufps,
> addps, mulps, with a few xops, movaps, and movups thrown in.  I haven't
> timed individual instructions on all three processors.
> 
> Don Holmgren
> Fermilab

Awesomely useful, Don, thanks.

Do you have any idea what the overall marginal benefit is of using your
hand-optimized routines when working on large datasets (too big to fit
into cache)?  In particular, does performance devolve to
memory-bandwidth-bound behavior (and hence end up being the same for
MILC and SSE and dominated by the memory bus speed)?

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu






More information about the Beowulf mailing list