Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] EM64T Clusters

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Don Holmgren djholm at fnal.gov
Wed Jul 28 21:12:39 PDT 2004



On Wed, 28 Jul 2004, Bill Broadley wrote:

> > We've just brought up a test stand with both PCI-X and PCI-E Infiniband
> > host channel adapters.  Some very preliminary (and sketchy, sorry) test
> > results which will be updated occassionally are available at:
> >
> >    http://lqcd.fnal.gov/benchmarks/newib/
>
> Interesting, the listed:
>     * PCI Express: 4.5 microsec
>     * PCI-X, "HPC Gold": 7.4 microsec
>     * PCI-X, Topspin v2.0.0_531: 7.3 microsec
>
> Seem kind of slow to me, I suspect it's mostly the nodes (not pci-x).


I suspect that you're right.  Usually I've heard that I should see only
a 1 microsecond improvement in moving from PCI-X to PCI-Express.  The
numbers I'm reporting for the E7500 implementation of PCI-X are
consistent with what I measured last September on an E7501 cluster using
an older Netpipe (version 2.3) - see http://lqcd.fnal.gov/ib/.  My data
files from those runs show 7 microseconds, reported by Netpipe only to
that precision.  E7500/E7501 is getting pretty old, I suppose - these
nodes are 3 years old now, and if I recall correctly E7500 was the first
PCI-X chipset from Intel (i860 was just PCI 64/66, maybe?). E7500 was
also the first well-performing PCI bus from Intel after the terrible
PCI bandwidths on i840/i850/i860.

> I'm using dual opterons, PCI-X, and "HPC Gold" and getting 0.62 seconds:
>
> compute-0-0.local compute-0-1.local
> size=    1, 131072 hops, 2 nodes in  0.62 sec (  4.7 us/hop)    826 KB/sec
>
> My benchmark just does a MPI_Send<->MPI_Recv of a single integer,
> increments the integer it and passes it along in a circularly linked list
> of nodes.  What exact command line arguments did you use with netpipe
> I'd like to compare results.


I've added the commands used for each of the Netpipe runs shown in the
graphs to the web page (http://lqcd.fnal.gov/benchmarks/newib/).  All of
these runs are vanilla (no additional switches), except I suppose for
the "-t rdma_write" on the "verbs" run where bandwidth is greatly
improved versus the default.  I have the results of many other switch
combinations as well, but I haven't had a chance to digest them yet.

>
> > The PCI Express nodes are based on Abit AA8 motherboards, which have x16
> > slots.  We used the OpenIB drivers, as supplied by Mellanox in their
> > "HPC Gold" package, with Mellanox Infinihost III Ex HCA's.
> >
> > The PCI-X nodes are a bit dated, but still capable.  They are based on
> > SuperMicro P4DPE motherboards, which use the E7500 chipset.  We used
> > Topspin HCA's on these systems, with either the supplied drivers or the
> > OpenIB drivers.
> >
> > I've posted NetPipe graphs (MPI, rdma, and IPoIB) and Pallas MPI
> > benchmark results.  MPI latencies for the PCI Express systems were about
>
> Are the raw results for your netpipe runs available?

Yes.  I've added links to the raw results to the web page.

>
> > 4.5 microseconds; for the PCI-X systems, the figure was 7.3
> > microseconds.  With Pallas, sendrecv() bandwidths peaked at
> > approximately 1120 MB/sec on the PCI Express nodes, and about 620 MB/sec
>
> My pci-x nodes do about midway between those numbers:
> # Benchmarking Sendrecv
> #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
>  524288           80      1249.87      1374.87      1312.37       727.34
> 1048576           40      2499.78      2499.78      2499.78       800.07
> 2097152           20      4999.55      5499.45      5249.50       727.35
>
>
> > I don't have benchmarks for our application posted yet but will do so
> > once we add another pair of PCI-E nodes.
>
> I have 10 PCI-X dual opterons and should have 16 real soon if you want
> to compare Infiniband+pci-x on nodes that are closer to your pci-express
> nodes.

Yes, I would be very interested in lattice QCD application benchmarks on
your dual Opterons.  I should have access next week to about 16 dual
Xeon PCI Express nodes with Infiniband - the comparison should be very
enlightening.  Are you using libnuma?

Don Holmgren
Fermilab



More information about the Beowulf mailing list