very high bandwidth, low latency manner?
sp at scali.com
Fri Apr 12 13:41:23 PDT 2002
On Fri, 12 Apr 2002, Patrick Geoffray wrote:
> Steffen Persvold wrote:
> >>I figure out list cost on a 256 node system at about $2000 before for
> >>basic hardware. I as wrong. I reworked it and it is $1500 for
> >>256 (and would be the same for 512 and 1024).
> > So what is wron with my calculations :
> > 256 node L9/2MB/133MHz config :
> > Node cost = $2,195
> > and for a L9/2MB/200MHz config :
> > Node cost = $2,495
> Nothing, it's right for 256 nodes. However:
> 128 nodes L9/133 MHz config:
> Node cost = $1,595
> 128 nodes L9/200 MHz config:
> Node cost = $1,895
> For more than 128 ports, the number of switches increases to keep a
> guaranteed full-bissection, it adds about $500 per node. However, up to
> 128 nodes, you need only one switch. and the numbers I gave are correct.
Yes, I was just questioning Craig's numbers. I was actually suprised that
the Myrinet node cost didn't increase more when going from 128 to 256
nodes since it basically involves a lot more hardware (i.e 4 additional
switch enclousures, and 64 additional "spine" cards).
> The switchless cost model makes sense for configs > than the biggest
> switch size for switched technologies, ie. 128 ports for Quadrics and
> Myrinet. Surprisingly, the largest SCI cluster is, AFAIK, 132 nodes ;-)
The largest SCI cluster (atleast switchless) is indeed 132 nodes.
> > Now we have price comparisons for the interconnects (SCI,Myrinet and
> > Quadrics). What about performance ? Does anyone have NAS/PMB numbers for
> > ~144 node Myrinet/Quadrics clusters (I can provide some numbers from a 132
> > node Athlon 760MP based SCI cluster, and I guess also a 81 node PIII ServerWorks
> > HE-SL based cluster).
> Ok, I will say again what I think about these comparaisons: it's already
> hard to compare dollars (what about discount, what about support, what
> about software, etc) despite that it the same dollars, it's wasting time
> to do that for micro-benchmarks. It's something you do when you want to
> publish something in a conference next to a beach.
> When a customer asks me about performance, I don't give him my NAS or
> PMB numbers, he doesn't care. He wants access to a XXX nodes machine to
> play with and run his set of applications, or he gives a list of codes
> to the vendors for the bid and the vendors guarantee the results because
> it's used officially in the bid process. If someone buys a machine
> because the NAS look pretty and his CFD code sucks, this guy will take
> his stuffs and look for a new job.
> Do you spend time to tune NAS ? I don't. People already told me that the
> NAS LU test sucks on MPICH-GM. Well, the LU algorithm in HPL is much
> better. How many application behaves like the NAS LU, how many like HPL
> ? If a customer comes to me because his code behaves like NAS LU, I will
> tell him what to tune in his code to be more efficient.
> The pitfall with benchmarks is that you want to tune your MPI
> implementation to looks good on them. In real world, you cannot expect
> to run efficiently a code on a machine without tuning it, specially with
I think that most people on this list agrees that it is really the
customers application that counts, not NAS nor PMB numbers (and no, I
don't spend much time tuning NAS it was a bad example). I also agree with
most of your other statements, however I still think that atleast a MPI
specific benchmark such as PMB (don't know if it's available for PVM...)
will give the customers an initial feeling on what interconnect they need
(if they know how their application is architected).
> My 2 pennies
Steffen Persvold | Scalable Linux Systems | Try out the world's best
mailto:sp at scali.com | http://www.scali.com | performing MPI implementation:
Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 -
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency
More information about the Beowulf