Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] dual-core benefits?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mark Hahn hahn at physics.mcmaster.ca
Fri Sep 23 06:43:17 PDT 2005


> > is this scalability assuming a slow interconnect like gigabit?
> Yes, gigabit on Pentium 4 cluster.

well, the e1000 is a decent nic, but it is sometimes configured
for too much interrupt mitigation to suit HPC.  even so, it's not 
anywhere close to the domain of performance as a real cluster 
interconnect (~2 us, 400-1800 MB/s).

> > have you considered when it would be appropriate to go to something fast?
> Yes, that is probably sth that we will consider after trying gigabit and two
> network interfaces per mb. 

dual-port has a reputation for not helping much.  it's only a small
boost in bandwidth in the ideal case.

> > on a multiprocessor system, you effectively have a pretty fast, if small,
> > interconnect.  if your code can take advantage of that, then going
> > dual-core could well be a win.  for instance, if your code is limited
> > by short-message, point-to-point latency, then increasing "SMP-ness"
> > should help a lot, especially if you are assuming mere gigabit.
> 
> Well, actually I'm still not sure about this. The CPUs inside the node will
> communicate fast, but then the network will be a bottleneck?

I was careful in what I said, and perhaps not explicit enough.  if your 
code has a lot of short, p-p messages, then the MPI will avoid the use
of the nic on paths within the machine.  (at least myrinet and quadrics do).
that's a significant win, since intra-node is .8us/800MBps for an
older opteron cluster I have, vs 3.5/240 for inter-node.

so if you're only scaling to 8x, and you use dual-dual nodes, half your 
messages will be very fast.  if the inter-node fabric is gigabit, that
means .8/800 vs 30/80.

so just looking at an 8p cluster, assuming only p-p messages uniformly
distributed:
	two 2x2's will see .8/800 on half of all messages, 30/80
	on the rest or 15.4/440 aggregate.

	four 2x1's will see .8/800 on a quarter, 30/80 on the rest,
	or 22.7/260.

this is pretty rough, of course: if you had the right patterns, 
you could do better or worse.  and if you use collectives, you'll
always be limited at least by inter-node performance.

regards, mark hahn.




More information about the Beowulf mailing list