<span style="font-family:arial,sans-serif;font-size:13px">Vincent,</span><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">You are changing the item being tested. You disputed my statement that switches can have a latency as low as 100-150 ns. I described how to test the latency of a single hop (I neglected to say that the two NICs must be connected to the same switch chip i.e. blade, cross-bar, etc). You can additional measure multi-hop links the same way by choosing your ports correctly.</div>

<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">Please don't change rules because you cannot admit you are wrong.</div><div style="font-family:arial,sans-serif;font-size:13px">

<br></div><div style="font-family:arial,sans-serif;font-size:13px">Scott</div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">(reposted to the whole group)</div>

<div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Nov 9, 2012 at 3:40 PM, Vincent Diepeveen <span dir="ltr"><<a href="mailto:diep@xs4all.nl" target="_blank">diep@xs4all.nl</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

that's not how fast you can get the data at each core.<br>

<br>

The benchmark i wrote is actually a reflection of how a hashtable works for Game Tree Search in general.<br>

the speedup of it is exponential, so doing it in a different way we can PROVE (as in mathematical proof)<br>

that you will have troubles getting the same exponent (which we cal branching factor).<br>

<br>

So practical testing then what you can achieve from core to core is what matters.<br>

<br>

The first disappointment then happens with the new opteron cores actually, namely that AMD has designed<br>

a memory controller which just doesn't scale if you use all cores.<br>

<br>

Joel Hruska performed some tests there (not sure where he posted it online).<br>

We see then that the bulldozer type architecture still scales ok if you run benchmarks single core.<br>

Sure no real good latency but still...<br>

<br>

Yet if you move then from using 4 processes to measure to 8 processes to measure, this<br>

at a chip we already land at nearly 200 ns, which is real slow.<br>

<br>

The same effect happens when at a big supercomputer you run at full throttle with all cores.<br>

<br>

Manufacturers can claim whatever, but it is always paper math.<br>

<br>

If they ever release something it's some sort of single core, whereas in the first place that<br>

box didn't get ordered to work single core.<br>

<br>

You don't want the performance at a single core in a lab with temperatures nearby 0 Kelvin,<br>

you want to see that the box you got performs like this with all cores running :)<br>

<br>

And on the number posted you already start losing at Cray, starting with the actual CPU's that suck when you use all cores.<div class="HOEnZb"><div class="h5"><br>

<br>

On Nov 9, 2012, at 8:38 PM, atchley <a href="http://tds.net" target="_blank">tds.net</a> wrote:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Vincent, it is easy to measure.<br>

<br>

1. Connect to NICs back-to-back.<br>

2. Measure latency<br>

3. Connect machines to switch<br>

4. Measure latency<br>

5. Subtract (2) from (4)<br>

<br>

That is how we did it at Myricom and that is how we do it at ORNL.<br>

<br>

Try it sometime.<br>

<br>

Scott<br>

<br>

<br>

On Fri, Nov 9, 2012 at 2:36 PM, Vincent Diepeveen <<a href="mailto:diep@xs4all.nl" target="_blank">diep@xs4all.nl</a>> wrote:<br>

<br>

On Nov 9, 2012, at 7:31 PM, atchley <a href="http://tds.net" target="_blank">tds.net</a> wrote:<br>

<br>

Modern switches need 100-150 ns per hop.<br>

<br>

yeah that's BS when you have software that goes measure that with all cores busy.<br>

<br>

I wrote a benchmark to measure that with all cores busy.<br>

<br>

The SGI box back then that was having 50 ns switches which would have 'in theory' a latency of 480 ns @ 500 cpu's,<br>

so 960 for a blocked read, i couldn't get it down to less than 5.8 us on average.<br>

<br>

<br>

<br>

<br>

There are some things that do not scale per hp such as traversing the PCIE link from socket to NIC and back. So, I see it as 1.2 to go to the router and back and 100 ns per hop.<br>

<br>

Scott<br>

<br>

<br>

On Fri, Nov 9, 2012 at 11:17 AM, Vincent Diepeveen <<a href="mailto:diep@xs4all.nl" target="_blank">diep@xs4all.nl</a>> wrote:<br>

The latency estimate taking 5 hops seems a tad optimistic to me<br>

except when i read the English wrong and they mean 1.7 microseconds a<br>

hop making it for a 5 hop 5 * 1.7 = 8.5 microseconds in total.<br>

<br>

"Not every node is only one hop away, of course. On a fully<br>

configured system, you are five hopes away maximum from any socket,<br>

so there is some latency. But the delta is pretty small with<br>

Dragonfly, with a minimum of about 1.2 microseconds for a short hop,<br>

an average of 1.5 microseconds on average, and a maximum of 1.7<br>

microseconds for the five-hop jump, according to Bolding."<br>

<br>

On Nov 8, 2012, at 7:13 PM, Hearns, John wrote:<br>

<br>

> Well worth a read:<br>

><br>

><br>

><br>

> <a href="http://www.theregister.co.uk/2012/11/08/" target="_blank">http://www.theregister.co.uk/<u></u>2012/11/08/</a><br>

> cray_cascade_xc30_<u></u>supercomputer/<br>

><br>

><br>

><br>

> John Hearns | CFD Hardware Specialist | McLaren Racing Limited<br>

> McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK<br>

><br>

><br>

> T:  +44 (0) 1483 262000<br>

><br>

> D:  +44 (0) 1483 262352<br>

><br>

> F:  +44 (0) 1483 261928<br>

> E:  <a href="mailto:john.hearns@mclaren.com" target="_blank">john.hearns@mclaren.com</a><br>

><br>

> W: <a href="http://www.mclaren.com" target="_blank">www.mclaren.com</a><br>

><br>

><br>

><br>

> The contents of this email are confidential and for the exclusive<br>

> use of the intended recipient. If you receive this email in error<br>

> you should not copy it, retransmit it, use it or disclose its<br>

> contents but should return it to the sender immediately and delete<br>

> your copy.<br>

><br>

> ______________________________<u></u>_________________<br>

> Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin<br>

> Computing<br>

> To change your subscription (digest mode or unsubscribe) visit<br>

> <a href="http://www.beowulf.org/mailman/listinfo/beowulf" target="_blank">http://www.beowulf.org/<u></u>mailman/listinfo/beowulf</a><br>

<br>

______________________________<u></u>_________________<br>

Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>

To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" target="_blank">http://www.beowulf.org/<u></u>mailman/listinfo/beowulf</a><br>

<br>

<br>

<br>

</blockquote>

<br>

</div></div></blockquote></div><br></div>