[Beowulf] Register article on Cray Cascade

Sat Nov 10 04:54:44 PST 2012

Vincent,

You are changing the item being tested. You disputed my statement that
switches can have a latency as low as 100-150 ns. I described how to test
the latency of a single hop (I neglected to say that the two NICs must be
connected to the same switch chip i.e. blade, cross-bar, etc). You can
additional measure multi-hop links the same way by choosing your ports
correctly.

Please don't change rules because you cannot admit you are wrong.

Scott

(reposted to the whole group)

On Fri, Nov 9, 2012 at 3:40 PM, Vincent Diepeveen <diep at xs4all.nl> wrote:

> that's not how fast you can get the data at each core.
>
> The benchmark i wrote is actually a reflection of how a hashtable works
> for Game Tree Search in general.
> the speedup of it is exponential, so doing it in a different way we can
> PROVE (as in mathematical proof)
> that you will have troubles getting the same exponent (which we cal
> branching factor).
>
> So practical testing then what you can achieve from core to core is what
> matters.
>
> The first disappointment then happens with the new opteron cores actually,
> namely that AMD has designed
> a memory controller which just doesn't scale if you use all cores.
>
> Joel Hruska performed some tests there (not sure where he posted it
> online).
> We see then that the bulldozer type architecture still scales ok if you
> run benchmarks single core.
> Sure no real good latency but still...
>
> Yet if you move then from using 4 processes to measure to 8 processes to
> measure, this
> at a chip we already land at nearly 200 ns, which is real slow.
>
> The same effect happens when at a big supercomputer you run at full
> throttle with all cores.
>
> Manufacturers can claim whatever, but it is always paper math.
>
> If they ever release something it's some sort of single core, whereas in
> the first place that
> box didn't get ordered to work single core.
>
> You don't want the performance at a single core in a lab with temperatures
> nearby 0 Kelvin,
> you want to see that the box you got performs like this with all cores
> running :)
>
> And on the number posted you already start losing at Cray, starting with
> the actual CPU's that suck when you use all cores.
>
>
> On Nov 9, 2012, at 8:38 PM, atchley tds.net wrote:
>
>  Vincent, it is easy to measure.
>>
>> 1. Connect to NICs back-to-back.
>> 2. Measure latency
>> 3. Connect machines to switch
>> 4. Measure latency
>> 5. Subtract (2) from (4)
>>
>> That is how we did it at Myricom and that is how we do it at ORNL.
>>
>> Try it sometime.
>>
>> Scott
>>
>>
>> On Fri, Nov 9, 2012 at 2:36 PM, Vincent Diepeveen <diep at xs4all.nl> wrote:
>>
>> On Nov 9, 2012, at 7:31 PM, atchley tds.net wrote:
>>
>> Modern switches need 100-150 ns per hop.
>>
>> yeah that's BS when you have software that goes measure that with all
>> cores busy.
>>
>> I wrote a benchmark to measure that with all cores busy.
>>
>> The SGI box back then that was having 50 ns switches which would have 'in
>> theory' a latency of 480 ns @ 500 cpu's,
>> so 960 for a blocked read, i couldn't get it down to less than 5.8 us on
>> average.
>>
>>
>>
>>
>> There are some things that do not scale per hp such as traversing the
>> PCIE link from socket to NIC and back. So, I see it as 1.2 to go to the
>> router and back and 100 ns per hop.
>>
>> Scott
>>
>>
>> On Fri, Nov 9, 2012 at 11:17 AM, Vincent Diepeveen <diep at xs4all.nl>
>> wrote:
>> The latency estimate taking 5 hops seems a tad optimistic to me
>> except when i read the English wrong and they mean 1.7 microseconds a
>> hop making it for a 5 hop 5 * 1.7 = 8.5 microseconds in total.
>>
>> "Not every node is only one hop away, of course. On a fully
>> configured system, you are five hopes away maximum from any socket,
>> so there is some latency. But the delta is pretty small with
>> Dragonfly, with a minimum of about 1.2 microseconds for a short hop,
>> an average of 1.5 microseconds on average, and a maximum of 1.7
>> microseconds for the five-hop jump, according to Bolding."
>>
>> On Nov 8, 2012, at 7:13 PM, Hearns, John wrote:
>>
>> > Well worth a read:
>> >
>> >
>> >
>> > http://www.theregister.co.uk/**2012/11/08/<http://www.theregister.co.uk/2012/11/08/>
>> > cray_cascade_xc30_**supercomputer/
>> >
>> >
>> >
>> > John Hearns | CFD Hardware Specialist | McLaren Racing Limited
>> > McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK
>> >
>> >
>> > T:  +44 (0) 1483 262000
>> >
>> > D:  +44 (0) 1483 262352
>> >
>> > F:  +44 (0) 1483 261928
>> > E:  john.hearns at mclaren.com
>> >
>> > W: www.mclaren.com
>> >
>> >
>> >
>> > The contents of this email are confidential and for the exclusive
>> > use of the intended recipient. If you receive this email in error
>> > you should not copy it, retransmit it, use it or disclose its
>> > contents but should return it to the sender immediately and delete
>> > your copy.
>> >
>> > ______________________________**_________________
>> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> > Computing
>> > To change your subscription (digest mode or unsubscribe) visit
>> > http://www.beowulf.org/**mailman/listinfo/beowulf<http://www.beowulf.org/mailman/listinfo/beowulf>
>>
>> ______________________________**_________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/**mailman/listinfo/beowulf<http://www.beowulf.org/mailman/listinfo/beowulf>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20121110/1358fbb6/attachment.html>