[Beowulf] Woodcrest - Shared L2 cache
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Renato S. Silva rssr at lncc.brWed Aug 16 08:39:31 PDT 2006
- Previous message: [Beowulf] Memory latency (was woodcrest)
- Next message: [Beowulf] Woodcrest - Shared L2 cache
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Folks
Does anyone have information about how they shared L2 for both cores ?
Thanks
Renato Silva
Richard Walsh wrote:
> Mark Hahn wrote:
>
>>>> Good point which makes perfect sense to me.
>>>> Given that the theoretical maximum is actually 21.3 GB/s
>>>> the real maximum Triad number must be 21.3/3 = 7.1 GB/s.
>>>
>>
>> I don't get this - triad does two reads and one write.
>> if you don't use store-through ('nt' versions of mov),
>> then the write also implies a read for write-allocate
>> (filling the cache line).
>> without store-through, the peak theoretical number reported by
>> stream should be 3*peak/4. the 4 is because there are 3r+1w,
>> and the 3 because stream doesn't give credit for write-allocate.
>
> That looks right. So, one socket, with write allocate, >>should<< show:
>
> 10.5 GB/sec * .75 or 7.875 GBytes/sec
>
> and two sockets 15.75 GBytes/sec. The problem could be related
> to competitive/ineffective use of the shared L2 cache or a bottleneck
> in the North bridge. It would seem that a look at how the performance
> grows
> as you add cores within versus across sockets should reveal this.
>
> Two cores on separate sockets should show higher numbers if it's
> an L2 cache issue. If they are the same as those for 2 cores on one
> socket then you have a problem with the North bridge or getting
> full bandwidth from the FB-DIMMs.
> A complication in this test could be that in the one core per socket case
> the whole L2 cache is allocated to a single core. Watching performance
> change as the array sizes grow should reveal this.
> rbw
>
>
>>
>>> Then how do you explain a dual opteron with two 6.4GB/sec (peak)
>>> memory system, 12.8GB/sec total per node managing 9-10GB/sec?
>>>
>>> 12.8/3=4.26GB/sec. People are seeing well over twice that.
>>
>>
>> since pathscale does write-through, the peak really should be 12.8,
>> so achieving 9-10 is decent but not paradoxical. (the peak would
>> correspond to 1.07 Gflops, significantly below the peak theoretical
>> pipeline rate of 2*clock flops...)
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
>
- Previous message: [Beowulf] Memory latency (was woodcrest)
- Next message: [Beowulf] Woodcrest - Shared L2 cache
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
