[Beowulf] Stroustrup regarding multicore

Eric Thibodeau kyron at neuralbs.com
Tue Aug 26 11:30:55 PDT 2008


Perry E. Metzger wrote:
> "Robert G. Brown" <rgb at phy.duke.edu> writes:
>   
>> On Tue, 26 Aug 2008, Michael H. Frese wrote:
>>     
>>> C is not much better.  I once worked a young computational
>>> programmer for almost a week to get him to prove to himself that a C
>>> source program couldn't walk through a 2-d array the hard way as
>>> fast as a Fortran source program unless the stepping was coded by
>>> hand. He didn't believe that a 2-d array in C is syntactically a 1-d
>>> array of pointers to 1-d arrays, and the row pointers must be
>>> fetched from memory!  And separate compilation of functions
>>>       
>
> As I said already, he's wrong....
>
>   
>> Perhaps, but don't most C programmers allocate such an array as a single
>> vector and then repack the indices?
>>     
>
> I've never seen anyone allocate "as a single vector and repack the
> indices", though I'm sure that a counterexample exists in someone's
> code out there somewhere. In any case, one has no need to do such a
> thing.
>
> (This is not to say that when one calls malloc, if you're calling
> malloc to allocate an array, that you don't pass it a single size_t
> indicating what you're looking for, but that's a different issue.)
>
>   
I am not sure if this is what you mean but, anyone that has been 
programming in C long enough (hrm...to use malloc at least once ;) ) 
_should_ know that malloc reserves X bytes of memory and doesn't care 
nor needs to know what the memory is used for.

As for the contiguous nature of the assignment, doing otherwise would be 
horrendously inefficient given that most processors take for granted 
this memory mapping to optimize cache usage and pre-fetches (noting that 
actual memory allocation is done by the OS as pages). I am currently 
unable to dig out the reference but there are very few processors (none 
that are Beowulf COTS material iirc) that implement any sort of 
semantics to detect the actual fetching pattern (ie: understand that the 
data is being fetched by strides of x bytes). Also I am actually doing 
some studies on comparing the use of data structures (or not) in some 
simple C code to see if the use of structures has a significant impact 
on the cache's fetching capabilities....efforts which might become 
useless (stay readable to the human and let the compiler do it's work) 
since (from GCC-4.3.1's manpage):

       -fipa-struct-reorg
           Perform structure reorganization optimization, that change 
C-like structures layout in order to better utilize spatial locality.  
This transformation is
           affective for programs containing arrays of structures.  
Available in two compilation modes: profile-based (enabled with 
-fprofile-generate) or static
           (which uses built-in heuristics).  Require -fipa-type-escape 
to provide the safety of this transformation.  It works only in whole 
program mode, so it
           requires -fwhole-program and -combine to be enabled.  
Structures considered cold by this transformation are not affected (see 
--param
           struct-reorg-cold-struct-ratio=value).
> Perry
>   
Eric Thibodeau

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20080826/80078ad1/attachment.html


More information about the Beowulf mailing list