[Beowulf] Stroustrup regarding multicore
kyron at neuralbs.com
Tue Aug 26 11:30:55 PDT 2008
Perry E. Metzger wrote:
> "Robert G. Brown" <rgb at phy.duke.edu> writes:
>> On Tue, 26 Aug 2008, Michael H. Frese wrote:
>>> C is not much better. I once worked a young computational
>>> programmer for almost a week to get him to prove to himself that a C
>>> source program couldn't walk through a 2-d array the hard way as
>>> fast as a Fortran source program unless the stepping was coded by
>>> hand. He didn't believe that a 2-d array in C is syntactically a 1-d
>>> array of pointers to 1-d arrays, and the row pointers must be
>>> fetched from memory! And separate compilation of functions
> As I said already, he's wrong....
>> Perhaps, but don't most C programmers allocate such an array as a single
>> vector and then repack the indices?
> I've never seen anyone allocate "as a single vector and repack the
> indices", though I'm sure that a counterexample exists in someone's
> code out there somewhere. In any case, one has no need to do such a
> (This is not to say that when one calls malloc, if you're calling
> malloc to allocate an array, that you don't pass it a single size_t
> indicating what you're looking for, but that's a different issue.)
I am not sure if this is what you mean but, anyone that has been
programming in C long enough (hrm...to use malloc at least once ;) )
_should_ know that malloc reserves X bytes of memory and doesn't care
nor needs to know what the memory is used for.
As for the contiguous nature of the assignment, doing otherwise would be
horrendously inefficient given that most processors take for granted
this memory mapping to optimize cache usage and pre-fetches (noting that
actual memory allocation is done by the OS as pages). I am currently
unable to dig out the reference but there are very few processors (none
that are Beowulf COTS material iirc) that implement any sort of
semantics to detect the actual fetching pattern (ie: understand that the
data is being fetched by strides of x bytes). Also I am actually doing
some studies on comparing the use of data structures (or not) in some
simple C code to see if the use of structures has a significant impact
on the cache's fetching capabilities....efforts which might become
useless (stay readable to the human and let the compiler do it's work)
since (from GCC-4.3.1's manpage):
Perform structure reorganization optimization, that change
C-like structures layout in order to better utilize spatial locality.
This transformation is
affective for programs containing arrays of structures.
Available in two compilation modes: profile-based (enabled with
-fprofile-generate) or static
(which uses built-in heuristics). Require -fipa-type-escape
to provide the safety of this transformation. It works only in whole
program mode, so it
requires -fwhole-program and -combine to be enabled.
Structures considered cold by this transformation are not affected (see
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf