[Beowulf] Good demo applications for small, slow cluster
Max R. Dechantsreiter
max at performancejones.com
Wed Aug 21 12:46:15 PDT 2013
R. R. Coveyou (Knuth "Seminumerical Algorithms" 1981, 26-27):
u % 4 == 2
u[k+1] = u[k]*(u[k] + 1) % (1<<e)
Coveyou's generator is NOT parallelizable.
(It is very useful, however, for generating large seeds
from a small one, for use in another generator, thanks
to quadratic growth.)
On Wed, 21 Aug 2013, Peter St. John wrote:
> Remarkable, thanks! I surely agree that in binary, doubling is fast. So you
> sort of bypass computing low powers, with an ancient method (?!) of
> computing high powers efficiently. Very cool. So, everything is
> parallelizable :-)
> On Wed, Aug 21, 2013 at 3:22 PM, Max R. Dechantsreiter <
> max at performancejones.com> wrote:
>> Hi Peter,
>> That's interesting, where can I read about "giant-stepping the generator"?
>>> The wiki article
>>> mention distributed processing.
>> The term "giant-stepping" may not be in general use....
>> The idea is to find an efficient way to compute an RNG state
>> far ahead of the current one.
>> For a multiplicative generator like that of the NPB:
>> x[k+1] = a * x[k] mod p
>> (p not necessarily prime), observe that
>> x[k+2] = a * x[k+1] mod p = a**2 * x[k] mod p
>> x[k+m] = (a**m mod p) * x[k] mod p
>> a**m mod p may be efficiently computed by Russian Peasant:
>> x[k+2*m] = (a**m mod p) * x[k+m] mod p
>> et cetera. Single-threaded operations fill in the rest.
>> For LFSR-based generators, the multiplier is a bit matrix,
>> so powers of that matrix would need to be computed (also by
>> Russian Peasant).
>> The scheme would be particular to each generation algorithm.
>> Surprisingly many RNG can be parallelized this way; the big
>> advantage over (say) simply giving each computational thread
>> private RNG parameters is that with care the results of a
>> parallel computation could match those of a serial one.
More information about the Beowulf