[Beowulf] Good demo applications for small, slow cluster
Max R. Dechantsreiter
max at performancejones.com
Wed Aug 21 12:22:26 PDT 2013
> That's interesting, where can I read about "giant-stepping the generator"?
> The wiki article
> http://en.wikipedia.org/wiki/Linear_congruential_generator doesn't
> mention distributed processing.
The term "giant-stepping" may not be in general use....
The idea is to find an efficient way to compute an RNG state
far ahead of the current one.
For a multiplicative generator like that of the NPB:
x[k+1] = a * x[k] mod p
(p not necessarily prime), observe that
x[k+2] = a * x[k+1] mod p = a**2 * x[k] mod p
x[k+m] = (a**m mod p) * x[k] mod p
a**m mod p may be efficiently computed by Russian Peasant:
x[k+2*m] = (a**m mod p) * x[k+m] mod p
et cetera. Single-threaded operations fill in the rest.
For LFSR-based generators, the multiplier is a bit matrix,
so powers of that matrix would need to be computed (also by
The scheme would be particular to each generation algorithm.
Surprisingly many RNG can be parallelized this way; the big
advantage over (say) simply giving each computational thread
private RNG parameters is that with care the results of a
parallel computation could match those of a serial one.
More information about the Beowulf