[Beowulf] Random numbers (was Re: /dev/random entropy on stateless/headless nodes)

Robert G. Brown rgb at phy.duke.edu
Mon Feb 28 12:03:33 PST 2011


On Mon, 28 Feb 2011, Heiko Bauke wrote:

> Hi,
>
> On Sun, 27 Feb 2011 11:48:28 -0500 (EST)
> "Robert G. Brown" <rgb at phy.duke.edu> wrote:
>
>> The solution for nearly anyone needing large numbers of fast, high
>> quality random numbers is going to be:  Use a fast, high quality
>> random number generator from e.g. the Gnu Scientific Library, and
>>>> seed<< it from /dev/random, ensuring uniqueness of the seed(s)
>>>> across the cluster
>
> I agree that for Monte Carlo simulations a fast, high quality (pseudo)
> random number generator (PRNG) is more appropriate than /dev/{u}random.
> However, seeding a PRNG randomly is imho a missconception. Even though
> Monte Carlo algorithms utilize a pseudo random resource the final
> result of a Monte Carlo simulation should be deterministic and
> reproducible. Therefore, for scientific Monte Carlo applications one
> should use a known seed. Parallel Monte Carlo applications may derive
> streams of pseudo random numbers from a common base sequence by
> splitting and leap frogging, see also
> http://arxiv.org/abs/cond-mat/0609584

Interesting paper.

Well, SOMETIMES one cares if they are reproducible, sometimes not.  I
agree in that I always save the seed in the node/run metadata to be ABLE
to check for iid uniqueness and be ABLE to repeat a run, although quite
honestly I can't recall ever actually repeating a run with the same seed
a single time in CPU-centuries of computation time.  Why would one wish
to?  When you're average tens of thousands of runs, you aren't going to
be rerunning the sample average generated from seed = 3023216177 very
often, although perhaps when debugging the simulation code it can be
useful (in which case I always use simple seeds such as seed =
1,2,3...).

As for seeding randomly, I think (even after reading your paper:-) that
how important this is in a parallel computation depends on the
characteristics of the random number generator you are using.  R250, of
course, is shooting a crippled duck in a barrel, but it isn't so clear
that a mersenne twister or (super) KISS generator will have the same
problem -- those are pretty good generators, and pass all of the diehard
tests and most of L'Ecuyer's tests all the way out to where the tests
themselves start to fail for numerical reasons, and (depending on the
implementation) they may well have very long periods (as in MT19937 has
a period roughly 10^6000 and is equi-distributed in 623 dimensions).
Given that MT19937 is also damn fast (though KISS is faster) it isn't an
unreasonable choice.

Generators with very long periods compared to the number of random
numbers consumed in a parallel computation (and that otherwise pass all
of the cranked-up dieharder tests) are still very likely to give one
sufficiently independent results with random seeds "out of the (GSL)
box" so to speak.  However, I completely agree that this is a useful
thing to be able to explicitly test, and the latest version of dieharder
lets one construct a "supergenerator" at the command line that might
consist, e.g. of four MTs started with adjacent or random seeds
providing returns in round robin fashion (or via a shuffle table) and
then use it as direct input to dieharder.  Dieharder will then (one
hopes) be able to detect bitlevel or hyperplanar correlations not just
within a sequence but between sequences generated by various types of
correlated or decorrelated seeds.

It would also be very interesting to try feeding one of your MRG or
YARN-type generators into dieharder and running the full series of tests
on it (there are actually several diehard tests in dieharder that look
for hyperplanar correlations as well as the long tail "monkey" tests).
If I ever have time I'll try to give it a shot.  Is the library and/or a
cli available anyplace easy to find?

    rgb

>
>
> 	Regards,
>
> 	Heiko
>
>
> -- 
> -- Number Crunch Blog @ http://numbercrunch.de
> --  Cluster Computing @ http://www.clustercomputing.de
> --     Random numbers @ http://trng.berlios.de
> --        Heiko Bauke @ http://www.mpi-hd.mpg.de/personalhomes/bauke
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list