Random Number Generation.

Tue Jan 7 12:09:12 PST 2003

On 7 Jan 2003, Randall Jouett wrote:

> Hi folks,
> 
> One last thing before I do some serious parallel and beowulf
> study. I was wondering if you could improve the latency generator
> I described earlier by doing something like this:
> 
>   When compute nodes are idle, rather than just sit there, they
>   could open up /dev/random and generate random streams that could
>   be added to the entropy pool? As soon as they'd get some real
>   work to do, they'd stop doing this and get on with the task
>   at hand. Other types of random generation could be used in
>   place of /dev/random, of course.
> 
> Hmmm. How about a random-number generating screen blanker in
> a grid environment while we're at it, too? :^)
> 
> All my replies will be off list.

If that's a PROMISE I'll make one last reply on-list.

First of all, before addressing random numbers, which is a subject most
Ph.D.-wielding scientists and mathematicians are woefully ignorant upon,
you need to bring yourself up to at least their level of woeful
ignorance.  To help you, I can only suggest that you read Knuth or
Marsaglia for starters, or a chapter in a good book on numerical methods
(e.g. Forsythe, Malcom and Moler -- a classic although alas in Fortran)
or for a more up to date review visit Marsaglia's website for the
Diehard (Battery of Random Number Tests) at:

   http://stat.fsu.edu/~geo/diehard.html

The diehard sources come with a tool that can generate UD sequences from
some dozen different UD algorithms (in many cases, with several distinct
common configuration/initialization choices) for direct testing with
diehard, and diehard itself contains 15 distinct tests for randomness.
It is trivial to apply diehard to any sequence of data provided only
that you can generate an appropriately formatted file copy of the data.

Alas, I could find no copyright or copyleft -- I wrote Dr. Marsaglia
suggesting that he formally GPL the sources to make it a bit easier to
port them out of their current f2c translated state (ugly, ugly, ugly)
and package them up to make them easier to build.  I'm hoping/assuming
he's still alive -- he was writing papers on this back in 1968 -- but he
has not yet replied.

Anyway, the answer to your "could you improve..." question above is --
who knows?  Don't ask the list, find out!  Get diehard, build it, study
the generators and learn a bit about how they work, learn how to apply
the whole suite of tools and generators to problems.  Test the output
stream from /dev/random with diehard -- I have no idea myself whether or
not it "passes" Marsaglia's suite, but he mentions in the diehard
documentation that not even hardware devices tend to do well on the
entire suite.

Only when you've taken the time to become at least competent in the
basic X_n = F(X_n-m,X_n-m+1...X_n)mod a idea underlying most UDG's and
familiar with the characteristics of random processes can you even THINK
about being able to answer your question, and nobody else is likely to
answer it for you.  You also might want to do a bit of a literature
search on the statistics of network latencies.  Until you understand
what "poissonian" means and how it differs from "uniform" or
"exponential" or "gaussian", until you understand correlation and
covariance, you also won't be able to answer your own question.

To summarize (and reiterate my previous reply), throwing open-ended
questions out to the list, especially on topics like random number
generation, isn't likely to be a productive way to learn cluster
computing OR about random number generation, compared to reading a
chapter or two in any decent book on the subjects.  

In some cases (RNG's in particular) you are likely to find that you have
to start with a decent knowledge of e.g. probability and statistics or
networking in order to even properly frame your own question, and
reading one book will (and should) lead you to the next, and so on.
However, there is NO SUBSTITUTE for trying things out on your own, or
for taking a relevant course at e.g. a university or technical school,
where advanced technical subjects are concerned.  Nobody on-list has the
time to summarize a course in statistics and random number generation
theory (and iterated maps, and chaos and fractals, and random processes,
and all the rest of the connected concepts).  Hell, nobody on-list
probably COULD summarize all of these subjects -- lots of us know
something about some of the topics, but unless UDGs are your speciality
why would you know about them all?

Now remember, you promised!

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu