[Beowulf] OT: public random numbers?

Thu Aug 25 17:27:18 PDT 2011

On Aug 25, 2011, at 2:11 PM, Robert G. Brown wrote:

> On Thu, 25 Aug 2011, Vincent Diepeveen wrote:
>
>> I noticed that most generated semi-random numbers with software  
>> generators,
>> had the habit to truely adress a search space of n always in O (n  
>> log n).
>>
>> So if you draw from most software RNG's a number and do it modulo n,
>> with n being not too tiny, say quite some millions or even  
>> billions, then every
>> slot in your 'hashtable' will get hit at least once by the RNG,  
>> whereas data
>> in reality simply happens to not have that habit simply.
>>
>> So true random numbers versus generated noise is in this manner easy
>> to distinguish by this. Now i didn't study literature whether some  
>> other chap
>> some long time ago already had invented this. That would be most  
>> interesting
>> to know.
>
> Some other chap named George Marsaglia (and to some extent another  
> chap
> named Donald Knuth) have already invented this.  A number of tests of
> the tails of random number generators are already in dieharder.  All
> "good" modern rngs pass these tests.
>
> The Martingale betting system you are looking at is even older (at  
> least
> Marsaglia and Knuth are still alive).  It dates back to the 18th
> century, and is well known to be flawed for a variety of reasons, not
> the least of which is that gamblers don't have the infinite wealth
> necessary to make this >>even<< a zero-sum strategy and casinos have

 From mathematical viewpoint it makes perfect cash.
As statistica odds is you already have build up considerable profit
when a worst case (that you hit the 10 times practical double limit)
hits you.

The simulations are of course using the practical limit.

Note that the European casino's have a single zero.
In USA there is even more greedy mafia controlling all the casino's,
there are 2 zero's there. 0 and 00.

The simulations were for European casino's.

> betting limits that de facto make it impossible to pursue the  
> requisite
> number of steps and in roulette in particular have 0 and/or 00  
> slots and
> aren't zero-sum to begin with.  You can read a decent analysis of
> outcomes based on the presumed binomial distribution of a zero-sum  
> game
> here:
>
>   http://en.wikipedia.org/wiki/Martingale_%28betting_system%29
>

You're not allowed to use a system in a casino, so we speak about
theory. Probably first evening they let you try. Second day you'll  
get on the blacklist.

> Your test below is interesting, though.  The only real problems I can
> see with actually using it in dieharder are:
>

Yeah more interesting than the billion times discussed roulette  
system which
has been analyzed completely flat.

>   a) One would need a theoretical estimate of the distribution of
> filling given n log n draws on an n-slotted table (for largish n).   
> That
> is, for a perfect rng, what SHOULD the distribution of success/failure
> be.

As we figured out by now in Artificial Intelligence the statistical
assumptions made in the past they simply do not hold.

For Artificial Intelligence we need a new sort of theoretical theory.

As for the distribution problem, generatiors having a spread that's  
too accurate,
the way to deliver a proof would be for example build a simple device.

Build an old fashioned box where you can draw balls. Remember what  
you coud
see on TV some 20 years ago or so (not sure it was like that in USA).

A big basked with balls. The basket, in fact it's looking like this:

http://www.rateyours.com/blog/uploaded_images/lottery_machine-727064.jpg

But now a much bigger machine like this with inside different means  
of randomizing the balls,
actually also randomly modifying the inside  obstacles of shaking of  
the balls.

After a ball has been drawn you automatically have it annotated and  
the ball immediately goes back
into the machine. For a full minute you have the balls in the machine  
shaken again and you draw
again a ball. It is important to do this randomizing of the balls  
inside the machine for quite some time.
I would propose a minute.

Of course you have to do this with quite some balls.  Say a thousand.

Then you draw balls until all numbers have been drawn at least once.

This cool experiment can be easily build. Of course the expected  
running time of a single experiment
will be a few weeks.

You can produce a number of those drawing machines though and have a  
look.

Theories that seemingly work for small n, n being the number of balls,
are much harder to maintain at bigger n's, as we also see in prime  
number research.

The way how the machine gets designed of course is total crucial. I  
would propose a design that
really shakes the balls really a lot through each other and really  
very thoroughly.

Just like we nowadays know how flawed a big number of card shaking  
machines are that are popular to use.

Such a lottery with realy a lot of balls would be very interesting to  
see the outcomes from.

In fact i would prefer having produced number of those machines, so  
that it's possible to really have a lot of outcomes
and then analyze them very well.

>
>   b) One would then need the CDF for this distribution, to be able to
> turn the results of N trials (of n log n pulls each) into a p-value
> under the null hypothesis -- the probability of obtaining the  
> particular
> number of successes and failures presuming a perfectly random  
> generator.
>
> That way dieharder could apply it rigorously to its 70 or 80 embedded
> rngs or to any user's outboard generator.  There probably is  
> theoretical
> statistical support for the PD and/or CDF -- you're analyzing the  
> tails
> of a poissonian process -- but finding it or doing it yourself (or
> myself), aye, that's the rub.  One cannot just say "high degree of
> certainty that it is an RNG" (by which one means that the rng in
> question fails the test for randomness) in the test.  HOW high?   
> Perfect
> rngs or perfectly random processes will sometimes fill your table, but
> how often?

If we assume that reality of life represents randomness, which is  
another
rather good question in how far that theory is plausible, then using  
that
assumption i'm very sure that the RNG's i investigated so far
have a distribution which is too perfect, more perfect than i have seen
in any reality.

In fact most RNG's fill all slots faster than O ( n log n ), yet it's  
O ( n log n )
that they follow.

This is RNG's that have come through all tests as being a good and
very acceptabe RNG to be used.

Realize i'm no RNG expert, so all the names of all those tests.

For me it's just push button technology. I just designed a test
and found it very odd that all RNG's have such perfect distributions
that they don't even miss a single slot.

I'd argue the only test that would be interesting to me to see how it
might be in reality is the lottery machine test - yet with really a lot
of balls. I'd prefer 10k balls over a 1000 in fact - yet for practical
reasons i would agree with a number of above a 1000.

Paper fiddling is really not interesting to me there to prove anything,
as what i've seen in reality in randomness is total different from how
RNG's model that.

Regards,
Vincent

>   How can you differentiate an "accident" when one does from
> an actual failure?  All of those questions require a more rigorous
> theory and quantitative result embedded in a test that can be
> systematically cranked up to more clearly resolve failures until they
> are unambiguous, not marginal maybe yes maybe no.
>
> I suspect that the failures this test would reveal are already more  
> than
> covered in dieharder, in particular by the bit distribution tests and
> the monkey tests, but I'm not terribly happy with the monkey tests and
> would be perfectly thrilled to have a simpler to compute test that
> revealed precisely this sort of flaw, systematically.  And it doesn't
> hurt at all to have partially or fully redundant tests as long as the
> test themselves are rigorously valid.  If you can find or compute the
> CDF for your test below, I'd be happy to wrap it up and add it to
> dieharder, in other words.  One can always SIMULATE a CDF, of course,
> but that requires a known good generator and sort of begs the question
> if you don't think that e.g. AES or threefish or KISS are good
> generators that would actually pass your test.
>
> Even hardware/quantum sources of random bits are suspect -- they often
> are generated by a process that leaves in the traces of an underlying
> distribution.  I'm not convinced that >>any<< process in the real  
> world
> is >>truly<< random.  Physics is ambiguous on the issue -- the quantum
> description of a closed system is just as deterministic as the  
> classical
> one, and Master equation unpredictability on open subsets of a large
> closed system reflects entropy/ignorance, not actual randomness (hence
> Einstein's famous "doesn't play dice" remark).  But lots of this are
> sufficiently random that one cannot detect any failure of randomness,
> modern crypto class generators being a prime example.
>
>    rgb
>
>>
>> In semi pseudo code, let's take an array of size a billion as an  
>> example,
>> though usually a few million is more than ok:
>>
>> n = 2^30; // 2 to the power 30
>>
>> Function TestNumbersForRandomness(RNG,n) {
>> declare array hashtable[size n];
>>
>> guessednlogn = 2 * (log n / log 2) * n;
>>
>> for( i = 0 ; i < n ; i++ )
>>   hashtable[i] = FALSE;
>>
>> ndraws = filledn = 0;
>> while( ndraws  < guessednlogn ) {
>>    randomnumber = RNG();
>>    r = randomnumber % n; //     randomnumber =  r  (mod n)
>>    if( hashtable[r] == FALSE ) {
>>       hashtable[r] = TRUE;
>>       filledn++;
>>       if( filledn >= n )
>>         break;
>>
>>   }
>>   ndraws++;
>> }
>>
>> if( filledn >= n )
>>    print "With high degree of certainty data generated by a RNG\n");
>>  else
>>    print "Not so sure it's a RNG\n";
>>
>> }
>>
>>
>>
>>
>>
>> Regards,
>> Vincent
>>
>>
>>
>>
>>> -- both unpredictable and
>>> flat/decorrelated at all orders, and even though there aren't really
>>> enough of them for my purposes, I've used them as one of the (small)
>>> "gold standard" sources for testing dieharder even as I test  
>>> them.  For
>>> all practical purposes threefish or aes are truly random as well and
>>> they are a lot faster and easier to use as gold standard generators,
>>> though.
>>> I don't quite understand why the single site restriction is  
>>> important --
>>> this site has been up for years and I don't expect it to go away  
>>> soon;
>>> it is quite reliable.  I don't think there is anything secret  
>>> about how
>>> the numbers are generated, and I'll certify that the numbers it  
>>> produces
>>> don't make dieharder unhappy.  So 1 is fixable with a bit of  
>>> effort on
>>> your part; 6 I don't really understand but the guy who runs the  
>>> site is
>>> clearly willing to construct a custom feed for cash customers, if  
>>> there
>>> is enough value in whatever it is you are trying to do to pay for
>>> access.  If it's just a lottery, well, lord, I can think of a  
>>> dozen ways
>>> to make numbers so random that they'd be unimpeachable for any  
>>> sort of
>>> lottery, both unpredictable and uncorrelated, and they don't any  
>>> of them
>>> require any significant amount of entropy to get started.
>>> I will add one warning -- "randomness" is a rather stringent
>>> mathematical criterion, and is generally tested against the null
>>> hypothesis.  Amateurs who want to make random number generators  
>>> out of
>>> supposedly "random" data streams or fancy algorithms almost  
>>> invariably
>>> fail, sometimes spectacularly so.  There are a half dozen or more
>>> really, really good pseudorandom number generators out there and  
>>> it is
>>> easy to hotwire them together into an xor-based high entropy  
>>> stream that
>>> basically never repeats (feeding it a bit of real entropy now and  
>>> then
>>> as it operates).  I would strongly counsel you against trying to  
>>> take
>>> e.g. weather data and make something "random" out of it.  Unless you
>>> really know what you are doing, you will probably make something  
>>> that
>>> isn't at all random and may not even be unpredictable.  Even most
>>> sources of "quantum" randomness (which is at least possibly "truly
>>> random", although I doubt it) aren't flat, so that they carry the
>>> signature of their generation process unless/until you manage to
>>> transform them into something flat (difficult unless you KNOW the
>>> distribution they are producing).  Pseudorandom number generators  
>>> have
>>> the serious advantage of being amenable to at least some theoretical
>>> analysis (so you can "guarantee" flatness out to some high
>>> dimensionality, say) as well as empirical testing with e.g.  
>>> dieharder.
>>> HTH,
>>>
>>>     rgb
>>>> Thanks,
>>>> David Mathog
>>>> mathog at caltech.edu
>>>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>>> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>>> Duke University Dept. of Physics, Box 90305
>>> Durham, N.C. 27708-0305
>>> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
>>> Computing
>>> To change your subscription (digest mode or unsubscribe) visit  
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>