Yet another RNG
Random number generators are frequently posted on the network; my colleagues and
I posted ULTRA in 1992 and, from the number of requests for releases to use it in
software packages, it seems to be widely used.
I have long been interested in RNG's and several of my early ones are used as sy-
stem generators or in statistical packages.
So why another one? And why here?
Because I want to describe a generator, or rather, a class of generators, so pro-
mising I am inclined to call it
The Mother of All Random Number Generators
and because the generator seems promising enough to justify shortcutting the many
months, even years, before new developments are widely known through publication
in a journal.
This new class leads to simple, fast programs that produce sequences with very long
periods. They use multiplication, which experience has shown does a better job of
mixing bits than do +,- or exclusive-or, and they do it with easily-implemented ar-
ithmetic modulo a power of 2, unlike arithmetic modulo a prime. The latter, while
satisfactory, is difficult to implement. But the arithmetic here modulo 2^16 or 2^32
does not suffer the flaws of ordinary congruential generators for those moduli:
trailing bits too regular. On the contrary, all bits of the integers produced by
this new method, whether leading or trailing, have passed extensive tests of random-
ness.
Here is an idea of how it works, using, say, integers of six decimal digits from wh-
ich we return random 3-digit integers. Start with n=123456, the seed.
Then form a new n=672*456+123=306555 and return 555.
Then form a new n=672*555+306=373266 and return 266.
Then form a new n=672*266+373=179125 and return 125,
and so on. Got it? This is a multiply-with-carry sequence x(n)=672*x(n-1)+ carry
mod b=1000, where the carry is the number of b's dropped in the modular reduction.
The resulting sequence of 3-digit x's has period 335,999. Try it.
No big deal, but that's just an example to give the idea. Now consider the sequence
of 16-bit integers produced by the two C statements:
k=30903*(k&65535)+(k>>16); return(k&65535);
Notice that it is doing just what we did in the example:
multiply the bottom half (by 30903, carefully chosen), add the top half and return
the new bottom.
That will produce a sequence of 16-bit integers with period > 2^29, and if we conc-
atenate two such:
k=30903*(k&65535)+(k>>16);
j=18000*(j&65535)+(j>>16);
return((k<<16)+j);
we get a sequence of more than 2^59 32-bit integers before cycling.
The following segment in a (properly initialized) C procedure will generate more than
2^118 32-bit random integers from six random seed values i,j,k,l,m,n:
k=30903*(k&65535)+(k>>16);
j=18000*(j&65535)+(j>>16);
i=29013*(i&65535)+(i>>16);
l=30345*(l&65535)+(l>>16);
m=30903*(m&65535)+(m>>16);
n=31083*(n&65535)+(n>>16);
return((k+i+m)>>16)+j+l+n);
And it will do it much faster than any of several widely used generators designed to
use 16-bit integer arithmetic, such as that of Wichman-Hill that combines congruential
sequences for three 15-bit primes (Applied Statistics, v31, p188-190, 1982), period
about 2^42.
I call these multiply-with-carry generators. Here is an extravagant 16-bit example that
is easily implemented in C or Fortran. It does such a thorough job of mixing the bits
of the previous eight values that it is difficult to imagine a test of randomness it
could not pass:
x[n]=12013x[n-8]+1066x[n-7]+1215x[n-6]+1492x[n-5]+1776x[n-4]
+1812x[n-3]+1860x[n-2]+1941x[n-1]+carry mod 2^16.
The linear combination occupies at most 31 bits of a 32-bit integer. The bottom 16 is
the output, the top 15 the next carry. It is probably best to implement with 8 case
segments. It takes 8 microseconds on my PC. Of course it just provides 16-bit random
integers, but awfully good ones. For 32 bits you would have to combine it with another,
such as
x[n]=9272x[n-8]+7777x[n-7]+6666x[n-6]+5555x[n-5]+4444x[n-4]
+3333x[n-3]+2222x[n-2]+1111x[n-1]+carry mod 2^16.
Concatenating those two gives a sequence of 32-bit random integers (from 16 random 16-
bit seeds), period about 2^250. It is so awesome it may merit the Mother of All RNG's
title.
The coefficients in those two linear combinations suggest that it is easy to get long-
period sequences, and that is true. The result is due to Cemal Kac, who extended the
theory we gave for add-with-carry sequences: Choose a base b and give r seed values x[1],
...,x[r] and an initial 'carry' c. Then the multiply-with-carry sequence
x[n]=a1*x[n-1]+a2*x[n-2]+...+ar*x[n-r]+carry mod b,
where the new carry is the number of b's dropped in the modular reduction, will have
period the order of b in the group of residues relatively prime to m=ar*b^r+...+a1b^1-1.
Furthermore, the x's are, in reverse order, the digits in the expansion of k/m to the
base b, for some 0<k<m.
In practice b=2^16 or b=2^32 allows the new integer and the new carry to be the bottom
and top half of a 32- or 64-bit linear combination of 16- or 32-bit integers. And it
is easy to find suitable m's if you have a primality test: just search through candi-
date coefficients until you get an m that is a safeprime---both m and (m-1)/2 are prime. Then the period of the multiply-with-
carry sequence will be the prime (m-1)/2. (It can't be m-1 because b=2^16 or 2^32 is a
square.)
Here is an interesting simple MWC generator with period> 2^92, for 32-bit arithmetic:
x[n]=1111111464*(x[n-1]+x[n-2]) + carry mod 2^32.
Suppose you have functions, say top() and bot(), that give the top and bottom halves of
a 64-bit result. Then, with initial 32-bit x, y and carry c, simple statements such as
y=bot(1111111464*(x+y)+c)
x=y
c=top(y)
will, repeated, give over 2^92 random 32-bit y's.
Not many machines have 64 bit integers yet. But most assemblers for modern CPU's permit
access to the top and bottom halves of a 64-bit product.
I don't know how to readily access the top half of a 64-bit product in C. Can anyone
suggest how it might be done? (in integer arithmetic)
/* Mother *****************************************************************
| George Marsaglia's The mother of all random number generators producing
| uniformly distributed pseudo random 32 bit values with period about 2^250.
|
| The text of Marsaglia's posting is appended at the end of the function.
|
| The arrays mother1 and mother2 store carry values in their first element,
| and random 16 bit numbers in elements 1 to 8.
|
| These random numbers are moved to elements 2 to 9 and a new carry and
| number are generated and placed in elements 0 and 1.
|
| The arrays mother1 and mother2 are filled with random 16 bit values on
| first call of Mother by another generator. mStart is the switch.
|
| Returns:
| A 32 bit random number is obtained by combining the output of the two
| generators and returned in *pSeed. It is also scaled by 2^32-1 and re-
| turned as a double between 0 and 1
|
| SEED:
| The inital value of *pSeed may be any long value
|
| Bob Wheeler 8/8/94
**************************************************************************/
static double Mother(unsigned long *pSeed)
{
unsigned long number, number1, number2;
short n, *p;
unsigned short sNumber;
/* Initialize motheri with 9 random values the first time */
if (mStart)
{
sNumber= *pSeed&m16Mask; /* The low 16 bits */
number= *pSeed&m31Mask; /* Only want 31 bits */
p=mother1;
for (n=18;n--;)
{
number=30903*sNumber+(number>>16); /* One line multiply-with-cary */
*p++=sNumber=number&m16Mask;
if (n==9)
p=mother2;
}
/* make cary 15 bits */
mother1[0]&=m15Mask;
mother2[0]&=m15Mask;
mStart=0;
}
/* Move elements 1 to 8 to 2 to 9 */
memmove(mother1+2,mother1+1,8*sizeof(short));
memmove(mother2+2,mother2+1,8*sizeof(short));
/* Put the carry values in numberi */
number1=mother1[0];
number2=mother2[0];
/* Form the linear combinations */
number1+=1941*mother1[2]+1860*mother1[3]+1812*mother1[4]+1776*mother1[5]+
1492*mother1[6]+1215*mother1[7]+1066*mother1[8]+12013*mother1[9];
static int ssleay_rand_bytes(unsigned char *buf, int num)
{
static volatile int stirred_pool = 0;
int i,j,k,st_num,st_idx;
int num_ceil;
int ok;
long md_c[2];
unsigned char local_md[MD_DIGEST_LENGTH];
EVP_MD_CTX m;
#ifndef GETPID_IS_MEANINGLESS
pid_t curr_pid = getpid();
#endif
int do_stir_pool = 0;
#ifdef PREDICT
if (rand_predictable)
{
static unsigned char val=0;
for (i=0; i<num; i++)
buf[i]=val++;
return(1);
}
#endif
if (num <= 0)
return 1;
EVP_MD_CTX_init(&m);
/* round upwards to multiple of MD_DIGEST_LENGTH/2 */
num_ceil = (1 + (num-1)/(MD_DIGEST_LENGTH/2)) * (MD_DIGEST_LENGTH/2);
/*
* (Based on the rand(3) manpage:)
*
* For each group of 10 bytes (or less), we do the following:
*
* Input into the hash function the local 'md' (which is initialized from
* the global 'md' before any bytes are generated), the bytes that are to
* be overwritten by the random bytes, and bytes from the 'state'
* (incrementing looping index). From this digest output (which is kept
* in 'md'), the top (up to) 10 bytes are returned to the caller and the
* bottom 10 bytes are xored into the 'state'.
*
* Finally, after we have finished 'num' random bytes for the
* caller, 'count' (which is incremented) and the local and global 'md'
* are fed into the hash function and the results are kept in the
* global 'md'.
*/
CRYPTO_w_lock(CRYPTO_LOCK_RAND);
/* prevent ssleay_rand_bytes() from trying to obtain the lock again */
CRYPTO_w_lock(CRYPTO_LOCK_RAND2);
locking_thread = CRYPTO_thread_id();
CRYPTO_w_unlock(CRYPTO_LOCK_RAND2);
crypto_lock_rand = 1;
if (!initialized)
{
RAND_poll();
initialized = 1;
}
if (!stirred_pool)
do_stir_pool = 1;
ok = (entropy >= ENTROPY_NEEDED);
if (!ok)
{
/* If the PRNG state is not yet unpredictable, then seeing
* the PRNG output may help attackers to determine the new
* state; thus we have to decrease the entropy estimate.
* Once we've had enough initial seeding we don't bother to
* adjust the entropy count, though, because we're not ambitious
* to provide *information-theoretic* randomness.
*
* NOTE: This approach fails if the program forks before
* we have enough entropy. Entropy should be collected
* in a separate input pool and be transferred to the
* output pool only when the entropy limit has been reached.
*/
entropy -= num;
if (entropy < 0)
entropy = 0;
}
if (do_stir_pool)
{
/* In the output function only half of 'md' remains secret,
* so we better make sure that the required entropy gets
* 'evenly distributed' through 'state', our randomness pool.
* The input function (ssleay_rand_add) chains all of 'md',
* which makes it more suitable for this purpose.
*/
int n = STATE_SIZE; /* so that the complete pool gets accessed */
while (n > 0)
{
#if MD_DIGEST_LENGTH > 20
# error "Please adjust DUMMY_SEED."
#endif
#define DUMMY_SEED "...................." /* at least MD_DIGEST_LENGTH */
/* Note that the seed does not matter, it's just that
* ssleay_rand_add expects to have something to hash. */
ssleay_rand_add(DUMMY_SEED, MD_DIGEST_LENGTH, 0.0);
n -= MD_DIGEST_LENGTH;
}
if (ok)
stirred_pool = 1;
}
state_index+=num_ceil;
if (state_index > state_num)
state_index %= state_num;
/* state[st_idx], ..., state[(st_idx + num_ceil - 1) % st_num]
* are now ours (but other threads may use them too) */
md_count[0] += 1;
/* before unlocking, we must clear 'crypto_lock_rand' */
crypto_lock_rand = 0;
CRYPTO_w_unlock(CRYPTO_LOCK_RAND);
while (num > 0)
{
/* num_ceil -= MD_DIGEST_LENGTH/2 */
j=(num >= MD_DIGEST_LENGTH/2)?MD_DIGEST_LENGTH/2:num;
num-=j;
MD_Init(&m);
#ifndef GETPID_IS_MEANINGLESS
if (curr_pid) /* just in the first iteration to save time */
{
MD_Update(&m,(unsigned char*)&curr_pid,sizeof curr_pid);
curr_pid = 0;
}
#endif
MD_Update(&m,local_md,MD_DIGEST_LENGTH);
MD_Update(&m,(unsigned char *)&(md_c[0]),sizeof(md_c));
#ifndef PURIFY
MD_Update(&m,buf,j); /* purify complains */
#endif
k=(st_idx+MD_DIGEST_LENGTH/2)-st_num;
if (k > 0)
{
MD_Update(&m,&(state[st_idx]),MD_DIGEST_LENGTH/2-k);
MD_Update(&m,&(state[0]),k);
}
else
MD_Update(&m,&(state[st_idx]),MD_DIGEST_LENGTH/2);
MD_Final(&m,local_md);
for (i=0; i<MD_DIGEST_LENGTH/2; i++)
{
state[st_idx++]^=local_md[i]; /* may compete with other threads */
if (st_idx >= st_num)
st_idx=0;
if (i < j)
*(buf++)=local_md[i+MD_DIGEST_LENGTH/2];
}
}
EVP_MD_CTX_cleanup(&m);
if (ok)
return(1);
else
{
RANDerr(RAND_F_SSLEAY_RAND_BYTES,RAND_R_PRNG_NOT_SEEDED);
ERR_add_error_data(1, "You need to read the OpenSSL FAQ, "
"http://www.openssl.org/support/faq.html");
return(0);
}
}
static void ssleay_rand_add(const void *buf, int num, double add)
{
int i,j,k,st_idx;
long md_c[2];
unsigned char local_md[MD_DIGEST_LENGTH];
EVP_MD_CTX m;
int do_not_lock;
/*
* (Based on the rand(3) manpage)
*
* The input is chopped up into units of 20 bytes (or less for
* the last block). Each of these blocks is run through the hash
* function as follows: The data passed to the hash function
* is the current 'md', the same number of bytes from the 'state'
* (the location determined by in incremented looping index) as
* the current 'block', the new key data 'block', and 'count'
* (which is incremented after each use).
* The result of this is kept in 'md' and also xored into the
* 'state' at the same locations that were used as input into the
* hash function.
*/
/* check if we already have the lock */
if (crypto_lock_rand)
{
CRYPTO_r_lock(CRYPTO_LOCK_RAND2);
do_not_lock = (locking_thread == CRYPTO_thread_id());
CRYPTO_r_unlock(CRYPTO_LOCK_RAND2);
}
else
do_not_lock = 0;
if (!do_not_lock) CRYPTO_w_lock(CRYPTO_LOCK_RAND);
st_idx=state_index;
/* use our own copies of the counters so that even
* if a concurrent thread seeds with exactly the
* same data and uses the same subarray there's _some_
* difference */
md_c[0] = md_count[0];
md_c[1] = md_count[1];
for (k=0; k<j; k++)
{
/* Parallel threads may interfere with this,
* but always each byte of the new state is
* the XOR of some previous value of its
* and local_md (itermediate values may be lost).
* Alway using locking could hurt performance more
* than necessary given that conflicts occur only
* when the total seeding is longer than the random
* state. */
state[st_idx++]^=local_md[k];
if (st_idx >= STATE_SIZE)
st_idx=0;
}
}
EVP_MD_CTX_cleanup(&m);
if (!do_not_lock) CRYPTO_w_lock(CRYPTO_LOCK_RAND);
/* Don't just copy back local_md into md -- this could mean that
* other thread's seeding remains without effect (except for
* the incremented counter). By XORing it we keep at least as
* much entropy as fits into md. */
for (k = 0; k < sizeof md; k++)
{
md[k] ^= local_md[k];
}
if (entropy < ENTROPY_NEEDED) /* stop counting when we have enough */
entropy += add;
if (!do_not_lock) CRYPTO_w_unlock(CRYPTO_LOCK_RAND);