Parallel Random Number Generation¶
There are five strategies implemented that can be used to produce repeatable pseudo-random numbers across multiple processes (local or distributed). The final method is suitable for parallel applications where reproducibility is not a concern.
Using a SeedSequence¶
Using a SeedSequence
is a universal way to initialize many streams.
When using with a bit generator with a moderately large state (say 256-bits), the probability
of a collision is negligible. This said, some generators can have difficult to detect
correlation when producing many streams in corner cases.
import numpy as np
from randomgen import SFC64
NUM_STREAMS = 8196
seed_seq = SeedSequence(603484028490308141)
children = seed_seq.spawn(NUM_STREAMS)
streams = [SFC64(child) for child in children]
Distinct Keys in Cryptographic Generators¶
The cryptographic pseudo-random number generators (PRNGs) support using distinct
keys to produce distinct sequence.
Generators that support this form of parallelization include AESCounter
,
ChaCha
, HC128
,
ThreeFry
, Philox
, and
SPECK128
.
This example shows how many streams can be created by passing in different index values in the second input while using the same seed in the first.
import numpy as np
from randomgen import AESCounter, random_entropy
NUM_STREAMS = 8196
keys = random_entropy(4 * NUM_STREAMS).view(np.uint64)
keys = set([tuple(key.tolist()) for key in keys.reshape((-1, 2))])
# Essentially 0 probability this is needed
while len(keys) < NUM_STREAMS:
new_keys = random_entropy(4 * (NUM_STREAMS - len(keys))).view(np.uint8)
new_keys = set([tuple(key.tolist()) for key in new_keys.reshape((-1, 2))])
keys.update(new_keys)
# Distinct 128-bit numbers as a key encoded as 2 uint64 values
streams = [AESCounter(key=np.array(key,dtype=np.uint64)) for key in keys]
Advancing the PRNG’s state¶
Most of the cryptographic PRNGs are counter-based, and so support advancing which increments the counter. Advancing a PRNG updates the underlying PRNG state as if a given number of calls to the underlying PRNG have been made. In general there is not a one-to-one relationship between the number output random values from a particular distribution and the number of draws from the core PRNG. This occurs for two reasons:
The random values are simulated using a rejection-based method and so, on average, more than one value from the underlying PRNG is required to generate a single draw.
The number of bits required to generate a simulated value differs from the number of bits generated by the underlying PRNG. For example, two 16-bit integer values can be simulated from a single draw of a 32-bit PRNG.
Advancing the PRNG state resets any pre-computed random numbers. This is required to ensure exact reproducibility.
import numpy as np
from randomgen import SPECK128, SeedSequence
PHI = (np.sqrt(5) - 1) / 2
STEP = int(PHI * 2**96)
NUM_STREAMS = 8196
seed_seq = SeedSequence(603484028490308141)
base = SPECK128(seed_seq)
streams = [base]
for i in range(1, NUM_STREAMS):
next_gen = SPECK128(seed_seq)
streams.append(next_gen.advance(i * STEP))
In addition to the cryptographic PRNGs,
the PCG-based generators also support advance
: PCG64
,
LCG128Mix
, and PCG32
.
Note that HC128
is based on a stream cipher and so
does not support advancing a counter.
Jumping the PRNG state¶
jumped
advances the state of the PRNG as if a large number of random
numbers have been drawn, and returns a new instance with this state. Jumping
is more universal than advance
since the multiplier needed to jump some PRNGs
is expensive to compute. However, this multiplier can be pre-computed for a fixed
step size when the PRNG does not support an arbitrary advance which enables the jump.
The specific number of as if draws varies by PRNG, and ranges from around \(2^{64}\) to
\(2^{512}\). Additionally, the as if draws also depend on the size of
the default random number produced by the specific PRNG. The PRNGs that
support jumped
, along with the period of the PRNG, the size of the jump
and the bits in the default unsigned random are listed below.
PRNG |
Period |
Jump Size |
Bits |
---|---|---|---|
AESCounter |
\(2^{128}\) |
\(2^{64}\) |
64 |
ChaCha |
\(2^{128}\) |
\(2^{64}\) |
64 |
LCG128Mix |
\(2^{128}\) |
\(\phi\) |
64 |
DSFMT |
\(2^{19937}\) |
\(2^{128}\) |
53 |
LXM |
\(2^{256}\) |
\(2^{128}\) |
64 |
MT19937 |
\(2^{19937}\) |
\(2^{128}\) |
32 |
PCG32 |
\(2^{64}\) |
\(\phi\) |
32 |
PCG64 |
\(2^{128}\) |
\(\phi\) |
64 |
Philox |
\(2^{256}\) |
\(2^{128}\) |
64 |
SFMT |
\(2^{19937}\) |
\(2^{128}\) |
64 |
SPECK128 |
\(2^{128}\) |
\(2^{64}\) |
64 |
ThreeFry |
\(2^{256}\) |
\(2^{128}\) |
64 |
Xoroshiro128 |
\(2^{128}\) |
\(2^{64}\) |
64 |
Xorshift1024 |
\(2^{1024}\) |
\(2^{512}\) |
64 |
Xoshiro256 |
\(2^{256}\) |
\(2^{128}\) |
64 |
Xoshiro512 |
\(2^{512}\) |
\(2^{256}\) |
64 |
jumped
can be used to produce long blocks that are long enough to not
overlap. A jump size of \(\phi\) is the integer value of \(\sqrt{5}/2 - 1\), the
golden ratio, times the full period.
from randomgen.entropy import random_entropy
from randomgen import Xoshiro512, SeedSequence
NUM_STREAMS = 8196
seed = SeedSequence()
blocked_rng = []
last_rng = rng = Xoshiro512(seed)
for i in range(NUM_STREAMS):
blocked_rng.append(last_rng)
last_rng = last_rng.jumped()
Weyl Sequences¶
SFC64
uniquely supports generating streams using distinct
Weyl sequences. Distinct sequences are produced by setting unique values in k`
when initializing SFC64
. These increments must be odd.
There is some indication that sparse bit patterns in the Weyl sequence with 50%
or fewer non-zero bits perform best.
import numpy as np
from randomgen.entropy import random_entropy
from randomgen import SFC64, SeedSequence
NUM_STREAMS = 8196
seed_seq = SeedSequence()
# Helper function to generate values with 32 non-zero bits
# The number of non-zero bits is configurable
weyl_inc = SFC64(seed_seq).weyl_increments(1000)
streams = [SFC64(seed_seq, k=k) for k in weyl_inc]
Non-reproducible Sequences¶
randomgen.rdrand.RDRAND
uses a hardware based random number generator to
produce non-reproducible sequences on random numbers. These can be used on any
system that supports the RDRAND instruction, subject to the key limitation that
results cannot be reproduced without storing the values generated.
from randomgen import RDRAND
NUM_STREAMS = 8196
streams = [RDRAND() for _ in range(NUM_STREAMS)]