Parallel Random Number Generation

There are five strategies implemented that can be used to produce repeatable pseudo-random numbers across multiple processes (local or distributed). The final method is suitable for parallel applications where reproducibility is not a concern.

Using a SeedSequence

Using a SeedSequence is a universal way to initialize many streams. When using with a bit generator with a moderately large state (say 256-bits), the probability of a collision is negligible. This said, some generators can have difficult to detect correlation when producing many streams in corner cases.

import numpy as np
from randomgen import SFC64

NUM_STREAMS = 8196
seed_seq = SeedSequence(603484028490308141)
children = seed_seq.spawn(NUM_STREAMS)

streams = [SFC64(child) for child in children]

Distinct Keys in Cryptographic Generators

The cryptographic pseudo-random number generators (PRNGs) support using distinct keys to produce distinct sequence. Generators that support this form of parallelization include AESCounter, ChaCha, HC128, ThreeFry, Philox, and SPECK128.

This example shows how many streams can be created by passing in different index values in the second input while using the same seed in the first.

import numpy as np
from randomgen import AESCounter, random_entropy

NUM_STREAMS = 8196
keys = random_entropy(4 * NUM_STREAMS).view(np.uint64)
keys = set([tuple(key.tolist()) for key in keys.reshape((-1, 2))])
# Essentially 0 probability this is needed
while len(keys) < NUM_STREAMS:
    new_keys = random_entropy(4 * (NUM_STREAMS - len(keys))).view(np.uint8)
    new_keys = set([tuple(key.tolist()) for key in new_keys.reshape((-1, 2))])
    keys.update(new_keys)

# Distinct 128-bit numbers as a key encoded as 2 uint64 values
streams = [AESCounter(key=np.array(key,dtype=np.uint64)) for key in keys]

Advancing the PRNG’s state

Most of the cryptographic PRNGs are counter-based, and so support advancing which increments the counter. Advancing a PRNG updates the underlying PRNG state as if a given number of calls to the underlying PRNG have been made. In general there is not a one-to-one relationship between the number output random values from a particular distribution and the number of draws from the core PRNG. This occurs for two reasons:

  • The random values are simulated using a rejection-based method and so, on average, more than one value from the underlying PRNG is required to generate a single draw.

  • The number of bits required to generate a simulated value differs from the number of bits generated by the underlying PRNG. For example, two 16-bit integer values can be simulated from a single draw of a 32-bit PRNG.

Advancing the PRNG state resets any pre-computed random numbers. This is required to ensure exact reproducibility.

import numpy as np
from randomgen import SPECK128, SeedSequence

PHI = (np.sqrt(5) - 1) / 2
STEP = int(PHI * 2**96)
NUM_STREAMS = 8196

seed_seq = SeedSequence(603484028490308141)
base = SPECK128(seed_seq)
streams = [base]
for i in range(1, NUM_STREAMS):
    next_gen = SPECK128(seed_seq)
    streams.append(next_gen.advance(i * STEP))

In addition to the cryptographic PRNGs, the PCG-based generators also support advance: PCG64, LCG128Mix, and PCG32. Note that HC128 is based on a stream cipher and so does not support advancing a counter.

Jumping the PRNG state

jumped advances the state of the PRNG as if a large number of random numbers have been drawn, and returns a new instance with this state. Jumping is more universal than advance since the multiplier needed to jump some PRNGs is expensive to compute. However, this multiplier can be pre-computed for a fixed step size when the PRNG does not support an arbitrary advance which enables the jump. The specific number of as if draws varies by PRNG, and ranges from around \(2^{64}\) to \(2^{512}\). Additionally, the as if draws also depend on the size of the default random number produced by the specific PRNG. The PRNGs that support jumped, along with the period of the PRNG, the size of the jump and the bits in the default unsigned random are listed below.

PRNG

Period

Jump Size

Bits

AESCounter

\(2^{128}\)

\(2^{64}\)

64

ChaCha

\(2^{128}\)

\(2^{64}\)

64

LCG128Mix

\(2^{128}\)

\(\phi\)

64

DSFMT

\(2^{19937}\)

\(2^{128}\)

53

LXM

\(2^{256}\)

\(2^{128}\)

64

MT19937

\(2^{19937}\)

\(2^{128}\)

32

PCG32

\(2^{64}\)

\(\phi\)

32

PCG64

\(2^{128}\)

\(\phi\)

64

Philox

\(2^{256}\)

\(2^{128}\)

64

SFMT

\(2^{19937}\)

\(2^{128}\)

64

SPECK128

\(2^{128}\)

\(2^{64}\)

64

ThreeFry

\(2^{256}\)

\(2^{128}\)

64

Xoroshiro128

\(2^{128}\)

\(2^{64}\)

64

Xorshift1024

\(2^{1024}\)

\(2^{512}\)

64

Xoshiro256

\(2^{256}\)

\(2^{128}\)

64

Xoshiro512

\(2^{512}\)

\(2^{256}\)

64

jumped can be used to produce long blocks that are long enough to not overlap. A jump size of \(\phi\) is the integer value of \(\sqrt{5}/2 - 1\), the golden ratio, times the full period.

from randomgen.entropy import random_entropy
from randomgen import Xoshiro512, SeedSequence

NUM_STREAMS = 8196
seed = SeedSequence()
blocked_rng = []
last_rng = rng = Xoshiro512(seed)
for i in range(NUM_STREAMS):
    blocked_rng.append(last_rng)
    last_rng = last_rng.jumped()

Weyl Sequences

SFC64 uniquely supports generating streams using distinct Weyl sequences. Distinct sequences are produced by setting unique values in k` when initializing SFC64. These increments must be odd. There is some indication that sparse bit patterns in the Weyl sequence with 50% or fewer non-zero bits perform best.

import numpy as np
from randomgen.entropy import random_entropy
from randomgen import SFC64, SeedSequence

NUM_STREAMS = 8196


seed_seq = SeedSequence()
# Helper function to generate values with 32 non-zero bits
# The number of non-zero bits is configurable
weyl_inc = SFC64(seed_seq).weyl_increments(1000)
streams = [SFC64(seed_seq, k=k) for k in weyl_inc]

Non-reproducible Sequences

randomgen.rdrand.RDRAND uses a hardware based random number generator to produce non-reproducible sequences on random numbers. These can be used on any system that supports the RDRAND instruction, subject to the key limitation that results cannot be reproduced without storing the values generated.

from randomgen import RDRAND

NUM_STREAMS = 8196
streams = [RDRAND() for _ in range(NUM_STREAMS)]