Quality Assurance¶
Core Testing¶
A values in the below are the maximum output size where a bit generator or sequence of bit generators has passed PractRand. A – indicates that configuration is not relevant. Failures are marked with FAIL. Most bit generators were only tested in their default configuration. Non-default configurations are indicated by listing the keyword arguments to the bit generator. Two sets of tests were performed. The first tested all configurations using 128GB of data using PractRand’s extended set of tests and additional bit folding. The second set of tests used 4TB of data wit the standard set of tests and folding.
All bit generators have been tested using the same SeedSequence
initialized with the same 256-bits of entropy taken from random.org.
Method |
Seed Sequence |
Jumped |
|||
---|---|---|---|---|---|
Streams |
1 |
4 |
8196 |
4 |
8196 |
AESCounter |
4TB |
4TB |
4TB |
4TB |
4TB |
ChaCha(rounds=20) |
4TB |
4TB |
4TB |
4TB |
4TB |
ChaCha(rounds=8) |
4TB |
4TB |
4TB |
4TB |
4TB |
DSFMT⁴ |
4TB |
FAIL at 64 GB¹ |
4TB |
FAIL at 64 GB¹ |
FAIL at 64 GB¹ |
EFIIX64 |
4TB |
4TB |
4TB |
– |
– |
HC128 |
4TB |
4TB |
4TB |
– |
– |
JSF |
4TB |
4TB |
4TB |
– |
– |
JSF(seed_size=3) |
4TB |
4TB |
4TB |
– |
– |
LCG128Mix(output=upper) |
4TB |
4TB |
4TB |
4TB |
4TB |
LXM |
4TB |
4TB |
4TB |
4TB |
4TB |
MT19937⁴,⁵ |
4TB |
FAIL at 64 GB¹ |
4TB |
FAIL at 64 GB¹ |
4TB |
PCG64DXSM² |
4TB |
4TB |
4TB |
4TB |
4TB |
PCG64(variant=dxsm-128) |
4TB |
4TB |
4TB |
4TB |
4TB |
PCG64⁵ |
4TB |
4TB |
4TB |
4TB |
4TB |
Philox⁵ |
4TB |
4TB |
4TB |
4TB |
4TB |
Romu |
4TB |
4TB |
4TB |
– |
– |
Romu(variant=trio) |
4TB |
4TB |
4TB |
– |
– |
SFC64⁵ |
4TB |
4TB |
4TB |
– |
– |
SFC64(k=3394385948627484371) |
4TB |
4TB |
4TB |
– |
– |
SFC64(k=Weyl)³ |
4TB |
4TB |
4TB |
– |
– |
SFMT⁴ |
4TB |
FAIL at 64 GB¹ |
4TB |
FAIL at 64 GB¹ |
FAIL at 4 TB¹ |
SPECK128 |
4TB |
4TB |
4TB |
4TB |
4TB |
ThreeFry |
4TB |
4TB |
4TB |
4TB |
4TB |
Xoshiro256 |
4TB |
4TB |
4TB |
4TB |
4TB |
Xoshiro512 |
4TB |
4TB |
4TB |
4TB |
4TB |
Notes¶
¹ Failures at or before 128GB were generated by tests that used the expanded
set of tests and extra bt folds (-te 1
and -tf 2
). Failures at sample
sizes above 128GB were produces using the default configuration
(-te 0
and -tf 1
).
² PCG64DXSM and PCG64(variant=dxsm) are identical and so the latter not separately reported.
³ SFC64(k=weyl) uses distinct Weyl increments that have 50% or fewer non-zero bits.
⁴ The Mersenne Twisters begin to fail at 64GB. This is a known limitation of MT-family generators. These should not be used in large studies except when backward compatibility is required.
⁵ Identical output to the version included in NumPy 1.19.
Example Configuration¶
All configurations are constructed using the same template. The code below tests a
configuration using 8,196 streams of AESCounter
. The other
configurations simply make changes to either JUMPED
or STREAMS
.
import numpy as np
import randomgen as rg
ENTROPY = 86316980830225721106033794313786972513572058861498566720023788662568817403978
JUMPED = False
STREAMS = 8196
BIT_GENERATOR_KWARGS = {}
SEED_SEQ = np.random.SeedSequence(ENTROPY)
BASE_GEN = rg.AESCounter(SEED_SEQ, **BIT_GENERATOR_KWARGS)
if STREAMS == 1:
bit_gens = [BASE_GEN]
elif JUMPED:
bit_gens = [BASE_GEN]
for _ in range(STREAMS - 1):
bit_gens.append(bit_gens[-1].jumped())
else:
bit_gens = []
for child in SEED_SEQ.spawn(STREAMS):
bit_gens.append(rg.AESCounter(child, **BIT_GENERATOR_KWARGS))
output = 64
Additional Experiments¶
The best practice for using any of the bit generators is to initialize
a single SeedSequence
with a reasonably random seed,
and then to use this seed sequence to initialize all bit generators.
Some additional experiments were used to check that the quality of output
streams is not excessively sensitive to use that deviates from this best practice.
Correlated Seeds¶
While the recommended practice is to use a SeedSequence
,
it is natural to worry about bad seeds. A common sequence of bad seeds are
those which set a single bit to be non-zero: 1, 2, 4, 8, 16, and so on.
By default, bit generators use a SeedSequence
to transform
seed values into an initial state for the bit generator.
SeedSequence
is itself a random number generator that always
escapes low-entropy states – that is, those with many 0s or 1s – immediately.
All bit generators were tested with 8 streams using seeds of the form \(2^i\) for
i in 0, 1, …, 7. Only three bit generators failed this experiment: DSFMT
,
MT19937
, and SFMT
. These are all
members of the Mersenne Twister family which commonly fail BRank
tests.
Sequential Seeds¶
The recommended practice for constructing multiple Generator
objects
is to use the spawn()
method of SeedSequence
.
from numpy.random import default_rng, Generator, SeedSequence
from randomgen import Romu
NUM_STREAMS = 2**15
seed_seq = SeedSequence(5897100938578919857511)
# To use the default bit generator, which is not guaranteed to be stable
generators = [default_rng(child) for child in seed_seq.spawn(NUM_STREAMS)]
# To use a specific bit generator
generators = [Generator(Romu(child)) for child in seed_seq.spawn(NUM_STREAMS)]
It is common to see examples that use sequential seed that resemble:
generators = [default_rng(i) for i in range(NUM_STREAMS)]
This practice was examined with all bit generators using 8,196 streams seeded using 0, 1, 2, …, 8,195 by intertwining the output of the generators. None of the generators failed these tests.
Zero (0) Seeding¶
Bit generators use a SeedSequence
that always
escapes low-entropy states immediately to transform
seed values into an initial state for the bit generator.
To ensure that this is not an issue, all bit generators were tested using 4, 32 or 8196
streams using 128GB in PractRand with expanded tests and extra folding. The table
below reports only the configurations that failed. These were all Mersenne Twister-class
generators and so failure is attributable to the bit generator and not the seeding.
All other generators passed these tests.
Streams |
4 |
32 |
8196 |
---|---|---|---|
DSFMT |
FAIL at 64 GB |
FAIL at 64 GB |
– |
MT19937 |
FAIL at 64 GB |
FAIL at 64 GB |
– |
SFMT |
FAIL at 64 GB |
FAIL at 64 GB |
– |
The non-failures at 8196 are due to the relatively short length of each sequence tested since 128GB shared across 8196 streams only samples \(2^{37}/(2^{13}\times2^{3})=2^{21}\) values from each stream since each value is 8-bytes.