randomgen.generator.ExtendedGenerator.multivariate_normal

ExtendedGenerator.multivariate_normal(mean, cov, size=None, check_valid='warn', tol=1e-8, *, method='svd')

Draw random samples from a multivariate normal distribution.

The multivariate normal, multinormal or Gaussian distribution is a generalization of the one-dimensional normal distribution to higher dimensions. Such a distribution is specified by its mean and covariance matrix. These parameters are analogous to the mean (average or “center”) and variance (standard deviation, or “width,” squared) of the one-dimensional normal distribution.

Parameters:
meanarray_like

Mean of the distribution. Must have shape (m1, m2, …, mk, N) where (m1, m2, …, mk) would broadcast with (c1, c2, …, cj).

covarray_like

Covariance matrix of the distribution. It must be symmetric and positive-semidefinite for proper sampling. Must have shape (c1, c2, …, cj, N, N) where (c1, c2, …, cj) would broadcast with (m1, m2, …, mk).

sizeint or tuple of ints, optional

Given a shape of, for example, (m,n,k), m*n*k samples are generated, and packed in an m-by-n-by-k arrangement. Because each sample is N-dimensional, the output shape is (m,n,k,N). If no shape is specified, a single (N-D) sample is returned.

check_valid{‘warn’, ‘raise’, ‘ignore’ }, optional

Behavior when the covariance matrix is not positive semidefinite.

tolfloat, optional

Tolerance when checking the singular values in covariance matrix. cov is cast to double before the check.

method{‘svd’, ‘eigh’, ‘cholesky’, ‘factor’}, optional

The cov input is used to compute a factor matrix A such that A @ A.T = cov. This argument is used to select the method used to compute the factor matrix A. The default method ‘svd’ is the slowest, while ‘cholesky’ is the fastest but less robust than the slowest method. The method eigh uses eigen decomposition to compute A and is faster than svd but slower than cholesky. factor assumes that cov has been pre-factored so that no transformation is applied.

Returns:
outndarray

The drawn samples, of shape determined by broadcasting the leading dimensions of mean and cov with size, if not None. The final dimension is always N.

In other words, each entry out[i,j,...,:] is an N-dimensional value drawn from the distribution.

Notes

The mean is a coordinate in N-dimensional space, which represents the location where samples are most likely to be generated. This is analogous to the peak of the bell curve for the one-dimensional or univariate normal distribution.

Covariance indicates the level to which two variables vary together. From the multivariate normal distribution, we draw N-dimensional samples, \(X = [x_1, x_2, ... x_N]\). The covariance matrix element \(C_{ij}\) is the covariance of \(x_i\) and \(x_j\). The element \(C_{ii}\) is the variance of \(x_i\) (i.e. its “spread”).

Instead of specifying the full covariance matrix, popular approximations include:

  • Spherical covariance (cov is a multiple of the identity matrix)

  • Diagonal covariance (cov has non-negative elements, and only on the diagonal)

This geometrical property can be seen in two dimensions by plotting generated data-points:

>>> mean = [0, 0]
>>> cov = [[1, 0], [0, 100]]  # diagonal covariance

Diagonal covariance means that points are oriented along x or y-axis:

>>> from numpy.random import ExtendedGenerator
>>> erg = ExtendedGenerator()
>>> import matplotlib.pyplot as plt
>>> x, y = erg.multivariate_normal(mean, cov, 5000).T
>>> plt.plot(x, y, 'x')
>>> plt.axis('equal')
>>> plt.show()

Note that the covariance matrix must be positive semidefinite (a.k.a. nonnegative-definite). Otherwise, the behavior of this method is undefined and backwards compatibility is not guaranteed.

References

[1]

Papoulis, A., “Probability, Random Variables, and Stochastic Processes,” 3rd ed., New York: McGraw-Hill, 1991.

[2]

Duda, R. O., Hart, P. E., and Stork, D. G., “Pattern Classification,” 2nd ed., New York: Wiley, 2001.

Examples

>>> from randomgen import ExtendedGenerator
>>> erg = ExtendedGenerator()
>>> mean = (1, 2)
>>> cov = [[1, 0], [0, 1]]
>>> x = erg.multivariate_normal(mean, cov, (3, 3))
>>> x.shape
(3, 3, 2)

The following is probably true, given that 0.6 is roughly twice the standard deviation:

>>> list((x[0,0,:] - mean) < 0.6)
[True, True] # random