Variable transformation by hand (covariance matrix)

I am new to pymc, but with a lot of experience in Bayesian methods and sampling in cosmology.

In one project, we want to apply a very specific prior to some set of parameters, but we also want to sample a different set – this is the same as the known transformations but one that we would like to apply by hand. In this case, we know the Jacobian, so all we need to do is add this to the log-likelihood. The docs are slightly unclear about this case. Should we do this by pymc.model.core.Potential?

For the aficionados, the specific problem is that we want to solve is the following:

  • Parameters: the entries of the covariance matrix of a set of multivariate Gaussian random variables.
  • Prior: uniform on the entries Σᵢⱼ, constrained so that the matrix remains positive definite (yes, this really is what we want – not negotiable).
  • Transformation: We take the Cholesky decompostion L of Σ and sample (a) the off-diagonal entries of L and (b) the log of the diagonal entries of L (since Lᵢᵢ>0). This transformation makes it easy to implement the positive-definiteness constraint, since it corresponds to an unconstrained prior on (a) and (b)
  • We know the Jacobian of this transformation and can apply it directly. This is the part that needs the extra term.

Alternately, we could possibly sample different parameters, e.g., using the LKJ prior on the correlation (not covariance) matrix along with a simple half-uniform prior on the univariate variances, but we would still need to add a jacobian determinant (which would be straightforward once I understand the machinery).

The short answer is yes, you can do this all via pm.Potential.

The longer answer is to look here, which is where PyMC actually forms the likelihood. It adds together all terms in the priors (basic_RV), likelihood (observed_RV) and potentials (potentials). Every potential term in the model is simply added to the logp, one at a time. The only transformation that happens is that any random variables in the potential terms are changed to unconstrained value variables (which might also introduce a jacobian term).

If all you need to do is add a jacobian, you can use a Potential. If you need a logp and a log-det jacobian, you can do it as a single expression inside one potential, or you can put each in its own potential. It will be the same.

Re; LKJ Corr, we have it but the transform we use is bugged, see here. There’s a PR to fix it here, but it needs a little push over the finish line.

I didn’t see this earlier but this is exactly how Stan deals with positive-definite parameters. It’s a little trickier to get the normalization required for a correlation matrix (orthonormal rows), but it’s similar. After you apply the Jacobian, the distribution is an improper uniform distribution on covariance matrices with (N choose 2) + N parameters. In the correlation matrix case, it’s a proper uniform distribution since the entries are bounded in (-1,1) with only (N choose 2) free parameters.

The correlation matrix and its Cholesky forms are in the same section.

Ideally, you will never create a full covariance matrix and can work only with the Cholesky factors. For example, (inverse) Wisharts can be defined efficiently using the Cholesky factors and so can multivariate normal distributions. Sticking to Cholesky factors reduces the otherwise cubic solves to quadratic.

Or you can do what I think you’re suggesting and just put priors on the entries of the Cholesky factor and the diagonals, but it’s hard to know how to do that in an informed way. We like the LKJ because it shrinks correlations toward unity and lets you define your own independent distribution on the scales.