Using LKJCorr together with MvNormal

I am having difficulties getting LKJCorr to be useful and am wondering if I’m missing something.

I want to sample from an MvNormal with unit standard deviations but with an unknown corellation matrix.
To do that, however, I first have to construct the full matrix from the rather cumbersome output of LKJCorr, using something like:

def LKJCorrMat(name,n,eta=1):
    c_arr = pm.LKJCorr(name+'_raw',eta=eta,n=n)
    tri = at.zeros( (n,n) )
    tri = at.subtensor.set_subtensor(tri[np.triu_indices(n,1)],c_arr)
    return pm.Deterministic(name,tri + tri.T + at.diag(np.ones(n)))

However, feeding that into the MvNormal as cov, it complains about LKJCorr not supporting Interval transform.

My current workaround is to use LKJCholeskyCov with sd_dist=pm.LogNormal(sigma=1e-5), but that has an obvious problem of introducing unnecessary extra randomness.

So I’m quite sure I’m missing something obvious. Can anyone help me here?

Hi Velochy!

Can you show a minimal example that produces the Interval transform error with this function? There should be nothing wrong with the way you are using LKJCorr.

Also what versions of PyMC and Pytensor are you running?

I think the Interval issue was the one reported in Lkjcorr returning error , and updating my pymc to newest version fixed it and it now seems to be working.

Still - is it meant to be this cumbersome to use so that you have to write your own convoluted wrapper for such a simple use case?

I don’t know if I’d call a 3 line helper function “cumbersome” or “convoluted”, but we accept PRs :slight_smile: There’s a helper function pm.math.expand_packed_triangular for covariance matrices. Perhaps it could have some additional arguments to allow it to handle unpacking correlation matrices as well?

Let me put the question this way:

Is there something useful you can do with the vector it outputs without converting it to the matrix form? Because I can’t think of a use case but that does not mean there is not one.

But if there isn’t, wouldn’t it make sense to include the helper function code inside the distribution and have it output the matrix?

@Velochy I think you’re right and other people have brought up that issue recently: ENH: LKJCorr should return matrix rather than vector · Issue #7074 · pymc-devs/pymc · GitHub

If you’re interested that would be a great contribution

1 Like

Now. In the ideal world I would like to get the cholesky factorization of just the correlation matrix, as that would allow me to choose between centered or non-centered parametrizations.

It seems the easiest way to approach that would actually be to allow LKJCholeskyCov to take a constant as sd_dist, as then providing a constant 1 there effectively provides you the Correlation matrix in either cholesky or full format, making a separate LKJCorr redundant (especially if that constant 1 is set as the default value)

I tried looking at the code for LKJCholeskyCov to see if I can make that change and do a PR for it.
Unfortunately, it seems to be over my head, as parsing the code requires understanding a lot of the pymc internals that are new to me.

I do actually need this functionality though, so I’m inclined to still give it a try if someone is willing to hold my hand a bit and explain some of the more complex bits.

I took some time to try and see how far I can get.

Getting it to run with a pytensor vector of ones turned out to be not very hard and required only very minor modifications. Disabling the type check in _LKJCholeskyCov, circumventing the change_dist_size in favor of a regular resize and setting logp to zero on the sd_vals part seemed to do the trick.

What I’m stuck on is that even after all those steps, somehow the end result comes back with sd values between 0.9 and 1.1 instead of exactly 1.0 each time and I don’t understand where that variablity could come in. I guess this is related to why LKJCorr has an IntervalTransform introduced as a default, but just putting np.clip(C,-1.1) to _LKJCholeskyCovBaseRV.rng_op which I’d expect to do the same does not change anything, so it’s clear I’m missing something pretty big.

Any chance you could help me figure out what is going on @ricardoV94 ?

Can you show what code you modified and how, as well as the code you are using to test your changes?

IIRC the CholeskyRV doesn’t really sample from sd_dist but marginalizes it out / accounts for its logp directly after factoring out the standard deviation from the value. I don’t think the constants would even be used in the logp?

The naive way to get a fixed sd_dist would be to use pm.DiracDelta.dist(1.0), which obviously will fail because pm.logp(sd_dist, sd_vals) will always be -inf. But maybe that helps understanding how things are defined internally?

I’m starting to realize just how much out of my depth I am as I’m having trouble even understanding your response.

But I think I kind of realize why my naive approach does not work as expected.
So just to make sure I understand things correctly, I’d like to state some things that are probably obvious and ask you to correct me if I’m wrong:

a) rng_fn is actually not used in sampling as that is purely defined by initial conditions and logp
b) If something is part of an rv-s parameters, it is automatically included in the sampling as a variable

If these statements are more or less correct then I think I get it.
Because I was trying to fix rng_fn to do what I wanted and just bypass the pm.logp by simply setting it to zero if using a scalar. Which now seems incredibly stupid if (a) and (b) are true :slight_smile:

rng_fn is used only for forward sampling (pm.draw, pm.sample_prior_predictive, and pm.sample_posterior_predictive).

logp is used for backward/logp-based sampling (pm.sample, pm.fit, pm.find_MAP).

b) If something is part of an rv-s parameters, it is automatically included in the sampling as a variable

Not sure what you mean here. If it helps, anytime you have a dist input it usually means the RV will account for its logp during sampling. This may involve marginalizing a random variable like in pm.Mixture, or just transforming it and accounting for the effects of the transformation like in pm.Censored. I’m not quite sure in which bucket the sd_dist of the CholeskyRV falls, but I guess the latter, because they are still “recoverable” from the Deterministics.

The way I think about it, the sd is not really an input to the Choleksy RV but something that it spits out (more obvious in backward sampling), together with the rest of the matrix, when given a parametrized prior distribution. The sd is intertwined with the whole matrix in a way that unconstrains MCMC sampling.

I think I get it now.

But next question. LKJCholeskyCov has 4 degrees of indirection as opposed to the regular 2 (RV and then dist). One of the layers is the following:

# _LKJCholeskyCovBaseRV requires a properly shaped `D`, which means the variable can't
# be safely resized. Because of this, we add the thin SymbolicRandomVariable wrapper
class _LKJCholeskyCovRV(SymbolicRandomVariable):
    default_output = 1
    _print_name = ("_lkjcholeskycov", "\\operatorname{_lkjcholeskycov}")

    def update(self, node):
        return {node.inputs[0]: node.outputs[0]}

Am I correct in assuming this is because it has to return a square matrix of size n - in which case a direct corollary is that if we want LKJCorr to also return a matrix, the same structure would need to be replicated?

I don’t think it has to do with shapes.

The layers are:

  1. User facing function with helpful deterministics: LKJCholeskyCov
  2. Pure PyMC distribution _LKJCholeskyCov. This distribution combines sd_dist with a LKJCholeskyCovBaseRV. We could have implemented directly a _LKJCholeskyCovRV as a PyTensor RandomVariable Op, but this Op would need to know how to sample sd_dist in its rng_fn, which is not very composable as sd_dist can be an arbitrary provided distribution (or even a symbolic distribution, see next sentence). It’s easier to combine them “symbolically” as a PyTensor graph, which internally in PyMC we do with these SymbolicRandomVariable: pymc/pymc/distributions/distribution.py at 7bb2ccd4202e76a051c4b65fb2bbfc864441af04 · pymc-devs/pymc · GitHub
    Anyway, this PyMC distribution class sole purpose is to create these symbolic graphs, which is basically: resize sd_dist + pass it as a symbolic input to LKJCholeskyCovBaseRV.
  3. An LKJCholeskyCovBaseRV that is a PyTensor RandomVariable Op. This is the other building block besides sd_dist that _LKJCholesky.rv_op needs. This is only needed for forward sampling, which is actually more complicated than backward sampling for this distribution. Anyway this Op takes some pre-drawn sd_dist values as inputs (called D internally), and returns a CholeskyCov random draw from this.

Just like the SymbolicRandomVariable part could have been avoided with more work, so could the RandomVariable part, by instead implementing the whole forward graph symbolically inside the SymbolicRandomVariable. It was just much easier at the time to implement it as an Op that uses numpy routines (some of which like np.einsum are not even available in PyTensor), and, more importantly, allow us to reuse LKJCorrRV._random_corr_matrix which is also just numpy code and also not trivial to implement in PyTensor. Also this code already existed, so it was double harder!

Hope that makes some sense?

Note that logp is implemented only for the level 2 RV.

It sounds like you would like to work at level 3 (where you provide pre-drawn sd_dist values as inputs). For that you would need to implement the correct logp for what LKJCholeskyCovBaseRV represents. Even if you can figure out the logp it may be hard to sample from it with MCMC unless you can come up with a nice transformation.

To help making it more concrete, your PyMC model would then look something like:

sd_dist_vals = ... # prior or constant
packed_chol = pm._LKJCholeskyCovBase("packed_chol", ..., D=sd_dist_vals)   # not a thing now
chol, corr, stds = pm.LKJCholeksyCov.helper_deterministics(n, packed_chol)

CC @aseyboldt in case he has ideas, as he was the architect of the whole thing :smiley:

I think trying to modify LKJCholeskyCov is the wrong approach. The original paper by Lewandowski et al. (2000), in section 3.2 “Algorithm for generating random correlation matrices” note that you can work with Cholesky decomposed matrices to generate the final correlation matrix. Looking at our implementation here, it looks like P might already be the cholesky factorized correlation matrix. It might be as easy as adding a flag to the LKJCorr RV with default chol=False to return P instead of C if chol=True.

Might also need adjustments in the logp function to compute C there if chol=True, not sure. But it seems like an easier refactor target than LKJCholeskyCov

1 Like

That means you need to define logp for P directly as well. And hope that it doesn’t change the sampling constraints (or there’s an easy transform to unbound the values of P)

If chol=True, we can’t just compute C from P in the logp function and keep everything else as-is?

I would be surprised, unless there is no change of volume / integration needed out between P and C. Given the einsums and extra y/z draws this seems unlikely.

Like if you had a distribution that generated log-normal draws from normal draws internally, and asked it to now return the normal draws optionally, you wouldn’t be able to just reuse the logp of LogNormal. You would need to undo/skip the jacobian adjustment that must be there somewhere in the definition.

1 Like

You could also just sample it by recreating the generative graph:

        beta = eta - 1.0 + n / 2.0
        r12 = 2.0 * stats.beta.rvs(a=beta, b=beta, size=flat_size, random_state=rng) - 1.0
        P = np.full((flat_size, n, n), np.eye(n))
        P[..., 0, 1] = r12
        P[..., 1, 1] = np.sqrt(1.0 - r12**2)

Replacing that stats.beta.rvs by a PyMC Beta prior and the deterministic numpy operations by PyTensor ones

I don’t know if there’s some indeterminacy in those beta draws though