PPS (EDITED to rule out Interpolated
): the copula example (and the original post) does point to a rather general solution, provided one can find a pymc distribution that approximate the marginals well enough and has the icdf
method implemented. Unfortunately, this rules out Interpolated
since it does not implements icdf
. So here is how it would go:
- After running modelA, identify marginal distributions that describe the posterior of modelA well – in the general case, that could be the
Interpolated
distribution, but it lacks theicdf
method so you need to find another distribution such asLogNormal
. - transform the variables using the
cdf
method of that distriubtion, to obtain uniformly distributed variables - transform again with the normal distribution’s inverse cdf method (ppf in scipy, icdf in pymc)
- fit a multivariate normal distribution to the result of 3 => the resulting
mu
andcov
(orchol
) will be your prior in modelB
Note the steps 1-4 need not be done in pymc
.
And in modelB (needs to be done within a pymc model):
5. Define a MvNormal distribution with the mu
and chol
parameters obtained in 4
6. Transform the variables back to their original, custom shape by reverting 3. (use the normal distribution cdf
method) and 4. (use the Interpolated distribution icdf
method)
It feels a bit involved, and I wonder if 5) and 6) could cause convergence issues (assuming they are tracked in the pymc graph and NUTS has to calculated gradients on these) – the linked notebook above mentions convergence issues --, but it could be tried. In practical settings, it is probably best to apply the transformations to only these variables that have a long tail and are clearly not normally distributed, and leave the rest untransformed.
EDIT PPPS: in the case of a LogNormal distribution, it’s enough to just take the logarithm of that variable and use that in the sampling, without the intermediate steps of converting to a uniform distribution.