Sampler initialization error with a model containing an LKJCholeskyCov distribution

caclav · July 15, 2023, 8:45am

Hi, as reported initially in this thread (Dynamic shaping, "round" function, JAX, and a "few" more questions - #21 by caclav), the code below, which is basically identical to the following example (LKJ Cholesky Covariance Priors for Multivariate Normal Models — PyMC example gallery), returns an error when the sampler’s inivals parameter is initialized to model.initial_point().

import numpy as np
import pymc as pm

# configuration
RANDOM_SEED = 12345
rng = np.random.default_rng(RANDOM_SEED)
N = 10000

# true parameters
mu_actual = np.array([1.0, -2.0])
sigmas_actual = np.array([0.7, 1.5])
Rho_actual = np.matrix([[1.0, -0.4], [-0.4, 1.0]])
Sigma_actual = np.diag(sigmas_actual) * Rho_actual * np.diag(sigmas_actual)

# simulated data generation
x = rng.multivariate_normal(mu_actual, Sigma_actual, size=N)

# PyMC Model definition
coords = {"axis": ["y", "z"],
          "axis_bis": ["y", "z"],
          "obs_id": np.arange(N)}
with pm.Model(coords=coords,) as model:
    chol, corr, stds = pm.LKJCholeskyCov(
        "chol", n=2, eta=2.0, sd_dist=pm.Exponential.dist(1.0, shape=2)
    )
    cov = pm.Deterministic("cov", chol.dot(chol.T), dims=("axis", "axis_bis"))
    
    mu = pm.Normal("mu", 0.0, sigma=1.5, dims="axis")
    obs = pm.MvNormal("obs", mu, chol=chol, observed=x, dims=("obs_id", "axis"))

init_pts = model.initial_point()
if False: # AN ERROR IS RAISED IN THIS CASE
    pass
else: # THE ISSUE DOESN'T HAPPEN IN THIS CASE
    del init_pts["chol_cholesky-cov-packed__"]
    # line below facultative/optional
    # init_pts['chol'] = np.array([0.70931675, -0.61463295,  1.36559879])

# NUTS MCMC sampling and inference - THE ERROR HAPPENS HERE!
with model:
    idata = pm.sample(initvals=init_pts,
        idata_kwargs={"dims": {"chol_stds": ["axis"],
                               "chol_corr": ["axis", "axis_bis"]}},
    )

The error is:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Traceback (most recent call last):

  File ~/InstProg/anaconda3/envs/myenv/lib/python3.10/site-packages/spyder_kernels/py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File ~/Bureau/detector_work/IPC/_bug_test.py:49
    idata = pm.sample(initvals=init_pts,

  File ~/InstProg/anaconda3/envs/myenv/lib/python3.10/site-packages/pymc/sampling/mcmc.py:682 in sample
    initial_points, step = init_nuts(

  File ~/InstProg/anaconda3/envs/myenv/lib/python3.10/site-packages/pymc/sampling/mcmc.py:1327 in init_nuts
    initial_points = _init_jitter(

  File ~/InstProg/anaconda3/envs/myenv/lib/python3.10/site-packages/pymc/sampling/mcmc.py:1204 in _init_jitter
    ipfns = make_initial_point_fns_per_chain(

  File ~/InstProg/anaconda3/envs/myenv/lib/python3.10/site-packages/pymc/initial_point.py:86 in make_initial_point_fns_per_chain
    make_initial_point_fn(

  File ~/InstProg/anaconda3/envs/myenv/lib/python3.10/site-packages/pymc/initial_point.py:134 in make_initial_point_fn
    sdict_overrides = convert_str_to_rv_dict(model, overrides or {})

  File ~/InstProg/anaconda3/envs/myenv/lib/python3.10/site-packages/pymc/initial_point.py:47 in convert_str_to_rv_dict
    initvals[rv] = model.rvs_to_transforms[rv].backward(initval, *rv.owner.inputs)

  File ~/InstProg/anaconda3/envs/myenv/lib/python3.10/site-packages/pymc/distributions/transforms.py:157 in backward
    return pt.set_subtensor(value[..., self.diag_idxs], pt.exp(value[..., self.diag_idxs]))

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

I hope I haven’t made any stupid mistakes in this code sample and that I won’t waste your time.

((by the way, I think that in the example mentioned - which is extremely useful for a project I’m working on! - the rng_seeder parameter passed as an input parameter to pm.Model is obsolete in the latest versions of PyMC.))

Thank you for your help!

Simon · July 17, 2023, 7:55am

My apologies, I should’ve read your question more carefully before answering. As you mentioned the problem is with the initial values you assigned. I can run the model with no issues using the default arguments of the sampler:

with pm.Model(coords=coords,) as model:
    chol, corr, stds = pm.LKJCholeskyCov(
        "chol", n=2, eta=2.0, sd_dist=pm.Exponential.dist(1, shape=2)
    )
    cov = pm.Deterministic("cov", chol.dot(chol.T), dims=("axis", "axis_bis"))

    mu = pm.Normal("mu", 0.0, sigma=1.5, dims="axis")
    obs = pm.MvNormal("obs", mu, chol=chol, observed=x, dims=("obs_id", "axis"))

with model:
    idata = pm.sample(1000)

az.summary(idata)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [chol, mu]
 |████████████| 100.00% [8000/8000 00:26<00:00 Sampling 4 chains, 0 divergences]Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 46 seconds.

'''
                  mean     sd  hdi_3%  ...  ess_bulk  ess_tail  r_hat
mu[y]            0.995  0.007   0.983  ...    3905.0    3155.0    1.0
mu[z]           -1.977  0.015  -2.007  ...    3709.0    3044.0    1.0
chol[0]          0.708  0.005   0.699  ...    4736.0    3434.0    1.0
chol[1]         -0.606  0.014  -0.632  ...    4359.0    3084.0    1.0
chol[2]          1.362  0.010   1.345  ...    5933.0    3117.0    1.0
chol_corr[0, 0]  1.000  0.000   1.000  ...    4000.0    4000.0    NaN
chol_corr[0, 1] -0.406  0.008  -0.422  ...    4562.0    3382.0    1.0
chol_corr[1, 0] -0.406  0.008  -0.422  ...    4562.0    3382.0    1.0
chol_corr[1, 1]  1.000  0.000   1.000  ...    3124.0    3434.0    1.0
chol_stds[0]     0.708  0.005   0.699  ...    4736.0    3434.0    1.0
chol_stds[1]     1.491  0.011   1.472  ...    5046.0    3070.0    1.0
cov[y, y]        0.501  0.007   0.488  ...    4736.0    3434.0    1.0
cov[y, z]       -0.429  0.012  -0.449  ...    3899.0    3118.0    1.0
cov[z, y]       -0.429  0.012  -0.449  ...    3899.0    3118.0    1.0
cov[z, z]        2.223  0.031   2.167  ...    5046.0    3070.0    1.0

Results seem sensible, maybe you don’t really need to assign initial values. Is there a specific reason for using them?

caclav · July 17, 2023, 8:07am

Thank you for investigating where this error comes from. Yes, according to the PyMC example mentioned above, or the documentation of LKJCholeskyCov (see the sd_dist parameter), shape should be equal to n. Indeed, it seems there is an issue with the initialization based on model.initial_point(), which, unless I am mistaken, shouldn’t happen.

Simon · July 17, 2023, 8:09am

Yes, I’m sorry, I edited my answer a second time . I didn’t read your question carefully. So I was just asking whether you actually need to assign initial values.

Simon · July 17, 2023, 8:45am

So, just to properly answer you question. The problem is the name assigned to the LKJ prior. When you use init_pts = model.initial_point(), you will get the internal name.

init_pts = model.initial_point()

init_pts
Out[39]: {'chol_cholesky-cov-packed__': array([0., 0., 0.]), 'mu': array([0., 0.])}

Though the documents state that the names of transformed variables should be used, it actually works with the name of the assigned variable:

init_pts = {'chol': np.array([1, 1, 1]), 'mu': np.array([1, 1])} ##note I use 1 as initial value, as with 0 it may break

with model:
    idata = pm.sample(initvals=init_pts,
        idata_kwargs={"dims": {"chol_stds": ["axis"],
                               "chol_corr": ["axis", "axis_bis"]}},
    )
    
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [chol, mu]
 |████████████| 100.00% [8000/8000 00:54<00:00 Sampling 4 chains, 0 divergences]Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 74 seconds.

caclav · July 17, 2023, 8:56pm

Thank you again @Simon for taking the time to understand where the error in this code sample came from and for the solution you provided!

This is also what I ended up doing, i.e. using the names of the un-transformed parameters in the initialization dictionary supplied to the sampler, even though the documentation suggests otherwise.

Of course the most important thing is that it works, but it is still a bug I think, albeit a minor one.

Topic		Replies	Views
Initial values not being used when sampling v5	7	2020	July 15, 2022
`LKJCholeskyCov` requires `dist_params`? (v4.0) Questions	2	787	September 22, 2021
Bad initial energy: inf. The model might be misspecified Questions	5	3565	June 29, 2018
Unexpected Initial evaluation results v5 modeling	6	790	June 30, 2022
Which variables show up in sampling chain and which don't? Questions	5	599	July 24, 2018

Sampler initialization error with a model containing an LKJCholeskyCov distribution

Related topics