In this case I cannot use NUTS unfortunately. But still, you’re right, tuning for Slice or Metropolis or what have you should also do a fine job of adjusting the slice width or proposal variance or whatever. To that end, I’m wondering how to think about in which cases a transformed variable will give an easier time for the proposal method versus a harder time for the proposal method. I am curious about this theoretical question in general…
But in my case, tuning is indeed when the black-box likelihood fails. Here’s a MWE using that black-box likelihood, but it requires the external library pyddm.
import arviz as az
import numpy as np
import pymc as pm
import pytensor.tensor as pt
# Generate our "observed" data,
# 500 reaction time samples from a model with linear collapsing bound of bslope=1
samp = pyddm.Model(bound=pyddm.BoundCollapsingLinear(B=1, t=1)).solve().resample(500)
# Create op for the log likelihood (to be passed to a Potential). The only free parameter
# of this model is "t", the slope of the collapsing decision bound.
m = pyddm.Model(bound=pyddm.BoundCollapsingLinear(B=1, t=pyddm.Fittable()))
neg_logp = pyddm.get_model_loss(m, sample=samp, lossfunction=pyddm.LossRobustLikelihood)
# pymc model
with pm.Model() as model:
bslope = pm.HalfNormal("bslope", sigma=5)
obs = pm.Potential("obs", ddm_lik_op(bslope))
The log-likelihood given the synthetic data looks like this,
It’s a drift-diffusion model with a linear collapsing bound. The external library is solving a stochastic differential equation and some other things to get us the model likelihood. With its default settings for numerical precision, the likelihood will throw an exception for
Running sampling on the model above, an exception is thrown in tuning because (for some reason I’m still trying to understand exactly) really large values get proposed for bslope. This occurs reliably for all of Slice, Metropolis, and DEMetropolisZ. But I’ve been mostly working with slice sampling.
idata = pm.sample(step=pm.Slice(), chains=1, cores=1)
The issue is avoided if I set a tighter prior.
sigma=1 seems to mostly work. However, I’m surprised that any sigma much larger than
1 is an issue.
HalfNormal(sigma=5) has a 94% HDI of somewhere around [0.00…, 9.4]. It’s unexpected to me that values as high as
200 should be at all probable (and in this case, the proposals are often many orders of magnitude larger, sometimes overflowing to inf).
Further, sampling seems to have a much easier time if I don’t transform the
bslope variable. Even with
transform=None gives no problems during sampling/tuning! And thus my question…
FWIW it's also surprising to me that the exception has been occurring for me *only* during tuning. Setting `tune=0` in `pm.sample()` has more or less avoided the issue. I'm not sure if this is a red herring or not.
Also, indeed, setting a small initval helps a little bit but doesn’t avoid the issue. It mostly just delays how many steps occur before the proposal value shoots up.