Ideas for reparameterizing models/changing priors to avoid divergences

Hey everyone

I have been looking into solving problems related to divergens in my model and looking at different tips like Diagnosing Biased Inference with Divergences, Why hierarchical models are awesome, tricky, and Bayesian and Cookbook — Bayesian Modelling with PyMC3.

I wanted to ask if defining the prior as the logarithm of its possible value can help NUTS since priors with large values would be on the same scale or closer to the same scale as priors with smaller values. As an example take the slope and intercept of a linear function (i am using uniform priors because it is easy to translate the limits to log)

``````with Model() as linmodel:
sigma = HalfCauchy('sigma', beta=10, testval=1.)
slope =  Uniform('slope', lower=2.0 upper=4.0)

likelihood = Normal('y', mu=tt.exp(logintercept) + slope * x,
sigma=sigma, observed=y)

trace = sample(1000)
``````

Does such a change actually help the sampler? Even though the the parameters are closer to the same scale, is the sampler not more sensitive to changes in the intercept?

Another question i have is related to hierarchical models. In the examples i have seen like the ones linked above the parameters are centered on 0, but what if they are not? Can we just specify a non-centered model to be from another mu value? As an example if mu=5 could we write for The Eight Schools Model

``````with pm.Model() as NonCentered_eight:
mu = pm.Normal('mu', mu=5, sigma=5)
tau = pm.HalfCauchy('tau', beta=5)
theta_tilde = pm.Normal('theta_t', mu=5, sigma=1, shape=J)
theta = pm.Deterministic('theta', mu + tau * theta_tilde)
obs = pm.Normal('obs', mu=theta, sigma=sigma, observed=y)
``````

If anyone has any tips or sources for examples like these it would be greatly appreciated.

Hi Bob!
As for your second question, I’m not sure, but I don’t think you can do that: the point of the non-centered parametrization is to get the average effect (`mu`) out of the prior. `tau * theta_tilde` must then just be a deviation from the baseline.
But if `tau * theta_tilde` also contains a baseline value, then the model will be overparametrized in my opinion – an infinite number of different values of `mu` and `tau * theta_tilde` will produce the same sum `mu + tau * theta_tilde`.