Hey everyone
I have been looking into solving problems related to divergens in my model and looking at different tips like Diagnosing Biased Inference with Divergences, Why hierarchical models are awesome, tricky, and Bayesian and Cookbook — Bayesian Modelling with PyMC3.
I wanted to ask if defining the prior as the logarithm of its possible value can help NUTS since priors with large values would be on the same scale or closer to the same scale as priors with smaller values. As an example take the slope and intercept of a linear function (i am using uniform priors because it is easy to translate the limits to log)
with Model() as linmodel:
sigma = HalfCauchy('sigma', beta=10, testval=1.)
logintercept = Uniform('intercept', lower=tt.log(100.0), upper=tt.log(200.0))
slope = Uniform('slope', lower=2.0 upper=4.0)
likelihood = Normal('y', mu=tt.exp(logintercept) + slope * x,
sigma=sigma, observed=y)
trace = sample(1000)
Does such a change actually help the sampler? Even though the the parameters are closer to the same scale, is the sampler not more sensitive to changes in the intercept?
Another question i have is related to hierarchical models. In the examples i have seen like the ones linked above the parameters are centered on 0, but what if they are not? Can we just specify a non-centered model to be from another mu value? As an example if mu=5 could we write for The Eight Schools Model
with pm.Model() as NonCentered_eight:
mu = pm.Normal('mu', mu=5, sigma=5)
tau = pm.HalfCauchy('tau', beta=5)
theta_tilde = pm.Normal('theta_t', mu=5, sigma=1, shape=J)
theta = pm.Deterministic('theta', mu + tau * theta_tilde)
obs = pm.Normal('obs', mu=theta, sigma=sigma, observed=y)
If anyone has any tips or sources for examples like these it would be greatly appreciated.