Hi There!
I am trying to create a hierarchical bayesian model to infer the distribution shape of e_model from a number of observed data arrays [observed_r, std_r, observed_c, std_c, observed_ars, N] where N is the length of the other arrays. Now, my code works completely fine and as expected (both with simulated data and real data) when not including the Potential function “weight”, but results in upwards of 90% divergences when it is included, despite still outputting the desired result (just with extremely low effective sample size). It is a very similar issue to this thread that resulted in an inherent change in the code, so not sure if this requires the same (Good results but many divergent samples: what, me worry?)
Here is the full model (note I am using pymc v5.10.3):
def calc_implied_c(e_model, w_model, r_model):
g = (1 + e_model * pt.sin(w_model)) / pt.sqrt(1 - e_model**2)
return r_model * g ** 3
def prob_weighting(e_model, w_model, ars_model):
constraint = e_model < (1-1/ars_model)
obs = pm.math.switch(constraint, 1/ars_model *((1+e_model*pt.sin(w_model))/(1-e_model**2)), -np.inf)
return weighting
def log_likelihood (observed_r, std_r, observed_c, std_c, observed_ars, N):
with pm.Model() as model:
alpha_e = pm.Uniform('alpha', 0, 10, initval=2)
beta_e = pm.Uniform('beta', 0,10, initval=5)
e_model = pm.Beta('e_model', alpha=alpha_e, beta=beta_e, shape=N)
w_model = pm.Uniform('w_model', lower=0, upper=2*np.pi, shape=N)
weight = pm.Potential('weight', prob_weighting(e_model, w_model, observed_ars))
log_r = pm.Normal('log_r', np.log(observed_r), std_r/observed_r, shape=N)
log_c = pm.Deterministic('log_c', pt.log(calc_implied_c(e_model, w_model, r)))
log_obs_c = pm.Normal('log_obs_c', mu=log_c, sigma=std_c/observed_c, observed=np.log(observed_c), shape=N)
idata = pm.sample(10000, tune=40000, chains=4, cores=4, nuts={"target_accept":0.99}, return_inferencedata=True)
return idata
I also kept adjusting the not met criteria and found that my code results in no divergences when the fail condition is >-1e3 as in -9.999e2 and higher are fine, but obviously not want I want mathematically, while anything -1e3 or below as a penalty results in mass divergences.
I have debugged this for countless hours, found no use out of the parallel plots, changed coding functionality from pytensor to pymc math to using np inf, reparameterising the variables including the Potential function to log space, and nothing works. Something interesting is even a very simple setup such as the following:
with pm.Model() as model:
x = pm.Normal('x', mu=0., sigma=1.)
pm.Potential('obs_x',pt.switch(x < 0., -pt.inf, 0.))
idata = pm.sample()
regardless of -pt.inf or -np.inf, all the way down to -1e3 results in near 100% divergences.
I am not extremely familiar with pymc so if anyone knows why this is occurring or what the special constraint boundary between -9.9999…e2 and -1e3 is for the potential function is then please let me know!
Thanks.