Hi Tyfair, that is the expected behavior with using boundary conditions and potentials.
Using a switch condition to trigger a penalty will introduce a discontinuity on the posterior surface. It’s like a steep, abrupt drop off a cliff. We can visualize the drop off with a simple setup.
with pm.Model() as divergence_demo_1:
a = pm.Beta('a',1,1)
# if a is less than 1, no problem
# if a is greater than 1, medium sized problem
pm.Potential('discontinuity',pm.math.log(pm.math.switch(pm.math.lt(a,0.5), 1, 0.1)))
trace = pm.sample(chains=1)
Discontinuity can be tough on gradient-based samplers like NUTS. They will check the gradient at immediate locations between each sample. If the value of the posterior at the new location departs widely from what you would expect based on following the gradients, that triggers the divergence. So if the value of the posterior is -np.inf, that is always too surprising of a jump and always triggers a divergence.
The reason why you find such unusual behaviour, especially the bit about -9.99e2 vs -1e3 is that NUTS comes with a hard-coded threshold for what counts as too surprising a jump to be a divergence and what doesn’t. It just so happens that the difference in penalties between -9.99e2 and -1e3 hits that threshold.
