Perhaps there’s a better place to continue this discussion, but how often is it the case that for a condition (event) set \mathcal{A} \subseteq \mathcal{X} you can define some differentiable metric d(x, \mathcal{A})? This would enable annealing on a Gibbs potential

\ell \propto \mathrm{pdf}(x)\prod_{j=1}^k \exp\left\{-\tau \times d(x, \mathcal{A}_j)\right\}

which would give gradients towards the condition set for higher-order methods.

@zemlyansky This is probably a better way to use `pm.Potential`

too (I find that this works about 2 orders of magnitude faster than a flat potential: `382.20it/s`

vs `18.93 it/s`

```
tau = 5.
with pm.Model() as model:
x = pm.Normal('x', mu=-10, sd=5)
y = pm.Deterministic('y', x * x)
d = pm.Deterministic('d', (y - 10) ** 2)
pm.Potential('cond', pm.math.switch(y < 10, 0, -tau * d))
#result = pm.sample(model=model, step=pm.Metropolis(), draws=1000)
result = pm.sample(1000, tune=100, chains=3, cores=1, nuts_kwargs={'target_accept':0.95})
pm.traceplot(result, varnames=['x']);
```

only ~2% of samples violate the condition, and the maximum value is `y=10.764...`

. Increasing \tau too much, as you noticed, tends to break the sampler; but if the number of violations is small they can be fixed by a quick rejection pass at the very end.

@colcarroll What’s missing for this approach is a callback in the sampler to do things like change `tau`

during tuning to ramp it to its final value.