Decreasing rhat for Black-Box Likelihood Model

When you say “it’s PDF”, you mean you have a PDF for the external model? Or you mean you have an assumed distribution over the data, and the external model acts like a (very complicated) deterministic transformation of the parameter samples?

If it’s the latter, it might be nice to look into the simulation api if you haven’t. It does likelihood-free parameter calibration by matching sample moments to data moments.

If it’s the former, is it possible to use approximate numerical derivatives? Statsmodels uses this approach to avoid trying to differentiate through the Kalman Filter in it’s Bayesian example (example code in PyMC3 unfortunately – there are some changes to the way you would implement this now, especially w.r.t the use of DensityDist). The way they do it in the example is a bit opaque, but the model.score function calls numdifftools. NUTS + approximate gradients might do better than metropolis. It’s worth a try at any rate.

I recently had a highly nonlinear model I couldn’t get nice gradients for, and I had some success using the emcee package to do ensemble sampling. You can use PyMC to compile a likelihood function then hand it to emcee to generate global samples without gradients. The tradeoff is you have to use a ton of chains.

Just some thoughts, but they might not be helpful.