Have you try sampling with core=1? This will disable the pymc3 parallel sampling so it does not conflict your custom forward computation (or at least surface the bug if there is any)
Have you try sampling with core=1? This will disable the pymc3 parallel sampling so it does not conflict your custom forward computation (or at least surface the bug if there is any)