Implementing rounding (by manual integration) more efficiently

You could try timing the logp and dlogp functions (use model.compile_logp and .compile_dlogp.

Yeah it looks like dlogp calls are ~3500 times slower with my custom distribution compared to just using pm.Gamma. logp calls are “only” 60 times slower. Is there a way to check the dlogp graph or manually define the dlogp function? I don’t have high hopes that I could remedy the situation but would be interesting to at least check what the graph looks like. I found pytensor.dprint(pymc.logp()) but couldn’t find and equivalent function for gradients.

Since Kinetic is a normal, the truncation can only result from the logp term. In general these should be transformed to an unconstrained space during sampling

Can there be a transform for the logp in addition to the variable transformations?