You could try timing the logp and dlogp functions (use
model.compile_logpand.compile_dlogp.
Yeah it looks like dlogp calls are ~3500 times slower with my custom distribution compared to just using pm.Gamma. logp calls are “only” 60 times slower. Is there a way to check the dlogp graph or manually define the dlogp function? I don’t have high hopes that I could remedy the situation but would be interesting to at least check what the graph looks like. I found pytensor.dprint(pymc.logp()) but couldn’t find and equivalent function for gradients.
Since Kinetic is a normal, the truncation can only result from the logp term. In general these should be transformed to an unconstrained space during sampling
Can there be a transform for the logp in addition to the variable transformations?