I’m probably be missing something but I would think if you can derive p at each time step you can also marginalize it. Not with pm.marginalize because that assumes a constant transition probability, but the algorithm doesn’t assume that?
p
pm.marginalize