I want to show that using a model with multiple observations and a Bernoulli distribution is equivalent to using the Binomial distribution. The real models are a bit more complex and involve mixtures, but I think it boils down to below example. The logp is probably different because of the constant in the binomial PMF so the two numbers shouldn’t be the same - but is there a way to use “something” that would show that they are equivalent?
with pm.Model() as m1:
x = pm.Normal("x", mu=0, sigma=1)
pm.Bernoulli("y", logit_p=x, observed=[0, 1, 0, 1, 1])
m1.logp().eval({'x': 0}) # returns -4.38467444
with pm.Model() as m2:
x = pm.Normal("x", mu=0, sigma=1)
pm.Binomial("y", logit_p=x, n=5, observed=3)
m2.logp().eval({'x': 0}) # returns -2.08208934
I don’t think there is a straightforward way to show that two models are equivalent.
I think you are right about the constant in the binomial PMF. When optimizing the calculation of logp, I am inclined to think that PyTensor decides to ignore any constant factors.
In your example, if you calculate the constant in the Binomial PMF (n over x), it is equal to (5 over 3) = 5! / (3! * 2!) = 10. And log(10) is approximately 2.3, which happens to be the difference between the values of logp that you get from the two models.
Maybe you can try to forse PyTensor to not do any optimizations to the computational graph and try to compare the logp again:
So they only differ by a constant that does not depend on \theta.
I’m not sure that it’s enough to show the derivatives are the same—we need the likelihood to be the same for it to be guaranteed to produce the same result with HMC.