Hi everyone,
I’m working on comparing models in PyMC, especially ones where I’ve marginalized latent discrete variables. I’m wondering if it’s straightforward to use the log-likelihood output from PyMC for this comparison, or if there are additional steps needed.
Does anyone have experience or insights on this? Also, are there any references you could recommend on this topic?
Thanks in advance for your help!
Depending on what types of latent variables you’re marginalizing, you could try the new automatic marginalization features in pymc-experimental. They now support automatic un-marginalization as well, so you can fit the model then automatically recover the posterior distributions over the discrete variables (though the .unmarginalize()
method is currently undocumented, but there’s a PR for an example notebook here).
I’m a big dummy, the PR was merged. The example notebook to check out is here
You can of course recover things by hand by using the logp
values directly. There’s an example of doing this at the end of this blog post (scroll down to “recovering mixture indexes”, although the whole post is important to read if you’re going to go this route).
2 Likes
Big thanks for the quick and insightful reply! That is very useful for my work ! Can’t wait to try the automatic marginalization features.