I am trying to put together a hidden markov model Custom Distribution in PYMC. I am heavily referencing this guide: https://www.pymc.io/projects/docs/en/stable/contributing/implementing_distribution.html
I have had a good amount of success and a lot of my tests seem to work as I expect. However, I am running into an issue where a distribution I put together works when I condition it on some observed data but will fail if I treat it as a prior. I am getting an error message that I can’t seem to decipher. You can see my tests here.
I am looking for some advice on how to decipher the error messages returned by PYMC so that I can figure out where the issue is arising.
Have you seen the markov chain RV implemented in pymc-experimental? It seems quite close to what you’re shooting for. Example with an HMM here.
We’re actively working on automatically marginalizing the hidden states of HMM, so if you’re keen to work on this class of models we have some open issues related to it on the pymc-experimental github page (this one for example)
Thank you @jessegrabowski for the quick response. Yes, I have been wanting to contribute to PYMC, however, I still feel like I don’t really understand what is going on behind the scenes. The current problem I am facing of not being able to decipher the error message is evidence of this. Thank you for the linked examples I will go through and see if I can figure out why my implementation is failing. I would like to implement this myself so that I can learn how things work at a deeper level.
Okay, I was able to figure out what was causing the issue. The default sampler that PYMC was using for the Discrete Markov distribution was the metropolis sampler, but what I needed was the BinaryGibbsMetropolis sampler. After telling it sample the hidden Markov states prior with the BinaryGibbsMetropolis sampler it now works as expected. I am wondering if it is possible to define within either the RandomVariable class or the Distribution class that the default sample should be the BinaryGibbsMetropolis sampler instead of the Metropolis sampler?
EDIT:
It is the CategoricalGibbsMetropolis not BinaryGibbs that I needed. After switching to it everything is working correctly. There is a weird check in PYMC that does not let you use the CategoricalGibbsMetropolis unless the distribution is specifically an instance of a CategoricalRV. I am not sure why this is in place since the distribution itself is inheriting from the Discrete Class?
Thank you, @ricardoV94. I wasn’t sure if I was doing something wrong with how I set up the Custom Distribution.
Thank you, @ricardoV94! I will try it once it is merged.
This is really necessary in order to sample well—discrete sampling of Markov states in HMMs is very inefficient for all the usual reasons that sampling discrete parameters is inefficient. I provide an example in the Stan User’s Guide for change-point models, but the arguments are similar.
I wrote a blog post on Gelman’s blog about a nice reference for the algorithms for HMMs which also contain the matrix derivatives you need: https://statmodeling.stat.columbia.edu/2019/12/16/beautiful-paper-on-hmms-and-derivatives/
There’s also the Wikipedia on marginalization, you can implement directly with autodiff: Forward algorithm - Wikipedia
Or you can go one step further and reduce to sufficient statistics: Forward–backward algorithm - Wikipedia
If you’re OK with a conjugate Dirichlet prior on the rows of the stochastic matrix, the forward-backward algorithm also lets you do an analytic update for the posterior of the transition matrix by exploiting the conjugacy.
3 Likes
By the way, we have already implemented automatic marginalization of both HMM (for now restricted to simple cases of univariate emissions - except Categorical), as well as finite discrete variables, such as in the change-point model: Automatic marginalization of discrete variables — PyMC example gallery
We also automate the plumbing to recover the marginalized variables, but as @jessegrabowski mentioned we haven’t done that yet for HMMs.
Anyway that PR still makes sense, because we don’t forbid sampling of discrete variables, and there’s no reason to have a dumb Metropolis instead of the finitely less dumb Categorical one
I do suggest trying to marginalize it as well