I am trying to fit a hidden Markov model with hierarchical emission to 4 million data points. The emission consists of categorical and Bernoulli distributions. Sub-sampling seems a bad idea because mini-batch and stochastic gradient seem to not work with NUTS (Minibatch and NUTS).
Is it a good idea to run the NUTS implementation in pymc4 on a GPU? The HMC of pymc4 doesn’t have mass matrix adaptation. I worry that if I use HMC, the emissions would all get stuck near 0 and 1.
The NUTS in TFP (backend of pymc4) is still very experimental and there are big changes coming - if you want to use it I would suggest you to wait a bit.
Meanwhile, you can try the HiddenMarkovModel in TFP https://www.tensorflow.org/probability/api_docs/python/tfp/distributions/HiddenMarkovModel
When I look at the hidden Markov model class in tfp, it calculates the logp of the emission conditioned to the hidden state for all observations before feeding the big array of logp over all time to scan. That big array will take way too many memory if I run 100 chains. The observation is 2 GB if I store it as
I can index a random variable by a categorical distribution of pymc3. I can’t do that with tfp’s categorical distribution and keep the overall logp differentiable.
tfp’s mcmc has a form of mass matrix adaption that only seems to work for a logp function with a single argument. But a hidden Markov model has multiple arrays of parameters.