Can you marginalize a mixture model where the draws from the different components are not independent?

The transition kernel would have to depend on the fraction of previous observations that are in the 1 state, which introduces a long-range dependence.

Iā€™m presenting some work based on this model to an audience with lots of Stan users next week, so it would be nice to know whether this model would be possible to run in Stan, where IIUC itā€™s impossible to sample from discrete variables? But it seems like it would be very difficult then.

Yes, for the constraint to hold true, there must necessarily be global correlations of some sort. If the first 50% trials are all 1, the last 50% must be 0. Of course it shouldnā€™t matter which subset you choose. My idea was just to provide a hacky approximation to these global dependencies by introducing a soft local kernel, which if implemented over enough n_lags may not be too terrible.

Here is the idea with a single lag, which MarginalModel could marginalize automatically for you (but not more lags unfortunately):

import pymc as pm
import pymc_experimental as pmx

with pm.Model() as m:
    idx = pmx.distributions.DiscreteMarkovChain("idx", P=[[0.1, 0.9], [0.9, 0.1]], shape=(10,))
    emission = pm.Normal("emission", pm.math.where(idx, 1, -1), sigma=0.1, shape=(10,))
    iid_emission = pm.NormalMixture("iid_emission", w=[0.5, 0.5], mu=[1, -1], sigma=0.1, shape=(10,))
    
print((pm.draw(emission, draws=100) > 0).mean(-1).std(), (pm.draw(iid_emission, draws=100) > 0).mean(-1).std())
# 0.063 0.153
1 Like

Yes without sampling discrete variables it seems computationally very challenging, although Iā€™m sure someone will find a math trick of some sort. Itā€™s a very interesting problem btw!

1 Like

:medal_sports: to PyMC for allowing me to sample from the model without math tricks :smiley:

I had one final question: One downside of the potential trick is that I canā€™t sample new draws from this distribution directly, as sample_posterior_predictive(trace, var_names=["group_assignment", "y"]) ignores the potential. Iā€™ve solved this in a hacky way by setting p to 0 and 1 to sample from the pure distribution, and then combining such samples manually. But it would of course be nice to be able to sample from the model directly. Is it possible to rewrite it in some way to enable this? My understanding is that itā€™s impossible as the potential is by necessity downstream of the group assignment. But am I missing something clever?

The Potential is just a blind warning. In your case you are sampling conditioned on the posterior indicator variables, so the Potential doesnā€™t matter anymore right? It would only matter if you were interested in resampling the indicator variables (or in prior_predictive).

Right, but I do want to resample the group_assignment variable.

Right that you canā€™t because you havenā€™t defined a pure random generating model. You can do some rejection based sampling of the indicator variables, just have to be careful to match the Potential penalty term if you want to be consistent.

1 Like