Hi!

I’m trying to do a model comparison in a hierarchical meta model using a dummy variable that encodes model choice. Sampling from the full model doesn’t work well because the chains get me stuck in local optima: every chain “picks” one model and sticks to it, updating only the corresponding parameters correctly.

To work around this, I have tried to sample the posteriors for all parameters first (training each submodel independently), and to then infer the the posterior of the model variable from based on these posteriors. However, I’m not really sure how to do that correctly:

- When using
`sample_posterior_predictive`

, the model variable is sampled from its prior, not its posterior. (I was able to get this to work in numpyro using its`Predictive`

class). - I’ve instead tried to use an alternative version of the meta model, in which the parameter priors are the previously estimated posteriors. That works but I don’t think it’s correct since the observations are the same as before, so I essentially observe the same data twice. I can see that the sampled parameter values are more narrowly distributed than before.
- Using pyro, I was able to get Variational Inference to work (numpyro gives me weird NaNs with the same model). This gives me the same results as the predictive approach in numpyro.

Any ideas, how to do this? (By the way, I’m aware of other forms of model comparison, but this is intended as a demonstration of the principles of Bayesian modeling, not an actual application, so I think showing the meta-model / Bayes-factor approach would be cool.)

Here is the code that I’m using. I’m trying to model the sizes of intervals of consecutive notes in polyphonic music. The interval size is always estimated as a geometric distribution (non-negative integers), but the three competing models use different predictors:

- model 1 assumes a globally constant parameter
- model 2 assumes different parameters for each voice (4 voices, the pieces are string quartets)
- model 3 assumes the parameter to depend on the pitch of the preceding note, or the “register” in musical terms (logistic)

You can see this reflected in the meta model.

(The non-flat prior on the model is just there to show that `sample_posterior_predictive`

indeed samples from the prior.)

```
# given data
observations = [...] # observed step sizes
staff = [...] # the staff/voice of each datapoint
p0 = [...] # the pitch of the first note corresponding to each datapoint
with pm.Model() as model_meta:
# model choice
model_choice = pm.Categorical("model_choice", [0.5, 0.3, 0.2])
# global model
theta_global = pm.Beta("theta_global", 0.5, 0.5)
# voice model
theta_voice = pm.Beta("theta_voice", 0.5, 0.5, shape=4)
# register model
a = pm.Normal("a_register", 0, 10)
b = pm.Normal("b_register", 0, 10)
theta_register = pm.math.sigmoid(p0*a + b)
# observation
theta = ptn.tensor.stack((
ptn.tensor.fill(p0, theta_global),
theta_voice[staff],
theta_register,
))
pm.Geometric("obs", p=theta[model_choice], observed=observations+1)
```

I use the following auxiliary model to obtain posterior samples for the parameters:

```
with pm.Model() as model_joint:
# global model
theta_global = pm.Beta("theta_global", 0.5, 0.5)
pm.Geometric("obs_global", p=theta_global, observed=observations+1)
# voice model
theta_voice = pm.Beta("theta_voice", 0.5, 0.5, shape=4)
pm.Geometric("obs_voice", p=theta_voice[staff], observed=observations+1)
# register model
a = pm.Normal("a_register", 0, 10)
b = pm.Normal("b_register", 0, 10)
theta_register = pm.math.sigmoid(p0*a + b)
pm.Geometric("obs_register", p=theta_register, observed=observations+1)
# draw samples
idata_joint = pm.sample(5_000, chains=4)
```

I then try to infer `model_choice`

like this:

```
with model_meta:
idata_model_choice_meta = pm.sample_posterior_predictive(idata_joint, var_names=["model_choice"])
az.plot_posterior(idata_model_choice_meta.posterior_predictive);
```

Which gives me samples from the prior (0.5, 0.3, 0.2)

Any advice is welcome.