I want to try SMC on a my model that is know to have a bimodal posterior with the given dataset. However I’m having troubles adapting the example here to use an already written generative model. Is it even possible or do I have to write the likelihood function from scratch?
Furthermore, I don’t quite understand the role of X - the Uniform in the example. Is this the prior?
Maybe there is another example/tutorial to look at?
Bad mixing. It uses metropolis under the hood. As far as I understood from the code the covariance matrix is estimated from the whole population at every temperature and a global Gaussian proposal is made with that matrix. Given this and the fact that I have two well separated modes bad mixing is not surprising. On a different topic I’ve managed to get DEMetropolis going and it just stands still, nothing being accepted. Oh-well.
For this model and dataset the best results I’ve had are from emcee (with their parallel tempering implementation), but it takes a week of sampling to get there and it also mixes poorly.
It is sort of obvious that with multimodal posteriors anything that uses a global proposal is doomed to fail whatever tempering regime is chosen. I wish there was some sort of ensemble NUTS implementation that would consider moving a particle to the exact location of another particle based on their odds ratio once in while.
Apparently there’s still a long way to the inference button.
And it is not possible to use NUTS? What kind of model are you trying to sample?
If you run NUTS with lots of chains and different starting point, it can kind of being an ensemble method (technically, it is not completely correct: if the mode is well separated, the weighting will not be right as each chain just stuck at one mode).
It is possible. NUTS works perfectly when the data provides for a unimodal posterior, but I have one particular important (real clinical data) dataset that happens to give two modes with approximately the same mass. What happens is that each chain gets poorly initialized and the sampling is inefficient even after relatively long tuning, I suppose this is because in this multimodal situation ADVI estimates something that is neither the first nor the second mode.
Just to clarify: This is for an academic article. It is not a major issue, I already have the answer from emcee, but in the article that I’m writing it is sort of stupid to write that I used one sampler for this and another for that, so I want to find something robust and fast that can be cited. This is the “proverbial” 90% of the work that are going to bring about 10% of the results.
I can show you my model and you will see why NUTS is struggling (I’m really abusing it), but you have to confirm that you have spare time and interest. Because explaining what is happening in the model will take quite some of my time which is scarce ATM.
As for the question in the topic I’m glad that I tried SMC and DEMetropolis , thanks to you (!), and (sort of) understood that they are a dead-end.
Sounds like you already have answer, so I will just comment on the following point:
In this case, try the new initialization jitter+adapt_diag, if you know where the multimode is, you can supply good starting values (i.e., one mode for one chain), and NUTS should explore each mode in sperate chain quite well.