Greetings, comunity. I’m a new parctitioner for pymc and PP/DS as well in general.
I have once employed a bayesian linear regression to my data. After a long time of inference (~40 mins), my posterior plot appeared to be multimodal (2 peaks, to be exact). I was able to deal with it by feature engineering, but still the problem bugged my head.
so I came up with a toy dataset in hope I could re-create the 2-peaks in the posterior. here’s what my dataset looks like
and i fit it to a linear model. however to my surprise, the posterior of the slope as well as the intercept all converged to the mean of the two lines, instead of the two peaks I hoped.
how should I interpret my result? and how should I view my one-time observed 2-peaked posterior?
The result you observe here is what I would expect. The mean of the two lines explains the data better than the mean of either line (assuming something like a Normal likelihood which “penalizes squared errors”). The multimodality you were seeing in your original model could have come from:
- Two chains that did not converge (each one found a separate mode)
- Two parameters that were redundant in that you could explain your data by increasing one and reducing the other or vice versa (although that usually leads to flat posteriors, not bimodal ones)
- Something else (really hard to say without knowing the data/model where you found that issue originally)
+1 to @ricardoV94, it is difficult to construct a simple example with multimodal posterior. The posterior usually easily overwhelm by the likelihood (given that it is a simple model), on top of that if your likelihood is not multimodal (which the case when you are using Normal) you wont see multimodal in the posterior.
Not using Gaussian, with few data point usually makes it a bit easier to construct multimodal posterior:
b = pm.StudentT("b", nu=4., mu=-5., sigma=1.)
obs = pm.StudentT("observed", nu=4., mu=b, sigma=1., observed=5.)
idata = pm.sample()
If I understand @junpenglao correctly, the way you used to provide the multimodal posterior was to provide biased prior and a very different ‘observed’ data to… confuse the inference?
With that in mind, I’ve used almost exclusively all Gaussian priors in my multidimensional linear model. And as @ricardoV94 suspected, at the end of my (very long) sampling there were some message shouting at me about the sampling was not complete. Upon closer inspection most of the posterior was also flat across multiple magnitudes, which is (correct me if I were wrong) signaling that data do not provide enough evidence for the inference process. Removing some of the redundant features resulted in 10x faster sampling as well as tighter posteriors.
Many thanks for the replies!