Doubly peaked posterior?

dchu1991 · January 20, 2022, 9:14am

Greetings, comunity. I’m a new parctitioner for pymc and PP/DS as well in general.

I have once employed a bayesian linear regression to my data. After a long time of inference (~40 mins), my posterior plot appeared to be multimodal (2 peaks, to be exact). I was able to deal with it by feature engineering, but still the problem bugged my head.

so I came up with a toy dataset in hope I could re-create the 2-peaks in the posterior. here’s what my dataset looks like

and i fit it to a linear model. however to my surprise, the posterior of the slope as well as the intercept all converged to the mean of the two lines, instead of the two peaks I hoped.

how should I interpret my result? and how should I view my one-time observed 2-peaked posterior?

thanks!

ricardoV94 · January 20, 2022, 9:21am

The result you observe here is what I would expect. The mean of the two lines explains the data better than the mean of either line (assuming something like a Normal likelihood which “penalizes squared errors”). The multimodality you were seeing in your original model could have come from:

Two chains that did not converge (each one found a separate mode)
Two parameters that were redundant in that you could explain your data by increasing one and reducing the other or vice versa (although that usually leads to flat posteriors, not bimodal ones)
Something else (really hard to say without knowing the data/model where you found that issue originally)

junpenglao · January 20, 2022, 1:51pm

+1 to @ricardoV94, it is difficult to construct a simple example with multimodal posterior. The posterior usually easily overwhelm by the likelihood (given that it is a simple model), on top of that if your likelihood is not multimodal (which the case when you are using Normal) you wont see multimodal in the posterior.

Not using Gaussian, with few data point usually makes it a bit easier to construct multimodal posterior:

with pm.Model():
    b = pm.StudentT("b", nu=4., mu=-5., sigma=1.)
    obs = pm.StudentT("observed", nu=4., mu=b, sigma=1., observed=5.)
    idata = pm.sample()

az.plot_trace(idata);

dchu1991 · January 21, 2022, 1:41am

If I understand @junpenglao correctly, the way you used to provide the multimodal posterior was to provide biased prior and a very different ‘observed’ data to… confuse the inference?

With that in mind, I’ve used almost exclusively all Gaussian priors in my multidimensional linear model. And as @ricardoV94 suspected, at the end of my (very long) sampling there were some message shouting at me about the sampling was not complete. Upon closer inspection most of the posterior was also flat across multiple magnitudes, which is (correct me if I were wrong) signaling that data do not provide enough evidence for the inference process. Removing some of the redundant features resulted in 10x faster sampling as well as tighter posteriors.

Many thanks for the replies!

Topic		Replies	Views
Linear combination of two Gaussian functions: multimodal posterior Questions	1	1250	June 13, 2020
General advice on modifying priors to avoid multiple modes/peaks for parameters? Questions	4	1064	May 29, 2019
Hierarchical model with mixture prior Questions	1	1799	January 12, 2019
Multiple regression caveats Questions	6	851	July 23, 2018
Multiple observations sharing priors and likelihood model Questions	3	2241	March 13, 2018

Doubly peaked posterior?

Related topics