Trouble Using SMC sampler for Marginal Likelihood/Bayes Factor Computation

Hi,

I’m testing the Sequential Monte Carlo sampler given the example in the docs here

I tried doing this with a simple linear regression model like this:

#generate data
np.random.seed(99)
x=np.random.choice(np.arange(1, 15, 1),20 , replace=True)
x=np.array(sorted(x))
y=(3*x)+np.random.normal(loc=0.0, scale=2.0, size=len(x))

test_folder=mkdtemp(‘Test’)
nchains=1000

true model for the data

true_model=pm.Model()
with true_model:

true_int=pm.Normal('true_int', mu=0, sd=10)
sig_true_int=pm.HalfNormal('sig_true_int', sd=2)

true_b1=pm.Normal('true_b1', mu=0, sd=10)
sig_b_grp=pm.HalfNormal('sig_true_b1', sd=2)

#specify the simple linear model
true_mu=true_int+(true_b1*x)

true_eps = pm.HalfCauchy('eps', 5)

Y_obs = pm.Normal('Y_obs', mu=true_mu, sd=true_eps, observed=y)
true_trace = smc.sample_smc(samples=2000,
                           n_chains=nchains,
                           progressbar=True,
                           homepath=test_folder,
                           stage=0,
                           random_seed=42)

#null model
null_folder=mkdtemp(‘Test2’)
#null model for contrast
null_model=pm.Model()

#add in conrast factors for each level of the factors
with null_model:

null_int=pm.Normal('null_int', mu=np.mean(y), sd=10)
sig_null_int=pm.HalfNormal('sig_null_int', sd=5)

#specify the simple linear model
null_mu=null_int

null_eps=pm.HalfCauchy('eps', 5)

Y_obs = pm.Normal('Y_obs', mu=null_mu, sd=null_eps, observed=y)

null_trace = smc.sample_smc(samples=2000,
                           n_chains=nchains,
                           progressbar=True,
                           homepath=null_folder,
                           stage=0,
                           random_seed=42)

Then I compute the Bayes Factor just like in the documentation:

BF=true_model.marginal_likelihood/null_model.marginal_likelihood

However, this code gives me the wrong answer, in fact it returns strong evidence for the null. Can someone please help find out whats wrong with this code, or suggest another way to get the Bayes Factor for each model?

Attached is a copy of the script.regression_example.py (1.9 KB)

Thanks!

I suspected that the marginal likelihood computed from the SMC is not correct (see https://junpenglao.xyz/Blogs/posts/2017-11-22-Marginal_likelihood_in_PyMC3.html#Computing-marginal-likelihood-using-SMC for another example). However, I have not yet have time to diagnose and validate the implementation.
cc @aloctavodia. Maybe we should take down the doc before we can make sure the implementation is correct?

Hi @pwitkow sorry for the inconveniences. Even when I was able to reproduce values of BFs for some models it seems that there is a problem with the current marginal likelihood computations implemented in SMC. BTW, notice that there are some variables in your models like sig_true_int that are not really part of the model as they are not contented to the data or to other priors, you should remove them to compute Bayes Factors.

@junpenglao notice that the marginal likelihoods computed by SMC are only proportional to the correct marginal likelihood (so they only makes sense as BFs), anyway you are totally right and we should remove BFs computed using SMC from the docs until we are sure the implementation is correct. I discussed this problem with @agustinaarroyuelo and we are going to tackle it during GSoC.

1 Like

Hi, thanks for such a quick response! @aloctavodia Thanks for pointing that out about the model too. Its a hold over from when I was playing around with multilevel models.

1 Like