Combine multiple chains

When the model has very multimodal posterior and a single mcmc run is able to sample from a single mode only, sampling from different modes still can be achieved using multiple chains with random initializations (e.g. the default jitter seems to work well for my cases). However, after sampling in this way I want to combine all the chains so as to get posterior close to the real one - and of course just concatenating all the samples together doesn’t do the right thing. E.g. if the region around a certain mode is twice more likely (has higher logp) than the other mode, it has to be represented with twice as many samples. If it’s 10^5 times more likely - there have to be no samples from the other mode. And so on - I hope you get the idea.

So my question is - how to combine the chains together correctly in this sense? Did anyone do this?

I get what you are trying to do, but I dont think this is a good solution to weight the trace posthoc. You might want to instead have a look at SMC sampler.

1 Like

It doesn’t seem to act as a drop-in replacement for pm.sample() (yet?) - smc gives me the error TypeError: 'NoneType' object is not callable while normal NUTS sampling goes well. And as I understand reading that page, SMC requires more time than several simple chains, is that right?

And what’s wrong with a weighted combination of chains, assuming all posterior modes were sampled in at least one of chains? Weights may be something like the exp(logp) averaged across all chain samples.

Yes, you are right - you need to set transform=None in the declaration of RVs in the model.

Yep, it takes a while.

The problem is that you dont know the weight of each mode, unless you have a theoretical analysis: as the information of the weight could not be known from the trace.
Imagine the posterior is a weighted normal of .75 * Normal(-5, 1) + .25 * Normal(5, 1), and you run two traces that each explore the mode. If you want to weight the traces and combine the two into the true posterior properly, you need to have some way to know the weight (.75, .25). I guess if you can have that information, you can randomly sample from each trace by doing:

with pm.Model():
    trace = pm.sample(1000)

x = trace.get_values('mu', combine=False)
xpost = np.array([x[0][:750], x[1][:250]])

But as this information [.75, .25] is difficult to get, I would suggest not to do it this way.

Yes, you are right - you need to set transform=None in the declaration of RVs in the model.

But I guess for my model this will worsen the sampling significantly, as it has both variables restricted to > 0, and gaussian processes which are automatically represented by f_rotated_ transformed variables for efficiency. Will try it anyway.

The problem is that you dont know the weight of each mode, unless you have a theoretical analysis: as the information of the weight could not be known from the trace.

Yes, from the trace samples themselves it’s impossible to get. But we can compute model.logp for all trace samples and average them somehow to get a single value per chain (not sure how exactly for best results, but it seems to be possible).

That’s conceptually the procedure in SMC actually :wink:
It adds a logp into the model

and then there is a weighting process on the model logp down the line somewhere.

I would like to try SMC for my problem also, but currently it gives the following error (regular sampling works fine using both NUTS and metropolis; transformed = None set for all variables which gave corresponding errors without it):

---> 36     trace = pm.smc.sample_smc(20, homepath='tmp_smc', step=pm.SMC())
     37 #     trace = pm.sample(step=pm.Metropolis())

~/anaconda3/lib/python3.6/site-packages/pymc3/step_methods/ in sample_smc(n_steps, n_chains, step, start, homepath, stage, n_jobs, tune_interval, tune, progressbar, model, random_seed, rm_flag)
    568             else:
    569                 step.covariance = step.calc_covariance()
--> 570                 step.proposal_dist = choose_proposal(step.proposal_name, scale=step.covariance)
    571                 step.resampling_indexes = step.resample()
    572                 step.chain_previous_lpoint = step.get_chain_previous_lpoint(mtrace)

~/anaconda3/lib/python3.6/site-packages/pymc3/step_methods/ in choose_proposal(proposal_name, scale)
     53     class:`pymc3.Proposal` Object
     54     """
---> 55     return proposal_dists[proposal_name](scale)

~/anaconda3/lib/python3.6/site-packages/pymc3/step_methods/ in __init__(self, s)
     48             raise ValueError("Covariance matrix is not symmetric.")
     49         self.n = n
---> 50         self.chol = scipy.linalg.cholesky(s, lower=True)
     52     def __call__(self, num_draws=None):

~/anaconda3/lib/python3.6/site-packages/scipy/linalg/ in cholesky(a, lower, overwrite_a, check_finite)
     79     """
     80     c, lower = _cholesky(a, lower=lower, overwrite_a=overwrite_a, clean=True,
---> 81                             check_finite=check_finite)
     82     return c

~/anaconda3/lib/python3.6/site-packages/scipy/linalg/ in _cholesky(a, lower, overwrite_a, clean, check_finite)
     28     c, info = potrf(a1, lower=lower, overwrite_a=overwrite_a, clean=clean)
     29     if info > 0:
---> 30         raise LinAlgError("%d-th leading minor not positive definite" % info)
     31     if info < 0:
     32         raise ValueError('illegal value in %d-th argument of internal potrf'

LinAlgError: 100-th leading minor not positive definite

Seems like the covariance matrix for your proposal distribution is not positive definite.
How many chains do you use for sampling?
We could add a check to that and make the proposal positive definite in case it is not.
However, this is usually the result of having not enough chains for sampling…I would suggest depending on the complexity of your solutionspace/model to use at least 1000 chains. If the model is simple few hundreds might be enough.

1 Like

And back to the original question - combining several chains with correct weights. I could think of the following way: create separate traces, one for each chain, and run on them. Then combine chains using the weights from compare. Does it make sense at all?

1 Like

Yes, that’s the right thing to do under the context of model averaging and using multiple models for prediction.

1 Like