Sample posterior predictive with a vengeance

My model gives some nice results. Now I would like to investigate what would happen if the value of some RVs were to be changed, e.g. by an outside intervention. In particular I would like a function that is much like sample_posterior_predictive(), but has an additional parameter, something like this:

    trace, var_names=['results'], 
    intervention={'mu': lambda x: x+1, 'sigma': lambda x: x * 0.5}

meaning run the posterior intervention, but instead of using the value of mu in the trace, use mu + 1, and instead of using the value of sigma, use sigma * 0.5.

I appreciate that pymc3.sampling has nothing like my imagined sample_posterior_intervention() already. How could I implement this behavior using the existing pymc3.sampling functions?

Or is this a nonsensical thing to do?

What you can do is use shared variables in Theano to run inference with certain values, and then adjust them after inference, and then run posterior predictive simulations.

There is an example in the docs linked below. Let us know if you have questions!

1 Like

I understand that you want to modify posterior samples for some of the variables in the posterior. In that case you can edit the posterior xarray Dataset of your result (given it’s stored as inferencedata) and then pass it to sample_posterior_predictive

You can also choose to also combine this with the shared objects and pm.Data to modify predictors instead of posterior samples, but doing only one of the 2 is also possible.


Or is this a nonsensical thing to do?

You most certainly are, but be aware that the resulting “posterior predictive” samples don’t have any clear statistical meaning - after all, they are not drawn from the posterior predictive distribution.


Can a variable be defined as both a shared variable and an RV that is fit to the data in the original sample? I don’t see any examples of this in the documentation.

Maybe a shared variable can be defined for the intervention, with the original RV as a term in a newly created deterministic. So instead of this

with pm.Model() as model:
    mu = pm.Normal('mu', 0, 10)

an intervention-ready mu is defined like this:

with pm.Model() as model:
    mu = pm.Deterministic('mu', mu_original + mu_intervention)
    mu_original = pm.Normal('mu_original', 0, 10)
    mu_intervention = theano.shared(0.0)

Then after the original sampling, the intervention is run and the posterior is re-sampled, like this:

with model:
    intervention_trace = sample_posterior_predictive(
        trace, var_names=['results'])

Your understanding is correct @OriolAbril.

I think you are suggesting I change the MultiTrace object that is provided by sample(). (If not, please correct my misunderstanding.)

How can the MultiTrace object be changed? trace['mu'] provides an numpy array of the values of mu, one element in the array for each draw. But changing those values seems to have no effect on the original trace. It appears that trace['mu'] creates a new copy of the array.


You have to use arviz.InferenceData instead of pm.MultiTrace. I’d also recommend trying to switch from MultiTrace to InferenceData as it should be more robust and flexible. PyMC3 is progressively working towards integration with InferenceData, you can now use return_inferencedata=True in pm.sample (which will become the default in the long run) and pm.sample_posterior_predictive already accepts both xarray dataset and InferenceData in addition to multitrace.

Here are some references that could be useful:


For someone reading this in the future, here’s the resulting code. (I have not actually run this. My actual model is more complex than this simple example. So there may be a bug in my simplified example here. Caveat emptor.)

import xarray as xr

def change_mu_and_sigma_in_posterior(posterior_ds):
    """Create new posterior dataset with mu and sigma altered"""
    new_post_ds = posterior_ds.copy(deep=False)
    new_post_ds['mu'] = xr.apply_ufunc(
        lambda mu_da: mu_da + 1, 
    new_post_ds['sigma'] = xr.apply_ufunc(
        lambda sigma_da: sigma_da  * 0.5, 
    return new_post_ds

with model:
    trace_with_intervention =
    posterior_post_intervention_results = pm.sample_posterior_predictive(
1 Like

There is a possible third approach, in addition to the two suggested by @RavinKumar and @OriolAbril. In a post about applying an existing model to new data, @lucianopaz suggested creating a new model with a model factory. I think his model factory approach could also work for solving this problem: create a new model using the existing data and fit to newly altered posterior samples of the old model.

Note: I haven’t tried this approach.