Deterministic posterior predictive?

Hi, this is another posterior predictive sampling question.

I am pretty sure that some time ago,

import pymc as pm
with pm.Model() as model:
   a = pm.Normal('a', 0, 1)
   trace = pm.sample()
   trace_post = pm.sample_posterior_predictive(trace, var_names=['a'])

print("posterior", trace.posterior.a.sel(draw=22, chain=1).values) 
print("resampled in posterior predictive", trace_post.posterior_predictive.a.sel(draw=22, chain=1).values)

would have resulted in the exact same trace for a in trace_post (deterministic resampling). But it seems that in version 5 (I’m not sure when it started, I tried in 5.6 and 5.8.2), it is resampled in a way similar to sample_prior_predictive. Is there any way to have the old behaviour back? I mean, use anything in the trace in a deterministic manner, instead of actually resampling.

The use case is a rather large model with a limited number of random variables but a very large number of diagnostic variables, which I calculate on-demand with the sample_posterior_predictive function. I want to avoid doing any resampling, and simply use the deterministic machinery of pymc to get what I need. E.g.

 def make_model(diag_b=False, ...):
     a = pm.Normal('a', 0, 1)
     if diag_b:
         pm.Deterministic('b', 3*a)
     ...

Disclaimer: I did not go through all posts and issues on that question, but I hope the example is simple enough, and suprising enough given past behaviour, to be considered nevertheless. Hi @ricardoV94 :slight_smile: (issue 5973).

Thanks.

We considered that behavior useless, it would just be copying a value from the posterior to the posterior_predictive group. You can do that directly with the inference data

idata.posterior_predictive["a"] = idata.posterior["a"]

You can still compute deterministics just fine, pass them to var_names (but not the RVs on which they depend).

import pymc as pm
import numpy as np

with pm.Model() as m:
  x = pm.Normal("x")
  det = pm.Deterministic("det", x + 1)
  idata = pm.sample()
  pp = pm.sample_posterior_predictive(idata, var_names=["det"])

np.testing.assert_allclose(idata.posterior["x"] + 1, pp.posterior_predictive["det"])
1 Like

Hi @ricardoV94 ,
thanks for the lightning fast reply.
Ok, good to know. It’s a little disturbing that the specification of var_names has so much impact on the behavior.
I tried your example with setting var_names=["det", "x"] and the assert fails (x is then resampled).

Not just “useless”, but a copy of x in posterior predictive wouldn’t really be the posterior predictive distribution of x I guess. Right now posterior_predictive is trying to do way too much I think. We should have a function with another name for computing expressions based on variables in a trace.

1 Like

var_names specifies which variables to sample. If you sample an RV you will get new draws from the posterior predictive (which for root RVs is the prior).

1 Like

Right now posterior_predictive is trying to do way too much I think. We should have a function with another name for computing expressions based on variables in a trace.

Yes, I can agree with that. There are cases where one wants to know the proper posterior predictive sampling to be compared with obs, and cases where one simply need to calculate stuff with the model.

About var_names and posterior_predictive_sampling, it’s enough to know it. I went back and figured that what I called “breaking” behaviour already occured in v4, and I had to come back to v3 to see the deterministic behaviour (everything seen in the trace is not resampled). I certainly see powerful uses of it, and at times I needed such behaviour but misunderstood the use of posterior_predictive_sampling. Anyway, thanks for clarifying once more.