Bug in fast sample posterior predictive?

bridgeland · March 4, 2021, 10:40pm

I encountered a bug in pm.fast_sample_posterior_predictive(). Or maybe it is intended behavior that I don’t understand.

Consider this simple model with two normals and three deterministics:

import pymc3 as pm
import numpy as np
import xarray as xr
import arviz as az

with pm.Model() as model:
    n1 = pm.Normal('n1', 0, 1)
    d1 = pm.Deterministic('d1', n1)
    
    n2 = pm.Normal('n2', 0, 1)
    d2 = pm.Deterministic('d2', n2)
    d3 = pm.Deterministic('d3', d2)

After sampling, all five variables have posterior distributions whose means are close to 0.0, and standard deviations are close to 1.0:

def mean_and_std(result):
    vals = result.values
    return np.mean(vals), np.std(vals)

Now a new trace incr_trace is created by copying the existing trace and then incrementing all samples of n1 by 1.0, and also incrementing all samples of d2 by 1.0.

def increment_both_distrs(posterior_ds):
    new_ds = posterior_ds.copy(deep=False)
    increment_distr(new_ds, 'n1')
    increment_distr(new_ds, 'd2')
    return new_ds

def increment_distr(dataset, var_name):
    dataset[var_name] = xr.apply_ufunc(
        lambda da: da+1, dataset[var_name])

incr_trace = trace.map(increment_both_distrs, groups='posterior')

As expected, the samples for n1 in incr_trace now have a mean of about 1.0, as do the samples for d2.

So far so good. Now pm.fast_sample_posterior_predictive() is run on the new trace, and d1 and d3 are sampled.

with model:
    pp_trace = az.from_pymc3(
        posterior_predictive=pm.fast_sample_posterior_predictive(
            trace=incr_trace, var_names=['d1', 'd3']))

The d1 samples in the new posterior_predictive trace pp_trace have a mean of about 1.0, as expected. But the d3 samples in pp_trace have a mean of about 0.0, rather than 1.0.

It’s as if pm.fast_sample_posterior_predictive() ignores the incremented values for d2 in incr_trace and uses the original non-incremented values. But how does it even know about the old, non-incremented values?

Is this a bug? Or perhaps is this a misunderstanding on my part?

Note that this bug—if it is in fact a bug—is not unique to pm.fast_sample_posterior_predictive(). The old, slow pm.sample_posterior_predictive() exhibits the same behavior.

PyMC3 version 3.9.3

ricardoV94 · March 5, 2021, 6:12am

Pymc does not / cannot condition on deterministics, so when you change d2 that information cannot trickle around to d3 (nor to n2 for that matter). On the other hand changing n1 does affect d1 as expected in your example.

In addition, if I understand it correctly, the goal of ppc is to resample observed or new variables given as input the parents points in a trace (e.g, the likelihood, new input x or new/missing groups). You don’t usually resample unobserved variables as you seem to be trying in your example. That’s because the posterior for unobserved variables may not look like any known distribution and therefore we cannot just use random number generation routines (you would need mcmc sampling again).

Finally, just to make sure, the deterministics in your model are not random samples from n1 and n2, they are just the values that n1 and n2 took at each step in the sampling. In other words d2 and d3 and n2 are exactly the same as each other, and n1 and d1 are the same as each other. None of these variables is supposed to change during ppc (and they don’t, as you can see from them having the exact same mean and std, except for the offset of 1 in those that you changed).

bridgeland · March 5, 2021, 11:50am

Finally, just to make sure, the deterministics in your model are not random samples from n1 and n2, they are just the values that n1 and n2 took at each step in the sampling. In other words d2 and d3 and n2 are exactly the same as each other, and n1 and d1 are the same as each other. None of these variables is supposed to change during ppc (and they don’t, as you can see from them having the exact same mean and std, except for the offset of 1 in those that you changed).

Correct. I created this little model to isolate this weird behavior. As you point out, there are only two RVs in this model: n1 and n2. Prior to the increment, n1 and d1 have exactly the same distributions, and n2, d2, and d3 have exactly the same distributions. The trace is then modified to increment the distributions of n1 and d2. I expected the posterior predictive sample to propagate those changes to the downstream deterministics d1 and d3. It did propagate the change to d1, but somehow not to d3. And my question is why? And is this a bug or the intended behavior?

bridgeland · March 5, 2021, 11:57am

In addition, if I understand it correctly, the goal of ppc is to resample observed or new variables given as input the parents points in a trace (e.g, the likelihood, new input x or new/missing groups). You don’t usually resample unobserved variables as you seem to be trying in your example. That’s because the posterior for unobserved variables may not look like any known distribution and therefore we cannot just use random number generation routines (you would need mcmc sampling again).

Yes, my little example is a bit pathological. I distilled it from a much larger model that is more sensible, but that also exhibits this strange behavior.

Note that in my larger model, I am using the super-useful (and under-appreciated) model factory idiom described by @lucianopaz here and here.

But perhaps I am misunderstanding your point?

bridgeland · March 5, 2021, 12:00pm

Pymc does not / cannot condition on deterministics, so when you change d2 that information cannot trickle around to d3 (nor to n2 for that matter). On the other hand changing n1 does affect d1 as expected in your example.

I am not sure what you mean. My understanding of posterior predictive sampling is that it is conceptually very simple: it just resamples the named variables given the values of all the other variables in the trace. Am I misunderstanding posterior predictive sampling?

ricardoV94 · March 5, 2021, 12:29pm

Let me see if I can be more helpful:

First, On the deterministic / random variable distinction. What I meant to say is that there is no (statistical) link from a deterministic child to its parent rv. That’s why changing d2 did not change neither n2 nor d3. This is to say that in the following model:

with pm.Model() as m:
  x = pm.normal('x')
  y1 = pm.Deterministic('y1', x)
  y2 = pm.Deterministic('y1', x+100)

Adding 100 to y2 will not do anything to either x or y1 (in either prior_predictive, normal sampling or posterior_predictive). I am emphasizing this, because I think it is a separate issue from the ppc question.

Second,

PPC resamples the variables that are missing in the provided trace, conditioned on the values of the parent variables or on the prior if it’s a top level rv. Under normal circumstances it only looks for observed variables as that is it’s main intended behavior, and resamples them given the trace points and the likelihood distribution.

Now, when you specify you want variables other than observed ones it behaves a bit funky. If these exist in the trace, the values are simply copied. If they don’t exist, then it samples them, conditioning on the trace values or the prior (if they are a top level rv):

 with pm.Model() as m: 
    x1 = pm.Normal('x1', 0, 1) 
    x2 = pm.Normal('x2', x1 + 50, 1) 

with m: 
    trace = pm.sample() 

with m: 
    ppc = pm.sample_posterior_predictive(trace)
ppc  # {} empty dictionary

with m: 
     ppc = pm.sample_posterior_predictive(trace, var_names=['x2'])
(ppc['x2'] == trace['x2']).mean()  # 1.0  same values as in the trace

# Now we add a new variable
with m: 
     x3 = pm.Normal('x3', x1 - 50, 1) 

with m: 
     ppc = pm.sample_posterior_predictive(trace, var_names=['x3'])                                                             
ppc['x3'].mean()  # -50.0173504935316

# And run it again
with m: 
     ppc = pm.sample_posterior_predictive(trace, var_names=['x3']) 
ppc['x3'].mean()  # -50.00807725408419  The values change

ricardoV94 · March 5, 2021, 12:36pm

As I understand it, PPC is helpful in 3 situations:

Sample from the observed distributions to see whether the model conclusions actually resemble your data. To do this, a dataset as large as the original one is sampled using the same likelihood distribution and conditioning on the old inputs and the posterior parameters that were just estimated. This is the standard behavior of the pymc method. In the simplest linear model with gaussian likelihood, it’s in its essence a shortcut to:

ppc_ys = scipy.stats.norm(trace['intercept'] + old_xs * trace['slope'], trace['sigma']).rvs()

Sample from the observed distribution, to see what predictions would be made given new inputs, but keeping the same posterior values unchanged (which may be unrealistic). This is a spin on the case above, where you change your input data, say the xs in a linear model, to see how the ys would look like given the other posterior parameters in the trace. It’s a shortcut to:

new_ys = scipy.stats.norm(trace['intercept'] + new_xs * trace['slope'], trace['sigma']).rvs()

Sample a complete new unobserved distribution that depends on posterior parameters. Say you want to see what a new group in a hierarchical model would look like. This is similar to the example above. (I didn’t know pymc3 could do this until very recently!) In a simple normal hierarchical model, this is a shortcut for:

new_group = scipy.stats.norm(trace['group_mu'], trace['group_sigma']).rvs()

When is it not helpful (because it simply can’t, that’s why mcmc was used in the first place):

See how the posterior of unobserved variables would change if you changed your data (including adding a new observation group in a hierarchical model)
See how the posterior of unobserved variables would change if you changed the prior or posterior of other parameters (as in your example where you changed the mean of a group in a hierarchical model).

bridgeland · March 10, 2021, 3:45pm

That’s a nice ontology of PPC usages, Ricardo. You are correct. I want to do #5.

In another post on a similar issue, @OriolAbril suggested the technique of modifying values in an arviz.InferenceData trace, and then running pm.sample_posterior_predictive() to generate new distributions for unobserved variables.

That suggestion was made on a somewhat different problem, as I wanted to model intervention on RVs. In this case, I am trying to model intervention on deterministics.

So there are three possible interpretation of the behavior of the simple model in the original post, above:

It is a bug. pm.fast_sample_posterior() should be able to work on modified samples of a deterministic.
It’s not a bug. pm.fast_sample_posterior() can handle modified samples of RVs, but not deterministics.
It’s off-label usage of pm.fast_sample_posterior(), and should not be relied on. It happens to work for RVs today, but that is more or less an accident. I should find a different technique to model interventions.

I think you are claiming #3.

ricardoV94 · March 10, 2021, 5:14pm

It’s definitely #3. The posterior predictive has no ability to do “new inferences”.

The only thing it can do is to sample new values that follow a standard closed form distribution given the parameter values in the trace.

What do you mean by:

lucianopaz · March 14, 2021, 5:51pm

Sorry for showing up late to the thread! This behavior was built-in to solve this issue. We basically made sample_posterior_predictive ignore all the deterministics that are stored in the trace. That means that you cannot apply changes to deterministics in the trace and see the intervention results in the posterior predictive.
I’m not up to date with the latest developments in v4, but i think that having the forward sampling graph in aesara will make it easier to do these sorts of interventions on any kind of node.

Topic		Replies	Views
Deterministic and observed RV behaviour when using sample_posterior_predictive Questions	5	1085	January 24, 2019
Deterministic posterior predictive?	5	544	November 14, 2023
Adding a Deterministic variable after sampling v5	6	593	November 4, 2023
New bug in fast sample posterior predictive Questions	3	532	July 11, 2021
Posterior predictive distributions of parameter values differs a lot from posterior?	18	556	November 23, 2023

Bug in fast sample posterior predictive?

Related topics