Sample_ppc fails on traces containing a subset of variables

Github issue #2561

Trying to use sample_ppc using a trace that does not contain all of the model’s RVs fails with “theano.gof.fg.MissingInputError: Undeclared input”.

Basically, I want to sample the posterior predictive distribution for a new dataset using a hierarchical model that I have sampled from on training dataset. So the top-level and group-level variables are present in the trace, but the latent variables for the individual units are not.

I am open to the argument that this is a misuse of sample_ppc, since this isn’t really a check so much as an attempt at inference. The functionality for doing this is essentially the same as in sample_ppc though; just drawing random values in depth-first order seems to fix the problem.

If this seems likely to be of sufficiently general interest, I’d be happy to send a pull request for the fix that I have in mind.

Failing example:

with pm.Model() as model:
    a = pm.Gamma('a', mu=10.0, sd=2.0)
    b = pm.Gamma('b', mu=a, sd=2.0)

    trace = pm.sample(trace=[model.a, model.a_log__])
    assert len(trace.varnames) == 2

    c = pm.Gamma('c', mu=b, sd=1.0)
    d = pm.Normal('d', mu=c, sd=a)

    ppc = pm.sample_ppc(trace, 100, vars=[c,d]) #!!! will throw theano.gof.fg.MissingInputError