Adding a Deterministic variable after sampling

I have a model where there are a number of variables derived from the parameters that are of interest, but declaring them all Deterministic so they appear in the InferenceData output causes slower sampling than if declare them as Deterministic. I know I can do things by hand with the Xarray representations of the posterior variables, but I’m curious if there’s a more straightforward way involving adding them as new Deterministic-declared variables to the model object after sampling and calling something akin to pm.sample_posterior_predictive(trace) (but without any stochasticity). Or is there genuinely no option but to compute them by hand with the Xarrays?

You can indeed use sample_posterior_predictive without any stochasticity (unless the deterministic depends on an observed variable, does it?).

Here is a snippet:

import pymc as pm
import numpy as np

from pymc.model.fgraph import clone_model

with pm.Model() as m:
    x = pm.Beta("x", 1, 1)
    idata = pm.sample(progressbar=False)
    
with clone_model(m) as clone_m:
    det = pm.Deterministic("det", clone_m["x"] + 1)
    pp = pm.sample_posterior_predictive(idata, var_names=["det"], progressbar=False)
    
assert np.all(pp.posterior_predictive["det"] == idata.posterior["x"] + 1)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [x]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 2 seconds.
Sampling: []

Node the second Sampling: [] corresponding to sample_posterior_predictive, meaning there is nothing stochastic in it.

I used clone_model to not modify the original model, but if you are not concerned about it you can add the Deterministic on the original model after sampling.

If you need a Deterministic that depends on the values of the observed variable, you can replace the observed variable by it’s data with pm.do instead of just doing clone_model.

1 Like

Do the new deterministic variables have to be functions of parameter variables or other existing deterministic variables as in this case? In my case, I have a whole chain of derived variables and it’s only the last that I’d like the values for after sampling, akin to:

with pm.Model() as model:
    # priors
    ...
    # derived variables
    ...
    interesting_dv = ... # interesting dv computed from earlier uninteresting vars
    # likelihood
    ...

Yes, because the values from the trace will be used as inputs in the posterior predictive function.

You can always have a helper function that returns the uninteresting variables from the model variables to avoid code duplication if that’s the concern?

1 Like

Hm, I’m getting different values for even Deterministic quantities when I declare them in this way. Here’s a minimal example:

import pymc as pm
from pymc.model.fgraph import clone_model

with pm.Model() as model:
    data = pm.ConstantData('data',[0])
    mu = pm.Normal('mu', mu=0, sigma=1)
    mu_squared1 = pm.Deterministic('mu_squared1', mu**2)
    y = pm.Normal('y', mu=mu, sigma=1, observed=data)


with clone_model(model) as model2:
    mu_squared2 = pm.Deterministic('mu_squared2', mu**2)


with model2:
    trace = pm.sample_prior_predictive(
        samples = 1
        , var_names = ['mu','mu_squared1','mu_squared2']
    )

Which yields:

>>> trace['prior']['mu']
array([[-1.44118118]])

>>> trace['prior']['mu_squared1']
array([[2.0770032]])

>>> trace['prior']['mu_squared2']
array([[0.25616674]])

Oh, I failed to notice that one can’t simply refer to variables as normal when cloning but have to refer to them as elements of the cloned model object, so this works as expected:

import pymc as pm
from pymc.model.fgraph import clone_model

with pm.Model() as model:
    data = pm.ConstantData('data',[0])
    mu = pm.Normal('mu', mu=0, sigma=1)
    mu_squared1 = pm.Deterministic('mu_squared1', mu**2)
    y = pm.Normal('y', mu=mu, sigma=1, observed=data)


with clone_model(model) as model2:
    mu_squared2 = pm.Deterministic('mu_squared2', model2['mu']**2)


with model2:
    trace = pm.sample_prior_predictive(
        samples = 1
        , var_names = ['mu','mu_squared1','mu_squared2']
    )
1 Like