Do Operator not working correctly with deterministic function

CausalTruth · November 15, 2023, 8:23pm

Hi, I have used the new do operator as in this blog post Causal analysis with PyMC: Answering "What If?" with the new do operator - PyMC Labs and have adapted it to my needs. However, I found that the sample_posterior_predictive function doesn’t seem to correctly use the data inserted via do operator - when a pm.Deterministic was used in the Model.
The following recreates the workflow and the resulting problem (with the minimum amount of variables to not create unnecessary complexity).

with pm.Model(coords_mutable={"i": [0], "campaign_dim":np.arange(2)}) as model_generative:
    campaign = pm.Categorical("campaign", p=[0.5,0.5], dims=("i"))
    
    alpha_y_campaign = pm.Normal("alpha_y_campaign",mu=[0,10],sigma=1, dims=("campaign_dim"))
    alpha_campaign = pm.Deterministic("alpha_campaign", alpha_y_campaign[campaign])
    y = pm.Normal("y", mu=alpha_campaign , dims=("i"))
N=100
with model_generative:
    simulate = pm.sample_prior_predictive(samples=N)

observed = {
    "campaign": simulate.prior.campaign.values.flatten(),
    "y": simulate.prior.y.values.flatten(),
}

df = pd.DataFrame(observed)

This creates a dataframe, where y = 10 when campaign = 1 and y = 0 when campaign = 0.

Now I am gonna reset the mu parameter for alpha_y_campaign to 0 for campaign index variables and let the model try to recover the parameters that were use for the above df and estimate the causal effect.

with pm.Model(coords_mutable={"i": [0], "campaign_dim":np.arange(2)}) as model_generative:
    campaign = pm.Categorical("campaign", p=[0.5,0.5], dims=("i"))
    
    alpha_y_campaign = pm.Normal("alpha_y_campaign",mu=0,sigma=1, dims=("campaign_dim"))
    alpha_campaign = pm.Deterministic("alpha_campaign", alpha_y_campaign[campaign])
    y = pm.Normal("y", mu=alpha_campaign , dims=("i"))

model_inference = pm.observe(model_generative, {"campaign": df["campaign"].values,
                                                "y":        df["y"].values
                                               })
model_inference.set_dim("i", N, coord_values=(np.arange(N)))

with model_inference:
    idata = pm.sample( random_seed=1)

So far so good. Now I am gonna set for one model all campaign values to 0 and for the other to 1.

model_z0 = do(model_inference, {"campaign": np.zeros(N, dtype="int32")}, prune_vars=True)
model_z1 = do(model_inference, {"campaign": np.ones(N, dtype="int32")}, prune_vars=True)

Now comes the interesting part. When I sample from the posterior_predictive and only write var_names=[“y”]. I don’t get a causal effect:

idata_z0 = pm.sample_posterior_predictive(
    idata,
    model=model_z0,
    predictions=True,
    var_names=["y"]
)
idata_z1 = pm.sample_posterior_predictive(
    idata,
    model=model_z1,
    predictions=True,
    var_names=["y"]
)
az.plot_posterior(idata_z1.predictions.y.reduce(np.mean,dim="i")-idata_z0.predictions.y.reduce(np.mean,dim="i"))

However, when I include “campaign” in the var_names. I get the correct causal inference:

idata_z0 = pm.sample_posterior_predictive(
    idata,
    model=model_z0,
    predictions=True,
    var_names=["campaign","y"]
)
idata_z1 = pm.sample_posterior_predictive(
    idata,
    model=model_z1,
    predictions=True,
    var_names=["campaign","y"]
)

az.plot_posterior(idata_z1.predictions.y.reduce(np.mean,dim="i")-idata_z0.predictions.y.reduce(np.mean,dim="i"))

It seems that problem arises when I am using pm.Deterministic inside the Model. Can somebody explain to me, why that makes a difference? I don’t have the issue, when I am not using pm.Deterministic, i.e. when I am using the model like this:

with pm.Model(coords_mutable={"i": [0], "campaign_dim":np.arange(2)}) as model_generative:
    campaign = pm.Categorical("campaign", p=[0.5,0.5], dims=("i"))
    
    alpha_y_campaign = pm.Normal("alpha_y_campaign",mu=[0,10],sigma=1, dims=("campaign_dim"))
    y = pm.Normal("y", mu= alpha_y_campaign[campaign], dims=("i"))

ricardoV94 · November 15, 2023, 9:44pm

Probably the same issue reported here: Posterior predictive doesn't resample intermediate Deterministics of intervened variables · Issue #6977 · pymc-devs/pymc · GitHub

Topic		Replies	Views
Should I add pm.Deterministic to my model? modeling	6	600	June 6, 2024
Deterministic and observed RV behaviour when using sample_posterior_predictive Questions	5	1085	January 24, 2019
Adding a Deterministic variable after sampling v5	6	595	November 4, 2023
New PyMCon Talk Released: Bayesian Causal Modeling by Thomas Wiecki & Ben Vincent PyMCon Web Series	9	1197	October 2, 2023
Bug in fast sample posterior predictive? Questions	9	1507	March 14, 2021

Do Operator not working correctly with deterministic function

Related topics