Do operator on multiple variables in model

I have been playing a bit with the new do and observe operators. I am working on a problem where I essentially want to perform an A/B test. However, I believe that there are important variables that contribute to my outcome of interest in the form of a linear model. There is an imbalance of data in my two conditions, so my initial thoughts are to build a hierarchical model with the two conditions being levels in the hierarchy (where the levels are essentially treatment indicators).

Suppose I have some data

X = some data (N x d)
y = (N x 1)
level_idx = (N x 1) .. [0, 0, 1, 0, ...]

where X is a mix of continuous and categorical variables and y is a continuous variable, like revenue or points. What I really want to know is

P(y | X, do(level=1)) - P(y | X, do(level=0))

So if I define a model:

coords = {"obs_idx": np.arange(N),
          "level": [0,1],
          "feats": [feat_1, feat_2, feat_3, feat_4, ... , feat_d]}

with pm.Model(coords=coords) as model:
    beta0  = pm.Normal("beta0", mu=0, sigma=10)
    sigma0 = pm.HalfNormal("sigma0", 1)
    betas = pm.Normal("betas", mu=beta0, sigma=sigma0 , dims=["level", "feats"])
    mu  = pm.Deterministic("mu", (X * betas["level"]).sum(axis=-1), dims="obs_idx")
    err =  ...
    like = pm.Normal("like",mu=mu, sigma=err, observed=y, dims="obs_idx")

I understand how I could follow this blog post (although it uses the pm.observe method in lieu of the observed kwarg) and then use to intervene.

My questions are:

  1. Is it fair/correct to intervene on the treatment if the treatment is a level in my hierarchical model? Should it just be a predictor, like the z variable in the referenced blog? Can I just pm.sample_posterior_predictive and then compute the ATE?

  2. What I really want to know is the ATE when X takes certain values. So assume X is something like:

X = [a, b, c, d]
a = continuous RV
b = continuous RV
c = categorical [red, blue, green]
d = categorical [open, closed]

then I want to know:

P(y | a=12, b=42, c=red, d=open , do(level=1)) - P(y | a=12, b=42, c=red, d=open, do(level=0))

My initial thought was to intervene on level and then compute the ATE conditional on the other variables, but this feels like I am just comparing conditional distributions like in this example and missing the true causal impact.

I can’t really get into the specifics of the data, but I think this might be a good analogy. Lets say I am in charge of a concert tour that has a mix of acts at each location. I want to know if having fireworks increases ticket sales. Fireworks up until now are just something that are decided by the city. However, going forward we have the choice to do them or not. so we will want to test this for certain venues and certain acts. So I have something like

acts = [a, b, c, d, ....]
city = [city1, city2, city3, ...]
temperature = [72, 95, 86, ...]
humidity = [  ... ]
indoor = [0, 1, 0, 1, 1, ...]
fireworks = [1, 1, 0, 0, 1]

As part of my planning, I want to know the impact of fireworks given that I am in a certain city, the venue is indoors, and that a certain act is playing. However, the temperature/humidity are directly correlated with indoor/outdoor. So the data from before would be X = [acts.T, city.T, temperature.T, humidity.T, indoor.T]

I hope that example helps clarify my questions.


To answer only one of your questions, you can certainly intervene on any model variables, not just predictors.

1 Like

Awesome, thanks. I did try it with the blog post example and was able to compute an ATE with multiple variable interventions, but I wasn’t sure if that was valid from a causal influence perspective.

Is it possible to perform the intervention but with variables being sampled from a distribution? For example, in my example, there is a mix of categorical and continuous variables. I would fix the categorical variables, but when I do(temperature=t) or do(humidity=h), those might not be fixed values but instead sampled from some narrow distribution around a desired intervention value.

You can swap out distributions in a predictive simulation to achieve what you describe. This is the general procedure:

  1. Opening a new model context, with a new model name
  2. For the RV you want to replace with a new distribution, give it a new name
  3. For RVs whose posteriors you want to recycle, give them pm.Flat priors witht the same name
  4. Do your model computations as normal
  5. Call pm.sample_posterior_predictive, passing your old idata.

This article goes deeper into the details on how to do it, with nice examples.


Awesome, thanks for point this out. I’ve glanced through that but now I read it in more detail. There are a few things that are still not clear to me:

  1. It seems to me that if if I only want to swap out distributions for some of my RVs, then its probably best to explicitly write out the equation for my likelihood mean than write it in matrix form, right? e.g.,
    instead of (X * betas["level"]).sum(axis=-1) I would likely be better to do something like
T*beta_T + H*beta_H + a*beta_A + ... 


  1. I am still not clear if that example notebook completely applies. For example, I believe my model still stands as is. Therefore, all I need to do is swap out the data for the specific conditions I am interested in. So going back to my example, I defined
acts = [a, b, c, d, ....]
city = [city1, city2, city3, ...]
temperature = [72, 95, 86, ...]
humidity = [  ... ]
indoor = [0, 1, 0, 1, 1, ...]
fireworks = [1, 1, 0, 0, 1]

In my specific case, I have seen all of conditions in the data, just not necessarily the specific combinations. i.e, I have data for every city, act, indoor/outdoor/, and a wide range of temp/humidity.

Now I want to draw N samples but I want to fix things that I already know or believe to be true e.g.

acts        := a
city        := city3
humidity    := normal(50, 10) 
temperature := normal(72, 10)
indoor      := 0

and now I want to compute the ATE for do(fireworks=1) - do(fireworks=0)

Is it as simple as wrapping all my original data in pm.MutableData containers and then using pm.set_data?

Something like?

acts        := N*[a]
city        := N*[city3]
humidity    := pm.draw(pm.Normal.dist(50, 10), draws=N) 
temperature := pm.draw(pm.Normal.dist(72, 10), draws=N)
indoor      := np.zeros_like(temperature)

or what if I want temp/humidity to be draws from the original obervations of temp, humidity | city?

How does pm.set_data compare to pm.observe in the context of using the do operator?

Sorry for so many questions but I sincerely appreciate the assistance.

Yes: Distributional do-operator? - #2 by ricardoV94
We should add it to the docstrings.

pm.observe affects the logp graph, whereas do affects the generative graph. The former won’t have any meaningful effect when doing prior/posterior predictive sampling. or pm.MutableData simply modify the underlying model. Sometimes what you want can be easily achieved by this, other times it’s too cumbersome.

The approach shown in that article is the most general possible: you manually write the new model and feed the inferred trace to it. You can do whatever do and set_data achieve and much more. It’s just a question of how complex your “intervention” / “conditional predictions” look like, which will determine the best approach.


NOTE: I accidentally created a post above that was incomplete. Instead of editing I accidentally deleted.

Thanks for all your replies. I actually ran into this same issue when I was trying to use do and observe on an already sampled model. I’ve had success with pm.set_data for out of sample predictions. This helps me understand the difference to some degree, but I am not 100% sure.

I think I will write out an example notebook to test the different options. I will post here and may ask for some feedback on whether or not the approach is correct/valid.

Thanks to everyone so far!