Do operator on multiple variables in model

JAB · October 13, 2023, 5:32am

I have been playing a bit with the new do and observe operators. I am working on a problem where I essentially want to perform an A/B test. However, I believe that there are important variables that contribute to my outcome of interest in the form of a linear model. There is an imbalance of data in my two conditions, so my initial thoughts are to build a hierarchical model with the two conditions being levels in the hierarchy (where the levels are essentially treatment indicators).

Suppose I have some data

X = some data (N x d)
y = (N x 1)
level_idx = (N x 1) .. [0, 0, 1, 0, ...]

where X is a mix of continuous and categorical variables and y is a continuous variable, like revenue or points. What I really want to know is

P(y | X, do(level=1)) - P(y | X, do(level=0))

So if I define a model:

coords = {"obs_idx": np.arange(N),
          "level": [0,1],
          "feats": [feat_1, feat_2, feat_3, feat_4, ... , feat_d]}

with pm.Model(coords=coords) as model:
    
    beta0  = pm.Normal("beta0", mu=0, sigma=10)
    sigma0 = pm.HalfNormal("sigma0", 1)
    betas = pm.Normal("betas", mu=beta0, sigma=sigma0 , dims=["level", "feats"])
    
    mu  = pm.Deterministic("mu", (X * betas["level"]).sum(axis=-1), dims="obs_idx")
    err =  ...
    like = pm.Normal("like",mu=mu, sigma=err, observed=y, dims="obs_idx")

I understand how I could follow this blog post (although it uses the pm.observe method in lieu of the observed kwarg) and then use pm.do to intervene.

My questions are:

Is it fair/correct to intervene on the treatment if the treatment is a level in my hierarchical model? Should it just be a predictor, like the z variable in the referenced blog? Can I just pm.sample_posterior_predictive and then compute the ATE?
What I really want to know is the ATE when X takes certain values. So assume X is something like:

X = [a, b, c, d]
a = continuous RV
b = continuous RV
c = categorical [red, blue, green]
d = categorical [open, closed]

then I want to know:

P(y | a=12, b=42, c=red, d=open , do(level=1)) - P(y | a=12, b=42, c=red, d=open, do(level=0))

My initial thought was to intervene on level and then compute the ATE conditional on the other variables, but this feels like I am just comparing conditional distributions like in this example and missing the true causal impact.

I can’t really get into the specifics of the data, but I think this might be a good analogy. Lets say I am in charge of a concert tour that has a mix of acts at each location. I want to know if having fireworks increases ticket sales. Fireworks up until now are just something that are decided by the city. However, going forward we have the choice to do them or not. so we will want to test this for certain venues and certain acts. So I have something like

acts = [a, b, c, d, ....]
city = [city1, city2, city3, ...]
temperature = [72, 95, 86, ...]
humidity = [  ... ]
indoor = [0, 1, 0, 1, 1, ...]
fireworks = [1, 1, 0, 0, 1]

As part of my planning, I want to know the impact of fireworks given that I am in a certain city, the venue is indoors, and that a certain act is playing. However, the temperature/humidity are directly correlated with indoor/outdoor. So the data from before would be X = [acts.T, city.T, temperature.T, humidity.T, indoor.T]

I hope that example helps clarify my questions.

Thanks!

ricardoV94 · October 13, 2023, 8:07am

To answer only one of your questions, you can certainly intervene on any model variables, not just predictors.

JAB · October 13, 2023, 2:51pm

Awesome, thanks. I did try it with the blog post example and was able to compute an ATE with multiple variable interventions, but I wasn’t sure if that was valid from a causal influence perspective.

Is it possible to perform the intervention but with variables being sampled from a distribution? For example, in my example, there is a mix of categorical and continuous variables. I would fix the categorical variables, but when I do(temperature=t) or do(humidity=h), those might not be fixed values but instead sampled from some narrow distribution around a desired intervention value.

jessegrabowski · October 13, 2023, 3:33pm

You can swap out distributions in a predictive simulation to achieve what you describe. This is the general procedure:

Opening a new model context, with a new model name
For the RV you want to replace with a new distribution, give it a new name
For RVs whose posteriors you want to recycle, give them pm.Flat priors witht the same name
Do your model computations as normal
Call pm.sample_posterior_predictive, passing your old idata.

This article goes deeper into the details on how to do it, with nice examples.

JAB · October 13, 2023, 6:53pm

Awesome, thanks for point this out. I’ve glanced through that but now I read it in more detail. There are a few things that are still not clear to me:

It seems to me that if if I only want to swap out distributions for some of my RVs, then its probably best to explicitly write out the equation for my likelihood mean than write it in matrix form, right? e.g.,
instead of (X * betas["level"]).sum(axis=-1) I would likely be better to do something like

T*beta_T + H*beta_H + a*beta_A + ...

right?

I am still not clear if that example notebook completely applies. For example, I believe my model still stands as is. Therefore, all I need to do is swap out the data for the specific conditions I am interested in. So going back to my example, I defined

acts = [a, b, c, d, ....]
city = [city1, city2, city3, ...]
temperature = [72, 95, 86, ...]
humidity = [  ... ]
indoor = [0, 1, 0, 1, 1, ...]
fireworks = [1, 1, 0, 0, 1]

In my specific case, I have seen all of conditions in the data, just not necessarily the specific combinations. i.e, I have data for every city, act, indoor/outdoor/, and a wide range of temp/humidity.

Now I want to draw N samples but I want to fix things that I already know or believe to be true e.g.

acts        := a
city        := city3
humidity    := normal(50, 10) 
temperature := normal(72, 10)
indoor      := 0

and now I want to compute the ATE for do(fireworks=1) - do(fireworks=0)

Is it as simple as wrapping all my original data in pm.MutableData containers and then using pm.set_data?

Something like?

N=10_000
acts        := N*[a]
city        := N*[city3]
humidity    := pm.draw(pm.Normal.dist(50, 10), draws=N) 
temperature := pm.draw(pm.Normal.dist(72, 10), draws=N)
indoor      := np.zeros_like(temperature)

or what if I want temp/humidity to be draws from the original obervations of temp, humidity | city?

How does pm.set_data compare to pm.observe in the context of using the do operator?

Sorry for so many questions but I sincerely appreciate the assistance.

ricardoV94 · October 16, 2023, 9:11am

Yes: Distributional do-operator? - #2 by ricardoV94
We should add it to the docstrings.

pm.observe affects the logp graph, whereas do affects the generative graph. The former won’t have any meaningful effect when doing prior/posterior predictive sampling.

pm.do or pm.MutableData simply modify the underlying model. Sometimes what you want can be easily achieved by this, other times it’s too cumbersome.

The approach shown in that article is the most general possible: you manually write the new model and feed the inferred trace to it. You can do whatever do and set_data achieve and much more. It’s just a question of how complex your “intervention” / “conditional predictions” look like, which will determine the best approach.

JAB · October 17, 2023, 2:27pm

NOTE: I accidentally created a post above that was incomplete. Instead of editing I accidentally deleted.

Thanks for all your replies. I actually ran into this same issue when I was trying to use do and observe on an already sampled model. I’ve had success with pm.set_data for out of sample predictions. This helps me understand the difference to some degree, but I am not 100% sure.

I think I will write out an example notebook to test the different options. I will post here and may ask for some feedback on whether or not the approach is correct/valid.

Thanks to everyone so far!

Topic		Replies	Views
Hierarchical model, categorical outcome Questions	0	537	September 15, 2020
Multilevel model: how to model outcome at the group level Questions	3	946	December 28, 2018
Distributional do-operator? v5 modeling	1	492	September 16, 2023
Incorporating multiple variables in Hierarchial model v5 hierarchical	0	482	December 30, 2022
Hierarchical Model with Multiple Predictors	5	1664	August 15, 2023

Do operator on multiple variables in model

Related topics