I have been playing a bit with the new do
and observe
operators. I am working on a problem where I essentially want to perform an A/B test. However, I believe that there are important variables that contribute to my outcome of interest in the form of a linear model. There is an imbalance of data in my two conditions, so my initial thoughts are to build a hierarchical model with the two conditions being levels in the hierarchy (where the levels are essentially treatment indicators).
Suppose I have some data
X = some data (N x d)
y = (N x 1)
level_idx = (N x 1) .. [0, 0, 1, 0, ...]
where X is a mix of continuous and categorical variables and y is a continuous variable, like revenue or points. What I really want to know is
P(y  X, do(level=1))  P(y  X, do(level=0))
So if I define a model:
coords = {"obs_idx": np.arange(N),
"level": [0,1],
"feats": [feat_1, feat_2, feat_3, feat_4, ... , feat_d]}
with pm.Model(coords=coords) as model:
beta0 = pm.Normal("beta0", mu=0, sigma=10)
sigma0 = pm.HalfNormal("sigma0", 1)
betas = pm.Normal("betas", mu=beta0, sigma=sigma0 , dims=["level", "feats"])
mu = pm.Deterministic("mu", (X * betas["level"]).sum(axis=1), dims="obs_idx")
err = ...
like = pm.Normal("like",mu=mu, sigma=err, observed=y, dims="obs_idx")
I understand how I could follow this blog post (although it uses the pm.observe
method in lieu of the observed
kwarg) and then use pm.do
to intervene.
My questions are:

Is it fair/correct to intervene on the treatment if the treatment is a level in my hierarchical model? Should it just be a predictor, like the
z
variable in the referenced blog? Can I justpm.sample_posterior_predictive
and then compute the ATE? 
What I really want to know is the ATE when X takes certain values. So assume X is something like:
X = [a, b, c, d]
a = continuous RV
b = continuous RV
c = categorical [red, blue, green]
d = categorical [open, closed]
then I want to know:
P(y  a=12, b=42, c=red, d=open , do(level=1))  P(y  a=12, b=42, c=red, d=open, do(level=0))
My initial thought was to intervene on level
and then compute the ATE conditional on the other variables, but this feels like I am just comparing conditional distributions like in this example and missing the true causal impact.
I canâ€™t really get into the specifics of the data, but I think this might be a good analogy. Lets say I am in charge of a concert tour that has a mix of acts at each location. I want to know if having fireworks increases ticket sales. Fireworks up until now are just something that are decided by the city. However, going forward we have the choice to do them or not. so we will want to test this for certain venues and certain acts. So I have something like
acts = [a, b, c, d, ....]
city = [city1, city2, city3, ...]
temperature = [72, 95, 86, ...]
humidity = [ ... ]
indoor = [0, 1, 0, 1, 1, ...]
fireworks = [1, 1, 0, 0, 1]
As part of my planning, I want to know the impact of fireworks given that I am in a certain city, the venue is indoors, and that a certain act is playing. However, the temperature/humidity are directly correlated with indoor/outdoor. So the data from before would be X = [acts.T, city.T, temperature.T, humidity.T, indoor.T]
I hope that example helps clarify my questions.
Thanks!