Error modelling Collider -- -- bayesian Regression

Hello I am trying to model a collider process from some pre generated data but when i try to subtract mu from the observed dist i get this error.

# collider 

# generating data
np.random.seed(2025)
beta1 = 3
beta2 = 2
x = np.random.normal(loc=10, scale=3, size=100)
y = np.random.normal(loc=2, scale=3, size=100)
z = x*beta1 + y*beta2 + np.random.normal(loc=0, scale=0.1, size=100)
data = pd.DataFrame({'x':x,'z': z,'y': y})

#WRONG MODEL INCLUDING COLLIDER

with pm.Model() as include_collider_model:
	x = pm.Data('x',data['x'].values)
	z = pm.Data('z',data['z'].values)
	y = pm.Data('y',data['y'].values)

	alpha = pm.Normal('intercept',0,1)
	beta_x = pm.Normal('slope_x',0,1)
	beta_z = pm.Normal('slope_z',0,1)
	sigma = pm.Exponential('error',lam=1)

	mu = alpha + beta_x*x + beta_z*z
	y_hat = pm.Normal('y_hat',0,sigma,observed=y-mu)
	trace_include_collider = pm.sample(1000)
# Figure 11

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [147], in <cell line: 3>()
     11 sigma = pm.Exponential('error',lam=1)
     13 mu = alpha + beta_x*x + beta_z*z
---> 14 y_hat = pm.Normal('y_hat',0,sigma,observed=y-mu)
     15 trace_include_collider = pm.sample(500,tune=1500,chains=2, cores=2, target_accept=0.95)

File ~\.conda\envs\pymc_env\lib\site-packages\pymc\distributions\distribution.py:271, in Distribution.__new__(cls, name, rng, dims, initval, observed, total_size, transform, *args, **kwargs)
    267 if resize_shape:
    268     # A batch size was specified through `dims`, or implied by `observed`.
    269     rv_out = change_rv_size(rv=rv_out, new_size=resize_shape, expand=True)
--> 271 rv_out = model.register_rv(
    272     rv_out,
    273     name,
    274     observed,
    275     total_size,
    276     dims=dims,
    277     transform=transform,
    278     initval=initval,
    279 )
    281 # add in pretty-printing support
    282 rv_out.str_repr = types.MethodType(str_for_dist, rv_out)

File ~\.conda\envs\pymc_env\lib\site-packages\pymc\model.py:1372, in Model.register_rv(self, rv_var, name, data, total_size, dims, transform, initval)
   1366 else:
   1367     if (
   1368         isinstance(data, Variable)
   1369         and not isinstance(data, (GenTensorVariable, Minibatch))
   1370         and data.owner is not None
   1371     ):
-> 1372         raise TypeError(
   1373             "Variables that depend on other nodes cannot be used for observed data."
   1374             f"The data variable was: {data}"
   1375         )
   1377     # `rv_var` is potentially changed by `make_obs_var`,
   1378     # for example into a new graph for imputation of missing data.
   1379     rv_var = self.make_obs_var(rv_var, data, dims, transform)

TypeError: Variables that depend on other nodes cannot be used for observed data.The data variable was: Elemwise{sub,no_inplace}.0

Is there an alternative approach to use in pymc4? I have made this work in pymc3 b4.

Thanks!!!

-Blaise

for reference the following model should exclude the confounding collider but I get the same error

with pm.Model() as exclude_collider_model:
    x = pm.Data('x', np.array(data['x'].values,dtype=float))
    y = pm.Data('y', np.array(data['y'].values,dtype=float))
    # y= 
    alpha = pm.Normal('intercept',0,1)
    beta_x = pm.Normal('slope_x',0,1)
    sigma = pm.Exponential('error',lam=1)
    
    mu = alpha + beta_x*x
    y_hat = pm.Normal('y_hat',0,sigma,observed=y-mu)
    trace_exclude_collider = pm.sample(1000)

I’m looking at this quickly, but I believe that you can only provide data into the observed argument whereas mu has some tensor variables and whatnot in there. Perhaps try the following for y_hat:

y_hat = pm.Normal("y_hat", mu, sigma, observed=y)

Let me know if this helps (or not)!

1 Like

hmm, in pymc3 we could subtract tensor variables from the observed to isolate the cause of x on y given x has some relationship to an unobserved Z the mu is supposed to control for this . I have ran with just the data in y but im unsure if this is mathematically sound for controlling the confounder.

Welcome!

I am also a bit confused about what you are trying to do. Your likelihood suggests that you are “observing” the difference between y and mu and that you expect this difference to be zero on average (y_hat has a mean of zero). I’m not sure how this interacts with your collider example, but as @larryshamalama suggests, conventional linear models generally flip the specification:

mu = alpha + beta_x*x
y_hat = pm.Normal('y_hat', mu, sigma, observed=y)
3 Likes

ok cool thanks for the explanation ! that makes sense !