Hi there, I am trying to code essentially perform regression on x for multiple outputs, y1, y2, y3… to arbitrarily large numbers of predictors. My data could look something like this for two targets
. This data is called dummy_data. I would like the regression coefficients to come from a population represented by a hyperprior. I would like a dimension in the Inference data object predictor index and observation index, which I encode as:
coords = {"obs_id":np.arange(1000), "target":np.arange(2)}
I can successfully program model to do what I need as:
with pm.Model(coords=coords) as multi_target_regression:
obs_idx = pm.Data("obs_idx", dummy_data.index.values, dims="obs_id" )
target_idx = pm.Data("target_idx", np.arange(2), dims="target")
x = pm.Data("x", dummy_data.x, dims="obs_id")
b_eff_rate = pm.HalfStudentT('b_eff_rate', sigma=1, nu=4)
b_eff = pm.Exponential("b_eff",1/b_eff_rate, dims="target")
lamd1 = pm.Deterministic("lamd1", x[obs_idx] * b_eff[0], dims=["obs_id"])
y1 = pm.Poisson("y1", mu= lamd1[obs_idx], observed=dummy_data.y1 , dims=["obs_id"] )
lamd2 = pm.Deterministic("lamd2", x[obs_idx] * b_eff[1], dims=["obs_id"])
y2 = pm.Poisson("y2", mu= lamd2[obs_idx], observed=dummy_data.y2 , dims=["obs_id"] )
The visual of the Bayesian network looks something like this:
This becomes an unwieldy way to code the model when I want to run from y1 to yn where say n = 100, instead of just up to y2 like in this model.
I attempted the following to be cleaner:
with pm.Model(coords=coords) as cell_regression:
obs_idx = pm.Data("obs_idx", dummy_data.index.values, dims="obs_id" )
target_idx = pm.Data("target_idx", np.arange(2), dims="target")
x = pm.Data("x", dummy_data.x, dims="obs_id")
b_eff_rate = pm.HalfStudentT('b_eff_rate', sigma=1, nu=4)
b_eff = pm.Exponential("b_eff",1/b_eff_rate, dims="target")
lamd = pm.Deterministic("lamd", x[obs_idx] * b_eff[target_idx], dims=["cell"])
y = pm.Poisson("exp", mu= lamd[obs_idx,target_idx], observed=dummy_data[["y1","y2"]].values , dims=["obs_id","target"] )
I get the following error:
Input dimension mis-match. (input[0].shape[0] = 1000, input[1].shape[0] = 2)
I could probably get this to work by unravelling the data and giving an indicator about which regression coefficient to use, but this seemed really inelegant and unreadable. Is there a clean/readable/intuitive way to perform multi-target regression which exploits the dimensions in the way I am trying above?