Elegant way to code multi target/output regression with Inference data dims?

Hi there, I am trying to code essentially perform regression on x for multiple outputs, y1, y2, y3… to arbitrarily large numbers of predictors. My data could look something like this for two targets
image. This data is called dummy_data. I would like the regression coefficients to come from a population represented by a hyperprior. I would like a dimension in the Inference data object predictor index and observation index, which I encode as:

coords = {"obs_id":np.arange(1000), "target":np.arange(2)}

I can successfully program model to do what I need as:

with pm.Model(coords=coords) as multi_target_regression:
    obs_idx = pm.Data("obs_idx", dummy_data.index.values, dims="obs_id" )
    target_idx = pm.Data("target_idx", np.arange(2), dims="target")
    x = pm.Data("x", dummy_data.x, dims="obs_id")
    b_eff_rate = pm.HalfStudentT('b_eff_rate', sigma=1, nu=4)
    b_eff = pm.Exponential("b_eff",1/b_eff_rate, dims="target")
    lamd1 = pm.Deterministic("lamd1", x[obs_idx] * b_eff[0], dims=["obs_id"])
    y1 = pm.Poisson("y1", mu= lamd1[obs_idx], observed=dummy_data.y1 , dims=["obs_id"] )
    lamd2 = pm.Deterministic("lamd2", x[obs_idx] * b_eff[1], dims=["obs_id"])
    y2 = pm.Poisson("y2", mu= lamd2[obs_idx], observed=dummy_data.y2 , dims=["obs_id"] )

The visual of the Bayesian network looks something like this:

This becomes an unwieldy way to code the model when I want to run from y1 to yn where say n = 100, instead of just up to y2 like in this model.

I attempted the following to be cleaner:

with pm.Model(coords=coords) as cell_regression:
    obs_idx = pm.Data("obs_idx", dummy_data.index.values, dims="obs_id" )
    target_idx = pm.Data("target_idx", np.arange(2), dims="target")
    x = pm.Data("x", dummy_data.x, dims="obs_id")
    b_eff_rate = pm.HalfStudentT('b_eff_rate', sigma=1, nu=4)
    b_eff = pm.Exponential("b_eff",1/b_eff_rate, dims="target")
    lamd = pm.Deterministic("lamd", x[obs_idx] * b_eff[target_idx], dims=["cell"])
    y = pm.Poisson("exp", mu= lamd[obs_idx,target_idx], observed=dummy_data[["y1","y2"]].values , dims=["obs_id","target"] )

I get the following error:
Input dimension mis-match. (input[0].shape[0] = 1000, input[1].shape[0] = 2)

I could probably get this to work by unravelling the data and giving an indicator about which regression coefficient to use, but this seemed really inelegant and unreadable. Is there a clean/readable/intuitive way to perform multi-target regression which exploits the dimensions in the way I am trying above?

So the first model works, but the problem is it get’s tricky to write when you have many outputs? How about adding a loop in your model? This isn’t a full minimum working example, so I can’t test it, but something along the lines of replacing the last 4 lines in your model definition with something like this…

for i in range(100):
    lamd = pm.Deterministic(f"lamd{i}", x[obs_idx] * b_eff[i], dims=["obs_id"])
    pm.Poisson(f"y{i}", mu=lamd[obs_idx], observed=dummy_data[f"y{i}"] , dims=["obs_id"] )

PS. I’m not convinced you need any of the [obs_idx] stuff. If you’ve got a sensible index in your data frame I don’t think you need it.

Thank’s for this. The for loop solution works quite well and does more or less what I was looking for :slight_smile:

1 Like