ModelBuilder with 2 outputvars possible?

Hey guys, I’m pretty newbish on probabilistic modelling and I’m currently hitting a fence. I’m trying to implement a model I got by my Professor for predicting football goals for the team which is playing home and for the one playing away with the ModelBuilder. The model is ending on two Poisson observations.

[...]
# observed
pm.Poisson("home_goals", observed=y_data_home, mu=mu_home, dims="match")
pm.Poisson("away_goals", observed=y_data_away, mu=mu_away, dims="match")

After sampling I try to predict the home_goals as well as the away_goals on unseen data using the predict_posterior function from the Modelbuilder class. Unfortunatly the function expects me to define my outputvar and checks it if it is in the posterior_predictions

if self.output_var not in posterior_predictive_samples:
   raise KeyError(
       f"Output variable {self.output_var} not found in posterior predictive samples."
            )

I have both predictions in my xarray present i.e:

<xarray.DataArray 'home_goals' (chain: 2, draw: 2000, match: 1)> Size: 32kB
array([[[1],
        ...,
       [[2],
        ...,
        [1]]], dtype=int64)
Coordinates:
  * chain    (chain) int32 8B 0 1
  * draw     (draw) int32 8kB 0 1 2 3 4 5 6 ... 1994 1995 1996 1997 1998 1999
  * match    (match) int32 4B 51


<xarray.DataArray 'away_goals' (chain: 2, draw: 2000, match: 1)> Size: 32kB
array([[[1],
        ...,
        [4]],
       [[1],
        ...,
        [0]]], dtype=int64)
Coordinates:
  * chain    (chain) int32 8B 0 1
  * draw     (draw) int32 8kB 0 1 2 3 4 5 6 ... 1994 1995 1996 1997 1998 1999
  * match    (match) int32 4B 51

My problem is that I can’t return both predictions and have to set output_var to a list of both neither a tuple because the if statement only checks on key. Is there a way to still get both predictions or do I have a wrong way of seeing what is going on here? Glad for any help.

tyvm

references:ModelBuilder

source code to output_var check

I don’t think it’s currently possible, but any reason you need ModelBuilder? The predict method is just setting data containers and calling sample posterior predictive

Good question. I’m trying to build a reproducible pipeline with kedro and mlflow. My initial plan was to save the model and ModelBuilder seemed the only way to do so. Unfortunately I couldn’t integrate it properly, so I’m just tracking the pipeline and metrics atm, without saving it. So I might don’t need it anymore, if I don’t save and load it anyway?

You could try using a single likelihood with an index for home and away:

pm.Poisson("goals", observed=y_data, mu=mu[home_away_idx])

where home_away_idx is a boolean array indicating home or away and mu is a vector of size 2.

2 Likes

Thank you very much, the likelihood seems to work fine with dims/index as u mentioned. I just had to make some adjustments in my data containers. Finnaly it looked somethink like that with var_output returning “goals”

y_data = pm.Data(
                "y_data",
                self.y,
                dims=["match", "y"],
            )

[...]
pm.Poisson("goals", observed=y_data, mu=mu, dims=["match", "y"])