ModelBuilder with 2 outputvars possible?

bf-malefiz · January 15, 2025, 5:01pm

Hey guys, I’m pretty newbish on probabilistic modelling and I’m currently hitting a fence. I’m trying to implement a model I got by my Professor for predicting football goals for the team which is playing home and for the one playing away with the ModelBuilder. The model is ending on two Poisson observations.

[...]
# observed
pm.Poisson("home_goals", observed=y_data_home, mu=mu_home, dims="match")
pm.Poisson("away_goals", observed=y_data_away, mu=mu_away, dims="match")

After sampling I try to predict the home_goals as well as the away_goals on unseen data using the predict_posterior function from the Modelbuilder class. Unfortunatly the function expects me to define my outputvar and checks it if it is in the posterior_predictions

if self.output_var not in posterior_predictive_samples:
   raise KeyError(
       f"Output variable {self.output_var} not found in posterior predictive samples."
            )

I have both predictions in my xarray present i.e:

<xarray.DataArray 'home_goals' (chain: 2, draw: 2000, match: 1)> Size: 32kB
array([[[1],
        ...,
       [[2],
        ...,
        [1]]], dtype=int64)
Coordinates:
  * chain    (chain) int32 8B 0 1
  * draw     (draw) int32 8kB 0 1 2 3 4 5 6 ... 1994 1995 1996 1997 1998 1999
  * match    (match) int32 4B 51


<xarray.DataArray 'away_goals' (chain: 2, draw: 2000, match: 1)> Size: 32kB
array([[[1],
        ...,
        [4]],
       [[1],
        ...,
        [0]]], dtype=int64)
Coordinates:
  * chain    (chain) int32 8B 0 1
  * draw     (draw) int32 8kB 0 1 2 3 4 5 6 ... 1994 1995 1996 1997 1998 1999
  * match    (match) int32 4B 51

My problem is that I can’t return both predictions and have to set output_var to a list of both neither a tuple because the if statement only checks on key. Is there a way to still get both predictions or do I have a wrong way of seeing what is going on here? Glad for any help.

tyvm

references:ModelBuilder

source code to output_var check

ricardoV94 · January 16, 2025, 7:29am

I don’t think it’s currently possible, but any reason you need ModelBuilder? The predict method is just setting data containers and calling sample posterior predictive

bf-malefiz · January 16, 2025, 8:19am

Good question. I’m trying to build a reproducible pipeline with kedro and mlflow. My initial plan was to save the model and ModelBuilder seemed the only way to do so. Unfortunately I couldn’t integrate it properly, so I’m just tracking the pipeline and metrics atm, without saving it. So I might don’t need it anymore, if I don’t save and load it anyway?

fonnesbeck · January 16, 2025, 4:58pm

You could try using a single likelihood with an index for home and away:

pm.Poisson("goals", observed=y_data, mu=mu[home_away_idx])

where home_away_idx is a boolean array indicating home or away and mu is a vector of size 2.

bf-malefiz · January 20, 2025, 8:56pm

Thank you very much, the likelihood seems to work fine with dims/index as u mentioned. I just had to make some adjustments in my data containers. Finnaly it looked somethink like that with var_output returning “goals”

y_data = pm.Data(
                "y_data",
                self.y,
                dims=["match", "y"],
            )

[...]
pm.Poisson("goals", observed=y_data, mu=mu, dims=["match", "y"])

Topic		Replies	Views
Predicting from a model with multiple observed variables? Questions	4	4116	October 9, 2018
PyMC v5.10.3 prediction stuff v5 modeling	11	538	January 15, 2024
How to understand `obs_dim_2` in out-of-sample prediction? v5 modeling	1	84	June 29, 2024
How to include coords in out-of-sample prediction? v5 modeling	1	135	July 2, 2024
Posterior predictive sampling for new groups in hierarchical model using ModelBuilder modeling	0	65	October 16, 2025

ModelBuilder with 2 outputvars possible?

Related topics