Rewriting Likelihood with DensityDist() outputs MissingInputError

As a first project in PyMC3, I am starting from this example: A Hierarchical model for Rugby prediction — PyMC3 3.11.2 documentation In this moment, I am trying to re-write the likelihood using DensityDist(), but I must be missing something. The model seems to work fine, but it fails at the very last step. This is the code to read the data:

import pandas as pd
import pymc3 as pm

df_all = pd.read_csv(pm.get_data("rugby.csv"), index_col=0)
df = df_all[["home_team", "away_team", "home_score", "away_score"]]
teams = df.home_team.unique()
teams = pd.DataFrame(teams, columns=["team"])
teams["i"] = teams.index

df = pd.merge(df, teams, left_on="home_team", right_on="team", how="left")
df = df.rename(columns={"i": "i_home"}).drop("team", 1)
df = pd.merge(df, teams, left_on="away_team", right_on="team", how="left")
df = df.rename(columns={"i": "i_away"}).drop("team", 1)

observed_home_goals = df.home_score.values
observed_away_goals = df.away_score.values

home_team = df.i_home.values
away_team = df.i_away.values

num_teams = len(df.i_home.drop_duplicates())
num_games = len(home_team)

This is the model:

with pm.Model() as model:
    # global model parameters
    home = pm.Flat("home")
    sd_att = pm.HalfStudentT("sd_att", nu=3, sigma=2.5)
    sd_def = pm.HalfStudentT("sd_def", nu=3, sigma=2.5)

    # team-specific model parameters
    atts_star = pm.Normal("atts_star", mu=0, sigma=sd_att, shape=num_teams-1)
    atts_last = -tt.sum(atts_star, keepdims=True) + 1
    atts = pm.Deterministic('atts', pm.math.concatenate([atts_star, atts_last], axis=0))
    defs = pm.Normal("defs_star", mu=0, sigma=sd_def, shape=num_teams)
    
    # likelihood of observed data
    mu_home = tt.exp( home + atts[home_team] + defs[away_team] )
    mu_away = tt.exp( atts[away_team] + defs[home_team] )

                                                         
    # Original likelihood                                       
    #  home_points = pm.Poisson("home_points", mu=mu_home, observed=observed_home_goals)
    #  away_points = pm.Poisson("away_points", mu=mu_away, observed=observed_away_goals)


    # Ind. Poisson
    def logp(home_goal, away_goal, mu_home, mu_away):
        base = home_goal * pm.math.log(mu_home) - mu_home + away_goal * pm.math.log(
            mu_away) - mu_away - pm.distributions.dist_math.factln(home_goal) - pm.distributions.dist_math.factln(
            away_goal)
        return base.sum()

    # Re-written likelihood
    res = pm.DensityDist('res', logp=logp, observed={'home_goal':observed_home_goals, 'away_goal':observed_home_goals,
                                                    'mu_home': mu_home, 'mu_away':mu_away})
                                                         
    trace = pm.sample(4000, chains=3, tune=1000, return_inferencedata=True, target_accept=0.85, random_seed=123456)

Everything seems to work when I use the original likelihood (see the commented lines). However, when I use the DensityDist() specification,the sampler seems to fail after drawing the samples. I get the following error:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (3 chains in 4 jobs)
NUTS: [defs_star, atts_star, sd_def, sd_att, home]
█Sampling 3 chains for 1_000 tune and 4_000 draw iterations (3_000 + 12_000 draws total) took 10 seconds.
Traceback (most recent call last):
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3427, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-131353e97490>", line 55, in <module>
    trace = pm.sample(4000, chains=3, tune=1000, return_inferencedata=True, target_accept=0.85, random_seed=123456)
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/pymc3/sampling.py", line 639, in sample
    idata = arviz.from_pymc3(trace, **ikwargs)
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/arviz/data/io_pymc3.py", line 563, in from_pymc3
    return PyMC3Converter(
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/arviz/data/io_pymc3.py", line 171, in __init__
    self.observations, self.multi_observations = self.find_observations()
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/arviz/data/io_pymc3.py", line 184, in find_observations
    multi_observations[key] = val.eval() if hasattr(val, "eval") else val
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/theano/graph/basic.py", line 554, in eval
    self._fn_cache[inputs] = theano.function(inputs, self)
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/theano/compile/function/__init__.py", line 337, in function
    fn = pfunc(
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/theano/compile/function/pfunc.py", line 524, in pfunc
    return orig_function(
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/theano/compile/function/types.py", line 1970, in orig_function
    m = Maker(
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/theano/compile/function/types.py", line 1584, in __init__
    fgraph, additional_outputs = std_fgraph(inputs, outputs, accept_inplace)
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/theano/compile/function/types.py", line 188, in std_fgraph
    fgraph = FunctionGraph(orig_inputs, orig_outputs, update_mapping=update_mapping)
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/theano/graph/fg.py", line 162, in __init__
    self.import_var(output, reason="init")
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/theano/graph/fg.py", line 330, in import_var
    self.import_node(var.owner, reason=reason)
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/theano/graph/fg.py", line 383, in import_node
    raise MissingInputError(error_msg, variable=var)
theano.graph.fg.MissingInputError: Input 0 of the graph (indices start from 0), used to compute AdvancedSubtensor1(defs_star, TensorConstant{[4 5 3 0 5..1 3 4 0 5]}), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.

As far as I understand, the logp function I wrote should match the original implementation exactly – I even included the normalization. I am having a hard time understanding the error and I would appreciate any guidance.

A couple of quick thoughts that might help:

  1. From quickly eyeballing your model spec vs that in the Rugby notebook, it looks like you’re missing the intercept in home_theta and away_theta so those linear submodels will be misspecified
  2. You don’t have to contain the logp() within the model context stack, and in fact the stack trace suggests that theano is having a hard time building the graph, so I’d move it outside of the context
  3. In fact I’d be tempted to genericize that function and call it twice for two observed likelihoods (as does the Rugby example), since they’re not really connected
  4. You could also try using a Potential, rather than a DensityDist (I find using Potentials a little easier)

@jonsedar Thanks for your suggestions. Based on your points 2. and 3., I rewrote the model as:

def logp(goal,  mu):
    base = pm.Poisson.dist(mu=mu).logp(goal)
    return base.sum()


with pm.Model() as model:
    # global model parameters
    home = pm.Flat("home")

    # team-specific model parameters
    atts_star = pm.Normal("atts_star", mu=0, sigma=5, shape=num_teams - 1)
    atts_last = -tt.sum(atts_star, keepdims=True) + 1
    atts = pm.Deterministic('atts', pm.math.concatenate([atts_star, atts_last], axis=0))
    defs = pm.Normal("defs_star", mu=0, sigma=5, shape=num_teams)
    # likelihood of observed data
    mu_home = tt.exp(home + atts[home_team] + defs[away_team])
    mu_away = tt.exp(atts[away_team] + defs[home_team])

    # Original likelihood
    # home_points = pm.Poisson("home_points", mu=mu_home, observed=observed_home_goals)
    # away_points = pm.Poisson("away_points", mu=mu_away, observed=observed_away_goals)

    res_home = pm.DensityDist('res_home', logp=logp, observed={'goal': observed_home_goals,   'mu': mu_home})
    res_away = pm.DensityDist('res_away', logp=logp, observed={'goal': observed_away_goals,   'mu': mu_away})


    trace = pm.sample(4000, chains=3, tune=1000, return_inferencedata=True, target_accept=0.85, random_seed=123456)

Unfortunately, the issue persists. At this point, the logp function is just a pointer to the original pm.Poisson logp() method (see code above). So, I believe the problem is not there – but I can be completely wrong.

As for

  1. From quickly eyeballing your model spec vs that in the Rugby notebook, it looks like you’re missing the intercept in home_theta and away_theta so those linear submodels will be misspecified

I think the model is well specified. The scale is given by this constraint (which I am copying from the literature):

atts_last = -tt.sum(atts_star, keepdims=True) + 1

In fact, the model works as long as I use pm.Poisson specification and not the DensityDist() specification, but they really should be equivalent.

I replicated the error in the most minimal settings. I am only modeling the home-team goals as a standard Poisson GLM where the only coefficients are an intercept and an indicator term for each team (six in total):

def logp(goal,  mu):
    base = pm.Poisson.dist(mu=mu).logp(goal)
    return base.sum()


with pm.Model() as model:
    # global model parameters
    intercept = pm.Normal('alpha', mu=0, sigma=10)
    # team-specific model parameters
    atts = pm.Normal("atts_star", mu=0, sigma=10, shape=num_teams)
    # likelihood of observed data
    mu_home = tt.exp(intercept + atts[home_team])
    # Original likelihood
    # home_points = pm.Poisson("home_points", mu=mu_home, observed=observed_home_goals)
    # Rewritten
    res_home = pm.DensityDist('res_home', logp=logp, observed={'goal': observed_home_goals,   'mu': mu_home})
    trace = pm.sample(4000, chains=3, tune=1000, return_inferencedata=True, target_accept=0.85, random_seed=123456)

Even this very minimal model gives the same error:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (3 chains in 4 jobs)
NUTS: [atts_star, alpha]
█Sampling 3 chains for 1_000 tune and 4_000 draw iterations (3_000 + 12_000 draws total) took 37 seconds.
Traceback (most recent call last):
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3427, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-30-6c5c56840078>", line 41, in <module>
    trace = pm.sample(4000, chains=3, tune=1000, return_inferencedata=True, target_accept=0.85, random_seed=123456)
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/pymc3/sampling.py", line 639, in sample
    idata = arviz.from_pymc3(trace, **ikwargs)
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/arviz/data/io_pymc3.py", line 563, in from_pymc3
    return PyMC3Converter(
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/arviz/data/io_pymc3.py", line 171, in __init__
    self.observations, self.multi_observations = self.find_observations()
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/arviz/data/io_pymc3.py", line 184, in find_observations
    multi_observations[key] = val.eval() if hasattr(val, "eval") else val
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/theano/graph/basic.py", line 554, in eval
    self._fn_cache[inputs] = theano.function(inputs, self)
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/theano/compile/function/__init__.py", line 337, in function
    fn = pfunc(
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/theano/compile/function/pfunc.py", line 524, in pfunc
    return orig_function(
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/theano/compile/function/types.py", line 1970, in orig_function
    m = Maker(
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/theano/compile/function/types.py", line 1584, in __init__
    fgraph, additional_outputs = std_fgraph(inputs, outputs, accept_inplace)
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/theano/compile/function/types.py", line 188, in std_fgraph
    fgraph = FunctionGraph(orig_inputs, orig_outputs, update_mapping=update_mapping)
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/theano/graph/fg.py", line 162, in __init__
    self.import_var(output, reason="init")
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/theano/graph/fg.py", line 330, in import_var
    self.import_node(var.owner, reason=reason)
  File "/home/non1987/anaconda3/envs/football_analytics/lib/python3.8/site-packages/theano/graph/fg.py", line 383, in import_node
    raise MissingInputError(error_msg, variable=var)
theano.graph.fg.MissingInputError: Input 0 of the graph (indices start from 0), used to compute AdvancedSubtensor1(atts_star, TensorConstant{[0 1 2 2 3..4 5 3 1 2]}), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.

You need to use idata_kwargs={"density_dist_obs": False}. For some reference, see az.from_pymc3 docs and this (:warning: closed! :warning: ) PR [WIP] Add givens argument to DensityDist by OriolAbril · Pull Request #4433 · pymc-devs/pymc3 · GitHub.

As for specifics of rugby analytics, you may be interested in taking a look at this notebook which implements several parametrizations of the same model from a predictive task point of view, one of them is a densitydist similar to what you are trying to do I believe. The main difference is that it does not use variables of the posterior as observed kwargs.

1 Like

Yes! I would have never guessed that, since I am not at all practical with Arziv. @OriolAbril thanks (again) for your help and kindness, as well as for the references to the documentation/issues/notebook.

A little notice about this debugging experience. In the PR you linked, there is some talk about “a brief notebook that shows the different compatible ways of using the DensityDist”. I would really have loved such a reference, because I could not find a single example showing what I needed to do. Actually, I found the use of DensityDist() in Probability Distributions in PyMC3 — PyMC3 3.11.2 documentation beyond confusing (how is \lambda passed?). Moreover, for a reason I do not understand, the model written in the API documentation on DensityDist (Distribution utility classes and functions — PyMC3 3.11.2 documentation) does not need theidata_kwargs={"density_dist_obs": False} for sampling. This is very confusing for a novice and a simple walk-through about DensityDist would really help. May I suggest that the use of idata_kwargs should be mentioned in the documentation?

That said, thank you again for your help.

As you’ll have seen in the issue, there has been many people confused by that, however, we have not taken any definitive steps (as you’ll have seen we have closed a pr that would have fixed this) because everything should be fixed from the root with the next major version which will be released soon, see the timeline.

I joined the project long after DensityDist was introduced, so I don’t really know, but to me it looks like this is a bug that over time has become a feature. If you try to pass mu_home as the observed kwarg of any other distribution, you’ll get an error at model creation, even before starting to sample, because all the other distributions check the object they receive as observed to make sure it’s not another variable in the model. And in fact, DensityDist does too when it gets a single input. This coherence check is only skipped when observed is a dict instance, somehow sampling works but then when converting the result to inferencedata, where we process the posterior, sample stats and observed data, there are “observed” variables that are not observed, they are posterior variables and everything crashes. We added this argument to the converter to indicate we want to avoid processing the observed argument of DensityDist and I have been announcing it here on Discourse but it never made the docs.

You’ll find plenty of examples of DensityDist that work perfectly without setting this to False because they don’t pass posterior variables as part of the observed kwarg dict.

Thanks for the explanation. Everything makes more sense now.
Indeed, it took me a while to figure out how to pass the mu_home parameter, since it is not truly observed. I figure it out based on this topic DensityDist from scipy.stats distribution - #3 by balarsen . However, in the very same topic, the DensityDist is sampled without passing any extra argument to the sampler. I guess that was correct when the topic was written, but the same code would not work now, if I understand correctly. This can be very odd for someone who does not know how PyMC3 is evolving. So, I reiterate that it would be nice to mention this issue with sampling in the documentation – and to have an up-to-date working notebook on the different uses of DensityDist. Anyway, thank you again for your availability and help.

A few months ago I would have encouraged you to submit a pull request to fix this, or even done one myself, but I first linked to the timeline because PyMC3 3.x will stay like this, bugs, limitations and all. It will only receive critical bug fixes for a short while until PyMC3>=4.0 has had some real life testing. We are no longer accepting PRs to master, only to v4 and are focusing ll our efforts on the next major release where we can fix all these from the initial design.

We are updating the example notebooks though, here you can contribute, Using a “black box” likelihood function — PyMC3 3.11.2 documentation for example uses DensityDists and has been broken for a while. Here is the relevant issue to fix this: Blackbox likelihood · Issue #48 · pymc-devs/pymc-examples · GitHub

2 Likes

Nice one, thanks Oriol!