Initial evaluation failed in OrderedLogisticRegression and model.debug() exceptions

I’m confident that this has to be an error in my understanding of what I’m doing, but I’m struggling to even figure out where to go to solve what I believe to be the issue.
I started with this model:

n1_c = 300; n2_c = 300; n3_c = 300
cluster1 = np.random.randn(n1_c) + -1
cluster2 = np.random.randn(n2_c) + 0
cluster3 = np.random.randn(n3_c) + 2

x = np.concatenate((cluster1, cluster2, cluster3))
y = np.concatenate((1*np.ones(n1_c),
                    2*np.ones(n2_c),
                    3*np.ones(n3_c))) - 1

with pm.Model() as model:
    cutpoints = pm.Normal("cutpoints", mu=[-1,1], sigma=10, shape=2,
                          transform=pm.distributions.transforms.ordered)
    y_ = pm.OrderedLogistic("y", cutpoints=cutpoints, eta=x, observed=y)
    idata = pm.sample()

Taken from the documentation of the OrderedLogistic distribution. This runs as expected. I take this and adapt it to my problem with random data:

import numpy as np
import pymc as pm
from scipy.stats import randint

points = randint.rvs(0, 82, size=100)
hfa = randint.rvs(0, 2, size=100)

all_possible_points = np.arange(0,82)
cutpoints_init = np.linspace(all_possible_points.min(), all_possible_points.max(), len(all_possible_points) - 1)

with pm.Model() as model:
    cutpoints = pm.Normal("cutpoints", mu=10, sigma=1, shape=len(all_possible_points) - 1, initval=cutpoints_init)
    #    cutpoints = pm.Normal("cutpoints", mu=0, sigma=1, shape=len(all_possible_points) - 1,
    #     transform=pm.distributions.transforms.ordered, initval=cutpoints_init)
    hfa_effect = pm.Normal('hfa_effect', mu=0, sigma=1)
    alpha = pm.Normal('alpha', mu=25, sigma=14)

    mu = alpha + hfa_effect * hfa

    points_lik = pm.OrderedLogistic('points_lik ', cutpoints=cutpoints, eta=mu, observed=points)
    
    trace = pm.sample()

With or without the initval I get the failure. As far as I can tell, the models are the same, except for the fact that when I run cutpoints.eval() to get a random sample of the distribution the values in the array are unordered unexpectedly whereas when I run the same after the first model the values are ordered. model.debug tells me that the values are associated with non-finite logp, but it’s not clear why and then the debug throws an exception:
IndexError Traceback (most recent call last)
in <cell line: 1>()
----> 1 model.debug()

/usr/local/lib/python3.10/dist-packages/pymc/model/core.py in debug(self, point, fn, verbose)
1824 )
1825 mask = ~np.isfinite(rv_fn_eval)
→ 1826 for value, fn_eval in zip(values[mask], rv_fn_eval[mask]):
1827 print_(f" value = {value} → {fn} = {fn_eval}")
1828 print_()

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

In a couple final attempts, I attempted to change up the distribution based on examples I’ve seen in the gallery and in other posts in this forum without success.

with pm.Model() as model:
    cutpoints = pm.Dirichlet("cutpoints", a=np.ones(len(all_possible_points) - 1)/81, shape=(len(all_possible_points) - 1))

    hfa_effect = pm.Normal('hfa_effect', mu=0, sigma=1)
    alpha = pm.Normal('alpha', mu=0, sigma=1)

    mu = alpha + hfa_effect * hfa

    points_lik = pm.OrderedLogistic('points', cutpoints=cutpoints, eta=mu, observed=points)
    
    trace = pm.sample()

and


def constrainedUniform(N, min=0, max=1):
    return pm.Deterministic(
        "cutpoints",
        pt.concatenate(
            [
                np.ones(1) * min,
                pt.extra_ops.cumsum(pm.Dirichlet("cuts_unknown", a=np.ones(N - 2))) * (max - min)
                + min,
            ]
        ),
    )

K = len(all_possible_points)

with pm.Model() as model:
    cutpoints = constrainedUniform(K, 0, K)

    hfa_effect = pm.Normal('hfa_effect', mu=0, sigma=1)
    alpha = pm.Normal('alpha', mu=0, sigma=1)

    mu = alpha + hfa_effect * hfa

    points_lik = pm.OrderedLogistic('points', cutpoints=cutpoints, eta=mu, observed=points)
    
    trace = pm.sample()

And at this point I’ve admitted defeat. Could anyone help point me in the right direction as to where I’m going wrong?

In your first examples, cutpoints must be ordered. If you use the ordered transform this will happen during mcmc sampling and should be fine although you need to provide an initval.

However if you call .eval you will not see ordered values because the transform only plays a role during mcmc sampling, it’s not part of the generative random graph that you are accessing when you call .eval().

These pages give more details on transforms: Transformations — PyMC dev documentation

Linked issue about a proposed alternative to ordered transforms: Implement `Ordered` distribution factory · Issue #7297 · pymc-devs/pymc · GitHub

For why the constrained uniform or other attempts are failing I’ll have to take a look. Just wanted to clear perhaps a source of confusion

I continued to dig in and did a bit more research, my thinking is that it has something to do with the fact that 81 partitions is producing near 0 values for some of the cutpoints and there’s some kind of rounding issues?

Even declaring cutpoints directly with
cutpoints = np.linspace(all_possible_points.min(), all_possible_points.max(), K - 1) seemed to be causing the same issues. Model debug ran this time however and indicates that eta receives these values:
0: [1.38879439e-11 2.43382550e-11 6.69904046e-11 1.84389320e-10
5.07526737e-10 1.39695394e-09 3.84507884e-09 1.05834779e-08
2.91307423e-08 8.01815905e-08 2.20697645e-07 6.07463939e-07
1.67202465e-06 4.60217483e-06 1.26671460e-05 3.48643278e-05
9.59505883e-05 2.64006391e-04 7.25951142e-04 1.99272674e-03
5.44409078e-03 1.46821817e-02 3.82545146e-02 9.13255537e-02
1.78972492e-01 2.45683137e-01 2.12517008e-01 1.21923146e-01
5.41693168e-02 2.13091105e-02 7.97648150e-03 2.92994100e-03
1.06875182e-03 3.88854688e-04 1.41349579e-04 5.13635535e-05
1.86621774e-05 6.78032013e-06 2.46337796e-06 8.94971791e-07
3.25152213e-07 1.18130963e-07 4.29181162e-08 1.55925632e-08
5.66492753e-09 2.05812234e-09 7.47735429e-10 2.71659584e-10
9.86966064e-11 3.58573171e-11 1.30273570e-11 4.73288075e-12
1.71951342e-12 6.24833518e-13 2.26929586e-13 8.23785484e-14
2.99760217e-14 1.08801856e-14 3.99680289e-15 1.33226763e-15
6.66133815e-16 2.22044605e-16 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00],

Simplifying the model even further for debugging:

import numpy as np
import pymc as pm
from scipy.stats import randint

points = randint.rvs(0, 82, size=100)
hfa = randint.rvs(0, 2, size=100)

all_possible_points = np.arange(0,82)

K = len(all_possible_points)

with pm.Model() as model:
  cutpoints = np.linspace(all_possible_points.min(), all_possible_points.max(), K - 1)

  alpha = pm.Normal('alpha', mu=50, sigma=20)

  mu = alpha

  points_lik = pm.OrderedLogistic('points', cutpoints=cutpoints, eta=mu, observed=points)
    
  trace = pm.sample()

Ensuring that cutpoints after alpha are centered in the ranges and the model finally sampled, albeit with divergences. I don’t know where this leaves me though. When I iterate and make the model more appropriate again do I just have to be aware of the fact that I have to structure my priors to capture more of the values possible in points?

You seem to have many cutpoints, and fixing them? Which of those were just for debugging? Asking because those are somewhat odd choices

Sorry I should’ve started with my overall problem statement.

I’m trying to model the points scored in an American football game. Because these points are scored only in increments of 3 and 7 typically (6, 8 and 2 occasionally, but not very frequently), scoring tends to group around certain numbers far more frequently making Poisson or Negative Binomial regressions a poor choice as I understand. To capture these traits of the data, I chose an Ordered Logistic as my likelihood with 81 cutpoints to start due to 81 being the most points ever scored (inclusive of 0).

That’s interesting.

Okay, so the first thing is these cutpoints are not on the outcome scale, they are on the logistic scale, so setting them linearly apart may be bad.

Another thing is you may want to define the model on different quantities, for instance total points and ratio of home/away.

And even ignoring the clumping may be fine. Depends on your use case, if it’s to predict actual outcomes it’s not great, but if you are interested in latent quantities it may be fine.

My use case is just gaining practice playing with Pymc. I’ve previously in the course of this project modeled my outcomes as both Poisson and NegativeBinomials even modeling counts of the different types of scoring events. This just presented an opportunity to play with an outcome distribution I didn’t get to work with very much and a unique approach I wanted to try.

In terms of the linear approach to the cutpoints, I really was just doing what I could to get the model to sample to see if I could gain some understanding about how the model was working. I wanted to circle back to using a different distribution, I just couldn’t get either of them to sample or the debug function to shine light on what was going wrong.

1 Like

It may help if you describe the latent paramerers you’re interested in as well.

I was working on a similar model last year with some friends (I’m very fun at parties). If you want to build the scoragami structure into your model, it might make more sense to try to model a game as a collection of plays,in which each play ends in a discrete number of points (maybe just 0, 3, or 7 to start). The outcome of each play could then depend on latent offense/defense parameters for a given match-up. I guess the number of plays in a game would also be a random variable, but as a first pass I’d set it to the average.

I understand your objective here is entirely pedagogical, but maybe there’s a “meta point” to take the generative process as seriously as possible when modeling? The observed frequency distribution of points doesn’t arise ex nihilo – you gain a lot by trying to describe the generative process.

On this note, if you haven’t seen this excellent blog post by @DanWeitzenfeld, it might be interesting to you. It quite out of date with respect to PyMC syntax, but the model is still very cool.

3 Likes

I’m more interested in getting the OrderedLogistic functioning and evaluating how much better this model works vs the other models I’ve trained than inference just yet. FWIW here’s a snippet from the Poisson model I was toying with just as an example:

defense_idx, defense = feat_df['defense_team_id'].factorize()
coords['defense'] = defense

with pm.Model(coords=coords) as offense_defense_partial_pooling:
  #priors
  hfa_effect = pm.Normal('hfa_effect', sigma=.1)

  offense_mu = pm.Normal('offense_mu', mu=0, sigma=.5)
  offense_sigma = pm.HalfNormal('offense_sigma', sigma=.1)
  offense_offset = pm.Normal('offense_offset', mu=0, sigma=.5, dims='offense')
  offense_effect = pm.Deterministic('offense_effect', offense_mu + offense_sigma * offense_offset, dims='offense')

  defense_mu = pm.Normal('defense_mu', mu=0, sigma=.5)
  defense_sigma = pm.HalfNormal('defense_sigma', sigma=.1)
  defense_offset = pm.Normal('defense_offset', mu=0, sigma=.5, dims='defense')
  defense_effect = pm.Deterministic('defense_effect', defense_mu + defense_sigma * defense_offset, dims='defense')

  #likelihood
  mu = offense_effect[offense_idx] - defense_effect[defense_idx] + hfa_effect * feat_df['home_ind']
  points_rate = pm.Deterministic('points_rate', pm.math.exp(mu), dims='game_id')
  points = pm.Poisson('points', mu=points_rate, observed=feat_df['points'], dims='game_id')

pm.model_to_graphviz(offense_defense_partial_pooling)

I would probably end up running the same kind of heirarchical model against the OrderedLogistic likelihood and then I can look at all kinds of things like the impact of distance traveled to away games on offensive/defensive ability, coach effects, FA spending etc. Treating the offensive and defensive effects as things to estimate based on previous seasons and offseason turnover. Just playing with different questions. Is it safe to assume that it’s probably going to be infeasible to approach this problem the way I was originally posting about?

That is a very cool blog post. My work has been more around CFB, but I would imagine it would hold up still. I’m feeling a bit inspired to try to recreate it with some interesting things to work out around levels of competition.

3 Likes