Blackbox version of ordered logit

Has anyone seen / written a “blackbox” i.e. self-written version of an ordered logit (or ordered response) model?
i.e. like Using a “black box” likelihood function (numpy) — PyMC example gallery but for ordered logit.

This would be very helpful to get me started with my project, but looks like it will take me plenty of stumbling around for a long while without help.

Thank you!
a newbie

Why do you need a blackbox?

My model is a peculiar discrete distribution and I need to build it up by individual probability of each discrete value. Part of it looks similar to an ordered logit, so that would be a good template to start with.

Have you seen Nathaniel Forde’s case study on ordered categorical models? If it’s not exactly what you are looking for, it might still a framework to make the stumbling a bit easier.

1 Like

Yes, I’ve seen it, thanks. It uses the built-in distributions, no insight for me to build something custom.

That doesn’t explain why a blackbox is needed.

It’s very rare for PyMC nodels to require a blackbox, usually you can write everything more easily with pymc primitives, so that’s why I am asking.

Usually you use a blackbox when there is some complex operation that can’t be expressed in PyMC (almost anything you can write in numpy you can write in PyMC) or very expensive that uses some optimized routine written in another framework.

If you really feel you need a blackbox I think you’ll need to share more details about what parts are you struggling to understand.

Note that there are two other spinoffs of the tutorial focused in JAX. They may give you further clues about how this is done:

1 Like

Continued thanks!
That also looks very complicated. I can write my model down in a couple lines of mathematical notation, yet I’ve still no idea how to even start coding it in PyMC.
Maybe, then, if someone had written an ordered logit implementation without using the pm.OrderedLogistic built-in, that could also be a useful template for me to see how “almost anything you can write in numpy you can write in PyMC” might work.

1 Like

Right. So maybe have a look at the source code of OrderedLogistic. You’ll see it just creates a Categorical distribution with a specific probability vector p. That’s just there for user convenience, but you could have done it yourself. And it’s only PyMC/PyTensor basic operations:

You could have computed those p and then just pass them to an observed pm.Categorical to define your likelihood.

1 Like

Ooooh! :slight_smile: That does look readable!!
Thank you. I’ll chew on this for a bit!

Hi all

Ironically I’ve been working on something like this just today, just to build some intuition around ordinal regressions again as a refresher. Like @ceebeelee I am trying to build up the model from bare-bones without relying on OrderedLogistic. I’m following the model McElreath describes in Statistical Rethinking 2nd ed, page 385, using the ordinal data from Kruschke’s DBDA2 ('OrdinalProbitData-1grp-1'). To that data I’ve just added a column of 1’s for the intercept.

I think I have this right, prior predictive has the correct dimensions as far as I can see, but when sampling I encounter a Rewrite failure due to: local_IncSubtensor_serialize AssertionError. I am a bit lost as to how to solve this one! Here is my code - I know this is a bit inefficient but just helps to cement knowledge for now:

c = {'K': range(d['y'].nunique() - 1),
     'Kt': range(d['y'].nunique()),
     'N': range(len(d))

with pm.Model(coords=c) as ordinal:
    # Priors for cutpoints
    cut = pm.Normal('cut',
                    mu=np.linspace(0, 5, num=len(c['K'])),
    # Terms of the linear model
    β = pm.Normal('β', mu=0, sigma=1)
    # linear predictor
    η = pm.Deterministic('η', β * d['intercept'].values, dims='N') # just the column of 1's I added
    η1 = pm.Deterministic('η1', cut - η[:, None], dims=('N', 'K'))
    Η = pm.Deterministic('Η', pm.math.invlogit(η1), dims=('N', 'K'))
    # Calculate the category probabilities from this
    p = pt.concatenate(
            1 - Η[:, [0]],
            Η[:, :-1] - Η[:, 1:],
            Η[:, [-1]]
    # Likelihood
    pm.Categorical('y', p=p, observed=d['Y'].values - 1, dims='N')
    # inferencebutton
    idata = pm.sample()

I’ve tried compiling this with pymc.sampling_jax.sample_numpyro_nuts, and I get an incompatible shape for broadcasting error - one tensor has dims (1, 6) and requested shape is (100, 6). I think the problem seems to stem from the concatenation line where I compute the category probabilities directly, but I’ve used this in some Item Response Theory models which have basically the same logic as ordinal regression (graded response model), and it works just fine! Any tips really welcome here, I’m not sure if this is just me missing something.

Edit: I am on PyMC 5.6.

1 Like

I fixed it myself with a code rewrite, borrowing some from the docs of OrderedLogistic. The following samples just fine, though I did notice in my obsessive checking that univariate_ordered didn’t actually order the cutpoints, but multivariate_ordered did. The following intercept-only model seems to work fine.

c = {'K': range(d['y'].nunique() - 1),
     'Kt': range(d['y'].nunique()),
     'N': range(len(d))

with pm.Model(coords=c) as ordinal:
    # Priors for cutpoints
    cut = pm.Normal('cut',
                    mu=np.linspace(-3, 3, num=len(c['K'])),
    # Prior for intercept
    β0 = pm.Normal('β0', mu=0, sigma=1)
    # Linear predictor
    η = pm.Deterministic('η', pm.invlogit(cut - (β0 * d['intercept'].to_numpy()[:, None])), dims=('N', 'K'))
    # Get probabilities now via subtraction
    pa = pt.concatenate(
            pt.zeros_like(η[:, [0]]),
            pt.ones_like(η[:, [0]])
    p = pm.Deterministic('p', 
                         pa[..., 1:] - pa[..., :-1], 
                         dims=('N', 'Kt'))
    # Likelihood
    pm.Categorical('y', p=p, observed=d['y'].to_numpy())

Ordinal regression is destined to be my nemesis.


Hi @alj I reproducibly get the following, using your code:

SamplingError: Initial evaluation of model at starting point failed!                                             
Starting values:                                                                                                 
{'cut_ordered__': array([-2.9111608 ,  0.40300104,  0.31628445, -0.30367939,  0.51787646,                        
        0.2265195 ]), 'β0': array(-0.42200085)}                                                                  
Logp initial evaluation results:                                                                                 
{'cut': 0.85, 'β0': -1.01, 'y': -inf}     

I think I might know what causes that and my fault for not being clear - the Categorical distribution at the heart of all of this requires the observed data to be 0, 1, … K. In the above data I subtracted 1 from the observed responses and put it in the dataframe as ‘y’. Force of habit at this point! Try subtracting one from the data that go into observed.

Thanks! So using the data file I found online, I would start your code with:

 # Load data
    df = pd.read_csv('OrdinalProbitData-1grp-1.csv').rename(columns={'Y': 'y'})-1
    df['intercept'] = 1

Right, that would work! Mine went something like

df = (pd.read_csv('OrdinalProbitData-1grp-1.csv')
      .assign(y=lambda x: x['Y'] - 1, intercept=1)

I’m surprised you have an intercept and a full set of cut points. Isn’t the intercept redundant with one of your cut points in this formulation?

I’m trying to generalize your code to more than one predictive variable dimension.

You’re right here that the intercept isn’t necessary. Its more of a placeholder for the terms of a linear model that I stuck in for mental clarity. McElreath’s formulation is to use the inverse logit, estimate all the K-1 cutpoints, and put in the linear predictor coefficients.

So in my example above you could expand the deterministic eta line with terms:

η = pm.Deterministic('η', pm.invlogit(cut - β1*x1 + β2*x2))

Kruschke on the other hand fixes two cut points (first and last) to constants on the scale of the responses, and estimates the others. The sigma of the latent distribution is also estimated and the mean is the linear combination. I’m trying to get thoughts clear on all of this and will try to do a blog post or similar soon.

I wrote a pretty long post here about ordinal regression models in PyMC. Its got a lot of background which might be basic but hopefully somtething of use in there. Any comments appreciated!


@alj Thanks for sharing the post. You might have a confusion about the value of K in the very first section (e.g. “(e.g. 0, 1, 2, 3, 4, K being 4)”).

@ricardoV94 Thank you very much for the help. I certainly didn’t need a blackbox for my model. Just pm.Categorical and some help from the source code for OrderedLogit !