Struggling with ordinal data and `OrdinalProbit`

Hey @drbenvincent no problem. I’ve been working on this on and off, and not getting too much further along!

My main confusion stems from what appears to be a reasonable model causing massive divergences. I modified the above model a little to try to approximate the Kruschke version, with an ordered-normal prior on the cutpoints and pinning the left and right extreme cuts:

with pm.Model() as try_again:
    
    # Priors for mu and sigma
    μ = pm.Normal('μ', mu=3, sigma=3)
    σ = pm.HalfNormal('σ', sigma=3)
    
    # Cutpoints
    cutpts = at.concatenate([
        np.ones(1)*.5,
        pm.Normal('theta', mu=[1, 2, 3, 4], sigma=3, shape=4,
                  transform=pm.distributions.transforms.ordered),
        np.ones(1)*5.5
    ])
    
    cuts = pm.Deterministic('cuts', cutpts)
    
    # Ordered
    pm.OrderedProbit('ll', eta=μ, sigma=σ, cutpoints=cuts, compute_p=False, observed=df.Y.values-1)
    
    trace = pm.sample(tune=2000)

This samples, seems to converge, but has lots of divergences and again does not retrieve the mean of 1.

You mention on the discussion that the min and max arguments can perhaps shift the bounds of the constrained uniform - I will try that.

I guess my question at this point is more about the practicalities of this, just trying to grasp this stuff. The constrained uniform approach explicitly sets the first cutpoint to be zero, and then implicitly the final cutpoint is one. The cumulative sum guarantees the ordering of each cutpoint - but why does this sample so well compared to a transformed normal version? The cumulative sum on the Dirichlet is very neat but unintuitive to me. I wondered too about the “dims” argument in the call to pm.Deterministic - it seems essential but I can’t figure out why the need to do that.

Sorry for the mass of comments here and thank you!