# Sampling a hierarchical ordered logistic regression

I am sampling a hierarchical ordered logistic regression. Each observation has a response `y` in 0-11, and reflects some covariates and a geographic variable, state. The coefficient for each state is represented by the `"a"` variable.

I have a few concerns and questions. I’ve read a bunch of the pymc3 docs and forum posts, but I’m not sure how to address these issues.

Here is the code and diagram.

``````with pymc3.Model() as model:
# State level priors.
# Mean across states.
mu_a = pymc3.Normal('mu_a', mu=8, sigma=2)
# Variation across states.
sigma_a = pymc3.HalfCauchy('sigma_a', beta=1)

# State-level intercept.
a = pymc3.Normal('a', mu=mu_a, sigma=sigma_a, shape=num_states)

# Covariates.
beta = pymc3.Normal('beta', mu=0, sd=2, shape=x.shape-num_states)

# Combine the state level parameters with the covariates.
mu =  pymc3.math.dot(x[:, :-num_states], beta) + pymc3.math.dot(x[:, -num_states:], a)

# Apply the invlogit transformation.
theta = 1 / (1 + pymc3.math.exp(-mu))

# Prior on cutpoints, one between each consecutive outcome option.
cutpoints_prior = [b+.5 for b in sorted(set(y))[:-1]]
# Cutpoints are ordered from least to greatest.
cutpoints = pymc3.Normal("cutpoints",
mu=cutpoints_prior,
sd=np.array([0.1 for _ in cutpoints_prior]),
shape=len(cutpoints_prior),
transform=pymc3.distributions.transforms.ordered)

y_ = pymc3.OrderedLogistic("y", cutpoints=cutpoints, eta=theta, shape=x.shape, observed=y)
``````

## The test point has a very extreme `y` value.

The `y` value is extreme. Is there any reason to worry about that or try to get it closer to 0?

``````model.check_test_point()
mu_a                         -1.61
sigma_a_log__                -1.14
a                           -46.87
beta                         -4.84
cutpoints_ordered__          13.84
y                     -70721145.93
Name: Log-probability of test_point, dtype: float64
``````

## The prior-predictive `y` values are not close to `8`.

``````>>> prior_predictive['y'].mean(axis=0).mean()
1.2820398066533978
``````

I would’ve expected it to be around `8.0` as specified in the `mu_a` prior. Mostly the values are close to `1.0`. I guess that means the prior predictive values have no state (and so they don’t get the boost of the `a` coefficient). But that’s an unrealistic observation – every real observation is from a real state. Is there a way to get prior predictions in light of that information?

## Sampling takes a long time.

``````%%time

with model:
trace = pymc3.sample(2000, tune=2000, step=pymc3.NUTS())
``````

Multiprocess sampling (4 chains in 4 jobs)
NUTS: [cutpoints, beta, a, sigma_a, mu_a]
Sampling 4 chains: 1%|▏ | 216/16000 [03:31<4:20:46, 1.01draws/s]

It can go for four hours or more. Is this to be expected? It’s difficult to be efficient when I have to wait 4 hours between attempts.

I tried a shorter run, but the chains didn’t mix well. A selection from the trace plot:

## ADVI raises a `FloatingPointError`

The discussions I’ve read on the forum and blogs mention that ADVI can be faster, so I tried it but it raised an exception.

``````with model:

---------------------------------------------------------------------------
FloatingPointError                        Traceback (most recent call last)
<timed exec> in <module>

~/tools/miniconda/envs/project/lib/python3.7/site-packages/pymc3/variational/inference.py in fit(n, local_rv, method, model, random_seed, start, inf_kwargs, **kwargs)
788                         'or Inference instance' %
789                         set(_select.keys()))
--> 790     return inference.fit(n, **kwargs)

~/tools/miniconda/envs/project/lib/python3.7/site-packages/pymc3/variational/inference.py in fit(self, n, score, callbacks, progressbar, **kwargs)
132         with tqdm.trange(n, disable=not progressbar) as progress:
133             if score:
--> 134                 state = self._iterate_with_loss(0, n, step_func, progress, callbacks)
135             else:
136                 state = self._iterate_without_loss(0, n, step_func, progress, callbacks)

~/tools/miniconda/envs/project/lib/python3.7/site-packages/pymc3/variational/inference.py in _iterate_with_loss(self, s, n, step_func, progress, callbacks)
216                     except IndexError:
217                         pass
--> 218                     raise FloatingPointError('\n'.join(errmsg))
219                 scores[i] = e
220                 if i % 10 == 0:
FloatingPointError: NaN occurred in optimization.
The current approximation of RV `a`.ravel() is NaN.
The current approximation of RV `a`.ravel() is NaN.
...
``````

## Question

Is it possible to try more things out without waiting many hours between attempts? How would you suggest I proceed? Thank you.

1 Like

Hi Mina,
I don’t know your use case and data, and I’m not very experienced with ordered logistic regression, but here are some pointers:

• Have you tried with different priors to see if this affects your prior predictive checks?
• Sampling and ADVI issue: have you standardized your predictors? Predictors that are on a very large scale often mess up MCMC sampling.
• You could find this notebook useful – it’s an adaptation to PyMC3 of McElreath’s Rethinking chapter 11 (first edition).

Sorry I can’t give you a more detailed answer, but I hope this already helps you Hi AlexAndorra,

• Have you tried with different priors to see if this affects your prior predictive checks?

I tried adjusting the priors a bit but didn’t see any notable differences.

• Sampling and ADVI issue: have you standardized your predictors? Predictors that are on a very large scale often mess up MCMC sampling.

The predictors have values between 0 and 10 so I didn’t think those scale issues would apply here.

• You could find this notebook useful – it’s an adaptation to PyMC3 of McElreath’s Rethinking chapter 11 (first edition).

Thanks. I think my code is similar to that code, but the hierarchical aspect in my model does complicate things.

I tried using an OLS instead of the ordered logit and that seemed to work better – at least it converges. I can stick with OLS for now, but if anybody has suggestions on how to make the ordered logit work I remain interested.

I realise I’m two years too late, but I’m posting more for visibility as I was running into issues myself and I’ve managed to get a hierarchical ordered logistic regression working and sampling in PyMC (PyMC 4.0.0b1), so it may be helpful for others. I’ve been working through a problem in the book noted above, chapter 12 problem 12H2. Which asks the reader to fit a hierarchical ordered logistic regression model onto the Trolley data set in the book. See code below:

``````d = pd.read_csv('Data/Trolley.csv', delimiter=';')
d['id_factorised'] = pd.factorize(d.id)
action = shared(d['action'].values)
intention = shared(d['intention'].values)
contact = shared(d['contact'].values)
N_ids = d['id'].unique().shape

with pm.Model() as m_12h2_null:
a = pm.Normal(
'a', 0., 10.,
transform=pm.distributions.transforms.ordered,
shape=6, testval=np.arange(6) - 2.5)

resp_obs = pm.OrderedLogistic(
'resp_obs', 0., a,
observed=d.response.values - 1
)
trace_m12h2_null = pm.sample(2000)

with pm.Model() as m_12h2_flexible:
sigma = pm.HalfCauchy('sigma', 1.)
a_id = pm.Normal('a_id', 0, sigma, shape=N_ids)

a = pm.Normal(
'a', 0., 10.,
transform=pm.distributions.transforms.ordered,
shape=6,
initval=trace_m12h2_null['posterior']['a'].mean(dim=['chain', 'draw'])
)

bA = pm.Normal('bA', 0., 10.)
bI = pm.Normal('bI', 0., 10.)
bC = pm.Normal('bC', 0., 10.)
phi  = a_id[d.id_factorised.values] + bA * action + bI * intention + bC * contact

resp_obs = pm.OrderedLogistic(
'resp_obs', phi, a,
observed=d.response - 1
)
trace_m12h2_flexible = pm.sample(4000, tune=1500)
``````

The key thing to get it sampling was to set a sensible `initval` parameter in the prior for `a`. This was done by fitting a null model onto the data and using the trace of `a` from that null model as an initial value for `initval`. Sampling takes about 9 mins for this set-up and I think the model probably needs to be tweaked to improve sampling effeciency, but otherwise this will hopefully be a good start for anyone wanting to try and fit one of these models.