Multivariate ordered logistic regression model won't run - how can I fix it?


#1

I’m trying to build a Bayesian multivariate ordered logit model using PyMC3. I have gotten a toy multivariate logit model working based on the examples in this book. I’ve also gotten an ordered logistic regression model running based on the example at the bottom of this page.

However, I cannot get an ordered, multivariate logistic regression to run.

Here’s my code:

Data prep for MWE:

import pymc3 as pm
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris(return_X_y=False)
iris = pd.DataFrame(data=np.c_[iris['data'], iris['target']],
                     columns=iris['feature_names'] + ['target'])

iris = iris.rename(index=str, columns={'sepal length (cm)': 'sepal_length', 'sepal width (cm)': 'sepal_width', 'target': 'species'})

Here is a working multivariate (binary) logit:

df = iris.loc[iris['species'].isin([0, 1])]
y = pd.Categorical(df['species']).codes
x = df[['sepal_length', 'sepal_width']].values

with pm.Model() as model_1:
      alpha = pm.Normal('alpha', mu=0, sd=10)
      beta = pm.Normal('beta', mu=0, sd=2, shape=x.shape[1])
      mu = alpha + pm.math.dot(x, beta)
      theta = 1 / (1 + pm.math.exp(-mu))
      y_ = pm.Bernoulli('yl', p=theta, observed=y)
      trace_1 = pm.sample(5000)

Here is a working ordered logit (with one independent variable):

x = iris['sepal_length'].values
y = pd.Categorical(iris['species']).codes

with pm.Model() as model:
    cutpoints = pm.Normal("cutpoints", mu=[-2,2], sd=10, shape=2,
                          transform=pm.distributions.transforms.ordered)

    y_ = pm.OrderedLogistic("y", cutpoints=cutpoints, eta=x, observed=y)
    tr = pm.sample(1000)

Here is my attempt at a multivariate ordered logit, which breaks:

x = iris[['sepal_length', 'sepal_width']].values
y = pd.Categorical(iris['species']).codes

with pm.Model() as model:
    cutpoints = pm.Normal("cutpoints", mu=[-2,2], sd=10, shape=2,
                          transform=pm.distributions.transforms.ordered)

    y_ = pm.OrderedLogistic("y", cutpoints=cutpoints, eta=x, observed=y)
    tr = pm.sample(1000)

The error I get is: “ValueError: all the input array dimensions except for the concatenation axis must match exactly.”

This suggests it’s a data problem (x, y), but the data looks the same as it does for the multivariate logit, which works.

How can I fix the ordered multivariate logit so it will run?


#2

That’s not how you should use it, the ordered logistic is done on the linear prediction. Something like:

x = iris[['sepal_length', 'sepal_width']].values
y = pd.Categorical(iris['species']).codes
with pm.Model() as model_1:
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=2, shape=x.shape[1])
    mu = alpha + pm.math.dot(x, beta)
    theta = 1 / (1 + pm.math.exp(-mu))
    cutpoints = pm.Normal("cutpoints", mu=[-2,2], sd=10, shape=2,
                          transform=pm.distributions.transforms.ordered)

    y_ = pm.OrderedLogistic("y", cutpoints=cutpoints, eta=theta, observed=y)
    trace = pm.sample(5000)

#3

This worked perfectly, thank you!