I’m trying to build a Bayesian multivariate ordered logit model using PyMC3. I have gotten a toy multivariate logit model working based on the examples in this book. I’ve also gotten an ordered logistic regression model running based on the example at the bottom of this page.
However, I cannot get an ordered, multivariate logistic regression to run.
Here’s my code:
Data prep for MWE:
import pymc3 as pm
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris(return_X_y=False)
iris = pd.DataFrame(data=np.c_[iris['data'], iris['target']],
columns=iris['feature_names'] + ['target'])
iris = iris.rename(index=str, columns={'sepal length (cm)': 'sepal_length', 'sepal width (cm)': 'sepal_width', 'target': 'species'})
Here is a working multivariate (binary) logit:
df = iris.loc[iris['species'].isin([0, 1])]
y = pd.Categorical(df['species']).codes
x = df[['sepal_length', 'sepal_width']].values
with pm.Model() as model_1:
alpha = pm.Normal('alpha', mu=0, sd=10)
beta = pm.Normal('beta', mu=0, sd=2, shape=x.shape[1])
mu = alpha + pm.math.dot(x, beta)
theta = 1 / (1 + pm.math.exp(-mu))
y_ = pm.Bernoulli('yl', p=theta, observed=y)
trace_1 = pm.sample(5000)
Here is a working ordered logit (with one independent variable):
x = iris['sepal_length'].values
y = pd.Categorical(iris['species']).codes
with pm.Model() as model:
cutpoints = pm.Normal("cutpoints", mu=[-2,2], sd=10, shape=2,
transform=pm.distributions.transforms.ordered)
y_ = pm.OrderedLogistic("y", cutpoints=cutpoints, eta=x, observed=y)
tr = pm.sample(1000)
Here is my attempt at a multivariate ordered logit, which breaks:
x = iris[['sepal_length', 'sepal_width']].values
y = pd.Categorical(iris['species']).codes
with pm.Model() as model:
cutpoints = pm.Normal("cutpoints", mu=[-2,2], sd=10, shape=2,
transform=pm.distributions.transforms.ordered)
y_ = pm.OrderedLogistic("y", cutpoints=cutpoints, eta=x, observed=y)
tr = pm.sample(1000)
The error I get is: “ValueError: all the input array dimensions except for the concatenation axis must match exactly.”
This suggests it’s a data problem (x, y), but the data looks the same as it does for the multivariate logit, which works.
How can I fix the ordered multivariate logit so it will run?