Softmax Regression with shared coefficients between classes

I would strongly recommend trying to generate data with pymc3 first; since it finds niggling little shape errors with the model.

To be pedantic: this fails:

import pymc3 as pm
import theano
import theano.tensor as tt

x1 = [1., 2., 3., 4., 5.]
x2 = [5., 5., 0., 0., -3.]
x3 = [0., -2, 3., 6., 0.]

with pm.Model() as m2:    
    b = pm.Normal('b', 0, 20, shape=len(x1))    
    mu1 = pm.math.dot(x1, b)
    mu2 = pm.math.dot(x2, b)
    mu3 = pm.math.dot(x3, b)
    mus = tt.transpose(tt.stack((mu1, mu2, mu3)))
    theta = pm.Deterministic('theta', tt.nnet.softmax(mus))
    yl = pm.Categorical('y', p=theta)
    spp = pm.sample_prior_predictive(10)

but the error suggests that there is a size issue. Removing yl and running shows that

spp['theta'].shape

(10, 1, 3)

So theta is becoming a (1,3) matrix instead of a vector. Reshaping it leads to:

with pm.Model() as m2:    
    b = pm.Normal('b', 0, 20, shape=len(x1))    
    mu1 = pm.math.dot(x1, b)
    mu2 = pm.math.dot(x2, b)
    mu3 = pm.math.dot(x3, b)
    mus = tt.transpose(tt.stack((mu1, mu2, mu3)))
    theta = pm.Deterministic('theta', tt.nnet.softmax(mus)).reshape((3,))
    yl = pm.Categorical('y', p=theta)
    spp = pm.sample_prior_predictive(10)

which works fine. The shape of theta under the prior predictive also tells you what to do when you lump your observations together:

x1 = np.random.uniform(size=(8,4))
x2 = np.random.uniform(-1, 1, size=(8, 4))
x3 = np.random.uniform(0,2,size=(8,4))

with pm.Model() as m2:    
    b = pm.Normal('b', 0, 20, shape=x1.shape[1])    
    mu1 = pm.math.dot(x1, b)
    mu2 = pm.math.dot(x2, b)
    mu3 = pm.math.dot(x3, b)
    mus = tt.transpose(tt.stack((mu1, mu2, mu3)))
    theta = pm.Deterministic('theta', tt.nnet.softmax(mus)).reshape((x1.shape[0], 3))
    yl = pm.Categorical('y', p=theta, shape=x1.shape[0])
    spp = pm.sample_prior_predictive(10)