I want to build a softmax regression model, where the coefficients are assumed to be the same for the different classes, and what varies are the input features. I could only make it work (kind of, still having problems with the intercept) for two classes, using a Logistic Regression model. I am however failing to do it using what I thought would have been the equivalent Softmax Regression model. I would like to figure this out before attempting to build a model for more classes.
Here is how I generate the data:
n = 100
betas = [0, 2, 0]
x1 = np.random.rand(n*(len(betas)-1)) * 5 -2.5
x1 = x1.reshape(n, (len(betas)-1))
x1 = add_constant(x1)
mu1 = np.dot(x1, betas)
x2 = np.random.rand(n*(len(betas)-1)) * 5 -2.5
x2 = x2.reshape(n, (len(betas)-1))
x2 = add_constant(x2)
mu2 = np.dot(x2, betas)
softprobs = [softmax([t1, t2]) for t1, t2 in zip(mu1, mu2)]
y = [np.random.choice((0,1), p=p) for p in softprobs]
Here is my almost-working Logistic Regression model (the intercept is underdefined):
with pm.Model() as m1:
b = pm.Normal('b', 0, 20, shape=len(betas))
mu1 = pm.math.dot(x1, b)
mu2 = pm.math.dot(x2, b)
theta = pm.math.sigmoid(mu2-mu1)
yl = pm.Bernoulli('y', p=theta, observed=y)
trace_m1 = pm.sample(1000)
Here is what I expected to be the equivalent Softmax model:
with pm.Model() as m2:
b = pm.Normal('b', 0, 20, shape=len(betas))
mu1 = pm.math.dot(x1, b)
mu2 = pm.math.dot(x2, b)
mus = tt.transpose(tt.stack((mu1, mu2)))
theta = tt.nnet.softmax(mus)
yl = pm.Categorical('y', p=theta, observed=y)
trace_m2 = pm.sample(1000)
I am, however, able to retrieve the same results of the first model if I simply use one of the columns of the softmax output with a Bernoulli distribution for the data:
with pm.Model() as m3:
b = pm.Normal('b', 0, 20, shape=len(betas))
mu1 = pm.math.dot(x1, b)
mu2 = pm.math.dot(x2, b)
mus = tt.transpose(tt.stack((mu1, mu2)))
theta = tt.nnet.softmax(mus)
yl = pm.Bernoulli('y', p=theta[:, 1], observed=y)
trace_m3 = pm.sample(1000)
Any idea what is going on? How may I go about fixing the Softmax model (as well as the intercept being underdefined in the Logistic model)?
I am grateful for any input you can give.