Ordinal model does not find true means and variance

So I’ve tried using the approach explained in that notebook (which matches perfectly with what I wanted to do). Unfortunately, I’m assuming that the underlying distribution is a Beta distribution and it seems that the logcdf of pymc3.Beta can only handle scalar values at the moment. In order to bypass that, I tried splitting up the thresholds, replacing:

probs1 = tt.exp(pm.Beta.dist(alpha=alpha, beta=beta).logcdf(d1))

by:

    prob1 = tt.exp(pm.Beta.dist(alpha=alpha, beta=beta).logcdf(0.05))
    prob2 = tt.exp(pm.Beta.dist(alpha=alpha, beta=beta).logcdf(0.25))
    prob3 = tt.exp(pm.Beta.dist(alpha=alpha, beta=beta).logcdf(0.50)) 
    prob4 = tt.exp(pm.Beta.dist(alpha=alpha, beta=beta).logcdf(0.75)) 
    prob5 = tt.exp(pm.Beta.dist(alpha=alpha, beta=beta).logcdf(0.95))
   
    probs1 = tt.stack([prob1,prob2,prob3,prob4,prob5],axis=1)

(the values are the particular thresholds of my study).

Unfortunately, I think that ran into this error. So I had to use the as_op decorator that I used in my original model, as follows:

@as_op(itypes=[tt.dscalar, tt.dscalar], otypes=[tt.dvector])
def outcome_probabilities( alpha, beta):
    out = np.empty(((6)), dtype=np.float64)
    n = stats.beta(alpha,beta)  
    out[0] = n.cdf(theta[0]) 
    out[1] = n.cdf(theta[1]) - n.cdf(theta[0])
    out[2] = n.cdf(theta[2]) - n.cdf(theta[1])
    out[3] = n.cdf(theta[3]) - n.cdf(theta[2])
    out[4] = n.cdf(theta[4]) - n.cdf(theta[3])
    out[5] = 1 - n.cdf(theta[4])
    return out


with pm.Model() as model1:

    theta = d1
    mu = pm.Uniform('mu',0,1)
    sigma = pm.Uniform('sigma',0,(mu*(1-mu)))
    alpha = ((1 - mu) / sigma - 1 / mu) * (mu ** 2)
    
    beta = alpha * (1 / mu - 1)

    probs1 = outcome_probabilities(alpha,beta)
    
    probs1 = pm.Deterministic("probs1", probs1)
    
    pm.Multinomial("counts1", p=probs1, n=c1.sum(), observed=c1.values)

This works much better than the model I started with, both with respect to posterior predictive checks and recovery of the true parameter values. The sampling is still quite slow, because it still isn’t a pure theano implementation, but at least the estimates are much better now. Thanks again for the pointer, @cluhmann !

1 Like