Shape error after pm.math.dot in multivar logistic regression model

I’m trying to create a multivariate logistic regression model using one-hot encoded data. The model runs through find_MAP fine, but stalls at the start of sampling. I’m thinking this might be related to this issue:

and perhaps:

The data is a one-hot encoded numpy.array with shape (5800,170) and y is a binary (5800,)

I’ve tried it with the NUTS and Metropolis samplers, to no avail. Any ideas or suggestions?

The model is defined below:

with pm.Model():
    betas = pm.Normal(name='beta', mu=0, sigma=3, shape=D)
    alpha = pm.Normal(name='alpha', mu=0, sigma=3)
    theta = pm.Deterministic('theta',pm.math.sigmoid(alpha + pm.math.dot(X, betas)))
    obs = pm.Bernoulli('obs', p=theta, observed=y)
    print("Finished GLM")
    start = pm.find_MAP()
    print("Finished MAP")
    print(start)
    step = pm.NUTS(scaling=start, step_scale=.25)
    trace = pm.sample(1000, step, start=start)

Setting scaling=start will only hurt you. Just use the normal settings from pm.sample(), ie just

trace = pm.sample()

or maybe

trace = pm.sample(tune=1000, draws=1000, target_accept=0.9)

Usually, you should only change tune, draws, cores, chains and target_accept.

Are there any nans in y or strange values in X?
Are the columns of X strongly collinear? If so, you could fix that using a QR decomposition.

1 Like