Pymc3 variational inference for multi-level logistic regression returning approximation equal to NaN

I am trying to use variational inference via ADVI to sample from a posterior with a lot of parameters. The data is simulated using the following code:

import math

def sigmoid(x):
  return 1 / (1 + math.exp(-x))

sigmoid_vec = np.vectorize(sigmoid)

side_cov = np.random.normal(0,1,size = (1000,20))
betas = np.random.normal(0,1,size = (20,30))
res = np.dot(side_cov,betas)
main_cov = np.random.normal(0,1,size = (1000,30))
thetas = sigmoid_vec(np.sum(np.multiply(res,main_cov), axis = 1))
obs_y = (thetas > 0.5).astype(int)

Using this model:

with pm.Model() as model:
  # priors
  tilda = pm.Normal('tilda', mu=0.01, tau=1,shape = (side_cov.shape[1], main_cov.shape[1]))

  tilda_print = tt.printing.Print("tilda")(tilda)

  beta = pm.Deterministic('beta', tt.dot(side_cov,tilda))

  theta = pm.Deterministic('p', pm.math.sigmoid((beta * main_cov).sum(axis=1)))

  y = pm.Bernoulli('y',p = theta, observed=obs_y)


with model:
  advi_fit = pm.fit(method='advi', n=30000)

The error I am receiving is this:

NaN occurred in optimization. 
The current approximation of RV `tilda`.ravel()[0] is NaN.
The current approximation of RV `tilda`.ravel()[1] is NaN.
The current approximation of RV `tilda`.ravel()[2] is NaN.
The current approximation of RV `tilda`.ravel()[3] is NaN.
The current approximation of RV `tilda`.ravel()[4] is NaN.
The current approximation of RV `tilda`.ravel()[5] is NaN.

I suspect that this may have something to do with the fact that there are 3000 tildas to be sampled, but would appreciate any guidance. I have tried making the priors on tilda more informative but it didn’t seem to work.

1 Like

You are probably observing numerical issues due to the sigmoid evaluating to either 0 or 1, and having an observation in obs_y that is the opposite.

You can try to use the argument logit_p in the pm.Bernoulli to pass directly the probability in logit scale without having to do pm.math.sigmoid

logit_theta = pm.Deterministic((beta * main_cov).sum(axis=1))
y = pm.Bernoulli('y', logit_p=logit_theta, observed=obs_y)

Or you can use the pm.math.invlogit instead of pm.math.sigmoid, which avoids a theta of 0 or 1. This will ensure stability at the cost of less precision, so I would recommend the first solution if it solves your problem.

theta = pm.Deterministic(pm.math.invlogit(beta * main_cov).sum(axis=1))
y = pm.Bernoulli('y', p=theta, observed=obs_y)
2 Likes

Thank you so much!!!