# Pymc3 variational inference for multi-level logistic regression returning approximation equal to NaN

I am trying to use variational inference via ADVI to sample from a posterior with a lot of parameters. The data is simulated using the following code:

``````import math

def sigmoid(x):
return 1 / (1 + math.exp(-x))

sigmoid_vec = np.vectorize(sigmoid)

side_cov = np.random.normal(0,1,size = (1000,20))
betas = np.random.normal(0,1,size = (20,30))
res = np.dot(side_cov,betas)
main_cov = np.random.normal(0,1,size = (1000,30))
thetas = sigmoid_vec(np.sum(np.multiply(res,main_cov), axis = 1))
obs_y = (thetas > 0.5).astype(int)
``````

Using this model:

``````with pm.Model() as model:
# priors
tilda = pm.Normal('tilda', mu=0.01, tau=1,shape = (side_cov.shape, main_cov.shape))

tilda_print = tt.printing.Print("tilda")(tilda)

beta = pm.Deterministic('beta', tt.dot(side_cov,tilda))

theta = pm.Deterministic('p', pm.math.sigmoid((beta * main_cov).sum(axis=1)))

y = pm.Bernoulli('y',p = theta, observed=obs_y)

with model:
``````

The error I am receiving is this:

``````NaN occurred in optimization.
The current approximation of RV `tilda`.ravel() is NaN.
The current approximation of RV `tilda`.ravel() is NaN.
The current approximation of RV `tilda`.ravel() is NaN.
The current approximation of RV `tilda`.ravel() is NaN.
The current approximation of RV `tilda`.ravel() is NaN.
The current approximation of RV `tilda`.ravel() is NaN.
``````

I suspect that this may have something to do with the fact that there are 3000 tildas to be sampled, but would appreciate any guidance. I have tried making the priors on tilda more informative but it didn’t seem to work.

1 Like

You are probably observing numerical issues due to the `sigmoid` evaluating to either 0 or 1, and having an observation in `obs_y` that is the opposite.

You can try to use the argument `logit_p` in the `pm.Bernoulli` to pass directly the probability in logit scale without having to do `pm.math.sigmoid`

``````logit_theta = pm.Deterministic((beta * main_cov).sum(axis=1))
y = pm.Bernoulli('y', logit_p=logit_theta, observed=obs_y)
``````

Or you can use the `pm.math.invlogit` instead of `pm.math.sigmoid`, which avoids a `theta` of 0 or 1. This will ensure stability at the cost of less precision, so I would recommend the first solution if it solves your problem.

``````theta = pm.Deterministic(pm.math.invlogit(beta * main_cov).sum(axis=1))
y = pm.Bernoulli('y', p=theta, observed=obs_y)
``````
2 Likes

Thank you so much!!!