Pymc3 variational inference for multi-level logistic regression returning approximation equal to NaN

Zhangyi_Fan · April 11, 2021, 10:16pm

I am trying to use variational inference via ADVI to sample from a posterior with a lot of parameters. The data is simulated using the following code:

import math

def sigmoid(x):
  return 1 / (1 + math.exp(-x))

sigmoid_vec = np.vectorize(sigmoid)

side_cov = np.random.normal(0,1,size = (1000,20))
betas = np.random.normal(0,1,size = (20,30))
res = np.dot(side_cov,betas)
main_cov = np.random.normal(0,1,size = (1000,30))
thetas = sigmoid_vec(np.sum(np.multiply(res,main_cov), axis = 1))
obs_y = (thetas > 0.5).astype(int)

Using this model:

with pm.Model() as model:
  # priors
  tilda = pm.Normal('tilda', mu=0.01, tau=1,shape = (side_cov.shape[1], main_cov.shape[1]))

  tilda_print = tt.printing.Print("tilda")(tilda)

  beta = pm.Deterministic('beta', tt.dot(side_cov,tilda))

  theta = pm.Deterministic('p', pm.math.sigmoid((beta * main_cov).sum(axis=1)))

  y = pm.Bernoulli('y',p = theta, observed=obs_y)


with model:
  advi_fit = pm.fit(method='advi', n=30000)

The error I am receiving is this:

NaN occurred in optimization. 
The current approximation of RV `tilda`.ravel()[0] is NaN.
The current approximation of RV `tilda`.ravel()[1] is NaN.
The current approximation of RV `tilda`.ravel()[2] is NaN.
The current approximation of RV `tilda`.ravel()[3] is NaN.
The current approximation of RV `tilda`.ravel()[4] is NaN.
The current approximation of RV `tilda`.ravel()[5] is NaN.

I suspect that this may have something to do with the fact that there are 3000 tildas to be sampled, but would appreciate any guidance. I have tried making the priors on tilda more informative but it didn’t seem to work.

ricardoV94 · April 12, 2021, 12:57pm

You are probably observing numerical issues due to the sigmoid evaluating to either 0 or 1, and having an observation in obs_y that is the opposite.

You can try to use the argument logit_p in the pm.Bernoulli to pass directly the probability in logit scale without having to do pm.math.sigmoid

logit_theta = pm.Deterministic((beta * main_cov).sum(axis=1))
y = pm.Bernoulli('y', logit_p=logit_theta, observed=obs_y)

Or you can use the pm.math.invlogit instead of pm.math.sigmoid, which avoids a theta of 0 or 1. This will ensure stability at the cost of less precision, so I would recommend the first solution if it solves your problem.

theta = pm.Deterministic(pm.math.invlogit(beta * main_cov).sum(axis=1))
y = pm.Bernoulli('y', p=theta, observed=obs_y)

Zhangyi_Fan · April 12, 2021, 9:25pm

Thank you so much!!!

Topic		Replies	Views
NaN occurred in optimization with ADVI Questions	5	4486	August 6, 2019
How to use pymc3.fit() method Questions	16	3680	May 7, 2018
Variational inference: diagnosing convergence	5	104	October 9, 2024
NaN occurred in optimization at first Iteration with ADVI Questions	3	728	February 18, 2020
Sampling a gaussian using ADVI Questions	3	651	April 19, 2018

Pymc3 variational inference for multi-level logistic regression returning approximation equal to NaN

Related topics