I am trying to use variational inference via ADVI to sample from a posterior with a lot of parameters. The data is simulated using the following code:
import math
def sigmoid(x):
return 1 / (1 + math.exp(-x))
sigmoid_vec = np.vectorize(sigmoid)
side_cov = np.random.normal(0,1,size = (1000,20))
betas = np.random.normal(0,1,size = (20,30))
res = np.dot(side_cov,betas)
main_cov = np.random.normal(0,1,size = (1000,30))
thetas = sigmoid_vec(np.sum(np.multiply(res,main_cov), axis = 1))
obs_y = (thetas > 0.5).astype(int)
Using this model:
with pm.Model() as model:
# priors
tilda = pm.Normal('tilda', mu=0.01, tau=1,shape = (side_cov.shape[1], main_cov.shape[1]))
tilda_print = tt.printing.Print("tilda")(tilda)
beta = pm.Deterministic('beta', tt.dot(side_cov,tilda))
theta = pm.Deterministic('p', pm.math.sigmoid((beta * main_cov).sum(axis=1)))
y = pm.Bernoulli('y',p = theta, observed=obs_y)
with model:
advi_fit = pm.fit(method='advi', n=30000)
The error I am receiving is this:
NaN occurred in optimization.
The current approximation of RV `tilda`.ravel()[0] is NaN.
The current approximation of RV `tilda`.ravel()[1] is NaN.
The current approximation of RV `tilda`.ravel()[2] is NaN.
The current approximation of RV `tilda`.ravel()[3] is NaN.
The current approximation of RV `tilda`.ravel()[4] is NaN.
The current approximation of RV `tilda`.ravel()[5] is NaN.
I suspect that this may have something to do with the fact that there are 3000 tildas to be sampled, but would appreciate any guidance. I have tried making the priors on tilda more informative but it didn’t seem to work.