How to use pymc3.fit() method

PyMC3 supports various Variational Inference techniques,the main entry point is pymc3.fit().but I don’t know how to apply it effectively,and when I tried to use it ,there were the following error:

Average Loss = 4.2499e+08:   0%|          | 19/10000 [00:02<22:09,  7.51it/s]
Traceback (most recent call last):

FloatingPointError: NaN occurred in optimization.

Did you have a look at the related discussion here?

Thank you,I saw the links you gave,to get the logp of each RV,I used

for RV in model.basic_RVs:
    print(RV.name, RV.logp(model.test_point))

which gives:

sigma_w_log__ -1.3068528194400542
rho 3.6862316527834187
omega 23.716498079566385
mu1 3.6862316527834187
xi_log__ 0.2240229904027684
b 276.72931195787453
sigma_log__ -0.7698925914732451
y -173353788.92710122

there is no -inf.then I uesd

approx = pm.fit(n=10000, method='advi', model=model,
        obj_optimizer=pm.adagrad_window(learning_rate=2e-4))

which gives:

FloatingPointError: NaN occurred in optimization.

I can’t find the problem,may be I’m too dumb understand the solution reading those posts,any help will be very grateful.

FYI, you can check the logp easier now if you are on master branch:
model.check_test_point()

Usually, if you get NaN occurred in optimization after the optimization started, there are some invalid parameters in your model, typically things like negative sigma for Normal distribution, negative alpha or beta for Beta distribution etc. These problem are more case by case and we dont yet have a good way to diagnose them - maybe you can share the code/data of the model you are working on.

Thank you very much for your advice! my code is as follows:

import numpy as np
import pymc3 as pm
from theano import tensor as tt


N = 200      # the number of observed individuals
n= 11         # each individual was observed 5 times

e=np.ones(n)
mu0 = np.linspace(0.0, 0.0, num=n)

# simulate observation data(AR结构)
C2=np.zeros((n,n))
for i in range(0,n):
    for j in range(0,n):
        C2[i,j]=10.*0.4**np.abs(i-j)
dataSet2 = np.random.multivariate_normal(mu0, C2, size=N)


time_obseved = [-5.0,-4.0,-3.0,-2.0,-1.0,0.0,1.0,2.0,3.0,4.0,5.0]
time = np.array(time_obseved)

H0 = np.zeros((n, n))
for i in range(0, n):
    for j in range(0, n):
        H0[i, j] = np.abs(time[j] - time[i])

with pm.Model() as model:
     sigma_w = pm.InverseGamma('sigma_w',1.0,0.1)
     rho = pm.Normal('rho', 0.4, 0.01)
     cov0=sigma_w**2*tt.pow(rho,H0)

     omega=pm.MvNormal('omega',mu=mu0,cov=cov0,shape=(1,n))    

     mu1=pm.Normal('mu1',mu=0.0,sd=0.01)
     xi=pm.InverseGamma('xi',11.0,0.1)
     b=pm.Normal('b', mu=0.0, sd=tt.sqrt(xi),shape=(N,1))

     mu2=(mu1+b)*e+omega
     sigma=pm.HalfNormal('sigma',0.01)

     y=pm.Normal('y',mu=mu2,sd=sigma,observed=dataSet2)

The problem is rho, as it must be bounded within [0, 1] for the approximation to valid. You can try:

BoundNormal = pm.Bound(pm.Normal, lower=0., upper=1.)
with pm.Model() as model:
    sigma_w = pm.InverseGamma('sigma_w', 1.0, 0.1)
    rho = BoundNormal('rho', mu=0.4, sd=0.01)
    ...

Which bounded rho within the appropriate range.

1 Like

That’s great :tada: , NaN no longer occurs in optimization,your advice has enabled my code to work properly.I uesd

with model:
    mean_field=pm.fit(method='advi',callbacks=[pm.callbacks.CheckParametersConvergence(diff='absolute',tolerance=2e-4)])

which gives:

Average Loss = 1.492e+06: 100%|██████████| 10000/10000 [22:18<00:00,  7.47it/s]
Finished [100%]: Average Loss = 1.4909e+06

it looks Average Loss is too big,and the most important problem is that there is no convergence.how can I make it converge?

Check the ELBO history:

plt.plot(mean_field.hist)

The convergent checks we currently have are only heuristics - you should train your model much longer and check your result extensively. For example, have a look at this discussion here: Poor Accuracy of ADVI for Linear Regression

1 Like

Thank you,according to your suggestion, I increased the training time and checked the ELBO history,and the result as follows:

with model:
    mean_field=pm.fit(n=50000,method='advi',callbacks=[pm.callbacks.CheckParametersConvergence(diff='absolute',tolerance=2e-4)])
    plt.plot(mean_field.hist)

get

Average Loss = 16,246: 100%|██████████| 50000/50000 [2:05:21<00:00,  6.65it/s]    
Finished [100%]: Average Loss = 16,246

%E5%9B%BE%E7%89%87

It looks like it took a long time, because n=50000. But the output result did not tell me how many iterations to converge, because there was no stopping. Does this mean that iteration 50000 still has no convergence?

The scale of the yaxis of the ELBO plot is rather big. Try plotting the [10000:] iteration - if it looks rather flat then further training probably wont help.

According to your suggestion, I modified the code:

with model:
    mean_field=pm.fit(n=20000,method='advi',callbacks=[pm.callbacks.CheckParametersConvergence(diff='absolute',tolerance=2e-4)])

    plt.figure(figsize=(10, 6))
    plt.subplot(221)
    plt.plot(mean_field.hist)
    plt.subplot(222)
    plt.plot(mean_field.hist[7500:])

and got:

Average Loss = 16,442: 100%|██████████| 20000/20000 [46:54<00:00,  7.11it/s]    
Finished [100%]: Average Loss = 16,443

the ELBO plot as follow:
%E5%9B%BE%E7%89%87
Does this mean that iteration 20000 still has no convergence?how can I make it converge?any suggestion will be very grateful.

It’s not converge according to the threshold you set (i.e., diff='absolute',tolerance=2e-4), however many reasons could cause this, you can also try increasing the number of mc to compute the gradient, and try different optimizer. More detail see the discussion in Poor Accuracy of ADVI for Linear Regression

Thank you! this post has been shared with me before, and I have learned the method mentioned in it, but I may not have fully learned it.I tried

with model:
    mean_field=pm.fit(20000, method='advi',callbacks=
        [pm.callbacks.CheckParametersConvergence()], obj_n_mc=100,
               obj_optimizer=pm.adagrad(learning_rate=0.02))

although there was no error in the operation, it was too slow, maybe I had to wait a few hours before the operation ended.

obj_n_mc=100 is probably too many - the default was 1.
My suggestion is try with 5 or 10, and also increase the learning_rate in pm.adagrad(learning_rate=0.02).
It all feels a bit like “change something and see what happen”, but that is the reality as all the solutions are domain specific, when you have a new question/model, you need to try different strategy until something works.
One of the good thing at least is that you can initialize the approximation class with different optimizers, and train the same small epochs (say 10000), and compare the ELBO/parameters to see which one is decreasing the fastest. Then continoue training using the best one.

Thank you for answering my doubts. In addition to adjusting training parameters, I have improved the distribution expression of variable omega,and the specific modifications are as follows:

the original expression of variable omega (omega obeys a multivariate normal distribution) :

omega=pm.MvNormal('omega',mu=mu0,cov=cov0,shape=(1,6))

the modified expression of variable omega(write out the distribution of each component of omega, and then combine the components to get the omega.):

omega_0=pm.Normal('omega_0',mu=0.,sd=sigma_w)
omega_1=pm.Normal('omega_1',mu=omega_0*rho,sd=sigma_w*tt.sqrt(1-rho**2))
omega_2=pm.Normal('omega_2',mu=omega_1*rho,sd=sigma_w*tt.sqrt(1-rho**2))
omega_3=pm.Normal('omega_3',mu=omega_2*rho,sd=sigma_w*tt.sqrt(1-rho**2))
omega_4=pm.Normal('omega_4',mu=omega_3*rho,sd=sigma_w*tt.sqrt(1-rho**2))
omega_5=pm.Normal('omega_5',mu=omega_4*rho,sd=sigma_w*tt.sqrt(1-rho**2))

omega=[omega_0,omega_1,omega_2,omega_3,omega_4,omega_5]

I am puzzled whether the modified omega is the same as the original omega, whether it is the same dimension or type.

It depends on the original covariance matrix cov0, which in turn depends on H0 in your model since cov0=sigma_w**2*tt.pow(rho,H0). I think the two is the same as this is an autoregressive term right? We also have an autoregressive distribution in PyMC3 that might help you simplify the code.

Yes, it’s a autoregressive structure ,maybe I didn’t express myself clearly. In the last question,I wonder if the shape of the modified omega is the same as that of the original omega, and I want to keep their shape unchanged.( the original omega's shape is (1,6))