What's the difference between pm.fit(..., method = 'ADVI' ) and pm.ADVI()

blueone · November 16, 2018, 5:26am

Hi!

I am new to PyMC and there are a few questions concerning ADVI that I cannot figure out by myself.

I was using PSIS to diagnose the ADVI’s performance, where the method is proposed in this paper. My model is a GMM, and the implementation of PSIS is based on Junpeng’s notebook. You could find my notebook here. In addition, Junpeng only implemented for ADVI(meanfiled) case, and I expanded Junpeng’s implementation to the situations of ADVI.FullRank(). The core part of codes are listes as follows:

#PSIS
from pymc3.math import logsumexp
from scipy.special import logsumexp as sp_logsumexp
from pymc3.distributions.dist_math import rho2sd

def log_important_ratio(approx, nsample):
    logp_func = approx.model.logp

    # in ADVI there are only 1 group approximation
    approx_group = approx.groups[0]
    if approx.short_name == "mean_field":
        mu_q = approx_group.params[0].eval()
        std_q = rho2sd(approx_group.params[1]).eval()
        logq_func = st.norm(mu_q, std_q)
    elif approx.short_name == "full_rank":
        packed_chol_q = approx_group.params[0]
        mu_q = approx_group.params[1].eval()
        dim = mu_q.shape[0]
        chol_q = pm.expand_packed_triangular(dim, packed_chol_q, lower=True).eval()
        cov_q = np.dot(chol_q, chol_q.T)
        logq_func = st.multivariate_normal(mu_q, cov_q)
        
    dict_to_array = approx_group.bij.map

    p_theta_y = []
    q_theta = []
    samples = approx.sample_dict_fn(nsample)  # type: dict
    points = ({name: records[i] for name, records in samples.items()}
              for i in range(nsample))

    for point in points:
        p_theta_y.append(logp_func(point))
        q_theta.append(np.sum(logq_func.logpdf(dict_to_array(point))))
    p_theta_y = np.asarray(p_theta_y)
    q_theta = np.asarray(q_theta)
    return p_theta_y-q_theta

However, I found that using pm.fit(..., method = 'ADVI') and pm.ADVI() gives me different k value. It is the same case with pm.fit(.., method='fullrank_advi') andpm.FullRank ()`. the k values are as followed:

nsample = 10000
ratio_advi_mf = log_important_ratio(advi_mf.approx, nsample) # via pm.ADVI()
_, k = pm.stats._psislw(ratio_advi_mf[:, None], 1)
k # array([0.17293144])

ratio_advi_mf2 = log_important_ratio(advi_mf2, nsample) # via pm.fit(..., method='advi')
_, k2 = pm.stats._psislw(ratio_advi_mf2[:, None], 1)
k2 # array([0.42639058])

ratio_advi_fr = log_important_ratio(advi_fr.approx, nsample) # via pm.FullRankADVI()
_, k_fr = pm.stats._psislw(ratio_advi_fr[:, None], 1)
k_fr # array([0.42639058]) # array([0.9365361])

ratio_advi_fr2 = log_important_ratio(advi_fr2, nsample) # via pm.fit(..., method='fullrank_advi')
_, k_fr2 = pm.stats._psislw(ratio_advi_fr2[:, None], 1)
k_fr2 # array([1.37102098])

Furthermore, using pm.fit, we could set obj_optimizer, but there is no such option in ADVI. And this made me doubt whether pm.fit and pm.ADVI() gives the same result, and which one is based on the original ADVI algorithm.

Lastly, could you please check if my implemetation for FullRankADVI case is correct, since the algo is giving me k value that is larger than 1.

Many thanks!

junpenglao · November 16, 2018, 6:06am

I think it is more likely the stochastic in model fitting. PSIS is very sensitive when I was playing around with it. Do you see any difference in the estimation?

blueone · November 16, 2018, 2:49pm

The estimation gives me the similar results on the toy data sets. But in general, pm.fit() seems to give a larger k value than pm.ADVI() / pm.FullRankADVI(), after running it for several times.

blueone · November 16, 2018, 2:56pm

BTW, sometimes pm.stats._psislw() is giving me negative k value, but isn’t the shape parameter of pareto distribution always positive?

junpenglao · November 16, 2018, 4:06pm

Hmmm, my reasoning would be that different optimizer is used, which returns different estimation from approx model.

Topic		Replies	Views
Evaluating Variational Inference: Need Access to fitted (AD)VI approximation density Questions	7	1379	November 15, 2018
Using ADVI's question Questions	5	537	December 3, 2018
Sampling a gaussian using ADVI Questions	3	674	April 19, 2018
Log-likelihood evaluation of pymc model	4	77	April 3, 2025
Variational inference: diagnosing convergence	5	267	October 9, 2024

What's the difference between pm.fit(..., method = 'ADVI' ) and pm.ADVI()

Related topics