# What's the difference between pm.fit(..., method = 'ADVI' ) and pm.ADVI()

Hi!

I am new to PyMC and there are a few questions concerning ADVI that I cannot figure out by myself.

I was using PSIS to diagnose the ADVI’s performance, where the method is proposed in this paper. My model is a GMM, and the implementation of PSIS is based on Junpeng’s notebook. You could find my notebook here. In addition, Junpeng only implemented for ADVI(meanfiled) case, and I expanded Junpeng’s implementation to the situations of ADVI.FullRank(). The core part of codes are listes as follows:

``````#PSIS
from pymc3.math import logsumexp
from scipy.special import logsumexp as sp_logsumexp
from pymc3.distributions.dist_math import rho2sd

def log_important_ratio(approx, nsample):
logp_func = approx.model.logp

# in ADVI there are only 1 group approximation
approx_group = approx.groups[0]
if approx.short_name == "mean_field":
mu_q = approx_group.params[0].eval()
std_q = rho2sd(approx_group.params[1]).eval()
logq_func = st.norm(mu_q, std_q)
elif approx.short_name == "full_rank":
packed_chol_q = approx_group.params[0]
mu_q = approx_group.params[1].eval()
dim = mu_q.shape[0]
chol_q = pm.expand_packed_triangular(dim, packed_chol_q, lower=True).eval()
cov_q = np.dot(chol_q, chol_q.T)
logq_func = st.multivariate_normal(mu_q, cov_q)

dict_to_array = approx_group.bij.map

p_theta_y = []
q_theta = []
samples = approx.sample_dict_fn(nsample)  # type: dict
points = ({name: records[i] for name, records in samples.items()}
for i in range(nsample))

for point in points:
p_theta_y.append(logp_func(point))
q_theta.append(np.sum(logq_func.logpdf(dict_to_array(point))))
p_theta_y = np.asarray(p_theta_y)
q_theta = np.asarray(q_theta)
return p_theta_y-q_theta

``````

However, I found that using `pm.fit(..., method = 'ADVI')` and `pm.ADVI()` gives me different k value. It is the same case with `pm.fit(.., method='fullrank_advi') and`pm.FullRank ()`. the k values are as followed:

``````nsample = 10000
_, k = pm.stats._psislw(ratio_advi_mf[:, None], 1)
k # array([0.17293144])

_, k2 = pm.stats._psislw(ratio_advi_mf2[:, None], 1)
k2 # array([0.42639058])

_, k_fr = pm.stats._psislw(ratio_advi_fr[:, None], 1)
k_fr # array([0.42639058]) # array([0.9365361])

_, k_fr2 = pm.stats._psislw(ratio_advi_fr2[:, None], 1)
k_fr2 # array([1.37102098])
``````

Furthermore, using `pm.fit`, we could set `obj_optimizer`, but there is no such option in `ADVI`. And this made me doubt whether `pm.fit` and `pm.ADVI()` gives the same result, and which one is based on the original ADVI algorithm.

Lastly, could you please check if my implemetation for FullRankADVI case is correct, since the algo is giving me k value that is larger than 1.

Many thanks!

I think it is more likely the stochastic in model fitting. PSIS is very sensitive when I was playing around with it. Do you see any difference in the estimation?

The estimation gives me the similar results on the toy data sets. But in general, pm.fit() seems to give a larger k value than pm.ADVI() / pm.FullRankADVI(), after running it for several times.

BTW, sometimes `pm.stats._psislw()` is giving me negative k value, but isn’t the shape parameter of pareto distribution always positive?

Hmmm, my reasoning would be that different optimizer is used, which returns different estimation from approx model.