Hi!
I am new to PyMC and there are a few questions concerning ADVI that I cannot figure out by myself.
I was using PSIS to diagnose the ADVI’s performance, where the method is proposed in this paper. My model is a GMM, and the implementation of PSIS is based on Junpeng’s notebook. You could find my notebook here. In addition, Junpeng only implemented for ADVI(meanfiled) case, and I expanded Junpeng’s implementation to the situations of ADVI.FullRank(). The core part of codes are listes as follows:
#PSIS
from pymc3.math import logsumexp
from scipy.special import logsumexp as sp_logsumexp
from pymc3.distributions.dist_math import rho2sd
def log_important_ratio(approx, nsample):
logp_func = approx.model.logp
# in ADVI there are only 1 group approximation
approx_group = approx.groups[0]
if approx.short_name == "mean_field":
mu_q = approx_group.params[0].eval()
std_q = rho2sd(approx_group.params[1]).eval()
logq_func = st.norm(mu_q, std_q)
elif approx.short_name == "full_rank":
packed_chol_q = approx_group.params[0]
mu_q = approx_group.params[1].eval()
dim = mu_q.shape[0]
chol_q = pm.expand_packed_triangular(dim, packed_chol_q, lower=True).eval()
cov_q = np.dot(chol_q, chol_q.T)
logq_func = st.multivariate_normal(mu_q, cov_q)
dict_to_array = approx_group.bij.map
p_theta_y = []
q_theta = []
samples = approx.sample_dict_fn(nsample) # type: dict
points = ({name: records[i] for name, records in samples.items()}
for i in range(nsample))
for point in points:
p_theta_y.append(logp_func(point))
q_theta.append(np.sum(logq_func.logpdf(dict_to_array(point))))
p_theta_y = np.asarray(p_theta_y)
q_theta = np.asarray(q_theta)
return p_theta_y-q_theta
However, I found that using pm.fit(..., method = 'ADVI')
and pm.ADVI()
gives me different k value. It is the same case with pm.fit(.., method='fullrank_advi') and
pm.FullRank ()`. the k values are as followed:
nsample = 10000
ratio_advi_mf = log_important_ratio(advi_mf.approx, nsample) # via pm.ADVI()
_, k = pm.stats._psislw(ratio_advi_mf[:, None], 1)
k # array([0.17293144])
ratio_advi_mf2 = log_important_ratio(advi_mf2, nsample) # via pm.fit(..., method='advi')
_, k2 = pm.stats._psislw(ratio_advi_mf2[:, None], 1)
k2 # array([0.42639058])
ratio_advi_fr = log_important_ratio(advi_fr.approx, nsample) # via pm.FullRankADVI()
_, k_fr = pm.stats._psislw(ratio_advi_fr[:, None], 1)
k_fr # array([0.42639058]) # array([0.9365361])
ratio_advi_fr2 = log_important_ratio(advi_fr2, nsample) # via pm.fit(..., method='fullrank_advi')
_, k_fr2 = pm.stats._psislw(ratio_advi_fr2[:, None], 1)
k_fr2 # array([1.37102098])
Furthermore, using pm.fit
, we could set obj_optimizer
, but there is no such option in ADVI
. And this made me doubt whether pm.fit
and pm.ADVI()
gives the same result, and which one is based on the original ADVI algorithm.
Lastly, could you please check if my implemetation for FullRankADVI case is correct, since the algo is giving me k value that is larger than 1.
Many thanks!