Fix Bridge sampling for variables with shape>1?

Hi! I’m trying to use @junpenglao’s implementation of the bridge sampling.
I’m trying to solve an error I get when computing the median effective sample size. First, there seems to be a deprecated method throwing an error.

# median effective sample size (scalar)
neff = pm.stats.dict2pd(neff_list,'temp').median()

AttributeError: module ‘pymc3.stats’ has no attribute ‘dict2pd’

Using neff = np.median(list(neff_list.values())) instead works for variables with shape=1, but fails for variables with shape > 1.
Here’s a full example (look for the FIXME comment at line 81):

# %%
import numpy as np
import pandas as pd
import patsy
import pymc3 as pm
from pymc3.model import modelcontext
from scipy import dot
from scipy.linalg import cholesky as chol
import scipy.special as spf
import scipy.stats as st
import warnings

# %%
# Dummy Data
mu1 = -1
mu2 = 2.5
sigma1 = 1
sigma2 = 2

df = pd.DataFrame({'A': np.random.normal(loc=mu1, scale=sigma1, size=30),
                   'B': np.random.normal(loc=mu2, scale=sigma2, size=30)
                  })

coords = {'var': ['A', 'B'],
          'obs_id': np.arange(len(df))}

# %%
# Model
with pm.Model(coords=coords) as model:
    # Priors.
    mu_a = pm.Normal('mu_a', mu=0, sigma=5)
    # Assume a positive difference between A and B.
    mu_diff = pm.Gamma('mu_diff', mu=1, sigma=3)
    mu_b = pm.Deterministic('mu_b', mu_a + mu_diff)

    # Stack priors, because data is not in long format.
    theta = pm.math.stack((mu_a, mu_b)).T
    # Model error. Shape=2. This is causing problems in bridge sampler.
    sigma = pm.Exponential("sigma", lam=1.0, dims='var')
    # Observed variable.
    obs_data = pm.Data("y_obs", df[coords['var']], dims=('obs_id', 'var'))
    # Using user_idx to index theta somehow tells which prior belongs to which user.
    y = pm.Normal("y", mu=theta, sigma=sigma, observed=obs_data, dims=('obs_id', 'var'))

# %%
with model:
    mtrace = pm.sample(2000, tune=400, target_accept=0.9)

# %%
# Bridge Sampling
r0, tol1, tol2 = 0.5, 1e-10, 1e-4
maxiter=1000

model = modelcontext(model)
logp = model.logp_array
# free_RVs might be autotransformed. 
# if that happens, there will be a model.deterministics entry with the first part of the name that needs to be used 
# instead of the autotransformed name below in stats.ess
# so we need to replace that variable with the corresponding one from the deterministics
vars = model.free_RVs
det_names=[d.name for d in model.deterministics]
det_names.sort(key=lambda s:-len(s)) # sort descending by length

def recover_var_name(name_autotransformed):
    for dname in det_names:
        if dname==name_autotransformed[:len(dname)]:
            return dname
    return name_autotransformed

# Split the samples into two parts  
# Use the first 50% for fiting the proposal distribution and the second 50% 
# in the iterative scheme.
len_trace = len(mtrace)
nchain = mtrace.nchains

N1_ = len_trace // 2
N1 = N1_*nchain
N2 = len_trace*nchain - N1

neff_list = dict() # effective sample size

arraysz = model.bijection.ordering.size
samples_4_fit = np.zeros((arraysz, N1))
samples_4_iter = np.zeros((arraysz, N2))
# matrix with already transformed samples
for var in vars:
    varmap = model.bijection.ordering.by_name[var.name]
    # for fitting the proposal
    x = mtrace[:N1_][var.name]
    samples_4_fit[varmap.slc, :] = x.reshape((x.shape[0], np.prod(x.shape[1:], dtype=int))).T
    # for the iterative scheme
    x2 = mtrace[N1_:][var.name]
    samples_4_iter[varmap.slc, :] = x2.reshape((x2.shape[0], np.prod(x2.shape[1:], dtype=int))).T
    # effective sample size of samples_4_iter, scalar
    orig_name=recover_var_name(var.name)
    neff_list.update(pm.stats.ess(mtrace[N1_:], var_names=[orig_name]))

# median effective sample size (scalar)
neff = np.median(list(neff_list.values()))  # FIXME: Crashes here because of shape sigma > 1!



# %%
# get mean & covariance matrix and generate samples from proposal
m = np.mean(samples_4_fit, axis=1)
V = np.cov(samples_4_fit)
L = chol(V, lower=True)

# Draw N2 samples from the proposal distribution
gen_samples = m[:, None] + np.dot(L, st.norm.rvs(0, 1, size=samples_4_iter.shape))

# Evaluate proposal distribution for posterior & generated samples
q12 = st.multivariate_normal.logpdf(samples_4_iter.T, m, V)
q22 = st.multivariate_normal.logpdf(gen_samples.T, m, V)

# Evaluate unnormalized posterior for posterior & generated samples
q11 = np.asarray([logp(point) for point in samples_4_iter.T])
q21 = np.asarray([logp(point) for point in gen_samples.T])

# Iterative scheme as proposed in Meng and Wong (1996) to estimate
# the marginal likelihood
def iterative_scheme(q11, q12, q21, q22, r0, neff, tol, maxiter, criterion):
    l1 = q11 - q12
    l2 = q21 - q22
    lstar = np.median(l1) # To increase numerical stability, 
                            # subtracting the median of l1 from l1 & l2 later
    s1 = neff/(neff + N2)
    s2 = N2/(neff + N2)
    r = r0
    r_vals = [r]
    logml = np.log(r) + lstar
    criterion_val = 1 + tol

    i = 0
    while (i <= maxiter) & (criterion_val > tol):
        rold = r
        logmlold = logml
        numi = np.exp(l2 - lstar)/(s1 * np.exp(l2 - lstar) + s2 * r)
        deni = 1/(s1 * np.exp(l1 - lstar) + s2 * r)
        if np.sum(~np.isfinite(numi))+np.sum(~np.isfinite(deni)) > 0:
            warnings.warn("""Infinite value in iterative scheme, returning NaN. 
            Try rerunning with more samples.""")
        r = (N1/N2) * np.sum(numi)/np.sum(deni)
        r_vals.append(r)
        logml = np.log(r) + lstar
        i += 1
        if criterion=='r':
            criterion_val = np.abs((r - rold)/r)
        elif criterion=='logml':
            criterion_val = np.abs((logml - logmlold)/logml)

    if i >= maxiter:
        return dict(logml = np.NaN, niter = i, r_vals = np.asarray(r_vals))
    else:
        return dict(logml = logml, niter = i)

# Run iterative scheme:
tmp = iterative_scheme(q11, q12, q21, q22, r0, neff, tol1, maxiter, 'r')
if ~np.isfinite(tmp['logml']):
    warnings.warn("""logml could not be estimated within maxiter, rerunning with 
                    adjusted starting value. Estimate might be more variable than usual.""")
    # use geometric mean as starting value
    r0_2 = np.sqrt(tmp['r_vals'][-2]*tmp['r_vals'][-1])
    tmp = iterative_scheme(q11, q12, q21, q22, r0_2, neff, tol2, maxiter, 'logml')

result = dict(logml = tmp['logml'], niter = tmp['niter'], method = "normal", 
              q11 = q11, q12 = q12, q21 = q21, q22 = q22)

Trying to get the median throws:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

neff_list's content:

{‘mu_a’: <xarray.DataArray ‘mu_a’ ()>
array(2530.95070228),
‘mu_diff’: <xarray.DataArray ‘mu_diff’ ()>
array(2404.32502243),
‘sigma’: <xarray.DataArray ‘sigma’ (var: 2)>
array([3993.40673043, 3797.273313 ])
Coordinates:
* var (var) <U1 ‘A’ ‘B’}

As you can see, sigma is of shape 2, causing the error.
How can this be fixed?

Splitting sigma into separate variables in the model may be reasonable for shape==2, but with more this would generate a lot of boilerplate. Is there a general solution?

I replaced the call in question with
neff = np.median(np.concatenate([x.values.reshape((1,)) if x.shape==() else x for x in neff_list.values()]))
and that seems to work for now. Is that a reasonable thing to do?

Maybe it is better to use pm.summary and read out the median effective sample size column. You can supply a list to var_names so that the ordering is what you want https://arviz-devs.github.io/arviz/generated/arviz.summary.html#arviz.summary

Thanks for the suggestion. For clarification: Is my approach wrong?
With the arviz.summary I get ess_mean, ess_sd, ess_bulk, and ess_tail, but not the median. I found az.stats.ess(idata, method='median'), but that gives me an xarray and not the scalar I need for neff.
Could you break down the steps on how to receive the correct value? Do I need to supply only the free_RVs to var_names? And then flatten all and take the median?

Sorry, I know I’m in over my head with this stuff. N_1 is used to determine the s_1 and s_2 constants in the denominator of the optimal bridge function. Why is N_1 replaced with the median effective sample size (neff) ? I didn’t see it in the tutorial on bridge sampling. In the R code, if the effective sample size can’t be calculated, the number of samples is used. I suppose that’s N_1? What happens, if I do this instead?

Your approach is fine as is - looking at it since bridge sampling needs variables in the latent unbounded space, probably more straightforward just do this by hand without Arviz.

Using number of samples might not be very accurate as MCMC samples are not iid - that said, if the mcmc chains are well mixed with low autocorrelations, you can use number of samples directly. the median is used here because we dont have a good way to estimate the effective number of samples for all random variables.