Tacking ISRS onto SVGD

chartl · February 20, 2019, 10:55pm

I’ve been playing around with the framework here quite a bit. I love SVGD but it has some performance issues – first, it can be brutally slow with a moderate number of particles, and second the temperature parameter really impacts the posterior estimates. See

https://ferrine.github.io/blog/2017/06/04/asvgd-sanity-check/

What SVGD does really well is track particles to posterior modes – and as far as I’m concerned this is the most substantial challenge in Bayesian modeling. If you know where the modes are and have just an inkling of the local density, one or two iterations of importance resampling can recover the posterior to a high degree of accuracy.

I’ve thrown together some code to tack on KDE-based resampling to an initial model sample:

def sample_unnormalized(p_prop, size):
    p_cum = np.cumsum(p_prop)
    unis = np.random.uniform(size=size) * p_cum[-1]
    return np.greater.outer(unis, p_cum).sum(axis=1)


def sample_kde(dataset, n):
    bw = dataset.shape[0] ** (-1./2.)
    # deal with numerics a little bit for high-dimensional settings
    kde_cov = bw * (np.cov(dataset.T) + np.diag((bw ** 2) * np.ones(dataset.shape[1])))
    kde_L = np.linalg.cholesky(kde_cov)
    idx = np.random.choice(dataset.shape[0], n, replace=True)
    return dataset[idx,:] + np.dot(np.random.normal(size=(n, dataset.shape[1])), kde_L.T)


def kde_isrs_(X, logp, N, n):
    Snew = sample_kde(X, N)
    liks = np.array([logp(Snew[i,:]) for i in xrange(N)])
    probs = np.exp(liks - np.max(liks))
    return Snew[sample_unnormalized(probs, n), :]


def kde_isrs(X, logp, N, n, iters=2):
    for j in xrange(iters):
        X = kde_isrs_(X, logp, N, n)
    return X


def vec2vardict(fit, vec, vars_):
    vd, offset = dict(), 0
    for vname in vars_:
        vcols = fit[vname].shape[1] if len(fit[vname].shape) > 1 else 1
        vd[vname] = vec[offset:(offset+vcols)]
        if len(vd[vname]) == 1:
            vd[vname] = vd[vname][0]
        offset += vcols
    return vd


def kde_isrs_model(model, model_fit, required_vars, N, n, iters=2):
    X = np.hstack([pd.DataFrame(model_fit[v]) for v in required_vars])
    logp = model.logp
    def model_lik(Xvec):
        vd = vec2vardict(model_fit, Xvec, required_vars)
        return logp(**vd)
    Z = kde_isrs(X, model_lik, N, n, iters)
    vardict, offset = dict(), 0
    for vname in required_vars:
        vcols = svgd_fit[vname].shape[1] if len(model_fit[vname].shape) > 1 else 1
        vardict[vname] = Z[:, offset:(offset+vcols)]
        offset += vcols
    return vardict

And its application to the Variational API Quickstart

isrs_res = kde_isrs_model(model, svgd_approx.sample(50), ['x'], 5000, 1000, 1)
ax = sns.kdeplot(trace['x'], label='NUTS');
sns.kdeplot(approx.sample(10000)['x'], label='ADVI');
sns.kdeplot(svgd_approx.sample(2000)['x'], label='SVGD');
sns.kdeplot(isrs_res['x'][:,0], label='SVGD+KDE-ISRS')

My issue is that this is entirely manual (outside of pymc3). The Interpolation trick used here is not particularly happy for higher-dimensions. Is there a pymc3 approach to this?

junpenglao · February 21, 2019, 6:00am

Interesting - so you use the smoothed histogram of SVGD particle to do an importance sampling to get the posterior right? I am not sure there are any good solution for high dimension interpolation, but what might be possible is to do a k-mean clustering of the SVGD particles, and do another VI using Gaussian mixture as approximations and we just need to estimate the weight (since mu and sd would be available from k-mean already).

also cc @ferrine

ferrine · February 23, 2019, 10:49am

Wow, it is really imptessive. My conserns are that in higher dimensionality svgd does is not expected to perform well.

Maybe if we totally ignore accuracy of approximate inference and focus on models we obtain, this might help to do better Bayesian averaging. But my experience+intuition suggests to try sghmc with cyclic learning rate as proposed in https://arxiv.org/abs/1902.03932. Can we do that?

Topic		Replies	Views
Can not initialize multidimensional prior distribution with Interpolated v3 gaussian_process , prior , modeling	0	13	February 5, 2025
Speeding up bayesian multiple regression Questions	3	877	February 19, 2019
Make predictions with the particles of SVGD Questions	6	1707	November 5, 2018
SVGD convergence	0	89	June 6, 2024
Sample_posterior_predictive slow Questions	8	1188	April 17, 2020

Tacking ISRS onto SVGD

Related topics