Tacking ISRS onto SVGD

I’ve been playing around with the framework here quite a bit. I love SVGD but it has some performance issues – first, it can be brutally slow with a moderate number of particles, and second the temperature parameter really impacts the posterior estimates. See


What SVGD does really well is track particles to posterior modes – and as far as I’m concerned this is the most substantial challenge in Bayesian modeling. If you know where the modes are and have just an inkling of the local density, one or two iterations of importance resampling can recover the posterior to a high degree of accuracy.

I’ve thrown together some code to tack on KDE-based resampling to an initial model sample:

def sample_unnormalized(p_prop, size):
    p_cum = np.cumsum(p_prop)
    unis = np.random.uniform(size=size) * p_cum[-1]
    return np.greater.outer(unis, p_cum).sum(axis=1)

def sample_kde(dataset, n):
    bw = dataset.shape[0] ** (-1./2.)
    # deal with numerics a little bit for high-dimensional settings
    kde_cov = bw * (np.cov(dataset.T) + np.diag((bw ** 2) * np.ones(dataset.shape[1])))
    kde_L = np.linalg.cholesky(kde_cov)
    idx = np.random.choice(dataset.shape[0], n, replace=True)
    return dataset[idx,:] + np.dot(np.random.normal(size=(n, dataset.shape[1])), kde_L.T)

def kde_isrs_(X, logp, N, n):
    Snew = sample_kde(X, N)
    liks = np.array([logp(Snew[i,:]) for i in xrange(N)])
    probs = np.exp(liks - np.max(liks))
    return Snew[sample_unnormalized(probs, n), :]

def kde_isrs(X, logp, N, n, iters=2):
    for j in xrange(iters):
        X = kde_isrs_(X, logp, N, n)
    return X

def vec2vardict(fit, vec, vars_):
    vd, offset = dict(), 0
    for vname in vars_:
        vcols = fit[vname].shape[1] if len(fit[vname].shape) > 1 else 1
        vd[vname] = vec[offset:(offset+vcols)]
        if len(vd[vname]) == 1:
            vd[vname] = vd[vname][0]
        offset += vcols
    return vd

def kde_isrs_model(model, model_fit, required_vars, N, n, iters=2):
    X = np.hstack([pd.DataFrame(model_fit[v]) for v in required_vars])
    logp = model.logp
    def model_lik(Xvec):
        vd = vec2vardict(model_fit, Xvec, required_vars)
        return logp(**vd)
    Z = kde_isrs(X, model_lik, N, n, iters)
    vardict, offset = dict(), 0
    for vname in required_vars:
        vcols = svgd_fit[vname].shape[1] if len(model_fit[vname].shape) > 1 else 1
        vardict[vname] = Z[:, offset:(offset+vcols)]
        offset += vcols
    return vardict

And its application to the Variational API Quickstart

isrs_res = kde_isrs_model(model, svgd_approx.sample(50), ['x'], 5000, 1000, 1)
ax = sns.kdeplot(trace['x'], label='NUTS');
sns.kdeplot(approx.sample(10000)['x'], label='ADVI');
sns.kdeplot(svgd_approx.sample(2000)['x'], label='SVGD');
sns.kdeplot(isrs_res['x'][:,0], label='SVGD+KDE-ISRS')


My issue is that this is entirely manual (outside of pymc3). The Interpolation trick used here is not particularly happy for higher-dimensions. Is there a pymc3 approach to this?

1 Like

Interesting - so you use the smoothed histogram of SVGD particle to do an importance sampling to get the posterior right? I am not sure there are any good solution for high dimension interpolation, but what might be possible is to do a k-mean clustering of the SVGD particles, and do another VI using Gaussian mixture as approximations and we just need to estimate the weight (since mu and sd would be available from k-mean already).

also cc @ferrine

Wow, it is really imptessive. My conserns are that in higher dimensionality svgd does is not expected to perform well.

Maybe if we totally ignore accuracy of approximate inference and focus on models we obtain, this might help to do better Bayesian averaging. But my experience+intuition suggests to try sghmc with cyclic learning rate as proposed in https://arxiv.org/abs/1902.03932. Can we do that?