Prior propagation by nonparametric copulas

#1

Having participated in a number of conversations regarding updating priors, I thought about the best one could do when you observe only a sample \hat \theta_k \sim \theta | D_1, \dots, D_k; and seek to compute P[\theta|D_1, \dots, D_{k+1}].

In effect, the question is how to extend

https://docs.pymc.io/notebooks/updating_priors.html

in some reasonable way for multivariate parameters. Previous topics have suggested using multiple pm.Interpolated distributions:

or parametric approximations:

The former case only provides marginal information; and the latter case is parametric.

Clearly what is needed is a multivariate version of pm.Interpolated; and this is a step towards such a class. The approach is straightforward:

  1. Convert the margins of \theta to be normal using the empirical distribution
  2. Estimate the inverse mapping g: Z_\theta^{(i)} \rightarrow \theta^{(i)} using interpolation
  3. Let \theta = g(Z) with Z \sim N(\hat E [Z_\theta], \hat{\mathrm{Var}}[Z_\theta])

Much of this can be further optimized and cleaned. There are doubtless bugs, corner cases, and failures for even “reasonable” distributions (for one thing, the polynomial interpolation is not guaranteed to be nondecreasing!! There are ways to do this using QP but…) The major items here are

a) A theano implementation of polynomial interpolation

def interp(xnew, x0, y0, degree=3, n_interp=None, lib='theano'):
    """\
    Produce the values of an interpolated curve for (`x0`, `y0`) on the values `xnew`
    to ensure sortedness
    
    :input xnew:     a theano variable
    :input x0:       a numpy array
    :input y0:       a numpy array
    :input degree:   the degree of the polynomial approximation
    :input n_interp: the number of points to use for interpolation
    :input lib:      `theano` for theano, anything else for numpy
    
    :returns:  The predicted values `y` from the interpolation
    
    """
    i = np.argsort(x0)
    iy = np.argsort(y0)
    if n_interp:
        i = np.array([i[0]] + [i[int((j+1)*(len(i)-1)/(n_interp-1))] for j in range(n_interp-1)])
        iy = np.array([iy[0]] + [iy[int((j+1)*(len(iy)-1)/(n_interp-1))] for j in range(n_interp-1)])
    x, y = x0[i], y0[iy]
    # construct the polynomial matrix
    A = np.vstack([x ** (degree-k-1) for k in range(degree)]).T
    # extract the *sequence* of coefficients for interpolation
    alpha = list()
    for j in range(1, x.shape[0] - degree + 1):
        A_ = A[(j-1):(j+degree-1),:]
        y_ = y[(j-1):(j+degree-1)]
        a = np.linalg.solve(A_, y_)
        alpha.append(a)
    alpha = np.array(alpha)
    if lib is 'theano':
        x_, alpha_ = theano.shared(x), theano.shared(alpha)
        Anew = tt.stack([xnew ** (degree-k-1) for k in range(degree)], 0).T
        i = tt.extra_ops.searchsorted(x_, xnew)
        i = tt.switch(tt.gt(i, alpha_.shape[0]-1), alpha_.shape[0] - 1, i)
        ynew = (Anew * alpha_[i,:]).sum(-1)
    else:
        Anew = np.vstack([xnew ** (degree-k-1) for k in range(degree)]).T
        i = np.searchsorted(x, xnew)
        i[i >= alpha.shape[0]] = alpha.shape[0]-1
        ynew = (Anew * alpha[i,:]).sum(-1)
    return ynew

b) Tracking variable indexes through concatenation (there is likely some easier way to do this with a trace and a pm.Model context??). This only works for vector-valued variables (no matrices)

def transform_variables(trace, varnames):
    # given a trace, pull the `varnames` variables out of them
    # and Gaussian-transform them using the empirical quantiles
    var_blocks = dict()
    sampled_dists = list()
    emp_zscores = list() # will eventually make a matrix
    idx = 0
    for v in varnames:
        dat = trace[v]
        if dat.ndim > 1:
            quant = np.argsort(np.argsort(dat, axis=0), axis=0)/dat.shape[0]
            dat_sorted = np.sort(dat, axis=0)
            var_blocks[v] = np.arange(idx, idx+quant.shape[1])
            idx += quant.shape[1]
        else:
            quant = np.argsort(np.argsort(dat))/dat.shape[0]
            quant = quant.reshape((quant.shape[0],1))
            dat_sorted = np.sort(dat)
            dat_sorted = dat_sorted.reshape((dat_sorted.shape[0],1))
            var_blocks[v] = np.arange(idx, idx+1)
            idx += 1
        n = dat.shape[0]
        quant[quant < 1./(2*n)] = 1./(2*n)
        quant[quant > (1. - 1./(2*n))] = 1. - 1./(2*n)
        emp_zscores.append(sp.stats.norm.ppf(quant))
        sampled_dists.append(dat_sorted)
    emp_zscores = np.hstack(emp_zscores)
    sampled_dists = np.hstack(sampled_dists)
    return var_blocks, sampled_dists, emp_zscores

c) A helper function to extract an empirical Copula from a trace.

def catapp(lst):
    arr = lst.pop(0)
    for v in lst:
        arr = np.append(arr, v)
    return arr

def posterior_to_prior(trace, varnames, interp_kwargs):
    # extract the empirical CDFs and transformed variables
    blocks, mars, zscores = transform_variables(trace, varnames)
    # compute the correlation of transformed variables
    pcor = np.corrcoef(zscores.T)
    print(pcor)
    L_p = np.linalg.cholesky(pcor)
    pr_lat_z = pm.Normal('pr_copula_z__', 0., 1., shape=(mars.shape[1],))
    pr_lat = pm.Deterministic('pr_copula__', tt.dot(L_p, pr_lat_z))
    # transform the margins using a smoothed empirical transformation
    # Z_i --> X_i
    tx_vars = tt.zeros(mars.shape[1], dtype=theano.config.floatX)
    for i in range(mars.shape[1]):
        zmin, zmax = np.min(zscores[:,i]), np.max(zscores[:,i])
        xmin, xmax = np.min(mars[:,i]), np.max(mars[:,i])
        xrange, zrange = xmax - xmin, zmax - zmin
        # assume that the range ratio holds constant so that a 1 z-score increase
        # on either end proportionally increases the xrange.
        zlim_l, zlim_u = zmin - 0.5, zmax + 0.5
        xlim_l, xlim_u = xmin - 0.5 * xrange/zrange, xmax + 0.5 * xrange/zrange
        x = catapp([xlim_l, mars[:,i], xlim_u])
        z = catapp([zlim_l, zscores[:,i], zlim_u])
        ni = interp_kwargs.get('n_interp', 50)
        deg = interp_kwargs.get('degree', 5)
        tx_vars = tt.set_subtensor(tx_vars[i], interp(pr_lat[i], z, x, n_interp=ni, degree=deg, lib=tt))
    # reconstitute the variables as deterministic
    varmap = dict()
    for vn in varnames:
        varmap[vn] = pm.Deterministic(vn, tx_vars[blocks[vn]])
    return varmap

This doesn’t handle bounded distributions; and makes assumptions about extrapolation that are generally unwarranted (the extrapolation really neeeds work, to be honest).

Simulating a very basic linear model:

mu = np.array([0.75, -0.3])
S = np.array([[2., 0.35], [0.35, 2.3]])
L_S = np.linalg.cholesky(S)
alpha = 1.3
beta = np.array([2., 1.])
err_sd = 1.0

def get_data(k):
    Z = np.random.normal(0., 1., size=(2,k))
    Z = mu[:,None] + np.dot(L_S, Z)
    y = alpha + np.dot(beta, Z) + np.random.normal(0., err_sd, size=(k,))
    return y, Z

I can compare fitting on 50 datapoints to fitting on 5 datasets of 10 points each:

def get_params(y, X, ns=250):
    X_ = theano.shared(X.T)
    y_ = theano.shared(y)
    with pm.Model() as mod:
        a = pm.Normal('Intercept', 0., 1.)
        b = pm.Normal('Coeffs', 0., 1., shape=(2,))
        y_pred = pm.Deterministic('yp', tt.dot(X_, b) + a)
        s = pm.HalfNormal('err_sd', 1.)
        y_obs = pm.Normal('y_obs', mu=y_pred, sd=s, observed=y_)
        lik = pm.Deterministic('likelihood', pm.Normal.dist(y_pred, s).logp_sum(y_))
        trace = pm.sample(ns)
    return trace

n_sets = 5
set_size=10
def gen_sets(N, n):
    starts = [int(i * N/n) for i in range(n_sets)]
    ends = [int((i+1)*N/n) for i in range(n_sets)]
    return [np.arange(x,y) for x, y in zip(starts, ends)]

dat_big = get_data(n_sets * set_size)


subsets = [(dat_big[0][ix_], dat_big[1][:, ix_], dat_big[2][:, ix_])
            for ix_ in gen_sets(n_sets*set_size, n_sets)]

posterior = get_params(dat_big[0], dat_big[1], 1000)
margins = [get_params(dat[0], dat[1], 500) for dat in subsets]

Here the colored curves are each subposterior, and the black curve is the combined [concatenated] density

image

This contrasts the concatenated (e.g. marginal) density with the true posterior

image

Sequentially fitting with the NPC:

ns = 1500
traces = list()
trace = None
for i, (y, X, fx) in enumerate(subsets):
    X_ = theano.shared(X.T)
    y_ = theano.shared(y)
    with pm.Model() as mod:
        if trace is None:
            a = pm.Normal('Intercept', 0., 1.)
            b = pm.Normal('Coeffs', 0., 1., shape=(2,))
            sl = pm.Normal('err_log', 0., 1.)
        else:
            mm = posterior_to_prior(trace, ['Intercept', 'Coeffs', 'err_log'],
                                   interp_kwargs={'n_interp': 150, 'degree': 3})
            a = mm['Intercept']
            b = mm['Coeffs']
            sl = mm['err_log']
        y_pred = pm.Deterministic('yp', tt.dot(X_, b) + a)
        s = pm.Deterministic('err_sd', tt.exp(sl))
        y_obs = pm.Normal('y_obs', mu=y_pred, sd=s, observed=y_)
        lik = pm.Deterministic('likelihood', pm.Normal.dist(y_pred, s).logp_sum(y_))
        trace = pm.sample(ns)
        traces.append(trace)

NUTS: [err_log, Coeffs, Intercept]
Sampling 4 chains: 100%|██████████| 8000/8000 [00:02<00:00, 2817.83draws/s]
NUTS: [pr_copula]
Sampling 4 chains: 100%|██████████| 8000/8000 [00:04<00:00, 1980.07draws/s]
NUTS: [pr_copula]
Sampling 4 chains: 100%|██████████| 8000/8000 [00:06<00:00, 1147.32draws/s]
NUTS: [pr_copula]
Sampling 4 chains: 100%|██████████| 8000/8000 [00:08<00:00, 916.81draws/s]

This shows a sequence of posteriors that converge nicely to (very close to) the true posterior:

for tr in traces:
    sbn.kdeplot(tr['Coeffs'][:,0])
    c0.append(tr['Coeffs'][:,0])
sbn.kdeplot(posterior['Coeffs'][:,0], color='black')

image

The copula correlations are entirely reasonable:

[[ 1.         -0.31719993  0.11464361 -0.06592688]
 [-0.31719993  1.         -0.00966127 -0.02764248]
 [ 0.11464361 -0.00966127  1.         -0.09797898]
 [-0.06592688 -0.02764248 -0.09797898  1.        ]]

Please play around with this; and help to get it in a more general and much improved form.

5 Likes
Using MCMC based model in closed loop