Prior propagation by nonparametric copulas

Having participated in a number of conversations regarding updating priors, I thought about the best one could do when you observe only a sample \hat \theta_k \sim \theta | D_1, \dots, D_k; and seek to compute P[\theta|D_1, \dots, D_{k+1}].

In effect, the question is how to extend

https://docs.pymc.io/notebooks/updating_priors.html

in some reasonable way for multivariate parameters. Previous topics have suggested using multiple pm.Interpolated distributions:

or parametric approximations:

The former case only provides marginal information; and the latter case is parametric.

Clearly what is needed is a multivariate version of pm.Interpolated; and this is a step towards such a class. The approach is straightforward:

  1. Convert the margins of \theta to be normal using the empirical distribution
  2. Estimate the inverse mapping g: Z_\theta^{(i)} \rightarrow \theta^{(i)} using interpolation
  3. Let \theta = g(Z) with Z \sim N(\hat E [Z_\theta], \hat{\mathrm{Var}}[Z_\theta])

Much of this can be further optimized and cleaned. There are doubtless bugs, corner cases, and failures for even β€œreasonable” distributions (for one thing, the polynomial interpolation is not guaranteed to be nondecreasing!! There are ways to do this using QP but…) The major items here are

a) A theano implementation of polynomial interpolation

def interp(xnew, x0, y0, degree=3, n_interp=None, lib='theano'):
    """\
    Produce the values of an interpolated curve for (`x0`, `y0`) on the values `xnew`
    to ensure sortedness
    
    :input xnew:     a theano variable
    :input x0:       a numpy array
    :input y0:       a numpy array
    :input degree:   the degree of the polynomial approximation
    :input n_interp: the number of points to use for interpolation
    :input lib:      `theano` for theano, anything else for numpy
    
    :returns:  The predicted values `y` from the interpolation
    
    """
    i = np.argsort(x0)
    iy = np.argsort(y0)
    if n_interp:
        i = np.array([i[0]] + [i[int((j+1)*(len(i)-1)/(n_interp-1))] for j in range(n_interp-1)])
        iy = np.array([iy[0]] + [iy[int((j+1)*(len(iy)-1)/(n_interp-1))] for j in range(n_interp-1)])
    x, y = x0[i], y0[iy]
    # construct the polynomial matrix
    A = np.vstack([x ** (degree-k-1) for k in range(degree)]).T
    # extract the *sequence* of coefficients for interpolation
    alpha = list()
    for j in range(1, x.shape[0] - degree + 1):
        A_ = A[(j-1):(j+degree-1),:]
        y_ = y[(j-1):(j+degree-1)]
        a = np.linalg.solve(A_, y_)
        alpha.append(a)
    alpha = np.array(alpha)
    if lib is 'theano':
        x_, alpha_ = theano.shared(x), theano.shared(alpha)
        Anew = tt.stack([xnew ** (degree-k-1) for k in range(degree)], 0).T
        i = tt.extra_ops.searchsorted(x_, xnew)
        i = tt.switch(tt.gt(i, alpha_.shape[0]-1), alpha_.shape[0] - 1, i)
        ynew = (Anew * alpha_[i,:]).sum(-1)
    else:
        Anew = np.vstack([xnew ** (degree-k-1) for k in range(degree)]).T
        i = np.searchsorted(x, xnew)
        i[i >= alpha.shape[0]] = alpha.shape[0]-1
        ynew = (Anew * alpha[i,:]).sum(-1)
    return ynew

b) Tracking variable indexes through concatenation (there is likely some easier way to do this with a trace and a pm.Model context??). This only works for vector-valued variables (no matrices)

def transform_variables(trace, varnames):
    # given a trace, pull the `varnames` variables out of them
    # and Gaussian-transform them using the empirical quantiles
    var_blocks = dict()
    sampled_dists = list()
    emp_zscores = list() # will eventually make a matrix
    idx = 0
    for v in varnames:
        dat = trace[v]
        if dat.ndim > 1:
            quant = np.argsort(np.argsort(dat, axis=0), axis=0)/dat.shape[0]
            dat_sorted = np.sort(dat, axis=0)
            var_blocks[v] = np.arange(idx, idx+quant.shape[1])
            idx += quant.shape[1]
        else:
            quant = np.argsort(np.argsort(dat))/dat.shape[0]
            quant = quant.reshape((quant.shape[0],1))
            dat_sorted = np.sort(dat)
            dat_sorted = dat_sorted.reshape((dat_sorted.shape[0],1))
            var_blocks[v] = np.arange(idx, idx+1)
            idx += 1
        n = dat.shape[0]
        quant[quant < 1./(2*n)] = 1./(2*n)
        quant[quant > (1. - 1./(2*n))] = 1. - 1./(2*n)
        emp_zscores.append(sp.stats.norm.ppf(quant))
        sampled_dists.append(dat_sorted)
    emp_zscores = np.hstack(emp_zscores)
    sampled_dists = np.hstack(sampled_dists)
    return var_blocks, sampled_dists, emp_zscores

c) A helper function to extract an empirical Copula from a trace.

def catapp(lst):
    arr = lst.pop(0)
    for v in lst:
        arr = np.append(arr, v)
    return arr

def posterior_to_prior(trace, varnames, interp_kwargs):
    # extract the empirical CDFs and transformed variables
    blocks, mars, zscores = transform_variables(trace, varnames)
    # compute the correlation of transformed variables
    pcor = np.corrcoef(zscores.T)
    print(pcor)
    L_p = np.linalg.cholesky(pcor)
    pr_lat_z = pm.Normal('pr_copula_z__', 0., 1., shape=(mars.shape[1],))
    pr_lat = pm.Deterministic('pr_copula__', tt.dot(L_p, pr_lat_z))
    # transform the margins using a smoothed empirical transformation
    # Z_i --> X_i
    tx_vars = tt.zeros(mars.shape[1], dtype=theano.config.floatX)
    for i in range(mars.shape[1]):
        zmin, zmax = np.min(zscores[:,i]), np.max(zscores[:,i])
        xmin, xmax = np.min(mars[:,i]), np.max(mars[:,i])
        xrange, zrange = xmax - xmin, zmax - zmin
        # assume that the range ratio holds constant so that a 1 z-score increase
        # on either end proportionally increases the xrange.
        zlim_l, zlim_u = zmin - 0.5, zmax + 0.5
        xlim_l, xlim_u = xmin - 0.5 * xrange/zrange, xmax + 0.5 * xrange/zrange
        x = catapp([xlim_l, mars[:,i], xlim_u])
        z = catapp([zlim_l, zscores[:,i], zlim_u])
        ni = interp_kwargs.get('n_interp', 50)
        deg = interp_kwargs.get('degree', 5)
        tx_vars = tt.set_subtensor(tx_vars[i], interp(pr_lat[i], z, x, n_interp=ni, degree=deg, lib=tt))
    # reconstitute the variables as deterministic
    varmap = dict()
    for vn in varnames:
        varmap[vn] = pm.Deterministic(vn, tx_vars[blocks[vn]])
    return varmap

This doesn’t handle bounded distributions; and makes assumptions about extrapolation that are generally unwarranted (the extrapolation really neeeds work, to be honest).

Simulating a very basic linear model:

mu = np.array([0.75, -0.3])
S = np.array([[2., 0.35], [0.35, 2.3]])
L_S = np.linalg.cholesky(S)
alpha = 1.3
beta = np.array([2., 1.])
err_sd = 1.0

def get_data(k):
    Z = np.random.normal(0., 1., size=(2,k))
    Z = mu[:,None] + np.dot(L_S, Z)
    y = alpha + np.dot(beta, Z) + np.random.normal(0., err_sd, size=(k,))
    return y, Z

I can compare fitting on 50 datapoints to fitting on 5 datasets of 10 points each:

def get_params(y, X, ns=250):
    X_ = theano.shared(X.T)
    y_ = theano.shared(y)
    with pm.Model() as mod:
        a = pm.Normal('Intercept', 0., 1.)
        b = pm.Normal('Coeffs', 0., 1., shape=(2,))
        y_pred = pm.Deterministic('yp', tt.dot(X_, b) + a)
        s = pm.HalfNormal('err_sd', 1.)
        y_obs = pm.Normal('y_obs', mu=y_pred, sd=s, observed=y_)
        lik = pm.Deterministic('likelihood', pm.Normal.dist(y_pred, s).logp_sum(y_))
        trace = pm.sample(ns)
    return trace

n_sets = 5
set_size=10
def gen_sets(N, n):
    starts = [int(i * N/n) for i in range(n_sets)]
    ends = [int((i+1)*N/n) for i in range(n_sets)]
    return [np.arange(x,y) for x, y in zip(starts, ends)]

dat_big = get_data(n_sets * set_size)


subsets = [(dat_big[0][ix_], dat_big[1][:, ix_], dat_big[2][:, ix_])
            for ix_ in gen_sets(n_sets*set_size, n_sets)]

posterior = get_params(dat_big[0], dat_big[1], 1000)
margins = [get_params(dat[0], dat[1], 500) for dat in subsets]

Here the colored curves are each subposterior, and the black curve is the combined [concatenated] density

image

This contrasts the concatenated (e.g. marginal) density with the true posterior

image

Sequentially fitting with the NPC:

ns = 1500
traces = list()
trace = None
for i, (y, X, fx) in enumerate(subsets):
    X_ = theano.shared(X.T)
    y_ = theano.shared(y)
    with pm.Model() as mod:
        if trace is None:
            a = pm.Normal('Intercept', 0., 1.)
            b = pm.Normal('Coeffs', 0., 1., shape=(2,))
            sl = pm.Normal('err_log', 0., 1.)
        else:
            mm = posterior_to_prior(trace, ['Intercept', 'Coeffs', 'err_log'],
                                   interp_kwargs={'n_interp': 150, 'degree': 3})
            a = mm['Intercept']
            b = mm['Coeffs']
            sl = mm['err_log']
        y_pred = pm.Deterministic('yp', tt.dot(X_, b) + a)
        s = pm.Deterministic('err_sd', tt.exp(sl))
        y_obs = pm.Normal('y_obs', mu=y_pred, sd=s, observed=y_)
        lik = pm.Deterministic('likelihood', pm.Normal.dist(y_pred, s).logp_sum(y_))
        trace = pm.sample(ns)
        traces.append(trace)

NUTS: [err_log, Coeffs, Intercept]
Sampling 4 chains: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 8000/8000 [00:02<00:00, 2817.83draws/s]
NUTS: [pr_copula]
Sampling 4 chains: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 8000/8000 [00:04<00:00, 1980.07draws/s]
NUTS: [pr_copula]
Sampling 4 chains: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 8000/8000 [00:06<00:00, 1147.32draws/s]
NUTS: [pr_copula]
Sampling 4 chains: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 8000/8000 [00:08<00:00, 916.81draws/s]

This shows a sequence of posteriors that converge nicely to (very close to) the true posterior:

for tr in traces:
    sbn.kdeplot(tr['Coeffs'][:,0])
    c0.append(tr['Coeffs'][:,0])
sbn.kdeplot(posterior['Coeffs'][:,0], color='black')

image

The copula correlations are entirely reasonable:

[[ 1.         -0.31719993  0.11464361 -0.06592688]
 [-0.31719993  1.         -0.00966127 -0.02764248]
 [ 0.11464361 -0.00966127  1.         -0.09797898]
 [-0.06592688 -0.02764248 -0.09797898  1.        ]]

Please play around with this; and help to get it in a more general and much improved form.

9 Likes