# Prior propagation by nonparametric copulas

Having participated in a number of conversations regarding updating priors, I thought about the best one could do when you observe only a sample \hat \theta_k \sim \theta | D_1, \dots, D_k; and seek to compute P[\theta|D_1, \dots, D_{k+1}].

In effect, the question is how to extend

https://docs.pymc.io/notebooks/updating_priors.html

in some reasonable way for multivariate parameters. Previous topics have suggested using multiple pm.Interpolated distributions:

or parametric approximations:

The former case only provides marginal information; and the latter case is parametric.

Clearly what is needed is a multivariate version of pm.Interpolated; and this is a step towards such a class. The approach is straightforward:

1. Convert the margins of \theta to be normal using the empirical distribution
2. Estimate the inverse mapping g: Z_\theta^{(i)} \rightarrow \theta^{(i)} using interpolation
3. Let \theta = g(Z) with Z \sim N(\hat E [Z_\theta], \hat{\mathrm{Var}}[Z_\theta])

Much of this can be further optimized and cleaned. There are doubtless bugs, corner cases, and failures for even βreasonableβ distributions (for one thing, the polynomial interpolation is not guaranteed to be nondecreasing!! There are ways to do this using QP butβ¦) The major items here are

a) A theano implementation of polynomial interpolation

def interp(xnew, x0, y0, degree=3, n_interp=None, lib='theano'):
"""\
Produce the values of an interpolated curve for (x0, y0) on the values xnew
to ensure sortedness

:input xnew:     a theano variable
:input x0:       a numpy array
:input y0:       a numpy array
:input degree:   the degree of the polynomial approximation
:input n_interp: the number of points to use for interpolation
:input lib:      theano for theano, anything else for numpy

:returns:  The predicted values y from the interpolation

"""
i = np.argsort(x0)
iy = np.argsort(y0)
if n_interp:
i = np.array([i[0]] + [i[int((j+1)*(len(i)-1)/(n_interp-1))] for j in range(n_interp-1)])
iy = np.array([iy[0]] + [iy[int((j+1)*(len(iy)-1)/(n_interp-1))] for j in range(n_interp-1)])
x, y = x0[i], y0[iy]
# construct the polynomial matrix
A = np.vstack([x ** (degree-k-1) for k in range(degree)]).T
# extract the *sequence* of coefficients for interpolation
alpha = list()
for j in range(1, x.shape[0] - degree + 1):
A_ = A[(j-1):(j+degree-1),:]
y_ = y[(j-1):(j+degree-1)]
a = np.linalg.solve(A_, y_)
alpha.append(a)
alpha = np.array(alpha)
if lib is 'theano':
x_, alpha_ = theano.shared(x), theano.shared(alpha)
Anew = tt.stack([xnew ** (degree-k-1) for k in range(degree)], 0).T
i = tt.extra_ops.searchsorted(x_, xnew)
i = tt.switch(tt.gt(i, alpha_.shape[0]-1), alpha_.shape[0] - 1, i)
ynew = (Anew * alpha_[i,:]).sum(-1)
else:
Anew = np.vstack([xnew ** (degree-k-1) for k in range(degree)]).T
i = np.searchsorted(x, xnew)
i[i >= alpha.shape[0]] = alpha.shape[0]-1
ynew = (Anew * alpha[i,:]).sum(-1)
return ynew


b) Tracking variable indexes through concatenation (there is likely some easier way to do this with a trace and a pm.Model context??). This only works for vector-valued variables (no matrices)

def transform_variables(trace, varnames):
# given a trace, pull the varnames variables out of them
# and Gaussian-transform them using the empirical quantiles
var_blocks = dict()
sampled_dists = list()
emp_zscores = list() # will eventually make a matrix
idx = 0
for v in varnames:
dat = trace[v]
if dat.ndim > 1:
quant = np.argsort(np.argsort(dat, axis=0), axis=0)/dat.shape[0]
dat_sorted = np.sort(dat, axis=0)
var_blocks[v] = np.arange(idx, idx+quant.shape[1])
idx += quant.shape[1]
else:
quant = np.argsort(np.argsort(dat))/dat.shape[0]
quant = quant.reshape((quant.shape[0],1))
dat_sorted = np.sort(dat)
dat_sorted = dat_sorted.reshape((dat_sorted.shape[0],1))
var_blocks[v] = np.arange(idx, idx+1)
idx += 1
n = dat.shape[0]
quant[quant < 1./(2*n)] = 1./(2*n)
quant[quant > (1. - 1./(2*n))] = 1. - 1./(2*n)
emp_zscores.append(sp.stats.norm.ppf(quant))
sampled_dists.append(dat_sorted)
emp_zscores = np.hstack(emp_zscores)
sampled_dists = np.hstack(sampled_dists)
return var_blocks, sampled_dists, emp_zscores


c) A helper function to extract an empirical Copula from a trace.

def catapp(lst):
arr = lst.pop(0)
for v in lst:
arr = np.append(arr, v)
return arr

def posterior_to_prior(trace, varnames, interp_kwargs):
# extract the empirical CDFs and transformed variables
blocks, mars, zscores = transform_variables(trace, varnames)
# compute the correlation of transformed variables
pcor = np.corrcoef(zscores.T)
print(pcor)
L_p = np.linalg.cholesky(pcor)
pr_lat_z = pm.Normal('pr_copula_z__', 0., 1., shape=(mars.shape[1],))
pr_lat = pm.Deterministic('pr_copula__', tt.dot(L_p, pr_lat_z))
# transform the margins using a smoothed empirical transformation
# Z_i --> X_i
tx_vars = tt.zeros(mars.shape[1], dtype=theano.config.floatX)
for i in range(mars.shape[1]):
zmin, zmax = np.min(zscores[:,i]), np.max(zscores[:,i])
xmin, xmax = np.min(mars[:,i]), np.max(mars[:,i])
xrange, zrange = xmax - xmin, zmax - zmin
# assume that the range ratio holds constant so that a 1 z-score increase
# on either end proportionally increases the xrange.
zlim_l, zlim_u = zmin - 0.5, zmax + 0.5
xlim_l, xlim_u = xmin - 0.5 * xrange/zrange, xmax + 0.5 * xrange/zrange
x = catapp([xlim_l, mars[:,i], xlim_u])
z = catapp([zlim_l, zscores[:,i], zlim_u])
ni = interp_kwargs.get('n_interp', 50)
deg = interp_kwargs.get('degree', 5)
tx_vars = tt.set_subtensor(tx_vars[i], interp(pr_lat[i], z, x, n_interp=ni, degree=deg, lib=tt))
# reconstitute the variables as deterministic
varmap = dict()
for vn in varnames:
varmap[vn] = pm.Deterministic(vn, tx_vars[blocks[vn]])
return varmap


This doesnβt handle bounded distributions; and makes assumptions about extrapolation that are generally unwarranted (the extrapolation really neeeds work, to be honest).

Simulating a very basic linear model:

mu = np.array([0.75, -0.3])
S = np.array([[2., 0.35], [0.35, 2.3]])
L_S = np.linalg.cholesky(S)
alpha = 1.3
beta = np.array([2., 1.])
err_sd = 1.0

def get_data(k):
Z = np.random.normal(0., 1., size=(2,k))
Z = mu[:,None] + np.dot(L_S, Z)
y = alpha + np.dot(beta, Z) + np.random.normal(0., err_sd, size=(k,))
return y, Z



I can compare fitting on 50 datapoints to fitting on 5 datasets of 10 points each:

def get_params(y, X, ns=250):
X_ = theano.shared(X.T)
y_ = theano.shared(y)
with pm.Model() as mod:
a = pm.Normal('Intercept', 0., 1.)
b = pm.Normal('Coeffs', 0., 1., shape=(2,))
y_pred = pm.Deterministic('yp', tt.dot(X_, b) + a)
s = pm.HalfNormal('err_sd', 1.)
y_obs = pm.Normal('y_obs', mu=y_pred, sd=s, observed=y_)
lik = pm.Deterministic('likelihood', pm.Normal.dist(y_pred, s).logp_sum(y_))
trace = pm.sample(ns)
return trace

n_sets = 5
set_size=10
def gen_sets(N, n):
starts = [int(i * N/n) for i in range(n_sets)]
ends = [int((i+1)*N/n) for i in range(n_sets)]
return [np.arange(x,y) for x, y in zip(starts, ends)]

dat_big = get_data(n_sets * set_size)

subsets = [(dat_big[0][ix_], dat_big[1][:, ix_], dat_big[2][:, ix_])
for ix_ in gen_sets(n_sets*set_size, n_sets)]

posterior = get_params(dat_big[0], dat_big[1], 1000)
margins = [get_params(dat[0], dat[1], 500) for dat in subsets]


Here the colored curves are each subposterior, and the black curve is the combined [concatenated] density

This contrasts the concatenated (e.g. marginal) density with the true posterior

Sequentially fitting with the NPC:

ns = 1500
traces = list()
trace = None
for i, (y, X, fx) in enumerate(subsets):
X_ = theano.shared(X.T)
y_ = theano.shared(y)
with pm.Model() as mod:
if trace is None:
a = pm.Normal('Intercept', 0., 1.)
b = pm.Normal('Coeffs', 0., 1., shape=(2,))
sl = pm.Normal('err_log', 0., 1.)
else:
mm = posterior_to_prior(trace, ['Intercept', 'Coeffs', 'err_log'],
interp_kwargs={'n_interp': 150, 'degree': 3})
a = mm['Intercept']
b = mm['Coeffs']
sl = mm['err_log']
y_pred = pm.Deterministic('yp', tt.dot(X_, b) + a)
s = pm.Deterministic('err_sd', tt.exp(sl))
y_obs = pm.Normal('y_obs', mu=y_pred, sd=s, observed=y_)
lik = pm.Deterministic('likelihood', pm.Normal.dist(y_pred, s).logp_sum(y_))
trace = pm.sample(ns)
traces.append(trace)

NUTS: [err_log, Coeffs, Intercept]
Sampling 4 chains: 100%|ββββββββββ| 8000/8000 [00:02<00:00, 2817.83draws/s]
NUTS: [pr_copula]
Sampling 4 chains: 100%|ββββββββββ| 8000/8000 [00:04<00:00, 1980.07draws/s]
NUTS: [pr_copula]
Sampling 4 chains: 100%|ββββββββββ| 8000/8000 [00:06<00:00, 1147.32draws/s]
NUTS: [pr_copula]
Sampling 4 chains: 100%|ββββββββββ| 8000/8000 [00:08<00:00, 916.81draws/s]



This shows a sequence of posteriors that converge nicely to (very close to) the true posterior:

for tr in traces:
sbn.kdeplot(tr['Coeffs'][:,0])
c0.append(tr['Coeffs'][:,0])
sbn.kdeplot(posterior['Coeffs'][:,0], color='black')


The copula correlations are entirely reasonable:

[[ 1.         -0.31719993  0.11464361 -0.06592688]
[-0.31719993  1.         -0.00966127 -0.02764248]
[ 0.11464361 -0.00966127  1.         -0.09797898]
[-0.06592688 -0.02764248 -0.09797898  1.        ]]


Please play around with this; and help to get it in a more general and much improved form.

9 Likes