Having participated in a number of conversations regarding updating priors, I thought about the best one could do when you observe only a sample \hat \theta_k \sim \theta | D_1, \dots, D_k; and seek to compute P[\theta|D_1, \dots, D_{k+1}].
In effect, the question is how to extend
https://docs.pymc.io/notebooks/updating_priors.html
in some reasonable way for multivariate parameters. Previous topics have suggested using multiple pm.Interpolated
distributions:
or parametric approximations:
The former case only provides marginal information; and the latter case is parametric.
Clearly what is needed is a multivariate version of pm.Interpolated
; and this is a step towards such a class. The approach is straightforward:
- Convert the margins of \theta to be normal using the empirical distribution
- Estimate the inverse mapping g: Z_\theta^{(i)} \rightarrow \theta^{(i)} using interpolation
- Let \theta = g(Z) with Z \sim N(\hat E [Z_\theta], \hat{\mathrm{Var}}[Z_\theta])
Much of this can be further optimized and cleaned. There are doubtless bugs, corner cases, and failures for even βreasonableβ distributions (for one thing, the polynomial interpolation is not guaranteed to be nondecreasing!! There are ways to do this using QP butβ¦) The major items here are
a) A theano implementation of polynomial interpolation
def interp(xnew, x0, y0, degree=3, n_interp=None, lib='theano'):
"""\
Produce the values of an interpolated curve for (`x0`, `y0`) on the values `xnew`
to ensure sortedness
:input xnew: a theano variable
:input x0: a numpy array
:input y0: a numpy array
:input degree: the degree of the polynomial approximation
:input n_interp: the number of points to use for interpolation
:input lib: `theano` for theano, anything else for numpy
:returns: The predicted values `y` from the interpolation
"""
i = np.argsort(x0)
iy = np.argsort(y0)
if n_interp:
i = np.array([i[0]] + [i[int((j+1)*(len(i)-1)/(n_interp-1))] for j in range(n_interp-1)])
iy = np.array([iy[0]] + [iy[int((j+1)*(len(iy)-1)/(n_interp-1))] for j in range(n_interp-1)])
x, y = x0[i], y0[iy]
# construct the polynomial matrix
A = np.vstack([x ** (degree-k-1) for k in range(degree)]).T
# extract the *sequence* of coefficients for interpolation
alpha = list()
for j in range(1, x.shape[0] - degree + 1):
A_ = A[(j-1):(j+degree-1),:]
y_ = y[(j-1):(j+degree-1)]
a = np.linalg.solve(A_, y_)
alpha.append(a)
alpha = np.array(alpha)
if lib is 'theano':
x_, alpha_ = theano.shared(x), theano.shared(alpha)
Anew = tt.stack([xnew ** (degree-k-1) for k in range(degree)], 0).T
i = tt.extra_ops.searchsorted(x_, xnew)
i = tt.switch(tt.gt(i, alpha_.shape[0]-1), alpha_.shape[0] - 1, i)
ynew = (Anew * alpha_[i,:]).sum(-1)
else:
Anew = np.vstack([xnew ** (degree-k-1) for k in range(degree)]).T
i = np.searchsorted(x, xnew)
i[i >= alpha.shape[0]] = alpha.shape[0]-1
ynew = (Anew * alpha[i,:]).sum(-1)
return ynew
b) Tracking variable indexes through concatenation (there is likely some easier way to do this with a trace and a pm.Model
context??). This only works for vector-valued variables (no matrices)
def transform_variables(trace, varnames):
# given a trace, pull the `varnames` variables out of them
# and Gaussian-transform them using the empirical quantiles
var_blocks = dict()
sampled_dists = list()
emp_zscores = list() # will eventually make a matrix
idx = 0
for v in varnames:
dat = trace[v]
if dat.ndim > 1:
quant = np.argsort(np.argsort(dat, axis=0), axis=0)/dat.shape[0]
dat_sorted = np.sort(dat, axis=0)
var_blocks[v] = np.arange(idx, idx+quant.shape[1])
idx += quant.shape[1]
else:
quant = np.argsort(np.argsort(dat))/dat.shape[0]
quant = quant.reshape((quant.shape[0],1))
dat_sorted = np.sort(dat)
dat_sorted = dat_sorted.reshape((dat_sorted.shape[0],1))
var_blocks[v] = np.arange(idx, idx+1)
idx += 1
n = dat.shape[0]
quant[quant < 1./(2*n)] = 1./(2*n)
quant[quant > (1. - 1./(2*n))] = 1. - 1./(2*n)
emp_zscores.append(sp.stats.norm.ppf(quant))
sampled_dists.append(dat_sorted)
emp_zscores = np.hstack(emp_zscores)
sampled_dists = np.hstack(sampled_dists)
return var_blocks, sampled_dists, emp_zscores
c) A helper function to extract an empirical Copula from a trace.
def catapp(lst):
arr = lst.pop(0)
for v in lst:
arr = np.append(arr, v)
return arr
def posterior_to_prior(trace, varnames, interp_kwargs):
# extract the empirical CDFs and transformed variables
blocks, mars, zscores = transform_variables(trace, varnames)
# compute the correlation of transformed variables
pcor = np.corrcoef(zscores.T)
print(pcor)
L_p = np.linalg.cholesky(pcor)
pr_lat_z = pm.Normal('pr_copula_z__', 0., 1., shape=(mars.shape[1],))
pr_lat = pm.Deterministic('pr_copula__', tt.dot(L_p, pr_lat_z))
# transform the margins using a smoothed empirical transformation
# Z_i --> X_i
tx_vars = tt.zeros(mars.shape[1], dtype=theano.config.floatX)
for i in range(mars.shape[1]):
zmin, zmax = np.min(zscores[:,i]), np.max(zscores[:,i])
xmin, xmax = np.min(mars[:,i]), np.max(mars[:,i])
xrange, zrange = xmax - xmin, zmax - zmin
# assume that the range ratio holds constant so that a 1 z-score increase
# on either end proportionally increases the xrange.
zlim_l, zlim_u = zmin - 0.5, zmax + 0.5
xlim_l, xlim_u = xmin - 0.5 * xrange/zrange, xmax + 0.5 * xrange/zrange
x = catapp([xlim_l, mars[:,i], xlim_u])
z = catapp([zlim_l, zscores[:,i], zlim_u])
ni = interp_kwargs.get('n_interp', 50)
deg = interp_kwargs.get('degree', 5)
tx_vars = tt.set_subtensor(tx_vars[i], interp(pr_lat[i], z, x, n_interp=ni, degree=deg, lib=tt))
# reconstitute the variables as deterministic
varmap = dict()
for vn in varnames:
varmap[vn] = pm.Deterministic(vn, tx_vars[blocks[vn]])
return varmap
This doesnβt handle bounded distributions; and makes assumptions about extrapolation that are generally unwarranted (the extrapolation really neeeds work, to be honest).
Simulating a very basic linear model:
mu = np.array([0.75, -0.3])
S = np.array([[2., 0.35], [0.35, 2.3]])
L_S = np.linalg.cholesky(S)
alpha = 1.3
beta = np.array([2., 1.])
err_sd = 1.0
def get_data(k):
Z = np.random.normal(0., 1., size=(2,k))
Z = mu[:,None] + np.dot(L_S, Z)
y = alpha + np.dot(beta, Z) + np.random.normal(0., err_sd, size=(k,))
return y, Z
I can compare fitting on 50 datapoints to fitting on 5 datasets of 10 points each:
def get_params(y, X, ns=250):
X_ = theano.shared(X.T)
y_ = theano.shared(y)
with pm.Model() as mod:
a = pm.Normal('Intercept', 0., 1.)
b = pm.Normal('Coeffs', 0., 1., shape=(2,))
y_pred = pm.Deterministic('yp', tt.dot(X_, b) + a)
s = pm.HalfNormal('err_sd', 1.)
y_obs = pm.Normal('y_obs', mu=y_pred, sd=s, observed=y_)
lik = pm.Deterministic('likelihood', pm.Normal.dist(y_pred, s).logp_sum(y_))
trace = pm.sample(ns)
return trace
n_sets = 5
set_size=10
def gen_sets(N, n):
starts = [int(i * N/n) for i in range(n_sets)]
ends = [int((i+1)*N/n) for i in range(n_sets)]
return [np.arange(x,y) for x, y in zip(starts, ends)]
dat_big = get_data(n_sets * set_size)
subsets = [(dat_big[0][ix_], dat_big[1][:, ix_], dat_big[2][:, ix_])
for ix_ in gen_sets(n_sets*set_size, n_sets)]
posterior = get_params(dat_big[0], dat_big[1], 1000)
margins = [get_params(dat[0], dat[1], 500) for dat in subsets]
Here the colored curves are each subposterior, and the black curve is the combined [concatenated] density
This contrasts the concatenated (e.g. marginal) density with the true posterior
Sequentially fitting with the NPC:
ns = 1500
traces = list()
trace = None
for i, (y, X, fx) in enumerate(subsets):
X_ = theano.shared(X.T)
y_ = theano.shared(y)
with pm.Model() as mod:
if trace is None:
a = pm.Normal('Intercept', 0., 1.)
b = pm.Normal('Coeffs', 0., 1., shape=(2,))
sl = pm.Normal('err_log', 0., 1.)
else:
mm = posterior_to_prior(trace, ['Intercept', 'Coeffs', 'err_log'],
interp_kwargs={'n_interp': 150, 'degree': 3})
a = mm['Intercept']
b = mm['Coeffs']
sl = mm['err_log']
y_pred = pm.Deterministic('yp', tt.dot(X_, b) + a)
s = pm.Deterministic('err_sd', tt.exp(sl))
y_obs = pm.Normal('y_obs', mu=y_pred, sd=s, observed=y_)
lik = pm.Deterministic('likelihood', pm.Normal.dist(y_pred, s).logp_sum(y_))
trace = pm.sample(ns)
traces.append(trace)
NUTS: [err_log, Coeffs, Intercept]
Sampling 4 chains: 100%|ββββββββββ| 8000/8000 [00:02<00:00, 2817.83draws/s]
NUTS: [pr_copula]
Sampling 4 chains: 100%|ββββββββββ| 8000/8000 [00:04<00:00, 1980.07draws/s]
NUTS: [pr_copula]
Sampling 4 chains: 100%|ββββββββββ| 8000/8000 [00:06<00:00, 1147.32draws/s]
NUTS: [pr_copula]
Sampling 4 chains: 100%|ββββββββββ| 8000/8000 [00:08<00:00, 916.81draws/s]
This shows a sequence of posteriors that converge nicely to (very close to) the true posterior:
for tr in traces:
sbn.kdeplot(tr['Coeffs'][:,0])
c0.append(tr['Coeffs'][:,0])
sbn.kdeplot(posterior['Coeffs'][:,0], color='black')
The copula correlations are entirely reasonable:
[[ 1. -0.31719993 0.11464361 -0.06592688]
[-0.31719993 1. -0.00966127 -0.02764248]
[ 0.11464361 -0.00966127 1. -0.09797898]
[-0.06592688 -0.02764248 -0.09797898 1. ]]
Please play around with this; and help to get it in a more general and much improved form.