I’m trying to model a linear trend subject to some (more or less) known spatial covariance, and I have a lot of points, so I’d like to model the trend as a line, model the spatial covariance with a 2D GP, and use minibatch to parcel everything out into friendlier data sizes.
Following the example of this tutorial, I create minibatches of the spatial data, the variable underlying the linear model, and the observations:
indep_m, dep_m, spatial_X_m = pm.Minibatch(indep), \ # 'x' for linear model
pm.Minibatch(dep), \ # observations
pm.Minibatch(spatial_X) # spatial coords
I then create a model that uses the minibatches
with pm.Model() as metgrad_linearmodel_mini:
# Define priors on linear regression
sigma = pm.HalfCauchy('logOH12-sigma', beta=.5, testval=.1)
intercept = pm.Normal('logOH12-x0', 8.5, sigma=20)
x_coeff = pm.Normal('logOH12-slope', 0, sigma=20)
# Define priors on spatial covariance
psf_ls = 1.5
psf_eta = pm.HalfCauchy('PSF-eta', beta=.1, testval=.01)
# Define the GP itself
psf_cov = psf_eta**2. * pm.gp.cov.ExpQuad(input_dim=2, ls=psf_ls)
psf_gp = pm.gp.Latent(cov_func=psf_cov)
noise_from_psf = psf_gp.prior('PSF', spatial_X_m, shape=dep_m.shape.eval())
met_at_indep = pm.Deterministic(
'logOH12-at-indep',
intercept + (x_coeff * indep_m))
# Define likelihood
likelihood = pm.Normal(
'met', mu=met_at_indep + noise_from_psf, sigma=sigma, observed=dep_m, total_size=dep_o.shape)
approx = pm.fit(100000, callbacks=[pm.callbacks.CheckParametersConvergence(tolerance=1e-4)])
This is where things get hairy…
I thought I would be able to define a similar model for the full data, but one that referred to indep
, dep
, and spatial_X
(instead of indep_m
, dep_m
, and spatial_X_m
):
with pm.Model() as metgrad_linearmodel:
# Define priors on linear regression
sigma = pm.HalfCauchy('logOH12-sigma', beta=.5, testval=.1)
intercept = pm.Normal('logOH12-x0', 8.5, sigma=20)
x_coeff = pm.Normal('logOH12-slope', 0, sigma=20)
# Define priors on spatial covariance
psf_ls = 1.5
psf_eta = pm.HalfCauchy('PSF-eta', beta=.1, testval=.01)
# Define the GP itself
psf_cov = psf_eta**2. * pm.gp.cov.ExpQuad(input_dim=2, ls=psf_ls)
psf_gp = pm.gp.Latent(cov_func=psf_cov)
noise_from_psf = psf_gp.prior('PSF', spatial_X)
met_at_indep = pm.Deterministic(
'logOH12-at-indep',
intercept + (x_coeff * indep))
# Define likelihood
likelihood = pm.Normal(
'met', mu=met_at_indep + noise_from_psf, sigma=sigma, observed=dep)
# Inference!
step = pm.NUTS(scaling=approx.cov.eval(), is_cov=True)
start_full = approx.sample()[0]
trace = pm.sample(500, tune=500, cores=1, step=step, start=start_full)
which throws the error
ValueError: Bad shape for start argument:
Expected shape (1784,) for var 'PSF_rotated_', got: (128,)
The error itself makes sense, since the Latent GP describing the covariate noise is now being evaluated at a larger number of points. For that matter, approx.cov
also has the wrong shape. So even if I try to initialize using only the entries from start_full
for logOH12-slope
, logOH12-x0
, and logOH12-sigma
, I have to ignore the covariance matrix that the minibatch fit worked so hard to find, so that seems to defeat the purpose.
I’m wondering (a) if minibatching is even a valid thing to do in the GP context, and if so, (b) whether there’s a workaround that retains the benefits of knowing the covariance and being able to scale NUTS accordingly.