GPs & minibatch: shape mismatch?

I’m trying to model a linear trend subject to some (more or less) known spatial covariance, and I have a lot of points, so I’d like to model the trend as a line, model the spatial covariance with a 2D GP, and use minibatch to parcel everything out into friendlier data sizes.

Following the example of this tutorial, I create minibatches of the spatial data, the variable underlying the linear model, and the observations:

indep_m, dep_m, spatial_X_m = pm.Minibatch(indep), \ # 'x' for linear model
                              pm.Minibatch(dep), \ # observations
                              pm.Minibatch(spatial_X) # spatial coords

I then create a model that uses the minibatches

with pm.Model() as metgrad_linearmodel_mini:
    # Define priors on linear regression
    sigma = pm.HalfCauchy('logOH12-sigma', beta=.5, testval=.1)
    intercept = pm.Normal('logOH12-x0', 8.5, sigma=20)
    x_coeff = pm.Normal('logOH12-slope', 0, sigma=20)
    
    # Define priors on spatial covariance
    psf_ls = 1.5
    psf_eta = pm.HalfCauchy('PSF-eta', beta=.1, testval=.01)
    
    # Define the GP itself
    psf_cov = psf_eta**2. * pm.gp.cov.ExpQuad(input_dim=2, ls=psf_ls)
    psf_gp = pm.gp.Latent(cov_func=psf_cov)
    noise_from_psf = psf_gp.prior('PSF', spatial_X_m, shape=dep_m.shape.eval())
    
    met_at_indep = pm.Deterministic(
        'logOH12-at-indep',
        intercept + (x_coeff * indep_m))

    # Define likelihood
    likelihood = pm.Normal(
        'met', mu=met_at_indep + noise_from_psf, sigma=sigma, observed=dep_m, total_size=dep_o.shape)
    
    approx = pm.fit(100000, callbacks=[pm.callbacks.CheckParametersConvergence(tolerance=1e-4)])

This is where things get hairy…

I thought I would be able to define a similar model for the full data, but one that referred to indep, dep, and spatial_X (instead of indep_m, dep_m, and spatial_X_m):

with pm.Model() as metgrad_linearmodel:
    # Define priors on linear regression
    sigma = pm.HalfCauchy('logOH12-sigma', beta=.5, testval=.1)
    intercept = pm.Normal('logOH12-x0', 8.5, sigma=20)
    x_coeff = pm.Normal('logOH12-slope', 0, sigma=20)
    
    # Define priors on spatial covariance
    psf_ls = 1.5
    psf_eta = pm.HalfCauchy('PSF-eta', beta=.1, testval=.01)
    
    # Define the GP itself
    psf_cov = psf_eta**2. * pm.gp.cov.ExpQuad(input_dim=2, ls=psf_ls)
    psf_gp = pm.gp.Latent(cov_func=psf_cov)
    noise_from_psf = psf_gp.prior('PSF', spatial_X)
    
    met_at_indep = pm.Deterministic(
        'logOH12-at-indep',
        intercept + (x_coeff * indep))

    # Define likelihood
    likelihood = pm.Normal(
        'met', mu=met_at_indep + noise_from_psf, sigma=sigma, observed=dep)

    # Inference!
    step = pm.NUTS(scaling=approx.cov.eval(), is_cov=True)
    start_full = approx.sample()[0]
    trace = pm.sample(500, tune=500, cores=1, step=step, start=start_full)

which throws the error

ValueError: Bad shape for start argument:
Expected shape (1784,) for var 'PSF_rotated_', got: (128,)

The error itself makes sense, since the Latent GP describing the covariate noise is now being evaluated at a larger number of points. For that matter, approx.cov also has the wrong shape. So even if I try to initialize using only the entries from start_full for logOH12-slope, logOH12-x0, and logOH12-sigma, I have to ignore the covariance matrix that the minibatch fit worked so hard to find, so that seems to defeat the purpose.

I’m wondering (a) if minibatching is even a valid thing to do in the GP context, and if so, (b) whether there’s a workaround that retains the benefits of knowing the covariance and being able to scale NUTS accordingly.