Question generating posterior predictive samples with Latent GP

Hello all! I have a question about generating posterior predictive samples for out of sample values using a Latent GP and Binomial likelihood.

The data is like:
x = time points
z = number of trials of the Binomial at time x
y = number of successes at time x.

I split my data into train and test sets then:

My model is:

with pm.Model() as model:
    model.add_coord("obs_id", z_train.flatten(), mutable=True)

    ## Data containers to enable prediction
    trials = pm.MutableData("trials", z_train.flatten(), dims="obs_id")
    successes = pm.MutableData("successes", y_train.flatten(), dims="obs_id")
    t = pm.MutableData("t", x_train)

    # Seasonal component.
    ls = pm.Gamma(name='ls_1', alpha=2.0, beta=1.0)
    period = pm.Gamma(name='period', alpha=52.1429, beta=2)
    gp_seasonal = pm.gp.Latent(cov_func=pm.gp.cov.Periodic(input_dim=1, period=period, ls=ls))

    # Linear trend.
    c_3 = pm.Normal(name='c_3', mu=1, sigma=2)
    gp_linear = pm.gp.Latent(cov_func=pm.gp.cov.Linear(input_dim=1, c=c_3))

    # Define gaussian process.
    gp = gp_linear + gp_seasonal
    logit_prob = gp.prior("logit_prob", X=t)

    # Likelihood.
    likelihood = pm.Binomial("likelihood", logit_p=logit_prob, n=trials, observed=successes)

Then we sample:

with model:
    # Sample.
    trace = pm.sample(draws=500, chains=2, tune=500, cores=1) 

I have to do cores=1 because I am on a Mac M1.

Now I want to do posterior predictive sampling on the test data. (and for the training data to check in-sample fit).

I tried:

with model:
    pm.set_data(
        {
            "t": x_train,
            "trials": z_train.flatten(),
            "successes": y_train.flatten()
        },
        coords={"obs_id": z_train.flatten()}
    )
    y_train_pred_samples = pm.sample_posterior_predictive(
        trace,
        var_names=['logit_prob'],
        return_inferencedata=True
    )

    pm.set_data(
        {
            "t": x_test,
            "trials": z_test.flatten(),
        },
        coords={"obs_id": z_test.flatten()}
    )
    y_test_pred_samples = pm.sample_posterior_predictive(
        trace,
        var_names=['logit_prob'],
        predictions=True,
        return_inferencedata=True
    )

but I get an error because the length of y_train is the old obs_id and not the length of y_test.

What is the correct format to do OOS posterior predictions for this type of model?

Thanks!

If I am not wrong you need to expand/recreate your model with .conditional as in Gaussian Processes: Latent Variable Implementation — PyMC example gallery unless you use the HSGP approximation, in which case the set_data approach suffices.

2 Likes

Thanks! That example very closely aligns to what I need.