Hello all! I have a question about generating posterior predictive samples for out of sample values using a Latent GP and Binomial likelihood.
The data is like:
x =
time points
z =
number of trials of the Binomial at time x
y =
number of successes at time x.
I split my data into train and test sets then:
My model is:
with pm.Model() as model:
model.add_coord("obs_id", z_train.flatten(), mutable=True)
## Data containers to enable prediction
trials = pm.MutableData("trials", z_train.flatten(), dims="obs_id")
successes = pm.MutableData("successes", y_train.flatten(), dims="obs_id")
t = pm.MutableData("t", x_train)
# Seasonal component.
ls = pm.Gamma(name='ls_1', alpha=2.0, beta=1.0)
period = pm.Gamma(name='period', alpha=52.1429, beta=2)
gp_seasonal = pm.gp.Latent(cov_func=pm.gp.cov.Periodic(input_dim=1, period=period, ls=ls))
# Linear trend.
c_3 = pm.Normal(name='c_3', mu=1, sigma=2)
gp_linear = pm.gp.Latent(cov_func=pm.gp.cov.Linear(input_dim=1, c=c_3))
# Define gaussian process.
gp = gp_linear + gp_seasonal
logit_prob = gp.prior("logit_prob", X=t)
# Likelihood.
likelihood = pm.Binomial("likelihood", logit_p=logit_prob, n=trials, observed=successes)
Then we sample:
with model:
# Sample.
trace = pm.sample(draws=500, chains=2, tune=500, cores=1)
I have to do cores=1
because I am on a Mac M1.
Now I want to do posterior predictive sampling on the test data. (and for the training data to check in-sample fit).
I tried:
with model:
pm.set_data(
{
"t": x_train,
"trials": z_train.flatten(),
"successes": y_train.flatten()
},
coords={"obs_id": z_train.flatten()}
)
y_train_pred_samples = pm.sample_posterior_predictive(
trace,
var_names=['logit_prob'],
return_inferencedata=True
)
pm.set_data(
{
"t": x_test,
"trials": z_test.flatten(),
},
coords={"obs_id": z_test.flatten()}
)
y_test_pred_samples = pm.sample_posterior_predictive(
trace,
var_names=['logit_prob'],
predictions=True,
return_inferencedata=True
)
but I get an error because the length of y_train
is the old obs_id
and not the length of y_test
.
What is the correct format to do OOS posterior predictions for this type of model?
Thanks!