# Value error in chains for a gaussian process

I am with a Gaussian Process (indeed coming from the question https://discourse.pymc.io/t/help-with-a-gaussian-process-for-very-few-data/9818/4). If the arrays to use are

``````X = np.array([32, 38, 94, 83, 99, 78])
y = np.array([1702, 1514, 683, 269, 900, 86])
y = (y - np.mean(y)) / np.std(y)
X = (X - np.mean(X))/np.std(X)
``````

my code is (I copy it because now I’m using PyMC instead of v3):

``````with pm.Model() as d_model:
# Specify the covariance function.
vert = 1*pm.HalfNormal("vert", sigma=1)
l = pm.HalfNormal(name='l', sigma=1)
cov_func = vert**2 * pm.gp.cov.Matern32(1, ls=l)

# Specify the GP. The default mean function is zero.
#gp = pm.gp.Marginal(mean_init, cov_func=cov_func)
gp = pm.gp.Marginal(cov_func=cov_func)

# Place a GP prior over the function f and do the noise:
sigma = pm.HalfNormal("sigma", sigma=1)
y_ = gp.marginal_likelihood("y_", X=X, y=y.flatten(), noise=sigma)

# MCMC:
trace = pm.sample(10000, chains=3, tune=1000, target_accept=0.99)
``````

And now sampling the posterior predictive comes the error:

``````X_new = np.linspace(-1.5, 1.5, 1500)
X_new = X_new.reshape(-1,1)
with d_model:
f_p = gp.conditional("f_p", X_new)
pred_samples = pm.sample_posterior_predictive(trace, samples=10, keep_size=False)
``````

I get: `ValueError: conflicting sizes for dimension 'chain': length 1 on the data but length 3 on coordinate 'chain'`. Note that due to the very slow `sample_posterior_predictive` (for this very same problem it was quite faster in PyMC3) I am only sampling for 10 draws (which needs `keep_size=False`).

Previous error can be fixed with `return_inferencedata=False` in `sample_posterior_predictive`, but that seems like going back to v3 and I doubt it’s the true solution. Thx.

I think this is a bug with `pm.sample_posterior_predictive`. Would you mind filing an issue in the PyMC github repo? In the short term, you can subsample `trace`, and then pass your smaller `trace` to `pm.sample_posterior_predictive`, and leave out both the `samples=10, keep_size=False` arguements.

Before it errors, the reason it’s slow is that you have 1500 points in `X_new`. GP predictions also scale poorly with data set size. Reducing this will give a big speed up.

EDIT: See @OriolAbril answer below, that’s the correct one.

1 Like

Thanks, I’ll post it in the github repo asap. Let me clarify for future users (myself included) the changes:

``````with pm.Model() as d_model:
# Specify the covariance function.
vert = 1*pm.HalfNormal("vert", sigma=1)
l = pm.HalfNormal(name='l', sigma=1)
cov_func = vert**2 * pm.gp.cov.Matern32(1, ls=l)

# Specify the GP. The default mean function is zero.
#gp = pm.gp.Marginal(mean_init, cov_func=cov_func)
gp = pm.gp.Marginal(cov_func=cov_func)

# Place a GP prior over the function f and do the noise:
sigma = pm.HalfNormal("sigma", sigma=1)
y_ = gp.marginal_likelihood("y_", X=X, y=y.flatten(), noise=sigma)

# MCMC:
trace = pm.sample(1000, chains=3, tune=1000, target_accept=0.99)

X_new = np.linspace(-1.5, 1.5, 100)
X_new = X_new.reshape(-1,1)
with d_model:
f_p = gp.conditional("f_p", X_new)
#Important to add var_names in the next one:!!
pred_samples = pm.sample_posterior_predictive(trace, var_names=["f_p"])
``````

If I can add to the conversation, do not use the `samples` argument in v4. The plan is to remove it, see Remove `samples` and `keep_size` from posterior_predictive · Issue #5775 · pymc-devs/pymc · GitHub. If you want to generate posterior predictive samples for a subset of the posterior only, you should provide a sliced posterior to `sample_posterior_predictive` as shown in the api docs example (bottom of the page): pymc.sample_posterior_predictive — PyMC 4.1.2 documentation

2 Likes