I am unable to save traces to file using to_netcdf, when gp.marginal_likelihood is conditioned on random variables. To see this, consider the following toy example:
import numpy as np
import pymc3 as pm
x = np.linspace(-1, 1, 1000)
y = np.random.normal(loc=0.0, scale=0.001, size=len(x)) + 0.1 + np.sin(x)
with pm.Model() as model:
mean = pm.Normal("mean", mu=0.1, sd=0.5)
gp = pm.gp.Marginal(cov_func=pm.gp.cov.Linear(input_dim=1, c=1))
y_pred = gp.marginal_likelihood('y_pred', X=x.reshape(-1, 1), y=y.flatten() - mean, noise=0.1)
# Sample
trace = pm.sample(return_inferencedata=True, tune=100, draws=300)
trace.to_netcdf("test")
This raises the error
ValueError: unable to infer dtype on variable 'gp'; xarray cannot serialize arbitrary Python objects
Has anyone encountered this problem before, knows how it is meant to be solved in PyMC, or has a suggestion for a workaround?
Thank you all very much in advance.
You should be able to fix this adding trace.posterior = trace.posterior.astype(float) between creation and saving.
Still, can you share the output of print(trace.posterior)? It is very strange that it gets assigned an object dtype instead of a numeric one, and that must come from earlier on, when converting to inferencedata arviz leaves the input arrays completely untouched and you can for example pass dask arrays instead of numpy without any problem (xarray supports both).
Not sure what is going on here, maybe @michaelosthege could help? It looks like gp.marginal_likelihood registers an observed variable that is not retrieved properly.
My only guess is it might be because all other distributions do not allow FreeRVs as to be passed with observed kwarg and therefore the converter doesn’t support this case, not sure why it would be allowed here though.
Yes, your y parameter to the gp.marginal_likelihood is a TensorVariable.
Looks like PyMC can deal with that when it comes to the differentiation & sampling, but like Oriol correctly pointed out, the code that converts to InferenceData never expected that the observed may be symbolic.
When it comes to PyMC v3, this will be a wontfix bug, but there’s a chance the problem persists with v4 in which case something should be done about it.
However, there may be a valid reparametrization of the model as a pm.gp.Latent gaussian process.
Essentially something in the direction of
gp = pm.gp.Latent(mean, cov)
pred = gp.prior(X=x.reshape(-1, 1))
L = pm.Normal("likelihood", mu=pred + mean, sigma=0.1, observed=y)
A non-zero mean_func=pm.gp.mean.Constant(mean) should also work.here.