Xarray serialization error when marginal likelihood depends on random variables

dgegen · April 13, 2022, 2:47pm

I am unable to save traces to file using to_netcdf, when gp.marginal_likelihood is conditioned on random variables. To see this, consider the following toy example:

import numpy as np
import pymc3 as pm

x = np.linspace(-1, 1, 1000)
y = np.random.normal(loc=0.0, scale=0.001, size=len(x)) + 0.1 + np.sin(x)

with pm.Model() as model:
    mean = pm.Normal("mean", mu=0.1, sd=0.5)

    gp = pm.gp.Marginal(cov_func=pm.gp.cov.Linear(input_dim=1, c=1))

    y_pred = gp.marginal_likelihood('y_pred', X=x.reshape(-1, 1), y=y.flatten() - mean, noise=0.1)

    # Sample
    trace = pm.sample(return_inferencedata=True, tune=100, draws=300)
    trace.to_netcdf("test")

This raises the error

ValueError: unable to infer dtype on variable 'gp'; xarray cannot serialize arbitrary Python objects

Has anyone encountered this problem before, knows how it is meant to be solved in PyMC, or has a suggestion for a workaround?
Thank you all very much in advance.

cluhmann · April 13, 2022, 3:37pm

@OriolAbril

OriolAbril · April 13, 2022, 4:23pm

You should be able to fix this adding trace.posterior = trace.posterior.astype(float) between creation and saving.

Still, can you share the output of print(trace.posterior)? It is very strange that it gets assigned an object dtype instead of a numeric one, and that must come from earlier on, when converting to inferencedata arviz leaves the input arrays completely untouched and you can for example pass dask arrays instead of numpy without any problem (xarray supports both).

dgegen · April 13, 2022, 4:47pm

print(trace.posterior) delivers

<xarray.Dataset>
Dimensions:  (chain: 2, draw: 300)
Coordinates:
  * chain    (chain) int64 0 1
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 ... 292 293 294 295 296 297 298 299
Data variables:
    mean     (chain, draw) float64 0.9928 0.997 0.999 ... 0.9992 1.02 0.9917
Attributes:
    created_at:                 2022-04-13T16:41:41.419915
    arviz_version:              0.12.0
    inference_library:          pymc3
    inference_library_version:  3.11.5
    sampling_time:              25.90023398399353
    tuning_steps:               100

The problem is, however, not with the posterior, but with trace.observed_data.
Calling trace.observed_data delivers

AttributeError: 'TensorVariable' object has no attribute 'item'

while the following works perfectly fine, but doesn’t save the observed data.

trace.to_netcdf(filename='test', groups=["posterior", "log_likelihood", "sample_stats"])

Something like trace.observed_data.astype(np.array) or trace.observed_data.astype(float) doesn’t work, raising

Cannot interpret '<built-in function array>' as a data type

OriolAbril · April 20, 2022, 10:53am

Not sure what is going on here, maybe @michaelosthege could help? It looks like gp.marginal_likelihood registers an observed variable that is not retrieved properly.

My only guess is it might be because all other distributions do not allow FreeRVs as to be passed with observed kwarg and therefore the converter doesn’t support this case, not sure why it would be allowed here though.

michaelosthege · April 22, 2022, 8:43pm

Yes, your y parameter to the gp.marginal_likelihood is a TensorVariable.
Looks like PyMC can deal with that when it comes to the differentiation & sampling, but like Oriol correctly pointed out, the code that converts to InferenceData never expected that the observed may be symbolic.

When it comes to PyMC v3, this will be a wontfix bug, but there’s a chance the problem persists with v4 in which case something should be done about it.

However, there may be a valid reparametrization of the model as a pm.gp.Latent gaussian process.
Essentially something in the direction of

gp = pm.gp.Latent(mean, cov)
pred = gp.prior(X=x.reshape(-1, 1))
L = pm.Normal("likelihood", mu=pred + mean, sigma=0.1, observed=y)

A non-zero mean_func=pm.gp.mean.Constant(mean) should also work.here.

Topic		Replies	Views
Use xarray for traces Development	5	1742	June 15, 2017
2D Gaussian Process: error generating posterior predictive samples v5 bug , gaussian_process	2	554	April 3, 2023
pyMC v 5.0.1 fails test because of lazylinker v5 bug	4	1525	January 6, 2023
Problem with Deterministic variable Questions	7	1017	March 22, 2021
The predict function of GP Questions	10	1218	June 15, 2021

Xarray serialization error when marginal likelihood depends on random variables

Related topics