I have been trying to compute deviance out-of-sample. The following code works okay in-sample,
waic = pm.stats.waic(trace_, model_)
deviance = waic.WAIC - waic.p_WAIC
where by in-sample I am referring to a model whose variables are defined explicitly with Theano variables set to the training dataset, and then estimated via NUTs.
When I call the set_value
method on my Theano variables to swap-in the values from the test dataset and then repeat the above code snipped, I get the following error,
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/Dropbox/data_science/workspace/python/pymc3-example-project/venv/lib/python3.6/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
902 outputs =\
--> 903 self.fn() if output_subset is None else\
904 self.fn(output_subset=output_subset)
ValueError: Input dimension mis-match. (input[3].shape[0] = 3340, input[4].shape[0] = 835)
which suggests to me that the test data has not been set correctly (the test dataset has 835 observations, while the training data has 3340). Note, that I observe sample_ppc
correctly sampling from the posterior predictive distribution for the test dataset.
I experience the same phenomenon when I ‘borrow’ the general approach found here Evaluate logposterior at sample points
If someone could point me in the right direction, it would be greatly appreciated!