Out-of-sample random-walk time series prediction

Looking at the prophet example (see here), I notice that at the end when he makes an out-of-sample prediction, he does it by manually drawing his own samples and using the inferred quantities from the model to model the trend in the future.

My question is this: Is there a way to do OOS time-series prediction for models with a trend (like eg the random-walk deep net here) by relying on the same OOS ‘trick’ of using tensor.set_value() that is prevalent in models that aren’t based on time-series? Or must one always do it ‘manually’ (like eg in the prophet example)?

I’ve already tried to do this myself, but I run into shape problems because the in-sample random walk inference was on data of size T, but I’m trying to predict on data of size 1. Also, not sure if the random walk is able to use the t=T value it got in the in-sample inference and use it as the prior for the t=T+1 value.

[There are quite a few questions on time-series prediction, so it’s possible I have missed this question already. ]

EDIT: Maybe my question is phrased a bit confusingly. Really I’m asking: Is there a way to make an out-of-sample time-series prediction using pm.sample_posterior_predictive() with a model that has serial auto-correlation (like eg GaussianRandomWalk)?

Does this help? If your observed variable is a pandas series or data frame column with missing values, pymc3 will automatically ‘interpolate’, which in this case is actually ‘predicting’:

1 Like

Yes! this is great!

Are there any docs anywhere I can read about this automatic interpolation?

@DanWeitzenfeld thanks for the notebook and the suggestion. I’ve tried to implement it with a simple linear regression model using GaussianRandomWalk, and it fails with the below error.

`Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [z, tau, mu]
Sampling 4 chains:   0%|          | 0/10000 [00:00<?, ?draws/s]/.../lib/python3.7/site-packages/numpy/core/fromnumeric.py:3257: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
INFO (theano.gof.compilelock): Waiting for existing lock by process '26249' (I am process '26250')
INFO (theano.gof.compilelock): Waiting for existing lock by process '26249' (I am process '26252')
INFO (theano.gof.compilelock): To manually release the lock, delete /Users/.../.theano/compiledir_Darwin-18.7.0-x86_64-i386-64bit-i386-3.7.4-64/lock_dir
INFO (theano.gof.compilelock): To manually release the lock, delete /Users/.../.theano/compiledir_Darwin-18.7.0-x86_64-i386-64bit-i386-3.7.4-64/lock_dir
.../lib/python3.7/site-packages/numpy/core/fromnumeric.py:3257: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
Sampling 4 chains:   0%|          | 0/10000 [00:14<?, ?draws/s]
Bad initial energy, check any log probabilities that are inf or -inf, nan or very small:
y   NaN`

My response variable is a theano.shared tensor with values:

[0.3373, 0.39189999999999997, 0.36560000000000004, 0.42700000000000005, 0.5428000000000001, 0.58, 0.636, 0.6937000000000001, 0.7287, 0.7345, 0.7517, 0.7874, 0.8003, 0.8062, 0.81, 0.8393999999999999, 0.8432, 0.8794, nan, nan, nan]

how are you setting the null values?

The trick, AFAICT, is to pass either a pandas series or a pandas data frame column as observed.

1 Like

@DanWeitzenfeld, thanks! this seems to do it, but this seems odd given the preference for using theano.shared tensors or numpy.arrays in most cases.