Out-of-sample random-walk time series prediction

mrbayes · July 29, 2019, 8:43pm

Looking at the prophet example (see here), I notice that at the end when he makes an out-of-sample prediction, he does it by manually drawing his own samples and using the inferred quantities from the model to model the trend in the future.

My question is this: Is there a way to do OOS time-series prediction for models with a trend (like eg the random-walk deep net here) by relying on the same OOS ‘trick’ of using tensor.set_value() that is prevalent in models that aren’t based on time-series? Or must one always do it ‘manually’ (like eg in the prophet example)?

I’ve already tried to do this myself, but I run into shape problems because the in-sample random walk inference was on data of size T, but I’m trying to predict on data of size 1. Also, not sure if the random walk is able to use the t=T value it got in the in-sample inference and use it as the prior for the t=T+1 value.

[There are quite a few questions on time-series prediction, so it’s possible I have missed this question already. ]

EDIT: Maybe my question is phrased a bit confusingly. Really I’m asking: Is there a way to make an out-of-sample time-series prediction using pm.sample_posterior_predictive() with a model that has serial auto-correlation (like eg GaussianRandomWalk)?

DanWeitzenfeld · August 5, 2019, 7:27pm

Does this help? If your observed variable is a pandas series or data frame column with missing values, pymc3 will automatically ‘interpolate’, which in this case is actually ‘predicting’:

gist.github.com

https://gist.github.com/DanielWeitzenfeld/bbfe8246dc20df77eee8e48ac12d65de

extrapolating random walk.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pymc3 as pm\n",
    "import seaborn as sns\n",

This file has been truncated. show original

mrbayes · August 5, 2019, 8:15pm

Yes! this is great!

Are there any docs anywhere I can read about this automatic interpolation?

dorian821 · September 23, 2019, 10:38am

@DanWeitzenfeld thanks for the notebook and the suggestion. I’ve tried to implement it with a simple linear regression model using GaussianRandomWalk, and it fails with the below error.

`Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [z, tau, mu]
Sampling 4 chains:   0%|          | 0/10000 [00:00<?, ?draws/s]/.../lib/python3.7/site-packages/numpy/core/fromnumeric.py:3257: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
INFO (theano.gof.compilelock): Waiting for existing lock by process '26249' (I am process '26250')
INFO (theano.gof.compilelock): Waiting for existing lock by process '26249' (I am process '26252')
INFO (theano.gof.compilelock): To manually release the lock, delete /Users/.../.theano/compiledir_Darwin-18.7.0-x86_64-i386-64bit-i386-3.7.4-64/lock_dir
INFO (theano.gof.compilelock): To manually release the lock, delete /Users/.../.theano/compiledir_Darwin-18.7.0-x86_64-i386-64bit-i386-3.7.4-64/lock_dir
.../lib/python3.7/site-packages/numpy/core/fromnumeric.py:3257: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
Sampling 4 chains:   0%|          | 0/10000 [00:14<?, ?draws/s]
Bad initial energy, check any log probabilities that are inf or -inf, nan or very small:
y   NaN`

My response variable is a theano.shared tensor with values:

[0.3373, 0.39189999999999997, 0.36560000000000004, 0.42700000000000005, 0.5428000000000001, 0.58, 0.636, 0.6937000000000001, 0.7287, 0.7345, 0.7517, 0.7874, 0.8003, 0.8062, 0.81, 0.8393999999999999, 0.8432, 0.8794, nan, nan, nan]

how are you setting the null values?

DanWeitzenfeld · September 23, 2019, 2:02pm

The trick, AFAICT, is to pass either a pandas series or a pandas data frame column as observed.

dorian821 · September 23, 2019, 3:10pm

@DanWeitzenfeld, thanks! this seems to do it, but this seems odd given the preference for using theano.shared tensors or numpy.arrays in most cases.

Ali_Khalili · February 27, 2023, 1:44am

is there any tricks for doing a similar thing with Pymc (v5)?
I get the following error when I try the above in Pymc v5:

NotImplementedError: Automatic inputation is only supported for univariate RandomVariables. obs of type <class ‘pymc.distributions.timeseries.RandomWalkRV’> is not supported.

thanks

DanWeitzenfeld · May 5, 2023, 9:33pm

see here

Topic		Replies	Views
Forecasting using distributions\timeseries in pymc 4.4.0 Questions	3	477	March 23, 2023
Forecasting Gaussian Random Walk - Problem v3	0	542	September 30, 2022
Predicting out-of-sample for autoregressive models Questions	7	2525	December 3, 2018
Forecasting out-of-sample discrete_markov_chain.ipynb version agnostic time_series , modeling	9	417	November 11, 2023
Making Out of Sample Predictions with GaussianRandomWalk v5 time_series	9	1009	October 5, 2022

Out-of-sample random-walk time series prediction

Related topics