What's the best way to implement an autoregressive moving average of a time series with a variable lag and window size?

I have a time series consisting of some times t at which I’ve measured RVs a and b. I want to model b as being dependent on some lagged moving average of a with unknown window size W and lag time T. Given that both the window size and the lag time can vary, what’s the best way to model this?

I thought about using pymc.AR but that distribution depends on a tensor of autoregressive coefficients; here, I’d like to model b as depending on a simple moving average of a over some unknown lagging window. Is the most natural way to do this to write a custom distribution?

Quick update on this - I’ve successfully written a model that I think should compute a lagged moving average. Here’s my code:

import numpy as np
import pymc as pm
import pytensor as pt
from pytensor.compile.sharedvalue import SharedVariable
from pytensor.tensor import conv


def moving_average(
    t: SharedVariable,
    width: pm.Discrete,
    lag: pm.Discrete,
) -> pt.tensor.TensorLike:
    """Compute the moving average signal t."""
    # Reshape into 4d arrays to make conv2d happy
    kernel_4d = (pm.math.ones((width,), dtype=float)/width).reshape((1, 1, 1,-1))
    temperature_4d = t.reshape((1, 1, 1, -1))

    # Only run conv2d on valid area; set boundary points which only partially
    # overlap the kernel to np.nan
    result = pm.math.full_like(t, np.nan, dtype=float)
    pt.tensor.set_subtensor(
        result[width//2:-width//2],
        conv.conv2d(temperature_4d, kernel_4d, border_mode='valid').flatten()
    )
    return lagged(result, lag)

def lagged(data: SharedVariable, lag: pm.Discrete) -> pt.tensor.TensorLike:
    """Lag an array by some amount."""
    # Keep the shape the same; the region where there is no lagged data is set
    # to np.nan.
    result = pm.math.full_like(data, np.nan, dtype=float)
    pt.tensor.set_subtensor(
        result[:-lag],
        data[lag:]
    )
    return result


def my_model(t_data, c_data, c0, a, b, g, w0, l0, w1, l1, l2):
    return (
        c0 +
        a*moving_average(t_data, w0, l0) +
        b*moving_average(t_data, w1, l1) -
        g*lagged(c_data, l2)
    )

def main():
    data = get_data()

    with pm.Model() as model:
        t_data = pm.MutableData("t_data", data["t"])
        c_data = pm.MutableData("c_data", data["c"])

        # Uninformative uniform prior for the initial number c.
        c0 = pm.DiscreteUniform("c0", lower=0, upper=1000)

        alpha = pm.HalfNormal("alpha", sigma=10)
        beta = pm.HalfNormal("beta", sigma=10)
        gamma = pm.HalfNormal("gamma", sigma=10)
        width_0 = pm.DiscreteUniform("width_0", 1, 200)
        width_1 = pm.DiscreteUniform("width_1", 1, 200)
        lag_0 = pm.DiscreteUniform("lag_0", lower=180, upper=545)
        lag_1 = pm.DiscreteUniform("lag_1", lower=550, upper=910)
        lag_2 = pm.DiscreteUniform("lag_2", lower=915, upper=1275)

        c_mu = pm.Deterministic(
            "c_mu",
            my_model(
                t_data,
                c_data,
                c0,
                alpha,
                beta,
                gamma,
                width_0,
                lag_0,
                width_1,
                lag_1,
                lag_2,
            ),
        )

        c_likelihood = pm.Poisson("c_likelihood", mu=c_mu, observed=c_data)
        idata = pm.sample(discard_tuned_samples=False)

Turns out that I didn’t need to use pymc.AR or anything like that. Here, my model of my observed c_data is a Poisson-distributed RV with a mu that depends on the moving average of two intervals some number of days in the past. The trickiest part so far was rewriting numpy-like array operations into valid pytensor operations. Even though I now have a valid model, I’m still having trouble getting the sampler to produce valid results:

pymc.exceptions.SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'c0': array(500), 'alpha_log__': array(2.30258509), 'beta_log__': array(2.30258509), 'gamma_log__': array(2.30258509), 'width_0': array(100), 'width_1': array(100), 'lag_0': array(362), 'lag_1': array(730), 'lag_2': array(1095)}

Logp initial evaluation results:
{'c0': -6.91, 'alpha': -0.73, 'beta': -0.73, 'gamma': -0.73, 'width_0': -5.3, 'width_1': -5.3, 'lag_0': -5.9, 'lag_1': -5.89, 'lag_2': -5.89, 'c_likelihood': -inf}
You can call `model.debug()` for more details.

I suspect the sampler is having trouble with the fact that I only have some certain sections of valid time series data, and I impute a bunch of nan values for all the dates where I don’t have any data. This makes computing the moving average simpler, but might also be the reason the sampler can’t produce a valid initial evaluation.

Are all values of c_mu strictly positive at the starting point? What does the output of model.debug() look like?

Are all values of c_mu strictly positive at the starting point?

Yep! I opened up the debugger after loading the data:

(Pdb++) np.all(data['c'] >= 0)
True

What does the output of model.debug() look like?


point={'c0': array(500), 'alpha_log__': array(2.30258509), 'beta_log__': array(2.30258509), 'gamma_log__': array(2.30258509), 'width_0': array(100), 'width_1': array(100), 'lag_0': array(362), 'lag_1': array(730), 'lag_2': array(1095)}

The variable c_likelihood has the following parameters:
0: Alloc [id A] <Vector(float64, shape=(?,))> 'c_mu'
 ├─ Sub [id B] <Vector(float64, shape=(1,))>
 │  ├─ Add [id C] <Vector(float64, shape=(1,))>
 │  │  ├─ ExpandDims{axis=0} [id D] <Vector(int64, shape=(1,))>
 │  │  │  └─ c0 [id E] <Scalar(int64, shape=())>
 │  │  ├─ Mul [id F] <Vector(float64, shape=(1,))>
 │  │  │  ├─ [nan] [id G] <Vector(float64, shape=(1,))>
 │  │  │  └─ Exp [id H] <Vector(float64, shape=(1,))>
 │  │  │     └─ ExpandDims{axis=0} [id I] <Vector(float64, shape=(1,))>
 │  │  │        └─ alpha_log__ [id J] <Scalar(float64, shape=())>
 │  │  └─ Mul [id K] <Vector(float64, shape=(1,))>
 │  │     ├─ [nan] [id G] <Vector(float64, shape=(1,))>
 │  │     └─ Exp [id L] <Vector(float64, shape=(1,))>
 │  │        └─ ExpandDims{axis=0} [id M] <Vector(float64, shape=(1,))>
 │  │           └─ beta_log__ [id N] <Scalar(float64, shape=())>
 │  └─ Mul [id O] <Vector(float64, shape=(1,))>
 │     ├─ [nan] [id G] <Vector(float64, shape=(1,))>
 │     └─ Exp [id P] <Vector(float64, shape=(1,))>
 │        └─ ExpandDims{axis=0} [id Q] <Vector(float64, shape=(1,))>
 │           └─ gamma_log__ [id R] <Scalar(float64, shape=())>
 └─ Shape_i{0} [id S] <Scalar(int64, shape=())>
    └─ t_data [id T] <Vector(float64, shape=(?,))>
The parameters evaluate to:
0: [nan nan nan ... nan nan nan]
This does not respect one of the following constraints: mu >= 0

Here’s the model graph:

Oh, set_subtensor is not an in-place operation, so your lagged function returns all nan values.

You cannot have missing values in your parameters. PyMC can do automatic interpolation if the data has missing values (i.e. c_data), but you have to define values for your model even in those cases, for example by assigning random variables to the missing values. The correct way to handle lagging is to create an (unobserved) initial state vector and prepend it to the data before convolution. See this notebook, cell 38 (“VAR as 2d convolution”) for an example.

Thanks so much for the reading - it took me a few days to get through it and think about. A few things about VAR models make them particularly challenging to adapt to my problem:

  1. I want the number of lags to be a parameter of the model - I don’t know a priori what this should be. It’s hard to imagine how a function like the one in cell 38 from the notebook you mentioned could handle a varying n_lags, since the shape of the matrix involved in the convolution depends on this:
def build_VAR(df, n_lags, intercept = True, model_full_cov = True, model_initial_states=True):
                 #  ⮤ this int needs to be passed to the VAR model to be instantiated
  1. Similarly, I also need a model parameter for which lags to include in the convolution kernel. For example, I’m not just considering a model where (using the AR(1) notation used in your notebook)
y_{t} = \phi_0 + \phi_1Y_{t-1} + \phi_2Y_{t-2},

I also want to allow for

y_{t} = \phi_0 + \phi_4Y_{t-4} + \phi_5Y_{t-5} + \phi_6Y_{t-6}

etc. So what I really need is a way to define a model which lag terms are included is a parameter of the model. These two reasons are what steered me away from using VAR models in the beginning, but maybe there’s a way to get them to work for my system that I just haven’t understood here.

:point_up: I’ll try fixing my set_subtensor use, and I’ll need to be more careful about edge effects where my data is missing. I’ll post an update if I can make progress on this. Thanks again for the advice!

You can handle multiple lags like is done in some stick-breaking processes. Assume a reasonable max number of lags and “disable the unused ones” with a masking operation.

Something like that is done here for another kind of models: Modeling temporal data with an unknown number of changepoints | Christopher Krapu

PyMC doesn’t really allow sampling of parameters that vary in shape, VAR or otherwise.

Whether any of this will be feasible to parametrize I have no idea.

People frequently ask about this, and I really don’t think it’s necessary. Just put a regularizing prior on your coefficient matrix. There’s no difference between a huge elaborate model that has dynamic shapes, and one that simply puts a zero in front of lags it judges unnecessary.