Generalising a Custom AR(1) Model to Multiple Dimensions

Hi Everyone, I’m fairly new to pymc, but I’m already a big fan.

At the moment I’m using it to model some multidimensional data as independant AR(1) processes. I would like the parameters rho and sigma to vary based on (1) the timestep and eventually (2) the dimension too (although this is not yet in my code). I’m implementing this using pm.CustomDist(), since I don’t think that pm.AR() can handle both (1) and (2) simultaneously (feel free to correct me!).

What I have written (code below) seems to work for 1D data as you can see if you set obs = df["x"].values, but with 3D data I get an error (it’s quite long, so I’ve only given what I believe to be the relevant part):

Traceback (most recent call last):
  File "pytensor\\scan\\scan_perform.pyx", line 418, in pytensor.scan.scan_perform.perform
pymc.logprob.utils.ParameterValueError: sigma > 0

I expect I’m making a simple mistake that boils down to a misunderstanding of how pytensor.scan() works and passing the wrong value as sigma, as I’m fairly sure the priors I’ve chosen should rule out invalid values for sigma.

Of course since it works in 1D, I could use a separate distribution for each dimension, but eventually I want this to work on a very large number of dimensions and I expect handling them together will be more performant than looping over them. For the life of me I can’t find the problem though, so any help would be much appreciated!

import arviz as az
import numpy as np
import pandas as pd
import pymc as pm
import pytensor
import pytensor.tensor as pt
from pymc.pytensorf import collect_default_updates

az.style.use("arviz-darkgrid")

# import test data
df = pd.read_csv(input("Please enter a file name for source data: ") + ".csv")
time = df['t'].values
obs = df[['x','y','z']].values
n = len(time)

# Define AR(1) distribution
def ar_dist(init, rho, sigma, size):
    def timestep(rho, sigma, x):
        y = pm.Normal.dist(mu=rho*x, sigma=sigma)
        return y, collect_default_updates([y])

    ar_innov, _ = pytensor.scan(
        fn=timestep,
        outputs_info=init,
        sequences=[rho, sigma],
        n_steps=n-1,
        strict=True,
    )
    
    return ar_innov

with pm.Model() as model:
    # base params
    mu = pt.exp(pm.Flat("logMu")) #=1
    naturalRange = pt.exp(pm.Flat("logNaturalRange")) #=1

    # derived params
    rho = pt.exp(-mu*np.diff(time))
    sigma = naturalRange*pt.sqrt(1-rho**2)

    # AR(1) Model
    ar_init = pm.Normal("init", mu=0, sigma=naturalRange, observed=obs[0])#we can use this structure to auto include a prediction!
    ar_innov = pm.CustomDist("AR", ar_init, rho, sigma, dist=ar_dist, observed=obs[1:])

with model:
    trace = pm.sample(2000, tune=1000)  # Adjust as needed

pm.plot_trace(trace)

AR1 should allow multiple independent dimensions if that’s wbat you’re looking for. Just pass parameters that “batch to the left” or extra dimensions on the left. If you want a multivariate AR (and not just multiple ARs) process you can use CustomDist but it didn’t look like that was what you were after

Hi there, you’re right that I want multiple ARs rather than a multivariate (vector) AR. I did try and use pm.AR(), but I found that it can’t cope with the case where rho and sigma depend on both the dimension and the timestep. In other words, I don’t think it can cope with rho and sigma being 2D (# dimensions by # timesteps).

This code, for instance gives an error:

import pymc as pm
import pytensor.tensor as pt

x = pm.AR.dist(
  init_dist=pm.DiracDelta.dist(0.0), 
  rho=pt.exp(pm.Normal.dist(shape=(3, 39))), # 40 times means 39 timesteps
  sigma = pt.exp(pm.Normal.dist(shape=(3, 39))),
  ar_order=1,
  shape=(3, 40), # 3 dimensions, 40 samples
)

pm.draw(x)

but if instead I use rho=pt.exp(pm.Normal.dist(shape=(3, 1))), sigma=pt.exp(pm.Normal.dist()),, it runs fine.

As an aside, it complains if I use rho=pt.exp(pm.Normal.dist(shape=(3, 1))), sigma=pt.exp(pm.Normal.dist(shape=(3, 1))),, but from what you’re saying, perhaps it shouldn’t?

Either way, this apparent limitation lead me to try using CustomDist and scan and I’m having the problem in the original post. Any thoughts?

Yes the parameters aren’t allowed to change over steps. The title of the issue didn’t make it obvious this was the goal.

A CustomDist should work for that case then. I’ll have to try some code to see where the problem is.