Hi Everyone, I’m fairly new to pymc, but I’m already a big fan.
At the moment I’m using it to model some multidimensional data as independant AR(1) processes. I would like the parameters rho and sigma to vary based on (1) the timestep and eventually (2) the dimension too (although this is not yet in my code). I’m implementing this using pm.CustomDist()
, since I don’t think that pm.AR()
can handle both (1) and (2) simultaneously (feel free to correct me!).
What I have written (code below) seems to work for 1D data as you can see if you set obs = df["x"].values
, but with 3D data I get an error (it’s quite long, so I’ve only given what I believe to be the relevant part):
Traceback (most recent call last):
File "pytensor\\scan\\scan_perform.pyx", line 418, in pytensor.scan.scan_perform.perform
pymc.logprob.utils.ParameterValueError: sigma > 0
I expect I’m making a simple mistake that boils down to a misunderstanding of how pytensor.scan()
works and passing the wrong value as sigma, as I’m fairly sure the priors I’ve chosen should rule out invalid values for sigma.
Of course since it works in 1D, I could use a separate distribution for each dimension, but eventually I want this to work on a very large number of dimensions and I expect handling them together will be more performant than looping over them. For the life of me I can’t find the problem though, so any help would be much appreciated!
import arviz as az
import numpy as np
import pandas as pd
import pymc as pm
import pytensor
import pytensor.tensor as pt
from pymc.pytensorf import collect_default_updates
az.style.use("arviz-darkgrid")
# import test data
df = pd.read_csv(input("Please enter a file name for source data: ") + ".csv")
time = df['t'].values
obs = df[['x','y','z']].values
n = len(time)
# Define AR(1) distribution
def ar_dist(init, rho, sigma, size):
def timestep(rho, sigma, x):
y = pm.Normal.dist(mu=rho*x, sigma=sigma)
return y, collect_default_updates([y])
ar_innov, _ = pytensor.scan(
fn=timestep,
outputs_info=init,
sequences=[rho, sigma],
n_steps=n-1,
strict=True,
)
return ar_innov
with pm.Model() as model:
# base params
mu = pt.exp(pm.Flat("logMu")) #=1
naturalRange = pt.exp(pm.Flat("logNaturalRange")) #=1
# derived params
rho = pt.exp(-mu*np.diff(time))
sigma = naturalRange*pt.sqrt(1-rho**2)
# AR(1) Model
ar_init = pm.Normal("init", mu=0, sigma=naturalRange, observed=obs[0])#we can use this structure to auto include a prediction!
ar_innov = pm.CustomDist("AR", ar_init, rho, sigma, dist=ar_dist, observed=obs[1:])
with model:
trace = pm.sample(2000, tune=1000) # Adjust as needed
pm.plot_trace(trace)