AR dims/coords/shape in v4

DanWeitzenfeld · July 27, 2022, 9:46pm

I’m have multiple time series of the same length with the same AR parameters. I’m struggling to figure out the shapes/dims situation in v4.

Here is a toy example:


import pandas as pd
import pymc as pm
import numpy as np
import arviz as az

def create_ar_series(length, rho, sigma):
    res = []
    lastval = 0
    for i in np.arange(length):
        nextval = lastval * rho + np.random.normal(loc=0, scale=sigma)
        lastval = nextval
        res.append(lastval)
    return res

# multiple series, same AR params
df = pd.DataFrame({'res1': create_ar_series(10000, .3, .1),
                   'res2': create_ar_series(10000, .3, .1), })

with pm.Model(coords={"time": df.index.values, "series": [0, 1]}) as m:
    rho = pm.Normal("rho", shape=1)
    init = pm.Normal.dist(0, shape=1)
    sigma = pm.HalfNormal('sigma', size=1)
    ar1 = pm.AR("ar1", rho=rho, sigma=sigma, init_dist=init,
                constant=False, observed=df[['res1', 'res2']].values, dims=("time", "series",))

with m:
    trace = pm.sample()

… results in this error:

~/.virtualenvs/m6_v3/lib/python3.9/site-packages/aesara/tensor/type.py in filter_variable(self, other, allow_convert)
    261                 return other2
    262 
--> 263         raise TypeError(
    264             f"Cannot convert Type {other.type} "
    265             f"(of Variable {other}) into Type {self}. "

TypeError: Cannot convert Type TensorType(float64, (10000, 2)) (of Variable ar1{[[-0.08402..06180672]]}) into Type TensorType(float64, (1, None)). You can try to manually convert ar1{[[-0.08402..06180672]]} into a TensorType(float64, (1, None)).

I’ve read the dimensionality docs but still couldn’t solve this.

ricardoV94 · July 28, 2022, 6:59am

time has to be the last dimension, but it seems you added it as the first?

DanWeitzenfeld · August 10, 2022, 9:26pm

Thank you Ricardo for the reply. When I put time as the last dimension, I get a different error. (Note that I’m transposing the observed data so the dimension is now 2, 10000)

import pandas as pd
import pymc as pm
import numpy as np
import arviz as az

def create_ar_series(length, rho, sigma):
    res = []
    lastval = 0
    for i in np.arange(length):
        nextval = lastval * rho + np.random.normal(loc=0, scale=sigma)
        lastval = nextval
        res.append(lastval)
    return res

df = pd.DataFrame({'res1': create_ar_series(10000, .3, .1),
                   'res2': create_ar_series(10000, .3, .1),
                  })

with pm.Model(coords={"time": df.index.values, "series": [0, 1]}) as m:
    rho = pm.Normal("rho", shape=1)
    init = pm.Normal.dist(0, shape=1)
    sigma = pm.HalfNormal('sigma', size=1)
    ar1 = pm.AR("ar1", rho=rho, sigma=sigma, init_dist=init,
                constant=False, observed=df[['res1', 'res2']].values.T, dims=("series", "time",))

with m:
    trace = pm.sample()

results in

TypeError: Cannot convert Type TensorType(float64, (2, 10000)) (of Variable ar1{[[-0.12119..07470894]]}) into Type TensorType(float64, (1, None)). You can try to manually convert ar1{[[-0.12119..07470894]]} into a TensorType(float64, (1, None)).

cluhmann · August 11, 2022, 1:09am

I am pretty naive when it comes to time series modeling, but I think AR is expecting parameters for each series you are observing. If you want to use a single set of parameters for all observed series and do it in a single likelihood, then maybe something like this?

import aesara.tensor as at

with pm.Model(coords={"time": df.index.values, "series": [0, 1]}) as m:
    rho = pm.Normal("rho")
    init = pm.Normal.dist(shape=2)
    sigma = pm.HalfNormal('sigma')
    ar1 = pm.AR("ar1",
                rho=at.stack([rho, rho]),
                sigma=at.stack([sigma, sigma]),
                init_dist=init,
                observed=df[['res1', 'res2']].to_numpy().T,
                dims=("series", "time")
    )

Of course, you should also be able build 2 separate likelihoods, which should do the same thing, but then you wouldn’t have everything in one giant data structure with series as a dimension.

DanWeitzenfeld · August 11, 2022, 2:26am

Thank you for the suggestion. I just tried that and pymc interprets that as an AR2 model.

cluhmann · August 11, 2022, 2:36am

Hm, I suppose that makes sense. Then I would vote for 2 likelihoods if you need to get something running in the meantime. @RavinKumar might have further suggestions.

jessegrabowski · August 11, 2022, 3:27am

I’ll preface this by saying that I struggle to use pm.AR in general, which is a personality flaw. Therefore, my approach here might be shooting a fly with a bazooka.

That said, parallel AR is a special case of a VAR, with all the cross-equation terms set to zero. A VAR can be represented as 2D convolution, which aesara has nice built-in tools to handle “easily”, once your head is wrapped around the filter shape stuff. I go over the details of this here, if anyone is interested.

Here’s how I attacked your problem as posed:

with pm.Model(coords={"time": df.index.values[1:], "series": [0, 1]}) as m:
    
    rho = pm.Normal("rho", shape=order)
    
    # filter shape is (output_channels, input_channels, filter_rows, filter_cols)
    # for timeseries stuff:
    # output_channels is always 1
    # input channels is n_series
    # filter_rows is n_lags
    # filter_cols is n_series
    VAR_filters = at.expand_dims(at.eye(n_series) * rho, axis=(1, -2))    
    
    init = pm.Normal('x0', mu=0, sigma=1, shape=n_series)
    sigma = pm.HalfNormal('sigma', size=1)
    
    data = pm.Data('data', df.values[order:, :], mutable=True)
    extended_data = at.concatenate([init[None], data], axis=0)
    
    # Each convolution produces the prediction for time t+1. As a result we have to drop the last row of data,
    # since we don't have an observation that goes with it. For example, the first 4 x 3 block of data makes 
    # a precition for t=5 (4 lags)

    # Flip the lags so the first row of the output is the 1st lag, rather than the last. I think it's more intuitive that way.
    VAR_equations = at.nnet.conv2d(input=at.expand_dims(extended_data[:-1], axis=(0, 1)), # add batch_size = in_filters = 1
                                   filters=VAR_filters[:, :, ::-1, :],
                                   border_mode='valid', 
                                   filter_flip=False)
    
    mean = VAR_equations[0, :, :, 0].T
    obs = pm.Normal('obs', mu=mean, sigma=sigma, observed=data, dims=['time', 'equations'])
    
with m:
    trace = pm.sample()

In this case I’ve restricted both time series to a shared AR parameter, but this framework would let you estimate an arbitrary number of series and parameters. It could even handle heterogeneous AR order between the series, if you had the patience to build the filter tensor.

Here’s the output to show that it recovers the true parameters:

           mean     sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  r_hat
rho[0]    0.293  0.006   0.281    0.305      0.000    0.000    3291.0    1730.0    1.0
x0[0]     0.408  0.330  -0.195    1.033      0.006    0.005    3380.0    1503.0    1.0
x0[1]    -0.546  0.325  -1.131    0.090      0.006    0.005    2669.0    1435.0    1.0
sigma[0]  0.100  0.001   0.099    0.101      0.000    0.000    3142.0    1526.0    1.0

ricardoV94 · August 11, 2022, 3:38am

I think you were seeing this bug: Implied dimensions of RVs are not broadcasted by dims or observed · Issue #5993 · pymc-devs/pymc · GitHub

If you pass shape explicitly (shape=(2, 1000)), besides dims, I think it will work.

Alternatively, if you remove all the shapes=1 (except rho?) from the inputs it might work without shape already… basically if without dims and observed the shape of the AR would be (2000,) and not (1, 2000)

cluhmann · August 11, 2022, 3:45am

That didn’t seem to work for me.

ricardoV94 · August 11, 2022, 3:53am

Did specifying shape work?

cluhmann · August 11, 2022, 3:55am

Specifying shape=2 worked, but that’s a different model.

ricardoV94 · August 11, 2022, 3:57am

I assume you mean shape=(2, 1000).

It is not a different model, why do you say that?

cluhmann · August 11, 2022, 3:58am

Oh, no I didn’t try explicitly specifying the shape of the observed data. If ditching the coords was an option, then I would just use 2 AR likelihoods, each with the same parameters.

ricardoV94 · August 11, 2022, 4:00am

You can use dims, the problem is that due to the linked bug, shape is not being properly inferred by them so you need to be redundant and specify shape as well.

Code will benefit from vectorization if you use a single likelihood, so I would advise it.

cluhmann · August 11, 2022, 4:04am

Specifying both dims=("series", "time") and shape=(2, 1000) I get: AssertionError: Steps do not match last shape dimension

cluhmann · August 11, 2022, 4:05am

That’s because I missed a zero. That works!

with pm.Model(coords={"time": df.index.values, "series": [0, 1]}) as m:
    rho = pm.Normal("rho")
    init = pm.Normal.dist()
    sigma = pm.HalfNormal('sigma')
    ar1 = pm.AR("ar1",
                rho=rho,
                sigma=sigma,
                init_dist=init,
                observed=df[['res1', 'res2']].to_numpy().T,
                dims=("series", "time"),
                shape=(2, 10000)
    )

DanWeitzenfeld · August 11, 2022, 2:47pm

Thanks everyone.

In the thread for this bug, @michaelosthege writes: " As a general advice to everybody: The best way to avoid shape issues (like the one this issue originally was about) is to not rely on “smart” broadcasting, but do all broadcasts/resizes intentionally."

I’m trying to grok what that would mean in this case.

ricardoV94 · August 12, 2022, 12:16pm

Well it’s a PyMC bug, I disagree with the statement.

Users shouldn’t have to overthink around bugs. In the meantime, using explicit shape will fix it, but in the future your original example will be perfectly valid.

Topic		Replies	Views
Reproducing PyMC3 AR(1) tutorial in PyMC4 PyMC4	1	965	June 7, 2020
Implementing a manual AR1 model v5 time_series	5	755	May 9, 2025
Time-series regression - shape error / Input dimension mis-match Questions time_series	8	1149	August 6, 2022
Model Data with Multiple AR models? Questions	0	381	June 7, 2021
AR model "too many indices for array" Questions	1	1773	January 18, 2019

AR dims/coords/shape in v4

Related topics