Linear models with ARMA residuals

Here is a link to a paper I co-wrote which uses PyMC3 to solve linear models with residuals modelled by an autoregressive moving average or ARMA(p, q) distribution, that is, by any stationary distribution according to Wold’s Theorem.

Quantifying influences on intragenomic mutation rate

This may be of interest to others as it allows values of p and q > 1, which do not seem to be implemented as native PyMC3 distributions at this point. Code is available at:

https://github.com/helmutsimon/ProbPolymorphism/blob/master/shared/recombination.py

Note that these models require the derivation of reasonably informative prior distributions to converge (e.g. using statsmodels). This is likely because using p+q ARMA parameters is redundant - the number of parameters can be reduced to m=max(p, q+1) ( Chib, Siddhartha, and Edward Greenberg. “Bayes inference in regression models with ARMA (p, q) errors.” Journal of Econometrics 64.1-2 (1994): 183-206.)

1 Like

Thanks for sharing! Any reason you are using Metropolis to sample some parameter? Using NUTS should work out of the box.
For reference, I have been working on some ARIMA models, with reference mostly from the Stan implementation in https://github.com/asael697/varstan/blob/master/inst/stan/Sarima.stan

I dont think this understanding is correct - you still need p + q parameters, but you can express it as m=max(p, q+1) linear functions when express in the state space form (this is how StatsModels doing under the hood). For more information see chapter 3.4 of Time Series Analysis by State Space Methods

My recollection is that I found NUTS very slow compared to Metropolis for the ARMA parameters - I made a comment on Issue 1304. As far as parameterisation was concerned, I was hoping that some more parsimonious approach would allow models using uniform prior distributions to converge. Have you come across similar issues with prior distributions in your ARIMA work?

The autoregressive parameter need to be somewhat regulated otherwise you might have a degenerate time series - I am using either Uniform(-1, 1) or Normal with a small scale as prior and those tend to work well.

Hi @junpenglao, I was hoping you could provide some guidance on how to use NUTS to solve the simplest form of an ARMA(1,1) model? In that case, I am only considering a fixed theta parameter equal to -1, as it is the derivative of an AR(1) process*. The solution proposed by @Helmut is not satisfying to me because a) I’d prefer to use NUTS and b) I do not need to actually simulate the noise, only describing the logp would be enough (it is much faster than actually sampling the AR1 residuals). So far for a simple AR(1) I found the following to work:

y = ... # my observed data, presumably some trend + auto-correlated noise

with pm.Model():
   ...  # here define prior parameter and expression for `model`
   ...  # here define prior for rho and sigma, and define init_dist
   ar1dist = pm.AR.dist(rho=rho, sigma=sigma, steps=y.size-1, ar_order=1, init_dist=init_dist)
   logp = pm.logp(ar1dist, pytensor.shared(y)-model)
   pm.Potential('ar1', logp)
   trace = pm.sample()

Now I am embarassed how to define a proper distribution for the modified ARMA(1,1) with fixed MA(1) parameter theta = -1. I was trying to do it from scratch, and I could even work out an expression for the covariance matrix, but doing matrix operation for a time-series of size 100 does results in an error (I tried using pytensor’s cholesky factors, pinv etc to define an expression for the logp, to no avail). I presume this is not the way to go.

I have been thinking, maybe subclassing the AR class and rewriting the rv_op would be more appropriate? (here) (since there are no additional parameters).

Thanks for any hint.

*I could just use AR(1) model on the cumulative sum, but the series has missing values, so it would need to be split into contiguous segments, and the correlation between two distinct segment would be lost. That’s why I have been trying to work on the differentiated series.