PyMC3 vs Statsmodels AR(1) model parameters estimation

vb690 · September 1, 2022, 1:23pm

Hello everyone

I am struggling to wrap my head around the discrepancy I am observing in the estimates provided by statsmodels and a model I wrote in PyMC3 of an AR(1) process. Given this toy simulated data

n = 300

x = np.random.normal(31, 1, n)
x_ar = np.array([x[i] * 1 if i == 0 else x[i] + (x[i-1] * .6) for i in range(n)])

y = np.random.normal(30, 1, n)
y_ar = np.array([y[i] * 1 if i == 0 else y[i] + (y[i-1] * .6) for i in range(n)])

delta = x_ar - y_ar

I would like to reliably estimate the expected value of the difference of the two AR processes delta (which I know is 1), calculating the empirical median gives as expected a biased estimate

While statsmodels AR seems to retrieve the right values for all the parameters with reasonable success

On the contrary, my PyMC3 model

import pymc3 as pm

with pm.Model() as ar_model:
    
    const = pm.Normal("const", 0, 1)
    rhos = pm.Normal("rho", 0, .3, shape=1)
    sigma = pm.HalfNormal("sigma", 1)    
    obs = pm.AR(
        "ar", 
        rhos, 
        sigma=sigma, 
        observed=delta - const, 
        shape=len(delta)
    )
    
    trace = pm.sample()

Seems to converge to a biased estimate for const

Am I missing something in the specification of my model / my understanding of how AR works in PyMC3?

vb690 · September 5, 2022, 10:55am

This has been solved by using the implementation of AR in PyMC V4 and specifying constant=True and having rho including the value for the constant.

I assume (please correct me if I am wrong) the model specification above is fundamentally different and the constant is not included in the AR process.

jessegrabowski · September 5, 2022, 11:29am

The PyMC V4 AR distribution indeed includes the constant as the 0th rho. Here is some relevant code from the AR distribution that shows how the constant is handled:

        def step(*args):
            *prev_xs, reversed_rhos, sigma, rng = args
            if constant_term:
                mu = reversed_rhos[-1] + at.sum(prev_xs * reversed_rhos[:-1], axis=0)
            else:
                mu = at.sum(prev_xs * reversed_rhos, axis=0)
            next_rng, new_x = Normal.dist(mu=mu, sigma=sigma, rng=rng).owner.outputs
            return new_x, {rng: next_rng}

You can see if the first branch of the if statement that when there is a constant term, the last “reversed_rho” (or the first regular rho) is used as the constant value.

There is a slight difference between the PyMC implementation and the Statsmodels, because Statsmodels uses the “Regression with ARIMA errors” formulation, as explained here. Rob Hyndman writes about the difference here. PyMC’s implementation would be an example of what Hyndman calls the ARMAX model, where \beta is the constant coefficient and x is a vector of ones, and \theta(B) = 0. So you have to do a bit of algebra to convert from one to the other.

In general it’s advisable to difference away the constant term when working with ARIMA class models, if you can. For one thing it avoids these headaches. Just centering and scaling the data will also let you omit the constant term.

vb690 · September 5, 2022, 12:15pm

This is extremely insightful thanks!

However, in this specific case I wanted a reliable estimate of mean of the process, I though an AR(1) model with constant term was the way to go.

Do you have any suggestion for a more appropriate approach here?

jessegrabowski · September 5, 2022, 1:40pm

Using the AR distribution with constant=True then is a fine way to go. Just be cautious in your interpretation, as noted in the blog post I linked.

Topic		Replies	Views
Autoregressive Model Questions	2	1093	October 26, 2021
Diagnosing AR(1) model whose results are way off course Questions	4	650	April 28, 2020
Mixture of normal and AR(p) process? Questions	2	519	November 6, 2018
Model Data with Multiple AR models? Questions	0	380	June 7, 2021
Must be real number, not FreeRV Questions	2	494	October 22, 2021

PyMC3 vs Statsmodels AR(1) model parameters estimation

Related topics