Linear Regression Model with GARCH Residuals (understanding the GARCH11 random variable in PYMC)

Hi there,

I would like to combine a Linear Regression Model with Garch(1,1) noise, and I have seen that there is already a built-in pm.Garch11 distribution in PYMC. I started with a standard Linear Regression Model, and replaced the std parameter for the data with np.sqrt( (σ_t) ** 2 ), where ‘σ_t’ is a random variable from pm.Garch11.

Unfortunately, the sampling speed even for a very simple dummy model is extremely slow, and I was wondering if someone has advice on improving the model stated below. I assume it has something to do with my standard deviation formulation np.sqrt( (σ_t) ** 2 ) for the data, but I could not figure out what I did wrong here.

Data Generation:

import numpy as np, pandas as pd, matplotlib.pyplot as plt
import pymc as pm, arviz as az

np.random.seed(42)
n, phi, sigma = 500, 0.8, 1
X1, X2 = np.linspace(0, 10, n), np.linspace(5, 15, n)
Y_linear = 1.0 + 0.25 * X1 + 0.5 * X2

# Generate AR(1) noise
ar_noise = np.zeros(n)
ar_noise[0] = np.random.normal(0, sigma)  # First element in AR(1) series
for i in range(1, n):
    ar_noise[i] = phi * ar_noise[i - 1] + np.random.normal(0, sigma)

Y = Y_linear + ar_noise

df_target = pd.DataFrame(Y, columns = ['Target'])
df_features = pd.concat([pd.Series(X1, name="Feature1"), pd.Series(X2, name="Feature2")], axis=1)
coords = {"T_features": df_features.index, "N_features": df_features.columns, "N_target": df_target.columns}

Model Parameterization:

with pm.Model() as LinearRegressionGARCH:
    # Data for features and target
    feature_array = pm.Data("feature", df_features.values, mutable=True, dims=("T_features", "N_features"))
    target_array = pm.Data("target", df_target.values.squeeze(), mutable=True, dims="T_features")

    # Linear Regression Part
    β0 = pm.Normal("β0", mu=0.0, sigma=10)
    β = pm.Normal("β", mu= 0, sigma=10, dims="N_features")
    mu_t = pm.Deterministic("mu_t", β0 + pm.math.dot(feature_array, β), dims="T_features")

    # GARCH Parameter
    # GARCH(1,1) Model for volatility
    ω = pm.HalfNormal("ω", sigma=5.0)  # Constant innovation variance
    
    ## these two's sum is bounded by 1    
    α = pm.Beta("α", alpha=1, beta=1)  # ARCH parameter (constrained between 0 and 1)
    β_garch = pm.Uniform('β_garch', 0, (1-α)) # GARCH parameter (constrained between 0 and 1)
    
    initial_vol = pm.HalfNormal("initial_vol", sigma=5.0)
    σ_t = pm.GARCH11("σ_t", omega=ω, alpha_1=α, beta_1=β_garch, initial_vol=initial_vol, dims="T_features")

## Evaluate Model    
    observed = pm.Normal("observed", mu=mu_t, sigma= np.sqrt( (σ_t) ** 2 ), observed=target_array, dims = "T_features")

Sampling:

draws = 1000
tune = 650
cores = 4
target_accept = 0.90
DEFAULT_RNG = np.random.default_rng(123)

with LinearRegressionGARCH:
    trace = pm.sample(draws, tune=tune, cores=cores, target_accept = target_accept, random_seed = DEFAULT_RNG)
    trace_predictive_oos = pm.sample_posterior_predictive(trace, predictions=True, var_names = ['observed'])

For speed alone, there are a few things you can try and check.
First, did you install using conda? The package requires a toolkit to enable non-NumPy BLAS.
Second, you can try pip installing jax jaxlib numpyro, and then make these calls

import pymc.sampling_jax as pjax
import pymc as pm

with pm.Model() as model1:

         blah

         idata = pjax.sample_numpyro_nuts((draws, tune=tune, cores=cores, target_accept = target_accept, random_seed = DEFAULT_RNG)

This will give you a potentially massive speed boost.
Third, you can gain even more speed if you leverage the GPU. Make sure you’re working on a linux machine, have a nvidia gpu, install the nvidia toolkit, install the g++ compiler if it doesn’t happen automatically, then install jax via pip install jax[cuda12] (or whatever version of cuda you’re capable of running.

Happy speed boosting :slight_smile:

1 Like

You can also get the numpyro (and blackjax, and nutpie) samplers without any extra imports, via pm.sample(nuts_sampler='numpyro')

1 Like

Thank you both!

I will have a look if different sampler would speed up the sampling process. It tried the JAX version, but was unable to get it going with the GARCH11 random variable

NotImplementedError: JAX does not support slicing arrays with a dynamic
slice length.

@jessegrabowski I saw your really useful posts about ARMA GARCH models in PYMC, and might just take that for my problem. It still buffles me that this rather simple example above does not work. My pymc version is from a new environment with version 5.16.1.

Looking at your model more closely, I don’t think it does what you think it does. GARCH11 isn’t a distribution that produces an autoregressive sigma, it produces observed data that follows a GARCH process. From the docstring:

    GARCH(1,1) with Normal innovations. The model is specified by

    .. math::
        y_t \sim N(0, \sigma_t^2)

    .. math::
        \sigma_t^2 = \omega + \alpha_1 * y_{t-1}^2 + \beta_1 * \sigma_{t-1}^2

    where \sigma_t^2 (the error variance) follows a ARMA(1, 1) model.

What you get back is the y_t values, not the \sigma_t^2. This is evident if you do pm.draw(σ_t) – you will see that draws are both positive and negative, which is double-impossible for variance (one because it’s a variance, and two because it’s squared!). The model is sampling slowly because it’s wrong.

This is also a good object lesson in why it’s a good idea to do prior predictive sampling! I actually don’t do this kind of “build the model once in numpy and once in PyMC” workflow anymore. I recommend you make a model with no data, sample the prior, then use pm.observed to add observed data to the model. An example of this workflow is here, starting from the “motivating example” section. If you follow this workflow, you would have discovered your priors are all NaN.

What you actually want is a GARCH process with a non-zero mean, and for that you will have to follow somethig like this gist, which I guess you’ve already seen. Just note that you should not use the register_rv api that I used in that example – this is out of date. Instead, use a CustomDist, as in this AR from scratch example.

1 Like

Thanks a lot for the input! I didnt use the prior predictive checks, but used the squared data of this Garch11 process as variance for the observed data.

# Running on PyMC v5.16.1
σ_t = pm.GARCH11("σ_t", omega=ω, alpha_1=α, beta_1=β_garch, initial_vol=initial_vol, dims="T_features")
observed = pm.Normal("observed", mu=mu_t, sigma= np.sqrt( (σ_t) ** 2 ), observed=target_array, dims = "T_features")

I realize my naming convention for this was not great, apologies for that. But looking at it, this seems to be a reasonable alternative:

N_draws = 1000
init = pm.HalfNormal.dist()
σ_dist = pm.GARCH11.dist(omega=1.0, alpha_1=0.25, beta_1=0.65, initial_vol=init, shape = N_draws)
data = pm.draw(σ_dist)

plt.plot( pd.DataFrame(data), label = 'data' )
plt.plot( np.sqrt( pd.DataFrame(data) ** 2 ), label = 'volatility parameter' )
plt.legend()
plt.show()

This makes me realize though that I could directly define the residuals of df_target - mu_t as Garch11 process, then I do not need to make any transformation for it. However, it seems like PYMC does not let me allow to make Deterministic Parameter as ‘observed’ in the model. Is there any way around that?

with pm.Model() as LinearRegressionGARCH2:
    # Data for features and target
    feature_array = pm.Data("feature", df_features.values, mutable=True, dims=("T_features", "N_features"))
    target_array = pm.Data("target", df_target.values.squeeze(), mutable=True, dims="T_features")

    # Linear Regression Part
    β0 = pm.Normal("β0", mu=0.0, sigma=10)
    β = pm.Normal("β", mu= 0, sigma=10, dims="N_features")

    mu_t = pm.Deterministic("mu_t", β0 + pm.math.dot(feature_array, β), dims="T_features")
    residual_t = pm.Deterministic("residual_t", target_array - mu_t, dims="T_features")
        
    # GARCH Parameter
    ω = pm.HalfNormal("ω", sigma=5.0)  # Constant innovation variance
    α = pm.Beta("α", alpha=1, beta=1)  # ARCH parameter (constrained between 0 and 1)
    β_garch = pm.Uniform('β_garch', 0, (1-α)) # GARCH parameter (constrained between 0 and 1)    
    initial_vol = pm.HalfNormal("initial_vol", sigma=5.0)

    observed = pm.GARCH11("residual_t", omega=ω, alpha_1=α, beta_1=β_garch, initial_vol=initial_vol, observed=residual_t, dims="T_features")

#TypeError: Variables that depend on other nodes cannot be used for observed data.The data variable was: residual_t

Modelling the residual directly as observed would be quite convenient, as I could also extend this to something that has non-zero mean later on.

You don’t observe residuals, so it doesn’t make sense to set them as observed.

You can force a model to work this way by working directly with logp terms via pm.Potential, but we do not plan to allow it in general.

I understand that you forced the outputs of the GARCH11 process to be strictly positive. But again, these are realizations from y \sim N(0, \sigma_t), where \sigma_t \sim \text{GARCH11}(\alpha, \beta). They are not the latent variance terms of the GARCH process, and don’t make sense as a time series of variances.

Thanks for the clarification!

So to implement my case, I should follow your Arma-Garch notebook and try to define this as Custom distribution as defined in your link.

To add the regression weights (mu_t from my example) to the model, I need to find a way to replace the scalar mean parameter for the ARMA part with my vector of mu_t?

You can compute mu = X @beta outside of the scan part, then pass it in as a sequence argument to the scan. Alternatively, you can pass in X as the sequence, then compute mu = x @ beta at each time-step inside the inner-loop of the scan.

Have a try and post if you hit any snags so I can help debug.

1 Like

@jessegrabowski Thanks for all the information! I started with a simple AR(1) process from the docs by adding a scalar mean as proxy for the Linear Regression part, and I can recover the parameter for simulated data (so all good!). Here is the code for anyone that needs it:

# Get Packages
import numpy as np, pandas as pd, matplotlib.pyplot as plt
import pymc as pm, arviz as az
import pytensor
import pytensor.tensor as pt
from pymc.pytensorf import collect_default_updates

# Generate the AR process
n, rho, sigma, omega = 500, [.65], 0.5, 2.0
ar_process = np.zeros(n)
ar_process[0] = 2*omega + np.random.normal(omega * rho[0], sigma)  # Initial values
for t in range(1, n):
    ar_process[t] = omega + rho[0] * ar_process[t-1] + np.random.normal(0, sigma)

lags = len(rho)
df_target = pd.DataFrame(ar_process, columns = ['Target'])[lags:]
df_features = pd.DataFrame( np.ones((df_target.shape[0], 2)) )

## Define Custom Distribution
def ar_step(data, mu, rho, sigma):
    mu_t = mu + data * rho
    data_new = mu_t + pm.Normal.dist(sigma=sigma)
    return data_new, collect_default_updates(data_new)

# Here we actually "loop" over the time series.
def ar_dist(data, mu, rho, sigma, size):
    ar_innov, _ = pytensor.scan(
        fn = ar_step,
        outputs_info = [{"initial": data,}],
        non_sequences = [ mu , rho, sigma,],
        n_steps = len(df_features.index),
        strict = True,
    )
    return ar_innov

## Define Coordinates
coords = {
    "lags": range(-lags, 0),
    "T_features": df_features.index, 
    "N_features": df_features.columns, 
    "N_target": df_target.columns,
    "steps": range( len(df_features.index) )
}

with pm.Model(coords=coords, check_bounds=False) as model:
## Data for features and target
    feature_array = pm.Data("feature", df_features.values, mutable=True, dims=("T_features", "N_features"))
    observed = pm.Data("observed", df_target.values.squeeze(), mutable=True, dims="T_features")

## AR Parameter
    ρ = pm.Normal(name="ρ", mu=0, sigma=0.5, )
    σ = pm.HalfNormal(name="σ", sigma=0.5)
    ar0 = pm.Normal(name="ar0", sigma=0.5, ) #dims=("lags",))

# Obtain Mean weights
    μ = pm.Normal(name="μ", sigma=5.0)

## Observations - Assign Custom Distribution for LinReg-AR distribution
    ar_innovations = pm.CustomDist("ar_dist",
        ar0, μ, ρ, σ,
        dist = ar_dist,
        observed = observed,
    )

## Print some of the output
    ar_trajectories = pm.Deterministic(
        name="ar_trajectories", var=pt.concatenate([[ar0], ar_innovations], axis=-1),
    )

pm.model_to_graphviz(model)

# Sampling
with model:
    trace = pm.sample(500, tune=500, cores=4, target_accept = 0.75, random_seed = np.random.default_rng(123))
    trace_predictive_oos = pm.sample_posterior_predictive(trace, predictions=True, var_names = ['observed'])

My problem comes now: As suggested, I try to change the mu term for the AR process to a sequence input. As a proxy for the Linear Regression, I just take this sequence as feature_array @ β, where feature_array is just a numpy array of ones. Hence, the sum of β term should just be exactly the same parameter as the scalar mu from my initial code.

I kept the step function as before, and only adjusted the custom-distribution, changing the mu-argument from non-sequence to sequence, and adjusted the dimensions in the model. Here is the MWE:

def ar_dist2(data, mu, rho, sigma, size):
    ar_innov, _ = pytensor.scan(
        fn = ar_step,
        outputs_info = [{"initial": data,}],
        sequences = [mu],
        non_sequences = [ rho, sigma,],
        n_steps = len(df_features.index),
        strict = True,
    )
    return ar_innov

with pm.Model(coords=coords, check_bounds=False) as model2:
## Data for features and target
    feature_array = pm.Data("feature", df_features.values, mutable=True, dims=("T_features", "N_features"))
    observed = pm.Data("observed", df_target.values.squeeze(), mutable=True, dims="T_features")

## Other AR Parameter
    ρ = pm.Normal(name="ρ", mu=0, sigma=0.2, )
    σ = pm.HalfNormal(name="σ", sigma=0.2)
    ar0 = pm.Normal(name="ar0", sigma=0.5, )

# Obtain Mean weights
    β = pm.Normal("β", mu = 0, sigma=10.0, dims="N_features")
    μ_t = pm.Deterministic("μ_t", feature_array @ β, dims="T_features")

## Observations - Assign Custom Distribution for LinReg-AR distribution
    ar_innovations = pm.CustomDist(
        "ar_dist",
        ar0, μ_t, ρ, σ,
        dist = ar_dist2,
        observed = observed,
    )

## Print some of the output
    ar_trajectories = pm.Deterministic(
        name="ar_trajectories", var=pt.concatenate([[ar0], ar_innovations], axis=-1),
    )

pm.model_to_graphviz(model2)
#

I can get a DAG that looks reasonable to me, but I cannot recover the correct parameter back (sum of betas should be mu, but it looks like I only sample from the prior here). Is there anything wrong with my understanding of this Custom Distribution? Also, is there a “better” way to check if my implementation for scan and CustomDistribution works? I.e., assign a set of parameter and sample from it (other than sampling from prior-predictive)?

Looking at prior predictive checks between the 2 models, it looks like that the scales of the samples is really off in the second model.

Even just replacing the Regression weights with something simple like:

    μ_t = pm.Normal("μ_t", mu = 2., sigma = 1, dims="T_features")

causes the prior predictive samples to often explode. This makes me think that I did not correctly apply the “sequences” argument for the Custom distribution. It seems like the mean term here is added at every step instead of only once in the scalar case, but I am somewhat out of ideas of how to tackle that. I tried to change the mu term and express it as differences via:

    Δμ_t = pm.Deterministic("Δμ_t", pt.concatenate([[0], pt.diff(μ_t)] ), dims="T_features")

but I still cannot recover the parameter unfortunately. It looks like I just sample Beta_0 and Beta from the prior, without incorporating the data information.

The inputs to the inner function of scan need to be ordered based on which scan argument they are passed into, in the following order:

  1. sequences
  2. outputs_info
  3. non_sequences

So when you “promote” mu from a non_sequence to a sequence, you will need to re-arrange the signature of your step function to ar_step(mu, data, rho, sigma)

You will also not be able to identify the two betas, because each feature vector is just ones. This is equivalent to a model y_t = \alpha_1 + \alpha_2 + \rho y_{t-1} + \epsilon_t. You cannot identify both \alpha_1 and \alpha_2, because there are an infinite combination of two numbers that add up to whatever the true intercept should be.

Using the \Delta\mu_t approach, you only have one data point to estimate each parameter. My expectation here is that you will always get the prior back.

2 Likes

Thanks, this solved my issues and recovered the parameter again! Will add the other MA-Garch parameter now.

@jessegrabowski - sorry to cc you again, I just cannot come up with a solution myself based on what I read up.

I had a look at the unconditional posterior predictive samples of the model above, which look good to me - what is the the best way now to obtain conditional posterior predictive samples (Samples conditioned on the observed data series, instead of propagating forward the initial state over the whole sequence; similar to i.e. statsmodels)?

I would like to keep the initial state for the first observation, and then used the observed sequence for the step function. Based on what I understood in the tutorials, I need to assign the observed data as “sequences” in the scan function with taps = [-1], and the initial state in outputs_info :

## Define Custom Distribution
def ar_step(data, mu, rho, sigma):
    mu_t = mu + data * rho
    data_new = mu_t + pm.Normal.dist(sigma=sigma)
    return data_new, collect_default_updates(data_new)

# Here we actually "loop" over the time series. -> This will recover the marginal Distribution of the Posterior
def ar_dist(data, data0, mu, rho, sigma, size): #data0, 
    ar_innov, _ = pytensor.scan(
        fn = ar_step,
        sequences = [dict(input = data, taps = [-1]) ],
        outputs_info = [dict(initial = data0, )],
        non_sequences = [ mu, rho, sigma,],
        n_steps = len(df_features.index),
        strict = True,
    )
    return ar_innov

However, it seems like the ar_step function needs 5 instead of 4 arguments now in the pymc model, so my implementation is incorrect. From the code above, I believed that either sequences or outputs_info would be assigned to the step function, but not both.

What is the recommended way in pymc to obtain posterior predictive samples for both the unconditional and conditional times series (given the observed target sequence)?

Edit: I managed to get the conditional posterior by creating a new CustomDistribution. Here is the code for it:

# Here we condition on the observed data by passing it through the `sequences` argument.
def conditional_ar_dist(data, mu, rho, sigma, size):
    ar_innov, _ = pytensor.scan(
        fn=ar_step,
        sequences=[{"input": data, "taps": -lags}],
        non_sequences=[mu, rho, sigma],
        n_steps = len(df_features.index),
        strict=True,
    )
    return ar_innov

and then adding it to the model at the end of the expression:

## Define Coordinates
coords = {
    "lags": range(-lags, 0),
    "T_features": df_features.index, 
    # 'T_features_prediction': df_features.index[1:],
    "N_features": df_features.columns, 
    "N_target": df_target.columns,
    "steps": range( len(df_features.index) )
}
# print(coords)

with pm.Model(coords=coords) as model: #, check_bounds=False
    
## Data for features and target
    feature_array = pm.Data("feature", get_values(df_features, squeeze = False), dims = ("T_features", "N_features") )
    target_array = pm.Data("target", get_values(df_target, squeeze = True), dims = "T_features" )

## AR Parameter
    # ρ = pm.Normal(name="ρ", mu=0, sigma=0.5, )
    ρ = pm.Uniform('ρ', -1, 1)  
    σ = pm.HalfNormal(name="σ", sigma=0.5)
    ar0 = pm.Normal(name="ar0", sigma=0.5,)

# Obtain Mean weights
    μ = pm.Normal(name="μ", sigma=1.0)

## Unconditional Posterior #Observations - Assign Custom Distribution for LinReg-AR distribution
    observed = pm.CustomDist("observed",
        ar0, 
        μ, ρ, σ,
        dist = ar_dist,
        observed = target_array,
        dims = "T_features",
    )

## Conditional Posterior Based on Observed Sequence
    observed_conditional = pm.CustomDist("observed_conditional",
        pt.concatenate([[ar0], target_array], axis=-1),
        μ, ρ, σ,
        dist = conditional_ar_dist,
        dims = "T_features",
    )
  1. The current implementation works, but it made the sampling step ~5-10 times slower. I still recover the parameter I used to simulate the target data, and marginal/conditional posterior looks correct, so I believe the model is correct now. What is the reason I see such a steep increase in sampling time? I believe I just would need to “draw” the trajectory for observed_conditional, nothing else. Is there a way I can do that without influencing performance or trying to create a new model?

  2. I also managed to implement a forecast distribution, this time just using the final data point as initial datapoint for the CustomDistribution. Below is a MWE in case someone needs it:

# Assign distribution:
def forecast_ar_dist(data, mu, rho, sigma, n_steps,  size):
    ar_innov, _ = pytensor.scan(
        fn=ar_step,
        outputs_info=[{"initial": data,}], # "taps": -lags}], #[{"initial": forecast_initial_state, "taps": range(-lags, 0)}],
        # sequences=[{"input": data, "taps": -lags}],
        non_sequences=[mu, rho, sigma],
        n_steps = n_steps, #len(df_features.index),
        strict=True,
    )
    return ar_innov

# Add to model
with pm.Model(coords=coords) as model: #, check_bounds=False
... #code from before

## Forecast Distribution based on last -lags target data as initial values
    forecasts = pm.CustomDist("forecasts",
        get_values(df_target.iloc[-lags:, :], squeeze = True),
        μ, ρ, σ,
        len(coords['forecast_steps']),
        dist = forecast_ar_dist,
        dims = "forecast_steps",
    )

This worked as well :D! Unfortunately, I have now 3 distributions in my model (observed / observed_conditional / forecasts ), of which only the first one is needed to fit the model, and the other 2 are only used for simulation. Is there a way/keyword so observed_conditional and forecasts are just drawn from the current sample and otherwise ignored during the pm.sample step?