How to keep beta non-negative in Rolling Regression?

Hello, I want to migrate a GLM model to Rolling Regression Model, but I’m encountering some issues.

I don’t know how to keep all factors non-negative in MvNormal. My current approach is to use pm.math.exp . This actually changes the distribution of the sample, and I’m not sure if it’s a good idea. At the same time, pm.Truncated is not implemented for multivariate distributions, and MvHalfNormal seems to be unavailable.

Here is my current code, and I’m hoping someone could help me.

signal_num = data.shape[1]-1

COORDS = {'coeffs':data.iloc[:,:-1].columns.tolist(),'time':data.index.values}
model = pm.Model(coords = COORDS)
with model:
    sigma = pm.HalfNormal("sigma", sigma=0.1)
    X = pm.MutableData("X",data.iloc[:,:-1].values,dims=('time','coeffs'))
    beta_sigma = pm.Exponential("beta_sigma", 10.0,dims='coeffs')
    beta = pm.MvGaussianRandomWalk("beta", 
                                mu=np.zeros(signal_num),
                                dims=("time","coeffs"),
                                cov=pytensor.tensor.diag(beta_sigma),
                                init_dist=pm.MvNormal.dist(0.1, 0.01*np.eye(signal_num)))
    
    Y = pm.MutableData("Y",data['Y'].values)
    mu = pm.math.sum(X*pm.math.exp(beta),axis=1)
    Y_obs = pm.Normal("Y_obs", mu=mu, sigma=sigma, observed=Y)
    idata = jax.sample_numpyro_nuts(chains=1,draws=1000,tune=1000,postprocessing_backend="cpu",)

My original GLM model is

COORDS = {'coeffs':data.iloc[:,:-1].columns.tolist()}
model = pm.Model(coords = COORDS)
with model:
    sigma = pm.HalfNormal("sigma", sigma=0.1)
    X = pm.MutableData("X",data.iloc[:,:-1].values)
    Y = pm.MutableData("Y",data['Y'].values)
    beta = pm.TruncatedNormal(name = 'beta',mu=0.1,sigma=0.1,lower=0,dims="coeffs")
    mu = pm.math.dot(X,beta)
    Y_obs = pm.Normal("Y_obs", mu=mu, sigma=sigma, observed=Y)
    idata = jax.sample_numpyro_nuts(chains=4,draws=10000,tune=10000)

I would exponentiate just like you did, or after multiplying by X, if X can be non-positive.

Otherwise you could use a LogNormal likelihood and model mu without the exponentiation.

Thank you for your reply! After adding the time dimension, the in-sample model performance has significantly improved, but overfitting is also apparent. I would be grateful if you could show me how to add an L1 (L2) penalty.

In a bayesian setting you don’t usually worry about overfitting as you are not doing maximization, but integration over parameter uncertainty.

The more complex your model is, the more parameters it will (usually) have and more ways to spread out uncertainty when intregating with MCMC.

If your model seems “too certain” that could mean the opposite: you don’t have enough structure in it and you are underfitting. For example, you are just fitting the mean, but ignoring covariates, and with enough datapoints there is not much uncertainty about what the mean could be. However your predictions should still be uncertain, not overconfident.

That means you may need to try a more complex model that more closely resembles the process that could generate your data. Or you may need a more “spread out” likelihood function like a student-t instead of a normal.

In addition, if you want to be “skeptical” of large effects you can make your priors more concentrated around the “no-effects” region, say by using a normal more concentrated around zero as a prior for your coefficients, if zero means “no effect” in your model. This is somewhat equivalent to the use of regularizing coefficients in Maximum Likelihood Estimation.

The only other source of “overfitting” would be if you have very narrow priors around a specific non-desirable configuration. You can assess this by doing prior predictive sampling and seeing if the range of predicted values seems reasonable before you observed any data. You can make them more diffuse in that case. This is not usually a big problem if you have a lot of data, as the likelihood will quickly swamp your prior information.

1 Like

Hi ricardo, thank you for your reply and I really appreciate your help!

My project mainly aims to predict stock prices using several economic indicators. I believe that the t-distribution is more reasonable in practice. However I only have one graphic card and it takes me an hour to draw 2000 samples, I’ll come back later and update how it works.

The reason I mentioned that the model was overfit is that the (mean of) beta changes too quick among time. From a practical standpoint, I hope the beta for last month might be 1 and this month it could be 0.95 or 0.9. However, the actual model gave very fast changes in betas, where it could be 1 today and 50 tomorrow. This did indeed improve the in-sample prediction accuracy, but it also brought significant uncertainty.

In addition, I only have limited data (20-40 factors, 5~10K days), therefore, I am not sure if I can design a more complex model, and I have not focused on the covariate matrix - I’m not able to set a good prior.

I will try to use some methods to smooth Y or modify the prior. If you don’t mind, I would appreciate hearing your thoughts on this.

You can try to force the innovations to be smaller by having your beta_sigma prior closer to zero.

The StudentT likelihood should also allow the model to jump less when it sees temporary spikes, as those can be accounted by the larger tails… at the cost of more uncertainty when doing predictions.

Stock market predictions are notoriously hard to predict in the short range (and may be altogether unpredictable), so you may have to accept something unsatisfactory in the end. It’s a fine balance between curve fitting and keeping predictive ability.

Thank you for your response! I have a limited understanding of Bayesian statistics and am currently working on learning more. I was hoping you could correct any misconceptions I have, and your assistance has certainly helped in this regard. Once again, I sincerely appreciate your help.

Actually I realize I was talking before about underfitting (with overconfidence) and not overfitting, which you asked for.

I didn’t expect your model could overfit, and I think it can’t, so I just started blabbing about underfitting. I called it overfitting by mistake.

The RandomWalk can’t really overfit your data since it is just a random walk. What it can do is get very confident about the innovation scales in a way that a mean-only model can get very confident about the mean of the data.

I agree it is difficult to add much more structure to this kind of models, but you could at least allow for covariance between the coefficients?

Maybe Vector Autoregression models may give you ideas on how to add more structure to timeseries data? Bayesian Vector Autoregressive Models — PyMC example gallery