I’m new to Pymc and I’m currently studying cases on pymc website and I found Rolling Regression — PyMC example gallery is very useful for me.
However I want to apply multivariate regression on this case, for example, I store all my Xs in a DataFrame:
and I hope if I could predict y.
Here’s the demo code when you only have one variable:
with pm.Model(coords={"time": prices.index.values}) as model_randomwalk:
# std of random walk
sigma_alpha = pm.Exponential("sigma_alpha", 50.0)
sigma_beta = pm.Exponential("sigma_beta", 50.0)
alpha = pm.GaussianRandomWalk("alpha", sigma=sigma_alpha, dims="time")
beta = pm.GaussianRandomWalk("beta", sigma=sigma_beta, dims="time")
# Define regression
regression = alpha + beta * prices_zscored.GFI.values
# Assume prices are Normally distributed, the mean comes from the regression.
sd = pm.HalfNormal("sd", sigma=0.1)
likelihood = pm.Normal("y", mu=regression, sigma=sd, observed=prices_zscored.GLD.to_numpy())
trace_rw = pm.sample(tune=2000, target_accept=0.9)
My question is, how could I apply this code to multivariate regression? I know one approach is to build beta_1
,beta_2
,beta_3
and change the regression formula to
regression = alpha + beta_1 * x_1.values + beta_2 * x_2.values + beta_3 * x_3.values
Could I just define ONE beta with three dimensions to do the job? I have ~40 variables and it would be pathetic if I define 40 betas…