Hi guys,
I have a large data set “factors” with rough shape (2.4mio, 1200), where I added an intercept column. In the data set there is a column “era” which is bundling samples belonging to the same time period together. Note: The length of the bundles varies for each era.
What I am trying to achieve is a rolling regression, where I keep the regression coeffs constant in each era but allow them to vary over the eras. Below you find the code I have so far, but am now stuck with.
features_per_era = [factors[factors.era == era].drop(columns=["era", "target"]).to_numpy() for era in factors.era.unique()]
targets_per_era = [factors[factors.era == era].target.to_numpy() for era in factors.era.unique()]
init_model = pm.Model()
with init_model:
priors = list()
for feature in ["intercept"] + features:
sigma = pm.Exponential(f"sigma_{feature}", 50.0)
beta = pm.GaussianRandomWalk(f"beta_{feature}", sigma=sigma, shape=len(features_per_era))
priors.append(beta)
betas = tt.stack(priors)
with init_model:
regression = pm.math.dot(features_per_era, betas)
sd = pm.HalfNormal("sd", sd=0.1)
likelihood = pm.Normal("target", mu=regression, sd=sd, observed=targets_per_era)
There is an error in the regression calculation because features_per_era is a list object
TypeError: Unsupported dtype for TensorType: object
How can I achieve to calculate a regression with the corresponding betas per era?
Cheers