I’m fairly new to Bayesian modeling but, I’m hoping to incorporate more of this into my data analysis routine. I have a big dataset from an experimental neuroscience study with multiple subjects and time-series data.
For this specific example, I have a 34081x32 design matrix that contains samples across 8 subjects. To fit this model, I was hoping to formulate this as a hierarchical model with random slopes and intercepts. My model is:
coords = {
'Mice' : pd.Series(mice),
'obs_id': x_copy.index.to_numpy(),
'predictors': pd.Series((x_copy.columns))
}
with pm.Model(coords=coords) as model:
data = pm.MutableData('x', x_copy, dims=("obs_id", "predictors"))
g = pm.MutableData("group", group_idx, dims="obs_id")
# Hyperpriors
mu_a = pm.Normal('mu_a', mu=0., sigma=10)
sigma_a = pm.HalfNormal('sigma_a', 5.)
mu_b = pm.Normal('mu_b', mu=0., sigma=10)
sigma_b = pm.HalfNormal('sigma_b', 5.)
sigma_hyper = pm.HalfNormal('sigma_hyper', 5.)
# Non-centered Reparameterization for group-level parameters
z_betas = pm.Normal('z_betas', mu=0., sigma=1., dims=("Mice", "predictors"))
z_intercept = pm.Normal('z_intercept', mu=0., sigma=1., dims="Mice")
betas = pm.Deterministic('a', mu_a + z_betas * sigma_a)
intercept = pm.Deterministic('b', mu_b + z_intercept * sigma_b)
# Model error
sigma = pm.HalfNormal('sigma', sigma=sigma_hyper, dims="obs_id")
# Define likelihood
mu = pm.Deterministic('mu', intercept[g] + (data * betas[g]).sum(axis=-1))
likelihood = pm.Normal('y', mu=mu, sigma=sigma, observed=y, dims="obs_id")
I used non-centered re-parameterization and ADVI to initialize the model before using NUTS sampling, but because of the complex nature of the model, this is taking an extremely long time to properly get enough samples. So, I was hoping I could get some input from more seasoned Bayesians. Do I abandon some things I want in my model? Do I put this on a super computing cluster and block off a week to sample?
All help is appreciated. I apologize if anything looks weird. Like I said, I’m new to this.