Interesting, thank you! I wonder if your approach would also work with a hierarchical model (as described in this thread)?
# std for likelihood
s = pm.HalfNormal('sigma', 2.)
# covariance matrix
packed_L_Omega = pm.LKJCholeskyCov('packed_L_Omega', n=n_predictors, eta=1, sd_dist=pm.HalfNormal.dist(1.))
L_Omega = pm.expand_packed_triangular(n_predictors, packed_L_Omega)
# group mean
mu = pm.Normal('mu', 0., 5., shape=(n_predictors, n_groups))
# group slopes (non-centered parameterization)
beta_base = pm.Normal('beta_base', 0., 1., shape=(n_predictors, n_groups))
beta = pm.Deterministic('beta', mu + pm.math.dot(L_Omega, beta_base))
t_beta = pm.Deterministic('t_beta', pm.math.dot(R_train_inv, beta))
# group intercepts
alpha = pm.Normal('alpha', 0., 5., shape=(n_samples, n_groups))
# likelihood:
# group_idx is a `(n_samples,)` array of integer group indices
y_hat = alpha[group_idx] + (Q_train * beta[:, group_idx].T).sum(axis=-1)
y = StudentT('y', nu=2, mu=y_hat, sd=s, observed=y_train)
The n_groups
doesn’t change between train and test time. group_idx
does change, though - in accordance with the new X_test
. Thoughts?