@zult,
For #1 and #2, you are correct. The dot product doesnt work as beta is now shape=(n_features, n_observations)
As i understand it, by removing the group offset parameter ('a' in @lucianopaz’s example), you end up effectively sampling from the prior of the group offset (a pm.Normal(0,1)), which will generally be around 0, which drops out the sigma_b “group spread” term, leaving the beta as mu_beta. Very nice 
Assuming you are using theano.shared's, is there any problem with just reusing the model after the sample step, instead of re-instantiating a new model via the model factory? Like so:
with model_factory(X=train_X,
y=train_Y,
site_shared=train_site_shared,
n_site=ntrain_site) as train_model:
train_trace = pm.sample()
df = pm.trace_to_dataframe(train_trace,
varnames=['mu_beta', 'sigma_beta', 'a'],
include_transformed=True)
#SET YOUR SHARED INPUTS HERE, then...
ppc = pm.sample_posterior_predictive(trace=df.to_dict('records'),
samples=len(df))
I’m assuming all data from sample is contained inside the trace object, and a fresh model from the model factory is the same as a model after pm.sample()?