I have a model as shown below that regresses previous data for a unit/context in a hierarchical model to newer data. I’m using the alpha0 coefficient fit from historical data (N_0, c_0), and then centering it and feeding it into a regression. My question - will mu0 and a0 get unintentionally partially fit on the new data (N_, c_) as well as the old data? Is there a way to avoid this and only fit mu0 and a0 on the historical data (N_0, c_0)?
with pm.Model() as m1:
# Model historical data
mu0 = pm.Normal("mu0", -1.85, 0.5, shape=2)
a0 = hierarchical_normal("a0", (n_units,2), mu0)
p0 = pm.math.invlogit(a0)
conv_pre = pm.Binomial("conv_pre", N_0, p0[i_, v_], observed=c_0)
# Model new data
theta = pm.HalfNormal("theta", 1)
mu_a = pm.Normal("mu", -1.85, 0.5, shape=2)
a = hierarchical_normal("a", (n_units,2), mu_a)
## Center historical data
centeredX = pm.Deterministic("centeredX", (a0[i_] - a0[i_].mean() ))
## Regress new data onto historical data
p = pm.math.invlogit(a[i_, v_] + centeredX*theta)
conv = pm.Binomial("conv", N_, p, observed=c_)
trace = pm.sample(target_accept=0.95, tune=2000, return_inferencedata=True)
Side question, could someone sanity check the centering (mean-zeroing) on this model? For hierarchical models should I subtract the global mean from each context or the context mean from each context?
The posteriors of
a0 will definitely reflect
conv depends on
p, which depends on
centeredX which depends on
a0 (which depends on
mu0). I don’t quite understand what your model is doing, so I can’t necessarily make recommendation about how to revise it.
As a side note, you have defined
conv twice, which will prevent pymc3 from building this particular model.
I was trying to make a hierarchical extension to CUPED (for variance reduction in experimentation) to reduce variance in the global parameter mu.
That being said I just realized this now (it feels like it was insanely obvious and I looked straight past it), priors would have accomplished the same variance reduction so why bother with this complicated model above
Anyway, still an interesting question - Is there a world where I can force a model to fit things sequentially, or would I have to have separate model contexts?
If you are looking to estimate a parameter (or more than one) on one data set and use the estimate (posterior) in a subsequent analysis, you might want to check out this notebook and the
from_posterior() function specifically. That function takes posterior samples and converts them into a distribution that can then be used in a subsequent model. I’m not super confident that this is your best bet, but it might point you in a useful directly.
@cluhmann This is fantastic thank you!