# Fit parameters exclusively on one observational model, then use as covariate for another observational model

I have a model as shown below that regresses previous data for a unit/context in a hierarchical model to newer data. I’m using the alpha0 coefficient fit from historical data (N_0, c_0), and then centering it and feeding it into a regression. My question - will mu0 and a0 get unintentionally partially fit on the new data (N_, c_) as well as the old data? Is there a way to avoid this and only fit mu0 and a0 on the historical data (N_0, c_0)?

``````
with pm.Model() as m1:

# Model historical data
mu0 = pm.Normal("mu0", -1.85, 0.5, shape=2)
a0 = hierarchical_normal("a0", (n_units,2), mu0)
p0 = pm.math.invlogit(a0)
conv_pre = pm.Binomial("conv_pre", N_0, p0[i_, v_], observed=c_0)

# Model new data
theta = pm.HalfNormal("theta", 1)
mu_a = pm.Normal("mu", -1.85, 0.5, shape=2)
a = hierarchical_normal("a", (n_units,2), mu_a)
## Center historical data
centeredX = pm.Deterministic("centeredX", (a0[i_] - a0[i_].mean() ))
## Regress new data onto historical data
p = pm.math.invlogit(a[i_, v_] + centeredX*theta)
conv = pm.Binomial("conv", N_, p, observed=c_)
trace = pm.sample(target_accept=0.95, tune=2000, return_inferencedata=True)

``````

Side question, could someone sanity check the centering (mean-zeroing) on this model? For hierarchical models should I subtract the global mean from each context or the context mean from each context?

The posteriors of `mu0` and `a0` will definitely reflect `c_` because `conv` depends on `p`, which depends on `centeredX` which depends on `a0` (which depends on `mu0`). I don’t quite understand what your model is doing, so I can’t necessarily make recommendation about how to revise it.

As a side note, you have defined `conv` twice, which will prevent pymc3 from building this particular model.

I was trying to make a hierarchical extension to CUPED (for variance reduction in experimentation) to reduce variance in the global parameter mu.

That being said I just realized this now (it feels like it was insanely obvious and I looked straight past it), priors would have accomplished the same variance reduction so why bother with this complicated model above

Anyway, still an interesting question - Is there a world where I can force a model to fit things sequentially, or would I have to have separate model contexts?

If you are looking to estimate a parameter (or more than one) on one data set and use the estimate (posterior) in a subsequent analysis, you might want to check out this notebook and the `from_posterior()` function specifically. That function takes posterior samples and converts them into a distribution that can then be used in a subsequent model. I’m not super confident that this is your best bet, but it might point you in a useful directly.

1 Like

@cluhmann This is fantastic thank you!

1 Like