# Understanding params in partially pooled model

Hi,

I am working on an randomized control trial model to determine the estimates for trial/control and a distribution (trial/control email sent on same date). I would expect the distribution beta estimates to be 0 instead of ~0.1, as the testing data controls for the distribution effect. It seems that some of the explanatory power of the trial/control betas is leaking into the distribution. I know this is caused by partial pooling between the trial/control and distribution betas (e.g. trial and control group being used a single distribution results in some of the trial effect leaking into the distribution estimate).

Is there a recommended approach to modeling that would avoid this leakage?

See below for the testing data, model code, and trace summary. Thank you for any feedback!

``````Data Example:
trial_control  distribution  sent  opens
0              0             1   100     20
1              1             1   500    150
2              0             2  1000    200
3              1             2  5000   1500
4              0             3  10000  2000
5              1             3  50000  15000
``````
``````# Model code
data = pd.DataFrame({'trial_control': [0, 1, 0, 1, 0, 1],
'distribution': [0, 0, 1, 1, 2, 2],
'sent': [100, 500, 1000, 5000, 10000, 50000],
'opens': [20, 150, 200, 1500, 2000, 15000]})

# Set number of distributions and trial/control and distribution indexes
tc_idx = data['trial_control'].values
n_tc = data['trial_control'].nunique()
dist_idx = data['distribution'].values
n_dist = data['distribution'].nunique()

# Priors
base_open_rate_mu = -np.log(4)  # logistic(-np.log(4)) = 20%
base_open_rate_sigma = 0.125
trial_control_mu = 0
trial_control_sigma = 0.5
distribution_mu = 0
distribution_sigma = 0.5

def logistic(x):
return 1 / (1 + np.exp(-1 * x))

with pm.Model() as open_model:
# Intercept for logisitc fn (baseline open rate: 0.2 = logisitc(-ln(4))).
b0 = pm.Normal('b0', base_open_rate_mu, base_open_rate_sigma)

tc_beta = pm.Normal('trial_control_beta', trial_control_mu, trial_control_sigma, shape=n_tc)
dist_beta = pm.Normal('distribution_beta', distribution_mu, distribution_sigma, shape=n_dist)

# Logistic (outputs P(open))
theta = pm.Deterministic('theta', logistic(b0 + tc_beta[tc_idx] + dist_beta[dist_idx]))

# Data likelihood
likelihood_opens = pm.Binomial('obs', p=theta, n=data['sent'].values, observed=data['opens'].values)

# Inference to approximate the posterior distribution
with open_model:
trace = pm.sample(draws=2000, tune=1000, init='auto')

print(pm.summary(trace))
pm.traceplot(trace)

``````
``````# Results
mean        sd  mc_error   hpd_2.5  hpd_97.5  \
b0                    -1.369596  0.120983  0.002495 -1.604596 -1.133555
trial_control_beta__0 -0.122449  0.237752  0.005715 -0.599465  0.325807
trial_control_beta__1  0.416473  0.236954  0.005724 -0.060768  0.861267
distribution_beta__0   0.102000  0.234017  0.005448 -0.342699  0.575411
distribution_beta__1   0.105062  0.226738  0.005434 -0.321203  0.581093
distribution_beta__2   0.105482  0.225409  0.005391 -0.346519  0.549681
theta__0               0.199793  0.014457  0.000144  0.172692  0.228046
theta__1               0.299519  0.018342  0.000197  0.265566  0.336318
theta__2               0.199953  0.005813  0.000063  0.189028  0.211798
theta__3               0.299875  0.006056  0.000067  0.288166  0.311875
theta__4               0.199985  0.003849  0.000042  0.192620  0.207582
theta__5               0.299932  0.002052  0.000020  0.295843  0.303866

n_eff      Rhat
b0                     2735.656425  0.999767
trial_control_beta__0  1840.009249  0.999842
trial_control_beta__1  1832.556874  0.999840
distribution_beta__0   1961.204705  0.999970
distribution_beta__1   1834.922021  0.999880
distribution_beta__2   1829.021279  0.999896
theta__0               7264.240538  0.999848
theta__1               7027.712167  0.999860
theta__2               7794.364056  0.999833
theta__3               7843.006894  0.999815
theta__4               8801.569805  0.999757
theta__5               8131.675057  0.999869

``````

I don’t think this is related to `trial_control` at all, but to the normal prior on the intercept. The problem goes away if I replace that with a `b0 = pm.Flat('b0')`.

2 Likes

Thank you! That worked perfectly.