Hi,
I’m experimenting with a hierarchical log-reg model to evaluate the differences between various pooling schemes. I’m currently trying to understand the result from the no-pooling setup where each group is modelled independently.
I’m getting unexpected results, however, that seem to depend on whether multiple groups are modelled in parallel in the same pm.Model
or not, and not whether the there is pooling, shared priors, or some other form of information sharing between the group models. Something I wouldn’t expect to be the case.
The data is very simple, there’s just one binary variable/feature called “exposure”.
To demonstrate the issue causing confusion, I’ve coded two models. One is a single model that takes as input a dataframe containing only one group:
def create_single_model(X, y):
with pm.Model() as model:
#define data
regressors = pm.ConstantData("regressors", X)
approval = pm.ConstantData("approval", y)
#define priors
intercept = pm.Normal('intercept', mu=0, sigma=5)
betas = pm.Normal('betas', mu=0, sigma=5, shape=X.shape[1])
# calc logit
logit = intercept + (regressors * betas).sum(axis=-1)
#get probabilities
p_i = pm.Deterministic('p_i', pm.math.invlogit(logit))
#define the bernoulli likelihood
y_observed = pm.Bernoulli('y_obs', p=p_i, observed=approval)
trace = pm.sample(1000, tune=1000, cores=2, nuts=dict(max_treedepth=15, target_accept=.97)
)
return dict(
model = model,
trace = trace
)
The other model iterates over the groups to create N
independent models that are sampled from together.
def create_no_pooling_iter_model(data, model_features, samples=1000, burn_in=1000, target_accept=0.9, n_chains=2):
group_idxs, groups = pd.factorize(data.group)
n_features = len(model_features)
coords = {
"group": groups,
"group_id": np.arange(len(group_idxs)),
"feature_id" : np.arange(len(model_features)),
}
with pm.Model(coords=coords) as model:
intercepts = {}
betas = {}
logits = []
targets = []
for name, group in data.groupby('group'):
grp_data = pm.ConstantData(name, group[model_features])
targets.append( group.target )
intercepts[name] = pm.Normal(f'intercept_{name}', mu=0, sigma=5)
betas[name] = pm.Normal(f'betas_{name}', mu=0, sigma=5, shape=n_features)
ind_logit = intercepts[name] + (grp_data * betas[name]).sum(axis=-1)
# calc logit
logits.append( ind_logit )
logit = aesara.tensor.concatenate(logits)
target = pm.ConstantData("target", pd.concat(targets) )
#get probabilities
p_i = pm.Deterministic('p_i', pm.math.invlogit(logit))
#define the bernoulli likelihood
y_observed = pm.Bernoulli('y_obs', p=p_i, observed=target)
trace = pm.sample(samples, tune=burn_in, cores=n_chains, nuts=dict(max_treedepth=15, target_accept=.97))
return dict(
model = model,
trace = trace
)
Here I’ve just grouped by group and exposure variable showing the mean of the summary of the traces of p_i
and the actual observed variable called target.
When running the first model on group_1
, we get the following results from the p_i
probability variable. The model is essentially just matching the data, as should be expected, given the simplicity of the conditional distribution of the target given the exposure.
When I run the “parallel” model, however, I get the following results:
If we look at the results for group 1, we see that the mean posterior probability of the target given the exposure is no where near the actual observed probability of the target given the exposure as it was in the previous model.
I don’t understand why this is. Given that the intercepts and coefficients for the models for each group are completely independent, I would expect these results to mirror the first model fairly closely.
What’s going on here?