Divergences in varying intercepts hierarchical model

john_c · November 16, 2020, 6:23pm

I have a small dataset that’s relatively simple, but fitting a partial pooling model is running into a lot of divergences, even when trying to include non-centering (pymc3 version = 3.8).

I really want to leverage partial pooling since I have more features to include (but I wanted to start simple with just 3 features since I had divergences).

def hierarchical_normal(name, shape, μ=0., sig=1., centered=False):

if not centered:
    Δ = pm.Normal('Δ_{}'.format(name), 0., 1., shape=shape)
    σ = pm.HalfCauchy('σ_{}'.format(name), sig)
    return pm.Deterministic(name, μ + Δ * σ)

else:
    mu = pm.Normal('μ_{}'.format(name), μ, 1)
    σ = pm.HalfCauchy('σ_{}'.format(name), sig)
    return pm.Normal('α_{}'.format(name), mu, σ, shape=shape)

with pm.Model() as model1:
    β0 = pm.Normal('β0', 0., 1., testval=df.yes.sum()/df['N'].sum())
    α_age = hierarchical_normal('age', n_age, sig=3, centered=False)
    α_income = hierarchical_normal('income', n_income, sig=3, centered=False)
    α_gender = hierarchical_normal('gender', n_gender, sig=3, centered=False)


    η = β0 \
        + α_age[age_]\
        + α_income[income_]\
        + α_gender[gender_]

    p = pm.Deterministic('p', pm.math.invlogit(η))
    obs = pm.Binomial('obs', df.N.values, p, observed=df.yes.values)

with model1:
    trace = pm.sample(
        init='advi', 
        draws=1000, 
        tune=5000, 
        target_accept=0.95,
        random_seed=99
    )

I’m surprised to see it fitting so poorly since its such a simple dataset.

trace plots below

And full code attached as a .py file

test.py (5.8 KB)

john_c · November 17, 2020, 1:56pm

I attempted to dig a little further and attempt to specify some correlation between income-age and income-gender ( ) to hopefully help the model fit a little better and it turned out to be a much worse fit.

In some sense, I get it, because I now have 235 params for a 60 entry dataset - but on the other hand, Its aggregated data so its not really 60 entries its more like 205,000 observations - shouldn’t this not be a big deal?

with pm.Model() as model3:

# fixed priors
g = pm.Normal("g", mu=0.0, sd=1.0, shape=n_income)
sd_dist = pm.Exponential.dist(1.0)
chol_age, _, _ = pm.LKJCholeskyCov(
    "chol_age", n=n_income, eta=4, sd_dist=sd_dist, compute_corr=True
)
chol_gender, _, _ = pm.LKJCholeskyCov(
    "chol_gender", n=n_income, eta=4, sd_dist=sd_dist, compute_corr=True
)

# adaptive priors centered
#     α_age = pm.MvNormal("α_age", mu=0.0, chol=chol_income, shape=(n_age, n_income))
#     α_gender = pm.MvNormal("α_gender", mu=0.0, chol=chol_gender, shape=(n_gender, n_income))

# adaptive priors non-centered
z_age = pm.Normal("z_age", 0.0, 1.0, shape=(n_income, n_age))
α_age = pm.Deterministic("α_age", pm.math.dot(chol_age, z_age) )
z_gender = pm.Normal("z_gender", 0.0, 1.0, shape=(n_income, n_gender))
α_gender = pm.Deterministic("α_gender", pm.math.dot(chol_gender, z_gender) )

# linear equation
η = g[income_] + α_age[income_, age_] + α_gender[income_, gender_]
p = pm.Deterministic('p', pm.math.invlogit(η) )

# Fit
obs = pm.Binomial('obs', df.N.values, p, observed=df.conversion.values)

trace3 = pm.sample(
    init='advi', 
    draws=1000, 
    tune=1000, 
    target_accept=0.8,
    random_seed=99)

jhrcook · November 17, 2020, 9:16pm

Hopefully someone with more experience/understanding of Bayesian modeling will step-in, but I bet the problem is that your covariates are highly correlated. In particular, I would think age and income are very strongly positively correlated. Does it help if you try to remove one or add more informative priors? Perhaps, try plotting scatter plots of the posterior samples from the MCMC - that should indicate quite clearly if there is high correlation between your parameters.

bayes · November 18, 2020, 6:12am

Just wondering why you have such a large burn in relative to draws, why you use advi rather than default, and why you increased target so much? (I assume the last is because of divergences?)

john_c · November 23, 2020, 1:38pm

It Turns out dropping the gender income interaction and instead encoding gender as a fixed effect, along with 10000 samples and target_accept=0.95 lead me to no divergences! The model still fits a little slower than I’d like but the posterior predictive checks look somewhat reasonable. I’m still curious if there’s a better model specification however

@bayes The large burn in is to make sure the parameters space is properly explored first, and I’ve noticed advi initialization helps avoid chain failures and bad initialization when I try models with only varying intercepts

Topic		Replies	Views
Modeling two varying intercepts Questions	6	857	October 29, 2020
There were 12 divergences after tuning. Increase `target_accept` or reparameterize Questions	3	2678	August 26, 2021
Hierarchical logistic regression giving non-sensible results? Do I formulate it correctly? Questions	15	846	August 4, 2020
Convergence with noncentered hierarchical model Questions	6	799	July 24, 2021
Random Intercepts and Slopes (Correlated) Convergence Issues Questions	16	2006	March 11, 2019

Divergences in varying intercepts hierarchical model

Related topics