Many Divergences in NUTS Sampler with Beta Binomial Regression

Basically I’m trying to run a Beta-Binomial regression with categorical input features that I factorize & add to coords, and some purely numerical features that are multiplied with beta.

I’ve been fiddling with this model for the past couple of weeks & every time I run it the tuned samples don’t diverge but when the first 500 iterations are done it starts diverging for the majority of the non-tuned samples.

I’ve tried playing around with the NUTS sampler by increasing the target_accept to no avail. I would like to re-parameterize if that will help but I don’t know which part of my code might be giving the issue.

You will find the piece of code below (apologies this is a class method on an object I haven’t included). Can anyone give me a hint where I should look? As input I put the following:

  • self.draws = 500
  • 7x numerical features that are multiplied with beta after being standardized
  • 16x categorical features that are turned into "_alpha"s. The largest cardinality of which is 39 for one feature
# Factorize categorical features
cat_idxs = dict()
for feat in self.cat_feats:
    cat_idxs[feat], cat_vals = pd.factorize(X[feat])
    if len(self.coords) == 0:
        self.coords = {"$obs_id$": np.arange(len(cat_idxs[feat])), "$num_feats$": self.num_feats, feat: cat_vals}
    else:
        self.coords[feat] = cat_vals

# Construct model
with pm.Model(coords=self.coords) as model:

    # Construct Numerical Beta Priors
    beta, x_ = None, None
    if self.num_feats:
        beta = pm.Normal(name="beta", mu=0, sigma=20, dims="$num_feats$")
        # Construct Data Containers
        x_ = pm.Data("X", X[self.num_feats])

    # Construct Categorical Alpha Priors
    cat_alphas = dict()
    for feat, cat_vals in self.coords.items():
        if feat not in ["$obs_id$", "$num_feats$"]:
            cat_idxs[feat] = pm.Data(f"{feat}_idx", cat_idxs[feat], dims="$obs_id$")
            cat_alphas[feat] = pm.Normal(name=f"{feat}_alpha", mu=0, sigma=20, dims=feat)

    # Construct "variance" prior(s)
    # When kappa is large, the beta is less variable
    kappa = pm.HalfNormal("kappa", sigma=20)

    # Construct Deterministic Target Mean
    if not self.cat_feats:
        alpha = pm.Normal(name="alpha", mu=0, sigma=20)
        mu = pymc3_math.dot(x_, beta) + alpha
    else:
        if self.num_feats:
            mu = pymc3_math.dot(x_, beta)
        else:
            mu = 0
        for feat in self.cat_feats:
            # Add the categorical alphas to mu
            mu += cat_alphas[feat][cat_idxs[feat]]

    # Define mean with sigmoid transform (-inf, inf) -> (0, 1)
    mu = pm.Deterministic(name="mu", var=pm.math.invlogit(mu))
    # Pass this sigmoid mean to beta distribution
    ps = pm.Beta('ps', mu*kappa, (1-mu)*kappa, dims="$obs_id$")
    # Pass beta to binomial likelihood w observations
    _ = pm.Binomial('target', n=10, p=ps, observed=y)

    logging.info("Start Sampling...")
    step = pm.NUTS(
        target_accept=.95
    )
    self.trace = pm.sample(draws=self.draws, step=step, tune=self.draws, n_init=200_000)
    self.burned_trace = self.trace[int(0.2*self.draws):]

This is a bit difficult to debug without something to reproduce locally. Can you share something that I could run and shows this behavior?

Anyway, from what I’m seeing currently:

    # Pass this sigmoid mean to beta distribution
    ps = pm.Beta('ps', mu*kappa, (1-mu)*kappa, dims="$obs_id$")
    # Pass beta to binomial likelihood w observations
    _ = pm.Binomial('target', n=10, p=ps, observed=y)

You can use a pm.BetaBinomial here instead of those two separate distributions, that should at least speed things up, not sure if it has anything to do with your problem though.

Also, just trace = pm.sample(target_accept=0.95) should be fine. You also don’t need n_init or manually remove burn-in, that is already done automatically (the tuning steps).

    logging.info("Start Sampling...")
    step = pm.NUTS(
        target_accept=.95
    )
    self.trace = pm.sample(draws=self.draws, step=step, tune=self.draws, n_init=200_000)
    self.burned_trace = self.trace[int(0.2*self.draws):]
1 Like

Another pro tip: You can use the mean/sd parameterization of the beta distribution directly in PyMC; no need to transform to shape parameters. In your case,

ps = pm.Beta('ps',  mu=mu, sigma=kappa, dims="$obs_id$")```
1 Like