Improving sampling / parameterization of complex mixture model

This post is related to the issue I was experiencing described in this post.

That specific problem has been resolved and I’ve been able to build a model that outputs posteriors that (kind of) make sense.

For the model below, I’m not including the steps for coming up with the parameters I specify. Here are the steps for that process:

  1. Use this dataset to identify priors, which is roughly trimodal:
  2. Divide up the data into 3 buckets based on (somewhat arbitrary) thresholds.
  3. Use the fitter package to identify the distribution and relevant parameter values that best describes the data based on SSE.

There are problems with the model below, even after setting the target_accept rate to 0.97 and the max_treedepth to 40. I still get divergences, the r_hat is above 1 and the ess values are below 100. The traceplots look very messed up.

import numpy as np
import matplotlib.pyplot as plt
import pymc as pm

def dist_beta(a, b, loc, scale,size):
    return pm.Beta.dist(a, b, size=size) * scale + loc

def logp_beta(x, a, b, loc, scale):
    return pm.logp(dist_beta(a, b, loc, scale, size=x.shape), x)

sd_mult = 0.01

data = np.array([2.02, 2.03, 2.04, 2.13, 2.17, 2.17])

with pm.Model() as m:
    
    dist1_a = pm.Normal('dist1_a', 1.63, 1.63 * sd_mult )
    dist1_b = pm.Normal('dist1_b', 0.92, 0.92 * sd_mult )
    dist1_loc = pm.Normal('dist1_loc', 2.00, 2.00 * sd_mult )
    dist1_scale = pm.Normal('dist1_scale', 0.13, 0.13 * sd_mult )
    
    dist2_a = pm.Normal('dist2_a', 0.94, 0.94 * sd_mult )
    dist2_b = pm.Normal('dist2_b', 0.91, 0.91 * sd_mult )
    dist2_loc = pm.Normal('dist2_loc', 2.14, 2.14 * sd_mult )
    dist2_scale = pm.Normal('dist2_scale', 0.10, 0.10 * sd_mult )

    logistic_mu = pm.Normal('logistic_mu', 2.29, 2.29 * sd_mult)
    logistic_scale = pm.Normal('logistic_sale', 0.01, 0.01 * sd_mult)

    beta_left = pm.CustomDist.dist(
        dist1_a,
        dist1_b,
        dist1_loc,
        dist1_scale,
        dist=dist_beta,
        logp = logp_beta,
        class_name='beta_left'
    )
    
    beta_middle = pm.CustomDist.dist(
        dist2_a,
        dist2_b,
        dist2_loc,
        dist2_scale,
        dist=dist_beta,
        logp = logp_beta,
        class_name='beta_middle'
    )

    log = pm.Logistic.dist(logistic_mu, logistic_scale)

    w = pm.Dirichlet('w', np.ones(3))

    resp = pm.Mixture('resp', w = w, comp_dists=[beta_left, beta_middle, log], observed=ann_ins)

    trace = pm.sample(draws=2000, step=[pm.NUTS(target_accept=0.97,max_treedepth=40)])

az.plot_trace(trace)

My questions:

  1. What can I do to improve this model? I understand re-parameterization is an option. Thing is that the Mixture model is composed of 2 non-standard beta distributions and 1 logistic, so I’d have to figure out how to reparametarize up to 10 parameters (4+4+2). I’ve seen this post, but not following where @chartl gets the gamma distribution and the hardcoded values he uses for defining the sigma with the gamma.
    Bigger picture, not sure whether it’s this or another path I need to take.
  2. Where is Pymc getting these extreme values for the posterior? Neither the priors nor the observed data have values below 1.75 or above 2.75.
    I realize it may possibly become a moot point if I improve the model, but would still ideally like to understand.
with m:
    prior = pm.sample_prior_predictive()
    post_pred = pm.sample_posterior_predictive(trace)

plt.hist(post_pred)
plt.title('Posterior distribution')

Interestingly enough, if I feed in the observed data divided by 1e6, then the model builds exceptionally fast (3 mins versus 35 minutes) with no divergences, and good ess and rhat values. The posterior predictions look sensible, too. Seems really strange to me.

For reference, this is how the prior predictive distribution looks to verify that that it’s close enough to the original distribution used to model the priors: