Improving sampling / parameterization of complex mixture model

matsuo_basho · May 25, 2023, 5:43pm

This post is related to the issue I was experiencing described in this post.

That specific problem has been resolved and I’ve been able to build a model that outputs posteriors that (kind of) make sense.

For the model below, I’m not including the steps for coming up with the parameters I specify. Here are the steps for that process:

Use this dataset to identify priors, which is roughly trimodal:

image3053×692 52 KB
Divide up the data into 3 buckets based on (somewhat arbitrary) thresholds.
Use the fitter package to identify the distribution and relevant parameter values that best describes the data based on SSE.

There are problems with the model below, even after setting the target_accept rate to 0.97 and the max_treedepth to 40. I still get divergences, the r_hat is above 1 and the ess values are below 100. The traceplots look very messed up.

import numpy as np
import matplotlib.pyplot as plt
import pymc as pm

def dist_beta(a, b, loc, scale,size):
    return pm.Beta.dist(a, b, size=size) * scale + loc

def logp_beta(x, a, b, loc, scale):
    return pm.logp(dist_beta(a, b, loc, scale, size=x.shape), x)

sd_mult = 0.01

data = np.array([2.02, 2.03, 2.04, 2.13, 2.17, 2.17])

with pm.Model() as m:
    
    dist1_a = pm.Normal('dist1_a', 1.63, 1.63 * sd_mult )
    dist1_b = pm.Normal('dist1_b', 0.92, 0.92 * sd_mult )
    dist1_loc = pm.Normal('dist1_loc', 2.00, 2.00 * sd_mult )
    dist1_scale = pm.Normal('dist1_scale', 0.13, 0.13 * sd_mult )
    
    dist2_a = pm.Normal('dist2_a', 0.94, 0.94 * sd_mult )
    dist2_b = pm.Normal('dist2_b', 0.91, 0.91 * sd_mult )
    dist2_loc = pm.Normal('dist2_loc', 2.14, 2.14 * sd_mult )
    dist2_scale = pm.Normal('dist2_scale', 0.10, 0.10 * sd_mult )

    logistic_mu = pm.Normal('logistic_mu', 2.29, 2.29 * sd_mult)
    logistic_scale = pm.Normal('logistic_sale', 0.01, 0.01 * sd_mult)

    beta_left = pm.CustomDist.dist(
        dist1_a,
        dist1_b,
        dist1_loc,
        dist1_scale,
        dist=dist_beta,
        logp = logp_beta,
        class_name='beta_left'
    )
    
    beta_middle = pm.CustomDist.dist(
        dist2_a,
        dist2_b,
        dist2_loc,
        dist2_scale,
        dist=dist_beta,
        logp = logp_beta,
        class_name='beta_middle'
    )

    log = pm.Logistic.dist(logistic_mu, logistic_scale)

    w = pm.Dirichlet('w', np.ones(3))

    resp = pm.Mixture('resp', w = w, comp_dists=[beta_left, beta_middle, log], observed=ann_ins)

    trace = pm.sample(draws=2000, step=[pm.NUTS(target_accept=0.97,max_treedepth=40)])

az.plot_trace(trace)

My questions:

What can I do to improve this model? I understand re-parameterization is an option. Thing is that the Mixture model is composed of 2 non-standard beta distributions and 1 logistic, so I’d have to figure out how to reparametarize up to 10 parameters (4+4+2). I’ve seen this post, but not following where @chartl gets the gamma distribution and the hardcoded values he uses for defining the sigma with the gamma.
Bigger picture, not sure whether it’s this or another path I need to take.
Where is Pymc getting these extreme values for the posterior? Neither the priors nor the observed data have values below 1.75 or above 2.75.
I realize it may possibly become a moot point if I improve the model, but would still ideally like to understand.

with m:
    prior = pm.sample_prior_predictive()
    post_pred = pm.sample_posterior_predictive(trace)

plt.hist(post_pred)
plt.title('Posterior distribution')

Interestingly enough, if I feed in the observed data divided by 1e6, then the model builds exceptionally fast (3 mins versus 35 minutes) with no divergences, and good ess and rhat values. The posterior predictions look sensible, too. Seems really strange to me.

For reference, this is how the prior predictive distribution looks to verify that that it’s close enough to the original distribution used to model the priors:

Topic		Replies	Views
Fitting a complex model with a large dataset: chain contains only diverging samples Questions	5	1245	December 12, 2019
Error when trying mixture distribution for likelihood version agnostic modeling	1	338	October 20, 2022
Gausian Mixture Model takes too long to sample v5 modeling	15	627	November 6, 2023
The standard deviation of posterior distribution remains the same compared to prior v5 modeling	16	690	January 21, 2024
Correct posteriors but many divergences v5 modeling	9	347	March 13, 2024

Improving sampling / parameterization of complex mixture model

Related topics