How to improve sampling or re-parametrize model

kwolff · July 7, 2023, 2:10pm

Hi all,

This is my first time posting here, so please forgive me, if the editing is off…

I think I need some help with either re-parametrizing my model or otherwise better understanding sampling, as right now my chains fail to converge.

My model is like this:

or in code:

# Data: 500 parts with 10 failures
nparts = 500
nfail = 10
# observed failures (only sum will be important but I want the coordinates)
y = np.concatenate((np.repeat(1,nfail), np.repeat(0,nparts - nfail)))
# bundle data into dataframe
data = pd.DataFrame({"y": y})


coords = {"parts": data.index.values}

with pm.Model(coords=coords) as stress_distribution_model:
    # priors, further re-parametrization necessary...
    sigma_stress = pm.TruncatedNormal("sigma_stress", mu=1, sigma=2, lower=0.001, upper=10)
    q90_stress = pm.TruncatedNormal("q90_stress", mu=1, sigma=2, lower=0.1, upper=2)
    mu_stress = pm.Deterministic("mu_stress", np.log(q90_stress) - 1.64*sigma_stress)
    # lognormal stress distribution
    stress = pm.LogNormal("stress", mu = mu_stress, sigma = sigma_stress, dims="parts")
    # pfailure is given by invlogit
    x = -5 + 4 * stress
    pfailure = pm.Deterministic("pfailure", pm.math.invlogit(x), dims="parts")
    # likelihood
    failure = pm.Bernoulli("failure", p=pfailure, dims="parts")
    nfailure = pm.Deterministic("nfailure", pm.math.sum(failure))
    # observe only number of failures, not individual parts
    # pm.Potential and pm.DiracDelta do not accept "observed" parameter
    # therefore use normal distribution with small sigma (nfailure is actually discrete)
    pm.Normal("observed", mu = nfailure, sigma = 0.001, observed = nfail)

I already did some re-parametrization using q90_stress instead of mu_stress as top-level variable and before sampling was even worse. My priors look like this

and posteriors like this

Inspection of the traces and energy plots clearly shows lack of convergence:

Is there a way to cleverly re-parametrize my model to improve sampling? Or should I change how the sampler does things? Currently I’m running the sampler pretty much with default settings and it chooses to do

CompoundStep
>NUTS: [sigma_stress, q90_stress, stress]
>BinaryGibbsMetropolis: [failure]

If (instead of using pymc) I implement a dumb “shotgun” method to draw 10000 samples from the prior distribution and compute the likelihood myself I get posteriors like this

cluhmann · July 7, 2023, 4:14pm

Welcome!

Just reviewing real quick, it looks like you have count data? If so, I am unclear why you have a Bernoulli likelihood, but then sum the expected number of failures and pass that along to a Normally distributed parameter. Why not just model the failure counts as a binomial?

kwolff · July 7, 2023, 8:36pm

Hi,

Yes, I have 10 failures in 500 parts, but those devices don’t see the same stress and therefore don’t have the same failure probabilities. I’m trying to estimate the environmental stress distribution from the number of failures but it would be too restricting to assume that the stress is the same on all of the devices. The last bit with the normal distribution is just a crutch because I didn’t know how else to compare predicted nfailures with observed nfailures. It would also be good if I could get rid of that, but I’m not sure if that will fix the sampling problem.

cluhmann · July 7, 2023, 8:57pm

Instead of comparing the number of failures, the observed data can be “compared” to the Bernoulli-distributed failure:

failure = pm.Bernoulli("failure", p=pfailure, dims="parts", observed=y)

kwolff · July 8, 2023, 5:43am

Thanks for the suggestion! I did try as you mentioned and now get posteriors like this:

I’d say they look better now, or at least closer to what the “shotgun” method produced. Unfortunately traces and energy_plot are still worrying:

(The reason why I did the sum of failures in my original model, was that I didn’t want to “force” particular parts to be the ones that fail. But now I believe you’re right and “forcing” a subset to be the failures without setting that subset apart otherwise doesn’t change the idea behind the model.)

jessegrabowski · July 8, 2023, 7:27am

Models with linear combinations of random variables are going to have identification problems, because there are an infinite combination of q90_stress and sigma_stress that can produce a given value of mu_stress. This blog post by Michael Betancourt has more information than you’d ever like to know on the subject. This is a hopeless situation in frequentist modeling, but in the Bayesian framework you can use priors to pin down a subset of the probability space to focus on, which can sometimes resolve the situation.

My go-to diagnostic in these situations, after seeing the awful trace and energy plot, is to check the pair plots. Here’s what I got when I ran your model (with the modification suggested by @cluhmann):

As you can see, all of the divergences occur when sigma_stress is sufficiently small. You can also kind of see an elliptical cloud between mu_stress of -2 and -4 on the lower right plot, with a long thin tail after that. I might think about focusing on this region, since you have a “degree of freedom” to decide which linear combinations of the variables to consider. Seems like that maps to sigma_stress between around 0.5 and 2, and you might even have some domain knowledge that makes those values more “reasonable”.

Some other assorted thoughts:

You don’t actually care about q90_stress, the value you use in your model is log(q90_stress). It might make sense to directly model the log value on \mathbb R, then exp it after modeling if you want to reason about it.
Normal distributions, including truncated and half, have a lot of probability mass right at zero. In cases like yours, where small values cause problems, I like to use Gamma(2, b), because it has no mass at zero and a mode at 1/b.

Anyway you can try tinkering with stuff. It might also be worth trying pm.sample_smc, which is a non-gradient sampler that progressively refines an estimated posterior via importance sampling. It’s true that NUTS is SOTA, but if you have a really degenerate posterior geometry (i.e. a straight line because of the linear dependence between variables) it might be too hard to do the Hamiltonian simulation NUTS needs, and SMC is a good second best.

The best thing, though, would be to get some more information, like some features of the devices, that would help you identify the linear function at the heart of your model.

kwolff · July 8, 2023, 8:29am

Thank you! I will try to digest the input and try out your suggestions. Regarding more “reasonable” parameter regions – the lognormal distribution for stress shouldn’t be too localized. Maybe I can make use of that.

Topic		Replies	Views
SamplingError("Bad initial energy") Questions	2	495	May 26, 2020
There were 6 divergences after tuning. Increase `target_accept` or reparameterize Questions	5	4240	August 19, 2021
Gaining intuitions for why sampling fails Questions	2	811	December 5, 2019
Model specification problem? Questions	2	1491	February 2, 2018
Reparameterization of State-Space / Structural Time Series models Questions	2	1136	May 17, 2018

How to improve sampling or re-parametrize model

Related topics