Difficulties with a discontinuous posterior

I am seeking to develop a hierarchical model to describe steel wire corrosion / degradation.

We have categorical data (Green/Orange/Red) obtained via a non-destructive testing (NDT) technique that seeks to identify wires that are corroded and/or cracked using ultrasound. We have results data for each wire across multiple tests, conducted over several years. We see clustering in corrosion at wires that are physically close to each other / forming the same structural components, which suggests a hierarchical model would be appropriate.

In addition we have a informative prior concerning the properties of the NDT test, e.g. how often it correctly identifies a cracked wire, or misclassifies. I model these properties via 3x3 conditional probabilities P(U|I) where U is the test result (G/O/R) and I the true status (G/O/R) at the time of the test. To encapsulate the logic that the true status of a wire can only degrade with time, I propose a wire degradation model defined via parameters tC (time at onset of corrosion) and tR (time at cracking/rupture) for each wire, with the logic that I=G if t<tC (no corrosion yet), I=O if tC<t<tR, and I=R if tR<t (i.e. true broken wire), where t is time e.g. the time of each test date.

I have a proof of concept model for a single wire that samples efficiently using Metropolis:

import pymc as pm
import numpy as np
from pytensor.tensor import stack, outer, switch, le

test_cats = ['G','O','R'] # possible test outcomes

# test results
tm_vals = [20.,25.] # wire age [years] at each test date
U_results = ['G','O'] # outcomes for a specific wire
U_codes = [test_cats.index(x) for x in U_results]

# test properties; conditional probabilities
# note these sum to unity as required
p_U_given_IG = np.array([0.90299401, 0.09101796, 0.00598803]) # P(U|I=G)
p_U_given_IO = np.array([0.78832117, 0.16058394, 0.05109489]) # P(U|I=O)
p_U_given_IR = np.array([0.1 , 0.42, 0.48]) # P(U|I=R)

with pm.Model() as model:
    
    model.add_coords({'U_cats':test_cats,
                      'I_cats':['G','O','R'],
                      't':tm_vals,
                      })
    
    # ---------- define priors
    
    # time at onset of corrosion [years]
    tC = pm.Triangular('tC',lower=0.,upper=50.,c=40.)
    
    # time from onset of corrosion to rupture [years]
    ΔtCR = pm.Triangular('ΔtCR',lower=0.,upper=10.,c=5.)
    
    # time at rupture [years]
    tR = pm.Deterministic('tR',tC+ΔtCR)
    
    # ---------- define data -------------------------------------------
    kw = dict(mutable=False)
    
     # t [years] of each test
    tm = pm.Data('t_m',tm_vals,dims=['t'],**kw)
    
    # encoded results from each test
    Uw = pm.Data('U_codes',U_codes,dims=['t'],**kw)
    
    # probabilities P(U|I)
    p_G = pm.Data('P(U|I=G)',p_U_given_IG,dims=['U_cats'],**kw)
    p_O = pm.Data('P(U|I=O)',p_U_given_IO,dims=['U_cats'],**kw)
    p_R = pm.Data('P(U|I=R)',p_U_given_IR,dims=['U_cats'],**kw)
    p = pm.Deterministic('P(U|I)',stack([p_G,p_O,p_R]),
                         dims=['I_cats','U_cats'])
    
    # # ---------- define likelihood --------------
    
    # get interpolated misclassification probabilities for each measurement
    tm_tC = outer(tm,1/tC)
    tm_tR = outer(tm,1/tR)
    I_code = pm.Deterministic('I_codes',
                              switch(le(tm_tC,1.),0,
                                     switch(le(tm_tR,1.),1,2))[:,0],
                              dims=['t'])
    
    # # likelihood of observed test category results
    Uw_code = pm.Categorical('U',
                             p=p[I_code,:],
                             observed=Uw,
                             dims=['t']
                             )

giving this model structure:

As mentioned above, I find I get an accurate posterior (despite the warnings; I’ve benchmarked versus exact solution obtained via grid integration over the domain, which is possible since just 2 RVs) when using Metropolis, without any special configuration, e.g.:

with model:
    idata = pm.sample(cores=1,chains=8,step=pm.Metropolis())

In contrast NUTS takes ages and doesn’t perform as well. This I would suppose is due to the abrupt cliffs in the likelihood function arising from the switch statement; the posterior has discontinuities across these and ill-defined gradient.

What I want to do though is use this as a sub-model within a larger hierarchical model that 1) uses data from all the wires I have data on (which is lots) and 2) models (tC,tR) for a given wire as being from a wider parent distribution across all wires (defined via group hyperparameters). Indeed there may be an additional hierarchy above that too, accounting for the physical closeness of wires (I plan to try different modelling approaches, starting with the easier one-level hierarchy across all wires). In its simplest form, this would lead to there being free RVs above tC and ΔtCR in the diagram above, defining the parameters of the parent distributions for these. E.g. mu and sigma parameters for a truncated Normal distribution defining tC for each wire.

I’m nervous though as I will end up with lots of RVs (80000) when I include all wires (40000); effectively a (tC,tR) pair for each wire. My dataset has 1 to 4 test results for each wire, so isn’t that huge for each wire. Even though these will be related via their parent distribution, I’m concerned that Metropolis may not prove efficient. Indeed my initial experiments in this regard don’t seem promising.

I’ve been reflecting on whether it might be possible to do use CompoundStep, to use Metropolis sampling for each of the (tC,tR) RVs, but NUTS (or other recommended efficient sampler) for higher level RVs. I’ve read this article Compound Steps in Sampling — PyMC example gallery, but am not so familiar with this approach. Before embarking on this though I hoped to ask advice from others in this community who might have similar experience?

One other avenue I might be able to pursue would be to eliminate the abrupt discontinuities from the likelihood function via using a continuous function to describe how P(U|I) conditional probabilities vary with time t, relative to tC and tR. Would this then likely give improved sampling efficiency do you think?

Sorry for the long post - and I hope it makes sense. I’d really appreciate any thoughts/tips that anyone has on this matter.

1 Like

The Triangular priors are probably not helping either. They also introduce a discontinuity. Can’t you use a Gamma/Normal, possibly truncated?

Shouldn’t your model have a concept of time? Once it’s corroded it can’t become uncorroded (I assume?). You could try to model it as a hidden markov model, with transition probabilities proportional to the elapsed time between tests?

The transition matrix would be triangular to reflect you can’t return to a previous better state

Edit: I see your model has this concept as your wrote clearly. Sounds to me you’ll need to integrate those two variables to arrive at a state probability at the time of test. The markov model was one way to do this.

PS: This is a very interesting problem.

1 Like

Thanks for these helpful ideas. I did consider a hidden Markov model, but the issue I couldn’t convince myself about was whether the Markov property (i.e. independence of transition times between states) would apply in my case. The approach I’ve implemented allows the time from corrosion onset to rupture to be non-independent from the time to corrosion onset (e.g. to reflect the physics that some locations may see more corrosive environmental conditions).

At any rate my model is actually working nicely now. I made a refinement to interpolate the likelihood function such that this no longer has abrupt “cliffs” (i.e. no more switch statements) which has made posterior sampling much more efficient.

The other thing I realized was that wires having the exact same priors and data (hence likelihood) will of course have identical posterior distributions. Hence rather than modelling all wires as separate RVs (leading to a big model with >50000 RVs) I just needed to have separate RVs for wires having distinct histories of test results (which ends up being more like 400 - much more manageable).

Thanks again for the discussion, always appreciated!

1 Like