Difficulties with a discontinuous posterior

I am seeking to develop a hierarchical model to describe steel wire corrosion / degradation.

We have categorical data (Green/Orange/Red) obtained via a non-destructive testing (NDT) technique that seeks to identify wires that are corroded and/or cracked using ultrasound. We have results data for each wire across multiple tests, conducted over several years. We see clustering in corrosion at wires that are physically close to each other / forming the same structural components, which suggests a hierarchical model would be appropriate.

In addition we have a informative prior concerning the properties of the NDT test, e.g. how often it correctly identifies a cracked wire, or misclassifies. I model these properties via 3x3 conditional probabilities P(U|I) where U is the test result (G/O/R) and I the true status (G/O/R) at the time of the test. To encapsulate the logic that the true status of a wire can only degrade with time, I propose a wire degradation model defined via parameters tC (time at onset of corrosion) and tR (time at cracking/rupture) for each wire, with the logic that I=G if t<tC (no corrosion yet), I=O if tC<t<tR, and I=R if tR<t (i.e. true broken wire), where t is time e.g. the time of each test date.

I have a proof of concept model for a single wire that samples efficiently using Metropolis:

import pymc as pm
import numpy as np
from pytensor.tensor import stack, outer, switch, le

test_cats = ['G','O','R'] # possible test outcomes

# test results
tm_vals = [20.,25.] # wire age [years] at each test date
U_results = ['G','O'] # outcomes for a specific wire
U_codes = [test_cats.index(x) for x in U_results]

# test properties; conditional probabilities
# note these sum to unity as required
p_U_given_IG = np.array([0.90299401, 0.09101796, 0.00598803]) # P(U|I=G)
p_U_given_IO = np.array([0.78832117, 0.16058394, 0.05109489]) # P(U|I=O)
p_U_given_IR = np.array([0.1 , 0.42, 0.48]) # P(U|I=R)

with pm.Model() as model:
    
    model.add_coords({'U_cats':test_cats,
                      'I_cats':['G','O','R'],
                      't':tm_vals,
                      })
    
    # ---------- define priors
    
    # time at onset of corrosion [years]
    tC = pm.Triangular('tC',lower=0.,upper=50.,c=40.)
    
    # time from onset of corrosion to rupture [years]
    ΔtCR = pm.Triangular('ΔtCR',lower=0.,upper=10.,c=5.)
    
    # time at rupture [years]
    tR = pm.Deterministic('tR',tC+ΔtCR)
    
    # ---------- define data -------------------------------------------
    kw = dict(mutable=False)
    
     # t [years] of each test
    tm = pm.Data('t_m',tm_vals,dims=['t'],**kw)
    
    # encoded results from each test
    Uw = pm.Data('U_codes',U_codes,dims=['t'],**kw)
    
    # probabilities P(U|I)
    p_G = pm.Data('P(U|I=G)',p_U_given_IG,dims=['U_cats'],**kw)
    p_O = pm.Data('P(U|I=O)',p_U_given_IO,dims=['U_cats'],**kw)
    p_R = pm.Data('P(U|I=R)',p_U_given_IR,dims=['U_cats'],**kw)
    p = pm.Deterministic('P(U|I)',stack([p_G,p_O,p_R]),
                         dims=['I_cats','U_cats'])
    
    # # ---------- define likelihood --------------
    
    # get interpolated misclassification probabilities for each measurement
    tm_tC = outer(tm,1/tC)
    tm_tR = outer(tm,1/tR)
    I_code = pm.Deterministic('I_codes',
                              switch(le(tm_tC,1.),0,
                                     switch(le(tm_tR,1.),1,2))[:,0],
                              dims=['t'])
    
    # # likelihood of observed test category results
    Uw_code = pm.Categorical('U',
                             p=p[I_code,:],
                             observed=Uw,
                             dims=['t']
                             )

giving this model structure:

As mentioned above, I find I get an accurate posterior (despite the warnings; I’ve benchmarked versus exact solution obtained via grid integration over the domain, which is possible since just 2 RVs) when using Metropolis, without any special configuration, e.g.:

with model:
    idata = pm.sample(cores=1,chains=8,step=pm.Metropolis())

In contrast NUTS takes ages and doesn’t perform as well. This I would suppose is due to the abrupt cliffs in the likelihood function arising from the switch statement; the posterior has discontinuities across these and ill-defined gradient.

What I want to do though is use this as a sub-model within a larger hierarchical model that 1) uses data from all the wires I have data on (which is lots) and 2) models (tC,tR) for a given wire as being from a wider parent distribution across all wires (defined via group hyperparameters). Indeed there may be an additional hierarchy above that too, accounting for the physical closeness of wires (I plan to try different modelling approaches, starting with the easier one-level hierarchy across all wires). In its simplest form, this would lead to there being free RVs above tC and ΔtCR in the diagram above, defining the parameters of the parent distributions for these. E.g. mu and sigma parameters for a truncated Normal distribution defining tC for each wire.

I’m nervous though as I will end up with lots of RVs (80000) when I include all wires (40000); effectively a (tC,tR) pair for each wire. My dataset has 1 to 4 test results for each wire, so isn’t that huge for each wire. Even though these will be related via their parent distribution, I’m concerned that Metropolis may not prove efficient. Indeed my initial experiments in this regard don’t seem promising.

I’ve been reflecting on whether it might be possible to do use CompoundStep, to use Metropolis sampling for each of the (tC,tR) RVs, but NUTS (or other recommended efficient sampler) for higher level RVs. I’ve read this article Compound Steps in Sampling — PyMC example gallery, but am not so familiar with this approach. Before embarking on this though I hoped to ask advice from others in this community who might have similar experience?

One other avenue I might be able to pursue would be to eliminate the abrupt discontinuities from the likelihood function via using a continuous function to describe how P(U|I) conditional probabilities vary with time t, relative to tC and tR. Would this then likely give improved sampling efficiency do you think?

Sorry for the long post - and I hope it makes sense. I’d really appreciate any thoughts/tips that anyone has on this matter.

1 Like

The Triangular priors are probably not helping either. They also introduce a discontinuity. Can’t you use a Gamma/Normal, possibly truncated?

Shouldn’t your model have a concept of time? Once it’s corroded it can’t become uncorroded (I assume?). You could try to model it as a hidden markov model, with transition probabilities proportional to the elapsed time between tests?

The transition matrix would be triangular to reflect you can’t return to a previous better state

Edit: I see your model has this concept as your wrote clearly. Sounds to me you’ll need to integrate those two variables to arrive at a state probability at the time of test. The markov model was one way to do this.

PS: This is a very interesting problem.

1 Like