Reducing divergences : Implementation challenges

pythonometrist · July 31, 2021, 3:13am

My model looks like this:

    M = X_train.shape[1]
    with pm.Model() as model:
            ν = 1
            λ = pm.HalfCauchy('lambda', 1)
            τ2 = pm.InverseGamma('tau_sq', alpha=0.5*ν, beta=0.5*ν/λ, shape=M, testval=0.1)
            σ = pm.HalfNormal("σ", 1.)
        
        
        pred = pm.Data("pred", X_train)

        mu = pm.Deterministic('mu',β0 + pm.math.dot(pred, β))
        obs = pm.Normal("obs",mu , σ, observed=y_train)

I find that there are many divergences for this specification. I’ve tried the following…

Using different initalizations (helps a little)
How do i reparametrize the halfCauchy? This seems to help with the Cauchy -https://github.com/pymc-devs/pymc3/issues/1924

Does it make sense to use a HalfStudentT in place of the HalfCauchy with dof 2?

Also any advice on how seriously to take divergences? I see in several models - the number of divergences does not affect the held out MAPE much. So is there a rule of thumb - say 10 or more divergences in 2 chains with 1000 tuning steps and 1000 draws is not ignorable?

jlindbloom · August 1, 2021, 3:23am

Can you show how you’re defining β0 and β in your code? And where τ2 is being used in the code (as shown it isn’t actually used in the model). This will help figure out what might be causing the divergences.

pythonometrist · August 1, 2021, 6:57pm

pythonometrist:

M = X_train.shape[1]
    with pm.Model() as model:
         ν = 1
         λ = pm.HalfStudentT('lambda', 1)
        
        τ2 = pm.InverseGamma('tau_sq', alpha=0.5*ν, beta=0.5*ν/(λ), shape=M, testval=0.1)
        σ1 = pm.HalfNormal("σ1", 2.5)
        sd_ =pm.Deterministic('sd_', σ1 * np.sqrt(τ2))
        β = pm.Normal('beta', mu=0, sd=sd_, shape=M)
        σ = pm.HalfNormal("σ", 2.5)       
        β0 = pm.Normal("β0", 0, 10.)
        
        pred = pm.Data("pred", X_train)

        mu = pm.Deterministic('mu',β0 + pm.math.dot(pred, β))
        obs = pm.Normal("obs",mu , σ, observed=y_train)

Here is the updated version of the code.

cluhmann · August 3, 2021, 1:33am

To paraphrase comments Aki Vehtari made after his keynote at last year’s PyMCon, the rule of thumb is that divergences are nothing to worry about as long as you have fewer than about one of them. In other words, divergences (any number) should not be ignored.

With a small number of divergences, I typically assume that I can fix things with small modifications to sampling parameters (e.g., targeted acceptance rates), but I always verify this by actually implementing those modifications. More divergences and you may have more of a pathological issue. PyMC3/arviz provides plenty of tools to investigate the divergences so that you can diagnose what all is going on.

pythonometrist · August 3, 2021, 6:31pm

Thanks That is useful - Is there a particular link I can focus on to examine divergences? I usually find a simple regression model with one covariate for prior predictive checks, a full example with multiple covariates would be a great starting point.

Empirically I’ve found the test performance for predictive models to be incredibly close for 0 vs say 5 divergences. So I suspect it matters which parameter(s) is divergent. Any comments on reparametrization would also be welcome.

cluhmann · August 3, 2021, 7:13pm

You can find the pymc3 notebook discussing the diagnosis of divergences here. It’s slightly out of date, and so many of the functions in that notebook have been (or are in the midst of being) moved into arviz. But that notebook should nonetheless give you good guidance about the general procedure.

Indeed. Ultimately, the divergences are simply an indication of what was going on during sampling. Divergences indicate that the sampler was having trouble and figuring out where in the parameter space the sampling was difficult (i.e., where the divergences were observed) is a major part of uncovering the relevant investigation. The notebook above specifically addresses this and presents a re-specified model that alleviates the uncovered sampling difficulties. Of course, you can get a stray divergence here or there for no obvious reason. But until you go digging, you’ll never know why.

cluhmann · August 3, 2021, 7:17pm

Forgot to include the arviz examples. Here and here are a couple that you can use to compare to the pymc3 notebook linked above.

pythonometrist · August 3, 2021, 11:41pm

Thanks so much. Really appreciate the insights!

Topic		Replies	Views
There were 6 divergences after tuning. Increase `target_accept` or reparameterize Questions	5	4250	August 19, 2021
Correct posteriors but many divergences v5 modeling	9	360	March 13, 2024
Divergences with unused variables	11	569	November 16, 2023
General doubt about problems when sampling a model Questions	7	693	March 27, 2021
What to do with divergences in my case Questions	7	2431	September 21, 2017

Reducing divergences : Implementation challenges

Related topics