I can't get rid of divergences in Extreme Value Analysis


I’m trying to fit a GEV with the following data, taking from the annual maxima (block maxima approach):

array([ 3.698211 , 12.084102 , 15.341089 ,  9.201157 ,  7.0970197,
       13.50453  , 26.924686 , 16.444876 ,  9.672132 ,  8.024217 ,
       22.364328 , 16.575699 ,  6.1274405,  8.381071 , 16.532887 ,
       14.530084 ,  5.5311694,  8.343545 , 13.957676 ,  5.140601 ,
       23.630177 ], dtype=float32)

Very few data, I know. Another option would be to use Peak Over Threshold instead of Block Maxima, and I think that would add more data, but Peak Over Threshold needs a Generalized Pareto Distribution, and I can’t find that distribution in PyMC right now.

I did, following Generalized Extreme Value Distribution — PyMC example gallery :

import pymc as pm
import numpy as np
import pymc_experimental.distributions as pmx

# Consider return period of 5 years (p=1/5)
p = 1/5

with pm.Model() as model:
    # Priors
    μ = pm.Normal("μ", mu=2, sigma=2)
    σ = pm.HalfNormal("σ", sigma=2)
    ξ = pm.Normal("ξ", mu=1, sigma=1.5)

    # Estimation
    gev = pmx.GenExtreme("gev", mu=μ, sigma=σ, xi=ξ, observed=data)
    # Return level
    z_p = pm.Deterministic("z_p", μ - (σ / ξ) * (1 - (-np.log(1 - p)) ** (-ξ)))

# The inference

with model:
    trace = pm.sample(
        #initvals={"μ": 2, "σ": 1.0, "ξ": 1},
# add trace to existing idata object

I have done many tests changing the priors, the initialization of the priors and I have standardized the data, but I can’t get rid of the so many divergences. Sometimes even the NUTS stops because the initialization doesn’t work.

I could perform a frequentist analysis with “PyExtremes”, so I’d say there should be some combination of parameters allowing also a Bayesian analysis. Any help would be very much appreciated. Thx.

Not knowing much about this type of analysis, I would begin with a prior predictive check. Doing so suggests that your priors (and model) are generating some values that are much, much larger than your observed data (10-15 orders of magnitude larger). This suggests to me that the GEV is not a great fit to your data, at least with the specific priors you have select. I fiddled a tiny bit with xi, cranking it’s prior down to zero with relatively tight sigma, and that seemed to eliminate the divergences. No clue whether this is appropriate, but it’s more evidence that your divergences may have to do with your priors.

1 Like

Thanks, that seems to solve it. I changed xi to

ξ = pm.Normal("ξ", mu=0, sigma=0.01)

and it makes sense (if xi is close to 0 we have a Gumbel type of extreme value distribution). Could you please tell what made you think that the xi could be the key parameter to change to solve the problem?

1 Like

I saw a loc, a scale, and a shape parameter. Give my experience with other 3-parameter distributions, I figured the shape parameter might be causing the insanely long right tail (i.e., I assumed that the scale parameter controlled something a bit more symmetric). So dumb luck basically.

1 Like