Sampling Error: GEV PyMC3

I am modeling Generalised Extreme Value Distribution in PYMC3 but I keep on getting SamplingError: Initial evaluation of model at starting point failed! I am implementing my own distribution. The logp equation is the PDF of GEV. I took the inital values for the shape, location and scale parameters from the best guess by scipystats.

with pm.Model() as model:
#priors
shape =  pm.Normal('shape', mu=shape1, sd=1.8*shape1)
loc = pm.Normal('loc', mu=loc1, sd=1.8*loc1)
scale = pm.Normal('scale', mu=scale1, sd=1.8*scale1)
def logp(value):

    if (shape == 0):
        lp=(np.log(1/scale) + (shape+1) * (-(value-loc/scale))- (-(value-loc/scale))).sum()
    else:
        lp=np.log(1/scale) + ((shape+1)*((-1/shape)*np.log(1+ shape*(value-loc/scale))))-       ((-1/shape)*np.log(1+shape*(value-loc/scale))).sum()
    return lp

gev = pm.DensityDist('gev', logp, observed = {'value':np.array(datin.data1)})
trace = pm.sample(3000, tune=500)

I am getting the following error:

SamplingError: Initial evaluation of model at starting point failed!
  
Starting values:

{'shape': array(-0.07499058), 'loc': array(14.728077), 'scale': array(5.38682189)}

Initial evaluation results:
shape    -inf
loc     -4.20
scale   -3.19
gev       NaN
Name: Log-probability of test_point, dtype: float64

It looks like someone has some working code for a model using the GEV distribution here. Does sampling work if you use this version + your data?

Hi,
Thanks for the reply. I tried this version but it doesnā€™t work. It returns error messages
TypeError: must be real number, not FreeRV

Can you share your code thatā€™s giving you this error?

Also, if youā€™re able to share your data datain that would be useful for debugging.

I am using the exact code you shared. I have tried this in the past with little success. Here is my data
data.txt (203 Bytes)

I think the problem is coming from your priors, you will need to make sure they make sense (Iā€™m not very familiar with the GEV distribution so Iā€™m not of much help). When I was messing with the priors I got the same

SamplingError: Initial evaluation of model at starting point failed!

issue you initially got. But here is a Collab notebook where it seems to be working. Just to note, I was getting a huge number of divergences when I ran inference initially so I set target_accept=0.99. You may want to revisit why these divergences are happening after you sort out your priors.

1 Like

Many thanks!
I agree, there is definitely some problem with my priors. I will try to play with different set of priors and see if it converges.

The GEV shape parameter rarely goes outside +/- 0.3 in practical applications. Itā€™s hard to know what your prior is doing - what is shape1?

Hi,
just wanted to update that I corrected my model by reducing the standard deviation and giving informed priors as calculated by the maximum likelihood. It looks like the model converges well with this set-up. I am attaching the trace plot. Could you please have a look at it if it looks decent. Also, Since I am new to PYMC3, I am confused about the posterior predictive analysis for this model. I tried to search for it and found some code , but it didnā€™t work out very well.
Many Thanks!

If youā€™re interested in the distribution of the prediction at a 10 year return level for example, this is what you could do:

p = 1/10  # probability of exceedance
with pm.Model() as model_gev:
    # Priors
    Ī¼ = pm.Normal('Ī¼', mu=3, sigma=5)
    Ļƒ =pm.Normal('Ļƒ', mu=0.24, sigma=5)
    Ī¾ = pm.Normal('Ī¾', mu=0, sigma=1)
    
    # Estimation
    gev = pm.DensityDist('gev', gev_logp, observed = {'value':data, 'loc':Ī¼, 'scale':Ļƒ, 'xi':Ī¾})
    # Return level
    z_p = pm.Deterministic('z_p', Ī¼ - Ļƒ/Ī¾*(1 - (-np.log(1-p))**(-Ī¾)) )

    trace = pm.sample(2000, cores=4, chains=4, tune=2000, return_inferencedata=True, 
                      idata_kwargs={"density_dist_obs": False}, start={'Ī¼': 0.0, 'Ļƒ': 1.0, 'Ī¾': 0.1}, target_accept=0.99)

Note the high target acceptance probability that is needed due to divergences.

As an aside, the Port Pirie data of Stuart Colesā€™ book is a useful benchmark for your implementation:

# Port Pirie data from Coles, "An Introduction to Statistical Modelling of Extreme Values" 
data = np.array([4.03, 3.83, 3.65, 3.88, 4.01, 4.08, 4.18, 3.8, 
                 4.36, 3.96, 3.98, 4.69, 3.85, 3.96, 3.85, 3.93, 
                 3.75, 3.63, 3.57, 4.25, 3.97, 4.05, 4.24, 4.22, 
                 3.73, 4.37, 4.06, 3.71, 3.96, 4.06, 4.55, 3.79, 
                 3.89, 4.11, 3.85, 3.86, 3.86, 4.21, 4.01, 4.11, 
                 4.24, 3.96, 4.21, 3.74, 3.85, 3.88, 3.66, 4.11, 
                 3.71, 4.18, 3.9, 3.78, 3.91, 3.72, 4, 3.66, 3.62, 
                 4.33, 4.55, 3.75, 4.08, 3.9, 3.88, 3.94, 4.33])
1 Like

@ccaprani Thanks, this is helpful. I was calculating the return levels with numpy. I updated the priors as suggested by Coles to non-informative with large variances. I tried the beta prior with alpha = 6 and beta =9 as well as suggested by . Huard D, Mailhot A, Duchesne S. Bayesian estimation of intensity-duration- frequency curves and of the return period associated to a given rainfall event. Stochastic Environ Res Risk Assess 2010;2(D3):337ā€“47. http://dx.doi.org/ 10.1007/s00477-009-0323-1.
However, I am getting a larger value of the shape parameter compared to the Maximum Likelihood which is reflected in the return level distribution with large uncertainty interval. Also, the return level is conditional on the shape parameter. How can I incorporate that in the pm.Deterministic function?

For example for a 50 year prediction (p=1/50), here is the 95% posterior density for z_p . Clearly, the distribution is heavy-tailed.

Hi @Dawar-812, a few comments on this:

  • that result for the predictand is out by an order of magnitude - did you scale the data beforehand? The 50-year RL should be about 4.5m.
  • the uncertainty in all the parameters is already taken into account in the RL. Although pm.Deterministic is a deterministic function, its inputs are the sampled values of the chain, and so the ensemble of its outputs represents its distribution taking account of both parameter and underlying uncertainty.
  • Iā€™m not convinced the beta distribution on the shape parameter is a good idea without knowing more about the phenomenon: the shape parameter can (in theory) be positive or negative and is usually bounded between +0.5 to -0.5. The prior from that paper, which may be suitable for rainfall intensity, puts most probability mass between +0.2 and +0.6 with nothing negative.
  • Be wary as well that Coles uses one expression of the GEV where a negative shape parameter is Weibull-tailed, while the Scipy and some other implementations are reversed.
  • Are you getting any divergences with your settings? And what is your target acceptance ratio? Those priors I had in the previous code are not very good in fairness!

@ccaprani Thanks for the inputs. Forgot to mention the data used is not from Port Pirie,but rather my own data data.txt (203 Bytes).
I stacked the trace and then plotted the 95% HDI. I agree, based on extensive studies, the shape prior usually lies between -0.5 to 0.5. The choice of Beta prior was also suggested by Martins ES, Stedinger JR. Historical information in a generalized maximum likelihood framework with partial duration and annual maximum series. Water Resour Res 2001;37(10):2559ā€“68. http://dx.doi.org/10.1029/2000WR000009.
As, i am working with a rainfall series, I based my decision of Beta prior just referring to the above papers. Choosing a normally distributed shape parameter gives me 95% HDI [-012, 0.32] with mean 0.09, which is somewhat close to the MLE value of -0.075 while with Beta prior,I get [0.12, 0.48] and mean (0.30) for this data set. I was getting a lot of divergences. I set the target accept to 0.99 and now get no divergences or sometimes may be 3-5.

@Dawar-812 Iā€™ve done a bit more on this today, and certainly the result is incredibly sensitive to the priors - especially on the shape parameter of course. This isnā€™t really surprising though, considering the intended behaviour of the GEV itself. Anyway, because of this sensitivity, it makes it all the more important to use informative priors and to do prior predictive checking. It could also help to center the data.

For the Port Pirie data, I refined my priors using prior predictive checking:

    Ī¼ = pm.Normal('Ī¼', mu=3.8, sigma=0.2)
    Ļƒ = pm.HalfNormal('Ļƒ', sigma=0.3)
    Ī¾ = pm.TruncatedNormal('Ī¾', mu=0, sigma=0.2, lower=-0.6, upper=0.6)

and with some abuse of Arviz back end code (is there a better way or cleaner API for this @aloctavodia ?), I could wrangle a comparison with the frequentist (gasp!) MLE approach - the results are given in Colesā€™ book for comparison (itā€™s good):

_,vals = az.sel_utils.xarray_to_ndarray(trace['posterior'],var_names=['Ī¼', 'Ļƒ', 'Ī¾'])
mle = [azpu.calculate_point_estimate('mode',val) for val in vals]
mle

[3.8669419667301725, 0.19976120556684945, -0.042758936933144004]

trace['posterior'].drop_vars('z_p').to_dataframe().cov().round(6)

	Ī¼ 	Ļƒ 	Ī¾
Ī¼ 	0.000799 	0.000178 	-0.000805
Ļƒ 	0.000178 	0.000455 	-0.000601
Ī¾ 	-0.000805 	-0.000601 	0.008192

Anyway, hope to finish out the GEV development for a v4 PR soon!

1 Like

@ccaprani Thanks, it would certainly make things much simpler if you incorporate GEV in Pymc3.
Is there a way I could get a similar figure for posterior predictive distribution with the inbuilt function for a gev model?
Here is what I did for validating the model. I used scipy stats inbuilt gev function to sample from the posterior trace. Similarly, I used MLE point estimates (shape1, loc1, scale1) from scipy stas and generated equal number of samples for comparing the two.

sample_number = 32000
shape_posterior = stacked.xi.values[:32000]
loc_posterior = stacked.loca.values[:32000]
scale_posterior = stacked.sig.values[:32000]
p_bayes = np.zeros(sample_number)
for i in range(sample_number):   
p_bayes[i]=genextreme.rvs(c=shape_posterior[i],loc=loc_posterior[i], scale=scale_posterior[i], size =1)
q_mle =  genextreme.rvs(c= shape1,loc=loc1, scale=scale1, size=32000)
_= sns.distplot(p_bayes, hist=False, color='red',kde_kws = {'shade': True, 'linewidth': 1})
_=sns.distplot(q_mle, hist=False, color='black',kde_kws = {'shade': True, 'linewidth': 1})

@Dawar-812 Iā€™m pleased to have completed the work now, and have submitted PRs for pymc v4, including an example worked out against Coles.

It would be great if you could examine the code and example and leave some comments on the PRs if you like? Would be also great if you could pull the gev branches and see how they run for your problems and data.

Please let me know of any problems!

@Dawar-812 FYI:

1 Like

@ccaprani Many thanks. Couldnā€™t get any better! For the scale parameter, a normal prior does a decent job as well. However, for my particular data set, I updated the shape parameter to Truncated Normal [-0.5, 0.5], and itā€™s giving some interesting results compared to Normal and Beta priors. I agree with your previous argument, the Beta prior might not be a good choice if we donā€™t have solid site-specific knowledge or knowledge about the characteristic phenomena.

Thanks for nice feedback @Dawar-812! Glad to get this finally done, and getting some good input from the pymc experts in the review too.

I suppose if you do already have a strong prior information on the tail behaviour, it could be better to fit the Frechet, Gumbel, or Weibull directly. I think the benefit of GEV is for those exploratory situations where there may not be a clear phenomenological reason for preferring one over another, and in this case, the prior should probably reflect that by being agnostic on the domain of attraction (i.e. centered at zero).

In any case, getting this into pymc, and seeing how itā€™s used in the wild, and what issues are thrown up, will be interesting. And Iā€™m a little hopeful that the divergences issue might get looked at by one of the HMC experts around here - perhaps there is a transformation that could avoid the ā€œfunnel of doomā€ - and that would be real progress in estimation for extremes.

@ccaprani I really appreciate your work.

I agree with the GEV choice. The choice should be subjective on the intended application. We can expect Gumbel to perform better in some cases, while in other cases Frechet or even Weibull is appropriate. Theoretically, the estimated shape parameter of a ļ¬tted GEV distribution reveals
which one of the three distributions performs better, as all of them emerge for speciļ¬c values of the shape parameter.

1 Like