Linear regression with Generalized extreme value distribution

ccaprani · May 8, 2023, 2:19pm

Ok, well I’m a sucker for a problem, so here goes.

X = np.linspace(1, 276, 276)
X = X.astype(int)
df = pd.read_csv('Datapy.csv')

with pm.Model() as model:
    # Priors
    alpha =pm.Normal('alpha', 20, 5)
    #alpha = alpha.astype(int)
    beta = pm.Normal('beta', 0, 1)
    #beta = beta.astype(int)
    gam = pm.Normal('gam', mu=0.0, sigma=0.1)
    #gam = gam.astype(int)
    
    σ = pm.HalfNormal("σ", sigma=10)
    ξ = pm.Normal("ξ", mu=0.0, sigma=0.1)
    # Expected value of outcome
    #μ = pm.Normal("μ", mu=20, sigma=5)
    μ = alpha + beta*X + gam*X**2
   
    # Estimation
    gev = pmx.GenExtreme("gev", mu=μ, sigma=σ, xi=ξ, observed=df.r)

idata = pm.sample_prior_predictive(samples=1000, model=model)
az.plot_ppc(idata, group="prior", figsize=(8, 3))
ax = plt.gca()
ax = plt.gca()
ax.set_xlim([-50, 200])
ax.set_ylim([0, 0.04]);

az.plot_posterior(idata, group="prior", var_names=["alpha","beta","gam","σ","ξ"], hdi_prob="hide", point_estimate=None, figsize=(8, 3));

with model:
    trace = pm.sample(
        500,
        cores=4,
        chains=4,
        tune=2000,
        initvals={'alpha':22, 'beta': -0.01, "gam": 0.0, "σ": 15.0, "ξ": 0.2},
        #initvals={'alpha':0.92, 'beta': -2.5, "gam": 0.9, "μ": 50, "σ": 3.0, "ξ": -0.1},
        target_accept=0.9,
    )
idata.extend(trace)

az.plot_trace(idata, var_names=["alpha", "beta", "gam", "σ", "ξ"], figsize=(12, 16));

The main thing was to choose more reasonable priors. Building up from a single GEV distribution fit showed that the X wasn’t so influential: beta is small and gam is close to zero. Also, the target value should be < 1.0 and it’s good to let the sampler select its own starting point, at least initially.

Compare the prior predictive check for the case for no linear dependence with the one for the quadratic dependence with these much tigher priors. It gives a good idea how important priors are here.

No linear dependence (7 secs)
Samples great.

Linear dependence (8 secs)
Needs a sensible prior for alpha and beta.

Quadratic dependence (37 secs)
Samples badly - can be lots of divergences, even with tight priors for gam, reasonable initvals and a higher target_accept.

With integers
Turning even just alpha to an astype(int) causes the sampling to go nuts (haha!) - it takes about 450 seconds then and looks just awful. DO you really need these parameters to be ints?

Topic		Replies	Views
Posterior Predictive Sampling Error : Custom Distribution Questions	0	295	September 21, 2021
Extreme Value Analysis with Pymc3 Questions	1	557	May 11, 2021
Generalized Extreme Value analysis in pymc3 Questions	4	1858	September 16, 2021
Initial evaluation of model at starting point failed, when attempting to use truncated model version agnostic	7	71	November 4, 2024
Gamma regression v5 modeling	7	1421	July 13, 2023

Linear regression with Generalized extreme value distribution

Related topics