How to understand parameter uncertainty and HPD?

Jiangchao1 · April 4, 2019, 4:10am

Hello everyone,
I built a nonlinear regression model Q=a(H-b)^c to fit the observed data (x,y). This model assumes that the observation of H is equal to the true value while the observation of Q is not. I’ve simulated the results. I’m in the process of visualizing the posterior distribution using pm.hpd. I am not sure if I understand and use this function correctly. My code is as follows:

data = data.sort_values('Hgauged')
Hgauged = np.array(data['Hgauged'])
Qgauged = np.array(data['Qgauged'])
sigma_Qmeasurement = data['Qgauged']*(data['uQ[%]']/100)/1.96   
    
#Model description
with pm.Model() as model:           
    #Elicitation of hydraulic priors
    hydraulic_param_a = pm.Normal('hydraulic_param_a', mu=82.16, sd=22.46)
    hydraulic_param_b = pm.Normal('hydraulic_param_b', mu=-0.5, sd=0.25)
    hydraulic_param_c = pm.Normal('hydraulic_param_c', mu=1.67, sd=0.025)
    error_param_gamma = pm.HalfNormal('error_param_gamma', 10, shape=2)
    #Likelihood (total error = structure error plus measurement error)
    Qratingcurve = pm.Deterministic('Qratingcurve', hydraulic_param_a*(Hgauged - hydraulic_param_b)**hydraulic_param_c)
    sigma_Fratingcurve = error_param_gamma[0] + error_param_gamma[1]*Qratingcurve
    sigma_total = np.sqrt(sigma_Fratingcurve**2 + sigma_Qmeasurement**2)
    Q_pred = pm.Normal('Q_pred', mu=Qratingcurve, sd=sigma_total, observed=Qgauged)
    #Inference
    step = pm.NUTS()
    trace_model = pm.sample(20000, step=step, cores=1)  
    
#Diagnostic sampling
    burnin = 10000
    chain_model = trace_model[burnin:]
    pm.traceplot(chain_model, varnames=[hydraulic_param_a, hydraulic_param_b, hydraulic_param_c, error_param_gamma])    
    plt.savefig('G:/Python/Ratingcurve/traceplot', dpi=600)
    
#save trace to a dataframe
    df_trace_chain_model = pm.trace_to_dataframe(chain_model)

#draw a piar figure
    sns.pairplot(df_trace_chain_model, vars = ["hydraulic_param_a","hydraulic_param_b","hydraulic_param_c"], diag_kind="kde")
    plt.savefig('G:/Python/Ratingcurve/pairfigure', dpi=600)
    
#posterior visualization
    plt.plot(Hgauged, Qgauged, 'b.')
    hydraulic_param_a_m = chain_model['hydraulic_param_a'].mean()
    hydraulic_param_b_m = chain_model['hydraulic_param_b'].mean()
    hydraulic_param_c_m = chain_model['hydraulic_param_c'].mean()
    H = list(np.arange(0,8,0.1))
    Qmaxpost = hydraulic_param_a_m*(H - hydraulic_param_b_m)**hydraulic_param_c_m   
    plt.plot(H, Qmaxpost, 'k', linewidth=1.5)
    text = r'$Q={0:.2f}×(H+{1:.2f})^{2:.2f}$'.format(hydraulic_param_a_m, -hydraulic_param_b_m, hydraulic_param_c_m)
    plt.text(0, 1200, text, size=12)
    plt.xlabel('$H[m]$', fontsize=12) 
    plt.ylabel('$Q[m^3/s]$', fontsize=12)
    plt.title('Posterior rating curve - ceshi')
   
    idx = np.argsort(Hgauged)
    Hgauged_ord = Hgauged[idx]
    sig = pm.hpd(chain_model['Qratingcurve'], alpha=0.05)[idx]
    plt.fill_between(Hgauged_ord, sig[:,0], sig[:,1], color='gray')

The result obtained is shown in the following figure:
ratingcurve

The first question: I used pm.Deterministic to define the Qratingcurve which is the “mu” of my likelihood, and in function of pm.hpd(chain_model[‘Qratingcurve’], alpha=0.05)[idx], the data is “Qratingcurve”. Can I interpret the 95% uncertainty of the hpd calculation as a result of the parameter being a random variable, say “parameter uncertainty” ? But look at my likelihood: Q_pred = pm.Normal(‘Q_pred’, mu=Qratingcurve, sd=sigma_total, observed=Qgauged), assuming the observed data follow the normal distribution, the uncertainty is sd=sigma_total. Now, I don’t quit understand the meaning of uncertainty and how to express it using pm.hpd or other methods.

The second question: pm.hpd use the [idx], which has 19 points in my case, and I’ve learned that the function of plt.fill_between simply interpolate linearly between adjacent points. I want make 95% uncertainty envelope smoother, and let [idx]=np.arange(0,8,0.1), but I can not do this successfully. I want to know how this works.

The third question: I used plt.text to show the maxpost power function in my figure, but the exponent 1.66 shows a problem. I really don’t know how to solve it.

Please forgive me for my foolish question. I sincerely hope to be helped. Thank you very much.

Gon_F · April 4, 2019, 5:20pm

The other guys here can elucidate the finer points of Pymc3 better than I will, but, as for the theoretical topic of HPD’s, I can confidently point you to McElreath’s book, which explains them in depth one chapter.

Topic		Replies	Views
Plot of mean and HDI is really off Questions	4	1757	October 20, 2020
Argument for pm.plot_hdi() v5 arviz	2	41	August 12, 2024
Highest Density Regions (HDR) misunderstanding Questions	9	2900	October 5, 2019
The 95% interval of PyMC3 trace doesn't cove the real values Questions	8	1837	July 21, 2017
Visualizing highest posterior density for multiple conditions using arviz plot_hpd Questions	7	3156	November 8, 2021

How to understand parameter uncertainty and HPD?

Related topics