Hello everyone,
I built a nonlinear regression model Q=a(H-b)^c to fit the observed data (x,y). This model assumes that the observation of H is equal to the true value while the observation of Q is not. I’ve simulated the results. I’m in the process of visualizing the posterior distribution using pm.hpd. I am not sure if I understand and use this function correctly. My code is as follows:
data = data.sort_values('Hgauged')
Hgauged = np.array(data['Hgauged'])
Qgauged = np.array(data['Qgauged'])
sigma_Qmeasurement = data['Qgauged']*(data['uQ[%]']/100)/1.96
#Model description
with pm.Model() as model:
#Elicitation of hydraulic priors
hydraulic_param_a = pm.Normal('hydraulic_param_a', mu=82.16, sd=22.46)
hydraulic_param_b = pm.Normal('hydraulic_param_b', mu=-0.5, sd=0.25)
hydraulic_param_c = pm.Normal('hydraulic_param_c', mu=1.67, sd=0.025)
error_param_gamma = pm.HalfNormal('error_param_gamma', 10, shape=2)
#Likelihood (total error = structure error plus measurement error)
Qratingcurve = pm.Deterministic('Qratingcurve', hydraulic_param_a*(Hgauged - hydraulic_param_b)**hydraulic_param_c)
sigma_Fratingcurve = error_param_gamma[0] + error_param_gamma[1]*Qratingcurve
sigma_total = np.sqrt(sigma_Fratingcurve**2 + sigma_Qmeasurement**2)
Q_pred = pm.Normal('Q_pred', mu=Qratingcurve, sd=sigma_total, observed=Qgauged)
#Inference
step = pm.NUTS()
trace_model = pm.sample(20000, step=step, cores=1)
#Diagnostic sampling
burnin = 10000
chain_model = trace_model[burnin:]
pm.traceplot(chain_model, varnames=[hydraulic_param_a, hydraulic_param_b, hydraulic_param_c, error_param_gamma])
plt.savefig('G:/Python/Ratingcurve/traceplot', dpi=600)
#save trace to a dataframe
df_trace_chain_model = pm.trace_to_dataframe(chain_model)
#draw a piar figure
sns.pairplot(df_trace_chain_model, vars = ["hydraulic_param_a","hydraulic_param_b","hydraulic_param_c"], diag_kind="kde")
plt.savefig('G:/Python/Ratingcurve/pairfigure', dpi=600)
#posterior visualization
plt.plot(Hgauged, Qgauged, 'b.')
hydraulic_param_a_m = chain_model['hydraulic_param_a'].mean()
hydraulic_param_b_m = chain_model['hydraulic_param_b'].mean()
hydraulic_param_c_m = chain_model['hydraulic_param_c'].mean()
H = list(np.arange(0,8,0.1))
Qmaxpost = hydraulic_param_a_m*(H - hydraulic_param_b_m)**hydraulic_param_c_m
plt.plot(H, Qmaxpost, 'k', linewidth=1.5)
text = r'$Q={0:.2f}×(H+{1:.2f})^{2:.2f}$'.format(hydraulic_param_a_m, -hydraulic_param_b_m, hydraulic_param_c_m)
plt.text(0, 1200, text, size=12)
plt.xlabel('$H[m]$', fontsize=12)
plt.ylabel('$Q[m^3/s]$', fontsize=12)
plt.title('Posterior rating curve - ceshi')
idx = np.argsort(Hgauged)
Hgauged_ord = Hgauged[idx]
sig = pm.hpd(chain_model['Qratingcurve'], alpha=0.05)[idx]
plt.fill_between(Hgauged_ord, sig[:,0], sig[:,1], color='gray')
The result obtained is shown in the following figure:
The first question: I used pm.Deterministic to define the Qratingcurve which is the “mu” of my likelihood, and in function of pm.hpd(chain_model[‘Qratingcurve’], alpha=0.05)[idx], the data is “Qratingcurve”. Can I interpret the 95% uncertainty of the hpd calculation as a result of the parameter being a random variable, say “parameter uncertainty” ? But look at my likelihood: Q_pred = pm.Normal(‘Q_pred’, mu=Qratingcurve, sd=sigma_total, observed=Qgauged), assuming the observed data follow the normal distribution, the uncertainty is sd=sigma_total. Now, I don’t quit understand the meaning of uncertainty and how to express it using pm.hpd or other methods.
The second question: pm.hpd use the [idx], which has 19 points in my case, and I’ve learned that the function of plt.fill_between simply interpolate linearly between adjacent points. I want make 95% uncertainty envelope smoother, and let [idx]=np.arange(0,8,0.1), but I can not do this successfully. I want to know how this works.
The third question: I used plt.text to show the maxpost power function in my figure, but the exponent 1.66 shows a problem. I really don’t know how to solve it.
Please forgive me for my foolish question. I sincerely hope to be helped. Thank you very much.