Difference between 'jitter+adapt_diag' and 'adapt_diag'?

Jiangchao1 · April 2, 2019, 8:00am

hello everyone,
I am a beginner at Pymc3. I built a nonlinear regression model y=a(x-b)^c to fit the observed data (x,y). This model assumes that the observation of x is equal to the true value while the observation of y is not. I have constructed prior distribution of 5 parameters and the likelihood using my expertise. I found a strange phenomenon in my simulation and I don’t know why. Hoping to get an explanation. My code is as follows:

data = data.sort_values('Hgauged')
Hgauged = np.array(data['Hgauged'])
Qgauged = np.array(data['Qgauged'])
sigma_Qmeasurement = data['Qgauged']*(data['uQ[%]']/100)/1.96
with pm.Model() as model:
    hydraulic_param_a = pm.Normal('hydraulic_param_a', mu=82.16, sd=22.46)
    hydraulic_param_b = pm.Normal('hydraulic_param_b', mu=-0.5, sd=0.25)
    hydraulic_param_c = pm.Normal('hydraulic_param_c', mu=1.67, sd=0.025)
    error_param_gamma = pm.Uniform('error_param_gamma', 0, 10000, shape=2)
    Qratingcurve = pm.Deterministic('Qratingcurve', hydraulic_param_a*(Hgauged - hydraulic_param_b)**hydraulic_param_c)
    sigma_Fratingcurve = error_param_gamma[0] + error_param_gamma[1]*Qratingcurve
    sigma_total = np.sqrt(sigma_Fratingcurve**2 + sigma_Qmeasurement**2)
    Q_pred2 = pm.Normal('Q_pred2', mu=Qratingcurve, sd=sigma_total, observed=Qgauged)
    step = pm.NUTS()
    trace=pm.sample(20000, step=step, cores=1)

At first, I use the default init (init=‘jitter+adapt_diag’) like above，the iteration is very slow at first chain( more than 15 mins) in 1 jobs, while the second chain is obviously faster (about 3 mins). But the traceplot is like this:

Then, I choose the init=‘adapt_diag’, the calculating speed is up to about 3 mins for one chain. The traceplot is like this:

But when I shut down my Spyder(Python 3.7), and turned it on again to run my model with init=‘adapt_diag’, the calculation speed again became very slow, and the result was the same as the first picture.
I repeat the above process several times, and get the same traceplot ficture.
I want to know the difference between ‘jitter+adapt_diag’ and ‘adapt_diag’, and how to set the parameters of pm.sample() to get the satisfied results like the second figure and the iteration speed can also be fast. Thank you very much!

junpenglao · April 2, 2019, 8:12am

jitter+adapt_diag makes a jitter on the starting value, so that different chain started at a different point in the parameter space. It is the recommended option, as it helps you discover bad parameterization. For example, in your case there are indication that some parameter space is difficult to sample from - narrow geometry etc - which makes some chain very slow (as it takes smaller and larger quantities of leapfrog steps).
So unless you have some edge cases where jitter+adapt_diag makes the staring value invalid (nan or inf logp), you should always use jitter+adapt_diag.

In your code, the likely problem is error_param_gamma = pm.Uniform('error_param_gamma', 0, 10000, shape=2) - HalfNormal is usually a much better choice of prior for sigma.

Jiangchao1 · April 2, 2019, 8:47am

thanks junpeng very much! I solved the problem using a HalfNormal distribution to describe the prior of my error parameters. the iteration speed is also good, thanks very much!

Topic		Replies	Views
Initialization energy is NaN or Inf with jitter Questions	4	1315	December 9, 2020
What exactly is `jitter+adapt_diag` and why is it the default now? Questions	3	3376	November 3, 2017
QuadPotentialFullAdapt v5	7	387	July 2, 2022
Change distribution of jitter v5 sampling	16	347	July 11, 2023
Derivatives are zero for jitter but not ADVI initialization Questions	2	527	September 28, 2018

Difference between 'jitter+adapt_diag' and 'adapt_diag'?

Related topics