Apparent pathological behavior

I’ve had the below model running well with a slightly different problem - recovering the hydraulic conductivity field where the domain is on the surface and observations are made over the entire domain, but for this application - recovering the electrical conductivity field where the domain extends into the ground and observations are only made at the surface, the target distribution appears to exhibit pathological behavior.

with pm.Model() as ERTModel:
    sigma = pm.HalfNormal('sigma', sd=10)
    pr_vec = pm.Normal('pr_vec', mu=0, sd=1., shape=np, testval=npy.ones(np))
    yr = pm.Deterministic('yr', reducedOp(pr_vec))
    pm.Normal('y_rec', mu=yr, sd=sigma, observed=y_obs)
    trace = pm.sample(cores=2)

Running the above model with 1000 draws and 1000 tuning samples over two cores, i barely get 1 effective sample per parameter value pr_vec. The model returns the following warnings:

The chain reached maximum tree depth. Increase max_treedepth, increase target_accept or reparametrize.
The Gelman-Rubin statistic is larger than 1.4 for some parameters. The sampler did not converge.

The following model was my attempt to reparametrise the problem. With this abstraction, after only 400 draws, the Markov transition begins to struggle to explore a region of the typical set and the speed of the sampling process drops drastically from 2 s/draws to about 70 s/draws or worse.

with pm.Model() as ERTModel:
    prmu = pm.Normal('prmu', mu=0., sd=1., shape=np)
    pr_vec = pm.Normal('pr_vec', mu=prmu, sd=1., shape=np)    
    tau = pm.HalfCauchy('tau', beta=5.)
    yr_tilde = pm.Normal('yr_tilde', mu=0, sd=1, shape=Ns*(Ne-1))
    yr = pm.Deterministic('yr', reducedOp(pr_vec) + tau * yr_tilde)
    sigma = pm.HalfNormal('sigma', sd=10, shape=Ns*(Ne-1))
    y_rec = pm.Normal('y_rec', mu=yr, sd=sigma, observed=y_obs)
    trace = pm.sample(150, tune=50, cores=2, nuts_kwargs=dict(target_accept=.99, max_treedepth=15))#, progressbar=False)

Below is the output from running the model with 400 draws (with a bigger sample size, the problem becomes intractable).

nick@u64:/media/sf_thesis/model$ run-escript reducedERTMCMC2.py
Electrical Resistivity Tomography Problem!
Number of spatial dimensions:        2
Domain: L x H x W                    3.00 x 1.00 x 2.00
Number of elements:                  [24, 8]
Number of Gaussian points:           [30 10]
Number of electrodes:                5
Number of sources:                   2
Electrode spacing:                   0.25
Node spacing:                        0.12
Electrical conductivity constant:    1.0E-02 S/m
Electrical resistivity constant:     100 Ohm.m
Injection current:                   0.100 A
Smoothness parameter, B:             0.70
H1 norm parameter, mu:               [10.    0.01]
Time stamp: 1532525473
Restart from ./solERT_0000
Restart from ./basesERT_0000
ysig = 0.000000
Checking gradient function...
Gradient function ok!
Only 150 samples in chain.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sigma, yr_tilde, tau, pr_vec, prmu]
Sampling 2 chains: 100%|████████████████████████| 400/400 [2:15:44<00:00, 43.16s/draws]
There were 1 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.7360035766265859, but should be close to 0.99. Try to increase the number of tuning steps.
There were 1 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.7897417586183715, but should be close to 0.99. Try to increase the number of tuning steps.
Number of Divergent samples 2
Solution calculations = 492, Gradient calculations = 12
pNorm = 1.000000, recovered-pNorm = 0.483385, std. dev norm = 3.374906
Output error = 0.153559, Parameter error = 0.229116
                 mean        sd  mc_error    ...      hpd_97.5       n_eff      Rhat
prmu__0      0.180292  0.839988  0.050420    ...      1.781476  306.049378  0.998777
prmu__1      0.031708  0.924706  0.058839    ...      1.854453  194.467061  1.004110
prmu__2      0.009131  1.005191  0.063116    ...      1.733507  176.327664  0.998862
prmu__3     -0.012624  0.889399  0.056414    ...      1.707618  185.598000  0.998121
prmu__4      0.008695  0.950955  0.057348    ...      1.650516  220.775217  0.997834
prmu__5      0.000922  0.995495  0.066238    ...      1.958040  149.487278  0.997922
prmu__6     -0.008213  0.950615  0.056124    ...      2.080760  238.562680  0.996699
sigma__0     2.870321  4.231095  0.218610    ...     12.911517  375.645866  0.996663
sigma__1     2.969281  4.225232  0.254199    ...    12.396289  138.982359  0.997647
sigma__2     2.559276  3.400854  0.198133    ...      9.269213  297.359816  0.996678
sigma__3     2.799366  4.020545  0.232845    ...     11.325008  355.386072  0.999065
sigma__4     2.890020  3.647479  0.235467    ...     11.103303  251.418386  0.998504
sigma__5     2.899661  3.784613  0.255042    ...     12.115214  180.291617  1.004820
sigma__6     2.740849  3.897345  0.245689    ...     11.095874  232.092121  0.996708
sigma__7     2.787906  3.648303  0.227267    ...     10.680208  260.942249  1.003896
pr_vec__0    0.464277  0.911246  0.059047    ...      1.965372  242.979151  0.997126
pr_vec__1    0.038997  1.237537  0.079859    ...      2.300441  127.622812  1.006685
pr_vec__2    0.108738  1.390851  0.092525    ...      2.834018  164.573090  1.004918
pr_vec__3    0.062204  1.325542  0.092184    ...      2.351675  161.703008  0.998508
pr_vec__4    0.025080  1.316358  0.088442    ...      2.441259  132.495654  0.997234
pr_vec__5    0.007047  1.347632  0.095244    ...      2.438103  181.239661  0.999442
pr_vec__6   -0.014676  1.337043  0.088856    ...      2.841059  207.026064  1.003519
tau          0.312444  0.363522  0.023918    ...      1.020279  196.710601  0.996771
yr__0        1.300798  0.403933  0.022404    ...      1.994476  292.390998  0.997132
yr__1        1.092699  0.501547  0.032226    ...      1.991827  210.085974  1.000527
yr__2        0.951433  0.359673  0.025742    ...      1.706200  234.349421  0.997904
yr__3        0.765361  0.400890  0.026796    ...      1.471246  197.611358  0.996709
yr__4        0.734420  0.437752  0.026622    ...      1.602707  319.524726  0.997765
yr__5        0.908453  0.401811  0.021367    ...      1.747097  340.693397  1.003103
yr__6        1.131041  0.442815  0.022617    ...      1.898621  317.356550  1.005265
yr__7        1.258169  0.391110  0.021565    ...      2.019219  283.663213  0.996725
yr_tilde__0  0.105831  0.932364  0.055118    ...      1.517819  270.199084  0.998734
yr_tilde__1  0.133401  0.956375  0.060569    ...      2.037100  236.711710  1.005033
yr_tilde__2  0.055703  0.892446  0.059818    ...      1.668348  236.464323  0.999644
yr_tilde__3  0.127867  0.911451  0.060479    ...      1.993409  231.516338  0.996701
yr_tilde__4 -0.075444  0.988053  0.050927    ...      1.811082  365.292256  0.996871
yr_tilde__5  0.063939  0.910760  0.053518    ...      1.678738  339.190489  0.996708
yr_tilde__6  0.014826  0.900815  0.051356    ...      1.664139  337.205537  0.999871
yr_tilde__7  0.069488  0.928890  0.056546    ...      1.871764  211.686041  1.001550

I’m hoping someone can help me appropriately reparametrise my model as it seems i am doing it wrong. Any assistance is greatly appreciated. I’ve attempted to run some diagnostics. I’m hoping these might shed some light.

Step Size plot:
step_size
Tune plot:
tune
Tree Size plot:
tree_size
Mean Tree Accept plot
mean_tree_accept
Step Size Bar plot
step_size_bar
Trace plot
Trace_1
Posterior plot for parameter


Posterior plot for standard deviation of recovered data-yr

It’s a bit difficult to diagnose as we dont know what is the function of reducedOp, but looking at the posterior plot there are a lot of weight concentrated around 0 for sigma. I would start with using a more informative prior, and change HalfCauchy to HalfNormal:

tau = pm.HalfNormal('tau', sd=2.5)
...
sigma = pm.HalfNormal('sigma', sd=2.5, shape=Ns*(Ne-1))
1 Like

The function of reducedOp(pr-vec) is non-trivial unfortunately. It approximates the solution to a PDE using parameter space reduction techniques. The pr_vec variable is a vector of coefficients for a series of parameter basis functions which are summed over to give the parameter field used for the input into the PDE model. The output vector yr is comprised of values extracted from the PDE solution at discrete points. The goal of the MCMC model is to recover the pr_vec that reproduces the measurement data yr_obs.

I managed to get a decent amount of samples to run with your suggestions and also by reducing the target_accept back to the default 0.8. I ran 1000 tunes and 2000 draws per core using 2 cores. Below is the model formulation:

with pm.Model() as ERTModel:
    prmu = pm.Normal('prmu', mu=0., sd=1., shape=np)
    pr_vec = pm.Normal('pr_vec', mu=prmu, sd=2.5, shape=np)    # Parameter coefficients not centered around 
    tau = pm.HalfNormal('tau', sd=2.5)
    yr_tilde = pm.Normal('yr_tilde', mu=0, sd=2.5, shape=Ns*(Ne-1))
    yr = pm.Deterministic('yr', reducedOp(pr_vec) + tau * yr_tilde)
    sigma = pm.HalfNormal('sigma', sd=2.5, shape=Ns*(Ne-1))
    y_rec = pm.Normal('y_rec', mu=yr, sd=sigma, observed=y_obs)
    trace = pm.sample(2000, tune=1000, cores=2, nuts_kwargs=dict(target_accept=.8, max_treedepth=10))#, progressbar=False)

This model ran a lot faster (2.06s/draw) but still has a very few number of effective samples and the Gelman-Rubin statistic is higher than 1.4 for some samples. Below is the output summary from running the model:

I found the energy_error (1 below) and mean_tree_accept (2 below) plots to be interesting but couldn’t really make sense of them. It seems that the amount of proposed transitions that are accepted dramatically decreases when the sample has finished tuning process (?) and i’m not sure why.

energy_errormean_tree_accept

The step_size plot revealed a jump from 0.00055 to 0.00075 at draw 2000.

I also found it interesting that if i ran the same model for much fewer draws and tunes, the number of effective samples was much better. Refer output below:

Do you think my problem is still poor parametrisation? Or is there potentially some other issue i’m seeing?

The jump you are seeing is not a sudden change of eg energy or step size, but the difference between the two different chain. Basically, different chains are exploring a different parameter shape, and the mixing is quite poor (hence the few numbers of effective samples and the Gelman-Rubin statistic is higher than 1.4). If you plot the trace plot the chains are likely not overlapping.

PDEs are difficult to fit, cc @aseyboldt and @michaelosthege who have more experience.

1 Like

Ah right, that’s interesting. Is it possible that increasing the amount of chains will help?

Not really, it will further demonstrate the problem of your model tho.

Got to the bottom of it. Turns out my priors weren’t informative enough (as you suggested) and the three level hierarchy wasn’t necessary. The below model finds a pretty good solution to the inverse problem without sacrificing too much robustness (due to the tight prior information).

with pm.Model() as ERTModel:
    pr_vec = pm.Normal('pr_vec', mu=0., sd=1., shape=np)
    yr = pm.Deterministic('yr', reducedOp(pr_vec))
    sigma = pm.HalfNormal('sigma', sd=0.5)
    y_rec = pm.Normal('y_rec', mu=yr, sd=sigma, observed=y_obs)
    trace = pm.sample(2000, tune=2000, cores=2, nuts_kwargs=dict(target_accept=.85, max_treedepth=10))
1 Like