Pymc3 evaluates function but doesn't update sample

This is a problem I’ve been observing with sampling in pymc3. I have a deterministic function (nonlinear ODE) that I can evaluate and the response seems to be correct.

When I tell pymc3 to sample the function, it does but it does not seem to realize that it is sampling. Instead it continues to sample the same function without any update. My definition of the model is as follows:

tstt       = tt.tensor.as_tensor_variable(ts)
straintt   = tt.tensor.as_tensor_variable(strains)
one        = tt.tensor.as_tensor_variable(1)
onehundred = tt.tensor.as_tensor_variable(100)

plt.plot(stress_noise[0,:])

with pm.Model() as model:
    lamda  = pm.Uniform("lambda", lower=10,  upper=100)
    mu     = pm.Uniform("mu",     lower=10,  upper=100)
    sigy_K = pm.Uniform("sigy_K", lower=100, upper=400)
    mod_K  = pm.Uniform("mod_K",  lower=0.,    upper=100)
    sigy_H = pm.Uniform("sigy_H", lower=100, upper=400)
    mod_H  = pm.Uniform("mod_H",  lower=0.,    upper=100)
    
    y = evaluate(tstt,straintt,lamda,mu,tt.tensor.stack([one, sigy_K, mod_K]),\
                 tt.tensor.stack([one, sigy_H, mod_H]),onehundred)
    
    sigma = pm.HalfNormal("sigma",sd=1.0)
    r = pm.Normal("r",mu=y,sd=sigma,observed=stress_noise)
    
    trace = pm.sample(1,njobs=1,chains=1,n_init=1,random_seed=123,progressbar=True)

And my deterministic model is initialized as follows:

@tt.compile.ops.as_op(itypes=[tt.tensor.dvector, tt.tensor.dmatrix, tt.tensor.dscalar, tt.tensor.dscalar, tt.tensor.dvector, tt.tensor.dvector, tt.tensor.bscalar],otypes=[tt.tensor.dmatrix])
def evaluate(ts,strains,lambda_in,mu_in,pmod1,pmod2,pts_per_segment)

evaluate returns a 6x300 matrix and stress_noise is also a 6x300 matrix. Is anything obviously wrong here?

A bit more info. When I print out the values of the parameters that are run when samples aren’t updating this is the output:

lambda __str__ = 55.0
mu __str__ = 55.0
sigy_K __str__ = 250.0
mod_K __str__ = 50.0
sigy_H __str__ = 250.0
mod_H __str__ = 50.0
55.0 55.0 [   1.  250.   50.] [   1.  250.   50.] 100

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Initializing NUTS failed. Falling back to elementwise auto-assignment.
Assigned Slice to lambda_interval__
Assigned Slice to mu_interval__
Assigned Slice to sigy_K_interval__
Assigned Slice to mod_K_interval__
Assigned Slice to sigy_H_interval__
Assigned Slice to mod_H_interval__
Assigned NUTS to sigma_log__
  0%|                                                                                            | 0/2 [00:00<?, ?it/s]

55.0 55.0 [   1.  250.   50.] [   1.  250.   50.] 100
55.0 55.0 [   1.  250.   50.] [   1.          250.           42.89492834] 100
55.0 55.0 [   1.  250.   50.] [   1.          250.           21.65069853] 100
55.0 55.0 [   1.  250.   50.] [   1.          250.            9.22774195] 100
55.0 55.0 [   1.  250.   50.] [   1.          250.            3.60497646] 100
55.0 55.0 [   1.  250.   50.] [   1.          250.            1.35712256] 100
55.0 55.0 [   1.  250.   50.] [   1.         250.           0.5035775] 100
55.0 55.0 [   1.  250.   50.] [  1.00000000e+00   2.50000000e+02   1.85847404e-01] 100
55.0 55.0 [   1.  250.   50.] [  1.00000000e+00   2.50000000e+02   6.84498525e-02] 100
55.0 55.0 [   1.  250.   50.] [  1.00000000e+00   2.50000000e+02   2.51921938e-02] 100
55.0 55.0 [   1.  250.   50.] [  1.00000000e+00   2.50000000e+02   9.26916624e-03] 100
55.0 55.0 [   1.  250.   50.] [  1.00000000e+00   2.50000000e+02   3.41013551e-03] 100

only mod_H seems to be changing and it is rapidly going to zero.

Could be the inefficiency of the Slice sampler in high dimension. Could you try to use metropolis to sample one or two RVs and fix the rest of the parameters to a constant?

That does seem to help. Do you have any ideas about what could be taking place? Should I just plan on using Metropolis for these sorts of models?

Metropolis sampler is kind of the only thing “works” out of the box now. You can also try SMC, but that will require some twisting. My suggestion to this kind of model (calling external function and gradient not available) is to try likelihood-free inference, e.g., http://elfi.readthedocs.io/en/latest/

The problem is that in high-dimension, random walk MC is highly inefficient: the sampler often appears to stuck, as all the proposal are rejected. This is the problem you see initially, but often happen also using Metropolis. You can force the sampler to take smaller steps, but that would result in high autocorrelation and low effective samples. It will explore the posterior space so slow that you will have bias result in finite time.

So you can keep using Metropolis for these model, but be very careful and always validate your model using simulation data (where you know the true parameters) and check the resulting fit using posterior prediction check.

That lines up with what I was seeing. I will give elfi a try. Thanks for all of your help. I really like what I’m seeing with pymc3 and I’m excited to see where it goes.

1 Like