Using stick breaking algorithm example but receiving gradient problem

No problem!

So here is my typical workflow of how to debug a model:

1, I run the code until before sampling, the model complied without a problem.

2, Try to sample using the default option:

with model:
    trace = pm.sample()

and confirm the same error. (tips: run with njobs=1 or cores=1 if you are on master, as the error is much cleaner)

3, run with

with model:
    trace = pm.sample(cores=1, init='adapt_diag')

and it can sample, indeed as explained above, the jitter is causing the problem in this case (we are trying to find a proper solution for it.)

4, However, you got lots of warning:

The chain contains only diverging samples. The model is probably misspecified.
The gelman-rubin statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.

That’s quite likely because we have too few data and too many parameters. Also mixture model is difficult to sample, see also some of the related post on this discourse: Gaussian Mixture of regression, Mixture with multiple observations.

5, Try to optimized the model by setting K=2, got a ValueError: Bad initial energy: inf. The model might be misspecified.

6, debug following Get nan or inf from model.logp (model.test point) is an attestation of incorrectly configured model?

for RV in model.basic_RVs:
    print(RV.name, RV.logp(model.test_point))

alpha -1.8378770664093453
beta -1.8378770664093453
gamma -6.443047252397437
delta -6.443047252397437
tau_log__ -2.0
obs -inf

found -inf logp of the node obs

7, double check the input to obs and its logp:

# use .tag.test_value to check the current default
w.tag.test_value
mu.tag.test_value
tau.tag.test_value

the problem is that w is not summed to 1, a normalization within the model should do the trick.

    w_ = stick_breaking(v)
    w = pm.Deterministic('w', w_/w_.sum(axis=1, keepdims=True))