pm.SMC(): How to deal with "numpy.linalg.LinAlgError: XX-th leading minor of the array is not positive definite"

Dear all,

I have been trying to implement pm.SMC() method in my work by following https://docs.pymc.io/notebooks/ODE_parameter_estimation.html, which is a very nice tutorial. I want to fit the panel data to the same ODE, and indeed I am able to fit it when I have only one common parameter for all instances. However, when I try to fit the parameter which is different to each particular instance, I encounter the following error quite often:

Traceback (most recent call last):
  File "main.py", line 120, in <module>
    trace = pm.sample(1000, progressbar=False, chains=n_chains, step=pm.SMC(), start=startsmc, cores=15)
  File "/home/aakhmetz/anaconda3/lib/python3.6/site-packages/pymc3/sampling.py", line 341, in sample
    random_seed=random_seed)
  File "/home/aakhmetz/anaconda3/lib/python3.6/site-packages/pymc3/step_methods/smc.py", line 202, in sample_smc
    proposal = MultivariateNormalProposal(covariance)
  File "/home/aakhmetz/anaconda3/lib/python3.6/site-packages/pymc3/step_methods/metropolis.py", line 56, in __init__
    self.chol = scipy.linalg.cholesky(s, lower=True)
  File "/home/aakhmetz/anaconda3/lib/python3.6/site-packages/scipy/linalg/decomp_cholesky.py", line 91, in cholesky
    check_finite=check_finite)
  File "/home/aakhmetz/anaconda3/lib/python3.6/site-packages/scipy/linalg/decomp_cholesky.py", line 40, in _cholesky
    "definite" % info)
numpy.linalg.LinAlgError: 26-th leading minor of the array is not positive definite

Maybe some of you have seen this also, and you could give me an advice. Could it be connected with my bad initial guess on the parameter values? or could it be some kind of bug in pm.SMC? (e.g. I have found the following discussion https://github.com/scikit-learn/scikit-learn/issues/2640)

Thank you very much in advance for any help

ps: As an example, I have an ODE that describe the growth of bacterial population. I may seek to find the growth rate. What I meant above, if I fit it the same growth rate for all samples, I am able to do. But now, if I try to fit to each particular sample, the inference fails.

I don’t really know the SMC code too well, but from a quick glance it looks like this might be because we don’t use a regularized covariance estimate for the multivariate metropolis proposal distribution.
@aloctavodia Shouldn’t https://github.com/pymc-devs/pymc3/blob/master/pymc3/step_methods/smc_utils.py#L42 use a regularized covariance estimate?

If this really is the problem, then I think you probably have more than 25 parameters. Usually gradient free samplers don’t work for parameter dimensions like that.
But again, not my area of expertise…

@aakhmetz Can you maybe post a reproducible example that raises this exception?

@aseyboldt probably you are right. Adding a small value to the diagonal could help here or something clever is needed?

Regarding the high-dimensional issue, SMC methods should work for larger dimensions compared to Metropolis-Hastings, but not sure how well they scale, in fact this is something I would like to empirically test. I will post results somewhere when ready.

@aakhmetz Could you try again with the version on master? Also if you could post a reproducible example it will be grea. Also FYI we have a GSoC student working on making ODEs easier to use with PyMC3. Here you will find some interesting blog-posts about his work

1 Like