DE-Metropolis with blackbox likelihood does not terminate

I am trying to run DE-MCMC with a Blackbox likelihood function. I have 3 parameters that go into the blackbox likelihood function. The program starts running and gives me the following output. It doesn’t really terminate even when I leave it running for a long time. Any idea why this is taking too long or maybe not terminating at all?

Population sampling (4 chains) DEMetropolis: [a1, a2, a3] 
Attempting to parallelize chains to all cores. You can turn this off with `pm.sample(cores=1)`. 
...
Population parallelization failed. Falling back to sequential stepping of chains.
...

Sampling 4 chains for 0 tune and 1_050 draw iterations (0 + 4_200 draws total) took 1054 seconds.

Hi,
Thanks for your question!
Are you Windows? Have you tried with pm.sample(cores=1)?

@AlexAndorra I use Ubuntu. When I use trace = pymc3.sample(ndraws, step=step, cores=1), I get the following error.

ValueError: DEMetropolis requires at least 4 chains. For this 3-dimensional model you should use ≥4 chains

Ah ok. Looks like you’re sampling less than 4 chains and the sampler complains about that. Did you try setting the number of chains to at least 4 with the kwarg chains=4 to pm.sample?
Also pinging @michaelosthege, as he’s our in-house expert on DEMetropolis :wink:

If evaluating the likelihood is expensive, it could be due to the conversion to inference data that happens after sampling to calculate ess and rhat. There is one proposal to adress the hang up part here https://github.com/arviz-devs/arviz/issues/1224.

If so, using idata_kwargs=dict(log_likelihood=False) shouldsolve the problem.

Thanks for the help @AlexAndorra. I got it running after setting the appropriate number of chains.
But I have question about rather unrelated topic: Why does it say that 0 tune + 4000 draw iterations were executed even when I specify that 3000 is the number of draws and number of burn-in samples is 1000. I understand that number of draw iterations 4000 = 3000(#draws) + 1000(#burn-in). But why does it say 0 tune?

Sampling 16 chains for 0 tune and 4_000 draw iterations (0 + 64_000 draws total) took 785 seconds.

Thanks @OriolAbril. I got the DE-MCMC running with Blackbox likelihood. The program now completes execution in some finite time. Let me know if I understand your point correctly, when I pass the argument idata_kwargs=dict(log_likelihood=False) in pymc3.sample(...) function, it should stop re-evaluating the model? But when I pass this to the function, the execution time is almost unaffected.

@berakrishn I am aware of two things that can inflate the runtime of pm.sample in non-obvious ways:

  • ArviZ re-evaluates the likelihood of all samples by default. If evaluation of your likelihood is expensive pass idata_kwargs=dict(log_likelihood=False) to turn it off
  • if the chains did not converge, the convergence diagnostics are slower to compute. Passing compute_convergence_checks=False disables them.

I recommend the latter until you found the right DE-MCMC(-Z) settings for your model.

3 Likes

thanks @michaelosthege. This explains the strange behavior of my code.

During sampling, PyMC3 does not store pointwise log likelihood values, therefore, after sampling finishes, ArviZ iterates over each chain and each draw to compute the pointwise log likelihood for each sample. This requires evaluating the log likelihood again for each sample, 64_000 times in your case which could take a long time. Using idata_kwargs=dict(log_likelihood=False) tells ArviZ to skip pointwise log likelihood computation.

Note that ArviZ is called after sampling finishes to calculate ess and rhat so skipping the likelihood computation should not affect the sampling time. This is why I suspected the reason behind the non termination could be this, in cases like this, sampling finishes, but both conversion to ArviZ and ess and rhat computation can still take a long time and have no progress bar so it looks like nothing is happening.

2 Likes