DE-Metropolis with blackbox likelihood does not terminate

berakrishn · July 1, 2020, 8:22am

I am trying to run DE-MCMC with a Blackbox likelihood function. I have 3 parameters that go into the blackbox likelihood function. The program starts running and gives me the following output. It doesn’t really terminate even when I leave it running for a long time. Any idea why this is taking too long or maybe not terminating at all?

Population sampling (4 chains) DEMetropolis: [a1, a2, a3] 
Attempting to parallelize chains to all cores. You can turn this off with `pm.sample(cores=1)`. 
...
Population parallelization failed. Falling back to sequential stepping of chains.
...

Sampling 4 chains for 0 tune and 1_050 draw iterations (0 + 4_200 draws total) took 1054 seconds.

AlexAndorra · July 1, 2020, 8:44am

Hi,
Thanks for your question!
Are you Windows? Have you tried with pm.sample(cores=1)?

berakrishn · July 1, 2020, 12:07pm

@AlexAndorra I use Ubuntu. When I use trace = pymc3.sample(ndraws, step=step, cores=1), I get the following error.

ValueError: DEMetropolis requires at least 4 chains. For this 3-dimensional model you should use ≥4 chains

AlexAndorra · July 2, 2020, 12:08pm

Ah ok. Looks like you’re sampling less than 4 chains and the sampler complains about that. Did you try setting the number of chains to at least 4 with the kwarg chains=4 to pm.sample?
Also pinging @michaelosthege, as he’s our in-house expert on DEMetropolis

OriolAbril · July 2, 2020, 1:07pm

If evaluating the likelihood is expensive, it could be due to the conversion to inference data that happens after sampling to calculate ess and rhat. There is one proposal to adress the hang up part here https://github.com/arviz-devs/arviz/issues/1224.

If so, using idata_kwargs=dict(log_likelihood=False) shouldsolve the problem.

berakrishn · July 7, 2020, 4:13am

Thanks for the help @AlexAndorra. I got it running after setting the appropriate number of chains.
But I have question about rather unrelated topic: Why does it say that 0 tune + 4000 draw iterations were executed even when I specify that 3000 is the number of draws and number of burn-in samples is 1000. I understand that number of draw iterations 4000 = 3000(#draws) + 1000(#burn-in). But why does it say 0 tune?

Sampling 16 chains for 0 tune and 4_000 draw iterations (0 + 64_000 draws total) took 785 seconds.

berakrishn · July 7, 2020, 4:29am

Thanks @OriolAbril. I got the DE-MCMC running with Blackbox likelihood. The program now completes execution in some finite time. Let me know if I understand your point correctly, when I pass the argument idata_kwargs=dict(log_likelihood=False) in pymc3.sample(...) function, it should stop re-evaluating the model? But when I pass this to the function, the execution time is almost unaffected.

michaelosthege · July 7, 2020, 8:36am

@berakrishn I am aware of two things that can inflate the runtime of pm.sample in non-obvious ways:

ArviZ re-evaluates the likelihood of all samples by default. If evaluation of your likelihood is expensive pass idata_kwargs=dict(log_likelihood=False) to turn it off
if the chains did not converge, the convergence diagnostics are slower to compute. Passing compute_convergence_checks=False disables them.

I recommend the latter until you found the right DE-MCMC(-Z) settings for your model.

berakrishn · July 7, 2020, 9:14am

thanks @michaelosthege. This explains the strange behavior of my code.

OriolAbril · July 7, 2020, 11:28pm

During sampling, PyMC3 does not store pointwise log likelihood values, therefore, after sampling finishes, ArviZ iterates over each chain and each draw to compute the pointwise log likelihood for each sample. This requires evaluating the log likelihood again for each sample, 64_000 times in your case which could take a long time. Using idata_kwargs=dict(log_likelihood=False) tells ArviZ to skip pointwise log likelihood computation.

Note that ArviZ is called after sampling finishes to calculate ess and rhat so skipping the likelihood computation should not affect the sampling time. This is why I suspected the reason behind the non termination could be this, in cases like this, sampling finishes, but both conversion to ArviZ and ess and rhat computation can still take a long time and have no progress bar so it looks like nothing is happening.

Topic		Replies	Views
Issue with chain parallelization when using black-box likelihood function Questions	4	713	September 11, 2020
Parallelizing chains with custom likelihood on multiple cores v5	29	2886	March 24, 2023
Sampling draw time increases massively near "finish line" for 1M observed rows Questions	3	428	July 28, 2021
Estimate total time when using pymc to sample v5 modeling	5	454	March 25, 2024
Chains parameter and Multiprocess sampling Questions	6	4423	August 30, 2018

DE-Metropolis with blackbox likelihood does not terminate

Related topics