Timeout on pymc3.sampling.sample

fvdnabee · November 12, 2019, 1:47pm

Dear pymc3 community,

I am facing a use case where we sample from the same pymc3 model for a number of datasets. We share one model between the sets, but swap out the observed data via pm.set_data(). This is done on top of Python’s multiprocessing where there is one worker with four chains per dataset. This worker runs 2 pymc sampling chains in parallel via a NoDaemonProcess as detailed here.

The sampling is working fine for most of the datasets, however I am experiencing slow sampling speeds for 2a5% of my datasets. The majority of the datasets finishes sampling under 5-8 minutes, but occasionally there is one dataset that takes up to twenty minutes. In these cases sample() outputs the following when returning, rendering the result unusable anyways (hence it wasn’t necessary worth waiting for such a long period of time):

The acceptance probability does not match the target. It is 0.9963603118187342, but should be close to 0.95. Try to increase the number of tuning steps.
There were 274 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.3272752463798762, but should be close to 0.95. Try to increase the number of tuning steps.
The gelman-rubin statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.

In the interest of speed, I would like sample() to draw a maximum of 500 samples during a time period of X minutes (e.g. X=8). If sample() didn’t finish within the time period, then it should return the samples it gathered so far (plus perhaps a timeout exception).

My question to the community is as follows: does a timeout mechanism like this already exist in pymc3 or does anyone know of similar functionally already available for pymc3 elsewhere? If not, advice on patching pymc3 would be appreciated. Unfortunately I can not share my model, nor my datasets. But I could make a mock example if this would help illustrate my question.

aseyboldt · November 13, 2019, 8:48am

pymc doesn’t have a timeout mechanism I’m afraid…
I don’t think adding something like that would be all that difficult, and I’d like a feature like that.
It would require a new argument timeout for pm.sample. Then, we need to store the wall time and check after every sample if the timeout is reached.
I’m just not sure what we should do if the timeout is reached. Should be throw an exception or should we return the current samples like we do for a KeyboardInterrupt? If we don’t throw an exception users might miss that the timeout occurred.

If you are on unix I guess you can work around this limitation using signals:

import signal
import pymc3 as pm

def handler(signum, frame):
    raise RuntimeError("Timeout")

    
signal.signal(signal.SIGALRM, handler)

timeout = 5
signal.alarm(timeout)

with pm.Model() as model:
    pm.Normal('y', shape=10000)
    pm.sample(1000)
    
# Cancel the timeout
signal.alarm(0)

Topic		Replies	Views
PyMC gradually slows down v5 bug	14	154	December 23, 2024
Time capped sampling PyMC4	1	803	April 3, 2020
How to sample for a prespecified amount of time? version agnostic modeling	0	288	September 26, 2023
Sampling with time limit instead of number of samples Questions	1	596	October 5, 2019
Sampling running very slowly for all models? Questions	1	840	April 27, 2020

Timeout on pymc3.sampling.sample

Related topics