Timeout on pymc3.sampling.sample

Dear pymc3 community,

I am facing a use case where we sample from the same pymc3 model for a number of datasets. We share one model between the sets, but swap out the observed data via pm.set_data(). This is done on top of Python’s multiprocessing where there is one worker with four chains per dataset. This worker runs 2 pymc sampling chains in parallel via a NoDaemonProcess as detailed here.

The sampling is working fine for most of the datasets, however I am experiencing slow sampling speeds for 2a5% of my datasets. The majority of the datasets finishes sampling under 5-8 minutes, but occasionally there is one dataset that takes up to twenty minutes. In these cases sample() outputs the following when returning, rendering the result unusable anyways (hence it wasn’t necessary worth waiting for such a long period of time):

The acceptance probability does not match the target. It is 0.9963603118187342, but should be close to 0.95. Try to increase the number of tuning steps.
There were 274 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.3272752463798762, but should be close to 0.95. Try to increase the number of tuning steps.
The gelman-rubin statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.

In the interest of speed, I would like sample() to draw a maximum of 500 samples during a time period of X minutes (e.g. X=8). If sample() didn’t finish within the time period, then it should return the samples it gathered so far (plus perhaps a timeout exception).

My question to the community is as follows: does a timeout mechanism like this already exist in pymc3 or does anyone know of similar functionally already available for pymc3 elsewhere? If not, advice on patching pymc3 would be appreciated. Unfortunately I can not share my model, nor my datasets. But I could make a mock example if this would help illustrate my question.

pymc doesn’t have a timeout mechanism I’m afraid…
I don’t think adding something like that would be all that difficult, and I’d like a feature like that.
It would require a new argument timeout for pm.sample. Then, we need to store the wall time and check after every sample if the timeout is reached.
I’m just not sure what we should do if the timeout is reached. Should be throw an exception or should we return the current samples like we do for a KeyboardInterrupt? If we don’t throw an exception users might miss that the timeout occurred.

If you are on unix I guess you can work around this limitation using signals:

import signal
import pymc3 as pm

def handler(signum, frame):
    raise RuntimeError("Timeout")

    
signal.signal(signal.SIGALRM, handler)

timeout = 5
signal.alarm(timeout)

with pm.Model() as model:
    pm.Normal('y', shape=10000)
    pm.sample(1000)
    
# Cancel the timeout
signal.alarm(0)