Stopping and restarting sample_smc

Hi, I want to stop, save and restart a sample_smc process with ABC kernel. For example I would like to save the generated data every 2000 draws, so if there is a power outage, I could restart the model from the last saved instance instead of starting it from the beginning. The reason I need that is that the mathematical model is computationally heavy and it takes 4 minutes between start of one and start of the next draw. The total process may take 2 months.
I attach my code below, after simplifying it a lot:

def simulator(xs1, xs2):
    xs = [xs1, xs2] #searched model parameters
    generate_input(xs) #generate input for the external software
    os.system("ansys input.inp") #run the input using external software
    result = read_result("output.out") # read the result
    return result

bay_scale_model = pm.Model()

with bay_scale_model:
    # Priors for simulator model parameters
    BoundedNormal = pm.Bound(pm.Normal, lower = 0.0, upper = 2.4)
    xs1 = BoundedNormal("xs1", mu=1, sd=0.227689)
    xs2 = BoundedNormal("xs2", mu=1, sd=0.257015)
    sim = pm.Simulator("sim", simulator, params=(xs1, xs2), sum_stat="sort", epsilon=0.003, observed=observed_data)
    trace, sim_data = pm.sample_smc(2000, kernel="ABC", n_steps=9, parallel=False, save_sim_data=True, cores = 1, chains = 1, threshold=0.01)
    idata = az.from_pymc3(trace, posterior_predictive=sim_data)

So is there any way to stop and save such a process and restart it later?

I am afraid there is no (implemented) way of doing this. You would likely have to fiddle with the source code

1 Like

I have such functionality in my package beat here: GitHub - hvasbath/beat: Bayesian Earthquake Analysis Tool
As I have the same problem with long forward model runtimes. It relies on an early version of pymc3 3.4.1

beat.sampler.smc.smc_sample would be the function of your choice …
that takes a step object (beat.sampler.SMC) which also has a backend attribute either ‘csv’ or ‘bin’
You should be able to use and import the sampler as you are used to from common pymc3 models.

in beat.backend there is also a Trace class that supports then restarting diagnostics and storing samples to the trace every n samples. You can easily read the source code to see how to use it… Let me know if you would like to dive into it.

See also here: Example 3: Rectangular source — beat 1.1 documentation
The sampling backends:
Sampling Backends — beat 1.1 documentation

You could try to port/use some of that functionality.

Cheers!

2 Likes

I will certainly check that out after Christmas and let you know how it went. Thank you a lot!

I have finally decided to solve my problem in a different way. I chopped up pymc source code to pieces in a way that segmented the calculations. At first, prior samples are generated and written to txt files as inputs for external software, which has the role of the simulator → all of these inputs are calculated parallely by the external software → then a different python input collects the simulator results, does the resampling, mutation, and writes proposal in txt file as inputs for external software → inputs are calculated parallely by the external software → the cycle is repeated until the end of the computations. This gives both the possibility to restart caculations from any point and to parallelize calculations on a cluster.
Cheers!

@michaelosthege who is working on some ideas for sampler backends