The code is very long and the model quite complicated. I am currently working on a minimal example which hopefully reproduces the error. In short, the model looks something like this:
spectrum_parameters = pm.Uniform(‘spectrum_parameters’, 0, 1, shape=n_fit_params, transform=None, testval=np.random.uniform(0.4, 0.6, size=n_fit_params))
x = pm.Uniform(“x”, 0, 1, shape=4, testval=np.array([0.2, 0.4, 0.6, 0.8]), transform=None)
pm.Potential(“order_means”, tt.switch(x[1] - x[0] < 0, -np.inf, 0) + tt.switch(x[2] - x[1] < 0, -np.inf, 0) + tt.switch(x[3] - x[2] < 0, -np.inf, 0))
…
pm.Potential(“like”, logp(reweight_matrix, S, B, f, gumbel_matrix, syst_xmax_scale, syst_energy_scale))
trace = pm.sample_smc(start=start, draws=kw.n_draws, n_steps=25, chains=kw.n_chains,
parallel=True, cores=kw.n_cores, threshold=0.5)
The second potential is my likelihood function, for which I first calculate the parameters reweight_matrix, S, B, f, gumbel_matrix, syst_xmax_scale, syst_energy_scale) from x and spectrum_parameters.
Here start is a dictionary of start values, which I need because I have a potential enforcing an ordering in some parameters and otherwise the sampler wouldn’t run due to -inf in logp. draws can be anything, the error happens independent of that. I tried with various combinations of chains and cores parameters.
I am just confused why it would run normally except when I submit it via condor. I’d also be fine with letting each run independently, so submitting with chains=1 multiple times and then combining them afterwards. This already works, but as I said before then pymc sets cores to 1 and everything becomes slow.