Using multiple chains in SMC in PyMC3

Hi,
I am currently using pymc3 version 3.9.2 and decided I wanted to try out the SMC sampler. I have been studying this notebook: https://docs.pymc.io/en/v3/pymcexamples/examples/samplers/SMC2_gaussians.html
which apparently uses 3.9.1. In the notebook it is printed “Multiprocess sampling (2 chains in 2 jobs)”, which my SMC sampler never printed. I always just get one chain (I know SMC uses multiple more internal chains, but for the calculation of Rhat for example I’d need several “real” chains). I looked into the source code of pymc on github (pymc/sample_smc.py at main · pymc-devs/pymc · GitHub), where I found that I could change n_chains, but apparently that does not work on my pymc version yet, so I tried updating to 3.11.4, the currently newest version. Nevertheless, the SMC sampler is not updated, so I still cannot change the kernel and the number of chains for example. Am I doing something wrong? I don’t want to update to pymc4 and since I am also using quite some theano calculations, I’d also prefer to not update to pymc-theano.
I’d appreciate any help!

What platform are you on? And what does the output look like for you?

PyMC 3.11.4 uses theano-pymc :thinking: maybe it would be a good idea to try creting a new environment and installing from scratch to make sure you don’t have both theano and theano-pymc or incompatible linkings between the two. There are installation instructions in our github wiki: Installation Guide (Linux) · pymc-devs/pymc Wiki · GitHub (also for mac and windows)

Thanks for the replies! I am even more confused now though.

I am working on a Linux cluster, in a conda environment with python 3.7. For me the output looks like this:
“Sample initial stage: …
Stage: 0 Beta: 0.000 Steps: 5 Acce: 1.000”

It prints these lines until it finishes and it gives me a trace of shape (n_draws,). So just the line with the number of chains is missing, that I saw in the example.

I tried setting up a fresh environment, but I still have problems (see below). My further questions are:

  1. Since the example I linked in my original post uses pymc 3.9 as well, but they can still sample multiple chains with SMC, I thought that should be possible with pymc 3.9.2. Is there any keyword apart from n_chains that I a missing?

  2. As described above, in the source code of the newest pymc 3 version on github I can see the parameter n_chains and e.g. the IMH kernel in the sample_smc.py file. The source code of pymc 3.11.4 and older versions does not contain these updates however. So, are these not included in any packaged pymc version yet?

  3. I tried installing pymc directly from github to get the newest features, following the instructions given at Installation Guide (Linux) · pymc-devs/pymc Wiki · GitHub in a fresh environment. But, then the installed version is 4.0, which I do not want. It then says: “You are running the v4 development version of PyMC which currently still lacks key features. You probably want to use the stable v3 instead which you can either install via conda or find on the v3 GitHub branch: GitHub - pymc-devs/pymc at v3”, which is precisely where I got the installation from…

So, I am not sure what to do here, I just want to have an SMC sampler which can sample multiple parallel chains and uses the IMH kernel…

The source code you are seeing on github corresponds to V4 which is now the master/main branch in github. When you follow the Wiki installation instructions you will get that version. You probably want V3 which you can get by conda/pip installing pymc3 the usual way.

If you want to install from github (not recommended) you would need to clone the repo and checkout the v3 branch and then install with pip install -e <path_to_cloned_repo>.

The API of sample_smc in V3 was not as harmonized with sample as in the future V4. Anyway you have the keyword chains there and optionally the keyowrd parallel if you want those multiple chains to be sampled in parallel.

1 Like

Hi,

thanks for the answers! I am working with pymc 3.11.4 now and I’ll probably wait until an official version of pymc 4 is released. The problem was that I accidentally loaded some packages, among others theano, from outside my conda environment. I now finally have the chains argument and it works fine, so thank you for that.

I have some questions concerning multiprocessing now though. I saw that sample_smc overwrites the number of cores if I only give it one chain. That was not the case in the older pymc version I used before, where one could calculate across multiple cores even with only one SMC chain. Why was it changed? And should I then specify n_cores similar to the number of chains or is it useful to have even more cores?
Also, one more question, although I am not sure you can help me with that. I am working on a linux cluster, and everything works fine when I run an SMC sampler with e.g. 3 chains and 16 cores. When I however submit the same job via condor, I get weird error messages and the trace is not saved. I am not sure if it is a condor or a pymc error, or if something in the cluster setup is not working correctly, so I will also ask our administrator about it, but maybe someone of you has encountered similar problems and is able to help. I’ll paste the error message below.

Initializing SMC sampler…
Sampling 3 chains in 16 jobs
multiprocessing.pool.RemoteTraceback:
“”"
Traceback (most recent call last):
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/multiprocessing/pool.py”, line 125, in worker
result = (True, func(*args, **kwds))
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/multiprocessing/pool.py”, line 51, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/site-packages/pymc3/smc/sample_smc.py”, line 267, in sample_smc_int
smc.setup_kernel()
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/site-packages/pymc3/smc/smc.py”, line 135, in setup_kernel
self.likelihood_logp_func = logp_forw([self.model.datalogpt], self.variables, shared)
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/site-packages/pymc3/model.py”, line 1046, in datalogpt
factors += [tt.sum(factor) for factor in self.potentials]
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/site-packages/pymc3/model.py”, line 1046, in
factors += [tt.sum(factor) for factor in self.potentials]
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/site-packages/theano/tensor/basic.py”, line 3221, in sum
out = elemwise.Sum(axis=axis, dtype=dtype, acc_dtype=acc_dtype)(input)
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/site-packages/theano/graph/op.py”, line 253, in call
compute_test_value(node)
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/site-packages/theano/graph/op.py”, line 126, in compute_test_value
thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/site-packages/theano/graph/op.py”, line 634, in make_thunk
return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/site-packages/theano/graph/op.py”, line 600, in make_c_thunk
outputs = cl.make_thunk(
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/site-packages/theano/link/c/basic.py”, line 1203, in make_thunk
cthunk, module, in_storage, out_storage, error_storage = self.compile(
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/site-packages/theano/link/c/basic.py”, line 1138, in compile
thunk, module = self.cthunk_factory(
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/site-packages/theano/link/c/basic.py”, line 1634, in cthunk_factory
module = get_module_cache().module_from_key(key=key, lnk=self)
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/site-packages/theano/link/c/cmodule.py”, line 1157, in module_from_key
module = self._get_from_hash(module_hash, key)
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/site-packages/theano/link/c/cmodule.py”, line 1060, in _get_from_hash
key_data.add_key(key, save_pkl=bool(key[0]))
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/site-packages/theano/link/c/cmodule.py”, line 497, in add_key
assert key not in self.keys
AssertionError
“”"

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “…/fit/fit_test_pymc.py”, line 432, in
trace = pm.sample_smc(start=start, draws=kw.n_draws, n_steps=kw.n_tune_smc, chains=kw.n_chains,
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/site-packages/pymc3/smc/sample_smc.py”, line 196, in sample_smc
results = pool.starmap(
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/multiprocessing/pool.py”, line 372, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File “/net/scratch/auger/conda/envs/teresa_py38_pymc311/lib/python3.8/multiprocessing/pool.py”, line 771, in get
raise self._value
AssertionError

Seems like you are passing some wrong arguments. Can you show the code you are actually trying to run?

The code is very long and the model quite complicated. I am currently working on a minimal example which hopefully reproduces the error. In short, the model looks something like this:

spectrum_parameters = pm.Uniform(‘spectrum_parameters’, 0, 1, shape=n_fit_params, transform=None, testval=np.random.uniform(0.4, 0.6, size=n_fit_params))
x = pm.Uniform(“x”, 0, 1, shape=4, testval=np.array([0.2, 0.4, 0.6, 0.8]), transform=None)
pm.Potential(“order_means”, tt.switch(x[1] - x[0] < 0, -np.inf, 0) + tt.switch(x[2] - x[1] < 0, -np.inf, 0) + tt.switch(x[3] - x[2] < 0, -np.inf, 0))

pm.Potential(“like”, logp(reweight_matrix, S, B, f, gumbel_matrix, syst_xmax_scale, syst_energy_scale))

trace = pm.sample_smc(start=start, draws=kw.n_draws, n_steps=25, chains=kw.n_chains,
parallel=True, cores=kw.n_cores, threshold=0.5)

The second potential is my likelihood function, for which I first calculate the parameters reweight_matrix, S, B, f, gumbel_matrix, syst_xmax_scale, syst_energy_scale) from x and spectrum_parameters.
Here start is a dictionary of start values, which I need because I have a potential enforcing an ordering in some parameters and otherwise the sampler wouldn’t run due to -inf in logp. draws can be anything, the error happens independent of that. I tried with various combinations of chains and cores parameters.

I am just confused why it would run normally except when I submit it via condor. I’d also be fine with letting each run independently, so submitting with chains=1 multiple times and then combining them afterwards. This already works, but as I said before then pymc sets cores to 1 and everything becomes slow.

I’ll post this as a new issue, as I think it has nothing much to do with the title anymore. Thanks!

1 Like