SMC Sampling Issues in PyMC v5 - Extremely Slow Progress and NotImplementedError

Hi PyMC Community,

I’m trying to use SMC sampling in PyMC v5 to compute marginal likelihoods for Bayesian model comparison, but I’m running into persistent issues. I have four different models, and in all cases, SMC sampling either runs extremely slowly (with beta staying at 0 for over an hour) or throws errors. These same models work perfectly fine with NUTS sampling.

My Goal

I want to use SMC to compute marginal likelihoods and Bayes factors for model comparison between four competing models.

The Problem

When I try to use SMC sampling, I encounter one of two issues:

  1. The sampling runs extremely slowly - beta remains at 0 for over an hour with no progress
  2. It throws a NotImplementedError

Code Example

Here’s one of my models that works perfectly with NUTS:

# Assume Theta_normalized is my observed data with shape (n_samples, num_features_theta)
with pm.Model() as skewed_model:
    mu = pm.Uniform('mu', lower=0, upper=1, shape=num_features_theta)
    omega = pm.HalfNormal('omega', sigma=1, shape=num_features_theta)
    alpha = pm.Normal('alpha', mu=0, sigma=2, shape=num_features_theta)
    
    for i in range(num_features_theta):
        pm.SkewNormal(
            f'obs_feature_{i}',
            mu=mu[i],        
            sigma=omega[i],  
            alpha=alpha[i],  
            observed=Theta_normalized[:, i]
        )

NUTS sampling works fine:

with skewed_model:
    trace = pm.sample(8000, tune=4000, nuts={'max_treedepth': 18}, 
                     target_accept=0.9, chains=8, cores=32)
    ppc = pm.sample_posterior_predictive(trace)

SMC sampling fails:

with skewed_model:
    trace_smc = pm.sample_smc(2000, chains=4)

Error Message:

NotImplementedError                       Traceback (most recent call last)
Cell In[29], line 2
      1 with skewed_model:
----> 2     trace_smc = pm.sample_smc(2000, chains=4)

Questions

  1. Are there known issues with SMC sampling in PyMC v5, particularly with models containing multiple observed variables or SkewNormal/Mvnormal distributions?

  2. What could cause beta to remain at 0 for extended periods?

  3. Are there specific model structures or distributions that are incompatible with SMC sampling?

  4. What alternatives would you recommend for computing marginal likelihoods in PyMC v5?

Any guidance would be greatly appreciated! I’m happy to provide more details about my models or environment if needed.

Thank you!

It might be a deadlock from multiprocessing. What happens if you set cores=1?

I changed the code in the SMC section to

with skewed_model:
    trace_smc = pm.sample_smc(2000,cores=1)

The full error message is:

---------------------------------------------------------------------------
_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
“”"
Traceback (most recent call last):
File “/anaconda3/envs/torch/lib/python3.11/concurrent/futures/process.py”, line 261, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/anaconda3/envs/torch/lib/python3.11/site-packages/pymc/smc/sampling.py”, line 317, in _sample_smc_int
smc._initialize_kernel()
File “/anaconda3/envs/torch/lib/python3.11/site-packages/pymc/smc/kernels.py”, line 249, in _initialize_kernel
initial_point, [self.model.datalogp], self.variables, shared, self.compile_kwargs
^^^^^^^^^^^^^^^^^^^
File “/anaconda3/envs/torch/lib/python3.11/site-packages/pymc/model/core.py”, line 825, in datalogp
return self.observedlogp + self.potentiallogp
^^^^^^^^^^^^^^^^^
File “/anaconda3/envs/torch/lib/python3.11/site-packages/pymc/model/core.py”, line 840, in observedlogp
return self.logp(vars=self.observed_RVs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/anaconda3/envs/torch/lib/python3.11/site-packages/pymc/model/core.py”, line 734, in logp
logp_scalar = variadic_add(
^^^^^^^^^^^^^
File “/anaconda3/envs/torch/lib/python3.11/site-packages/pymc/model/core.py”, line 735, in
*(as_tensor(factor, allow_xtensor_conversion=True).sum() for factor in logp_factors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/anaconda3/envs/torch/lib/python3.11/site-packages/pytensor/tensor/_init_.py”, line 50, in as_tensor_variable
return _as_tensor_variable(x, name, ndim, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/anaconda3/envs/torch/lib/python3.11/functools.py”, line 909, in wrapper
return dispatch(args[0]._class_)(*args, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/anaconda3/envs/torch/lib/python3.11/site-packages/pytensor/tensor/_init_.py”, line 57, in _as_tensor_variable
raise NotImplementedError(f"Cannot convert {x!r} to a tensor variable.“)
NotImplementedError: Cannot convert None to a tensor variable.
“””

The above exception was the direct cause of the following exception:
The above exception was the direct cause of the following exception:

NotImplementedError Traceback (most recent call last)
Cell In[14], line 2
1 with skewed_model:
----> 2 trace_smc = pm.sample_smc(2000,cores=1)
3 # trace_BF_1 = pm.sample(2500, step=pm.SMC())

File ~/anaconda3/envs/torch/lib/python3.11/site-packages/pymc/smc/sampling.py:194, in sample_smc(draws, kernel, start, model, random_seed, chains, cores, compute_convergence_checks, return_inferencedata, idata_kwargs, progressbar, compile_kwargs, **kernel_kwargs)
185 params = (
186 draws,
187 kernel,
188 start,
189 model,
190 )
192 t1 = time.time()
→ 194 results = run_chains(chains, progressbar, params, random_seed, kernel_kwargs, cores)
196 (
197 traces,
198 sample_stats,
199 sample_settings,
200 ) = zip(*results)
202 trace = MultiTrace(traces)

File ~/anaconda3/envs/torch/lib/python3.11/site-packages/pymc/smc/sampling.py:406, in run_chains(chains, progressbar, params, random_seed, kernel_kwargs, cores)
399 # update the progress bar for this task:
400 progress.update(
401 status=f"Stage: {stage} Beta: {beta:.3f**}",
402 task_id=task_id,
403 refresh=True,
404 )
→ 406 return tuple(cloudpickle.loads(r.result()) for r in done)
File ~/anaconda3/envs/torch/lib/python3.11/site-packages/pymc/smc/sampling.py:406, in (.0)
399 # update the progress bar for this task:
400 progress.update(
401 status=f"Stage: {stage} Beta: {beta:.3f
}**",
402 task_id=task_id,
403 refresh=True,
404 )
→ 406 return tuple(cloudpickle.loads(r.result()) for r in done)

File ~/anaconda3/envs/torch/lib/python3.11/concurrent/futures/_base.py:449, in Future.result(self, timeout)
447 raise CancelledError()
448 elif self._state == FINISHED:
→ 449 return self.__get_result()
451 self._condition.wait(timeout)
453 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:

File ~/anaconda3/envs/torch/lib/python3.11/concurrent/futures/_base.py:401, in Future.__get_result(self)
399 if self._exception:
400 try:
→ 401 raise self._exception
402 finally:
403 # Break a reference cycle with the exception in self._exception
404 self = None

NotImplementedError: Cannot convert None to a tensor variable.

Hi @jessegrabowski , since you mention about multiprocessing, is there any current issues with multiprocessing? I have a model that was running fine with cores = 4 for version 5.25.1 but did not compile for version 5.26.1 (I got an EOFError).

Looks like it may be a dimensionality issue? It seems to work well as long as num_features_theta is <<10. I have been using SMC for a while now, including multiple observed values and never encountered that error message I’m afraid.

Not that I know of. I’ve seen EOFErrors before, usually they’re masking some other error in one of the processes that you can’t see until you run it on a single core.

In my example, num_features_theta=7, the number of examples is 30, 7 features per example, I suspect that it has something to do with my version of pymc and some libraries, could you share the settings of your python environment when you did the SMC sampling?
My python environment was created on a linux server, python version is 3.11.14 ,pymc version is 5.26.1, pytensor version is 2.35.1, arivz version is 0.22.0, numpy version is 2.3.4.
In this python environment, I tried the example in Sequential Monte Carlo — PyMC example gallery and it doesn’t work either

I’m way behind in terms of versions… :slight_smile:

python 3.11.5

pymc 5.24.1

pytensor 2.31.7

arviz 0.19.0

numpy 1.26.4

I don’t know if this was implemented correctly but this is what I ran:

 import pymc as pm
 import numpy as np

 num_features_theta = 5
 Theta_normalized = np.random.normal(0, 1, size=(1000, num_features_theta))

 with pm.Model() as skewed_model:
 mu = pm.Uniform(“mu”, lower=0, upper=1, shape=num_features_theta)
 omega = pm.HalfNormal(“omega”, sigma=1, shape=num_features_theta)
 alpha = pm.Normal(“alpha”, mu=0, sigma=2, shape=num_features_theta)

 for i in range(num_features_theta):
     pm.SkewNormal(
         f"obs_feature_{i}",
         mu=mu[i],
         sigma=omega[i],
         alpha=alpha[i],
         observed=Theta_normalized[:, i],
     )

 with skewed_model:
     trace_smc = pm.sample_smc(2000, chains=4)

1 Like

If the example is broken, could you please open an issue on GitHub - pymc-devs/pymc-examples: Examples of PyMC models, including a library of Jupyter notebooks.? That’s very no bueno.

1 Like

Thank you!Now that the problem is solved and smc is running properly, I downgraded my numpy version from 2.3.4 to 1.26.4 and that seems to have solved the problem!