What is the correct format/shape of the initial values for Sequential Monte Carlo (SMC)?

How do I properly format the initial values passed to the parameter start in the function pymc.smc.sample_smc for Sequential Monte Carlo (SMC)?

To give a concrete example, let me consider the following code for Bayesian linear regression:

import pymc as pm
import numpy as np

def basic_model(observed_data):
    array_sizes = np.array([size for (size, _) in observed_data])
    array_costs = np.array([cost for (_, cost) in observed_data])
    coefficient_sigma = 5

    with pm.Model() as model:
        coefficient0 = pm.HalfNormal(
            "coefficient0", sigma=coefficient_sigma)
        coefficient1 = pm.HalfNormal(
            "coefficient1", sigma=coefficient_sigma)

        predicted_bounds = coefficient0 + coefficient1 * array_sizes
        observed_costs = pm.Normal("observed_costs", mu=predicted_bounds,
                                   sigma=10, observed=array_costs)
    return model

observed_data = [[1, 1], [2, 2], [4, 3], [8, 4], [
    16, 7], [32, 10], [64, 13], [128, 17], [256, 18]]

init_nuts = {"coefficient0": 10,
             "coefficient1": 10}

num_draws = 1000
num_chains = 4

init_smc = {"coefficient0_log__": np.full((num_draws, num_chains), 10),
            "coefficient1_log__": np.full((num_draws, num_chains), 10)}

with basic_model(observed_data):
    idata = pm.sample_smc(num_draws, start=init_smc,
                          chains=num_chains, random_seed=42)

When I run this code with PyMC v5.13.1, I get a warning UserWarning: More chains (4) than draws (1). Passed array should have shape (chains, draws, *shape). I believe the warning stems from the improper shape/format of the initial values init_smc I pass to the function pm.sample_smc. Here, init_smc is a dictionary mapping variable names (i.e., coefficient0_log__ and coefficient1_log__) to numpy arrays of shape (num_draws, num_chains).

I couldn’t find documentation on what the initial values’ format should be. Does anyone know by any chance how to resolve this issue? Thanks a lot in advance!

For some reason, if we replace

init_smc = {"coefficient0_log__": np.full((num_draws, num_chains), 10),
            "coefficient1_log__": np.full((num_draws, num_chains), 10)}

with randomized initial values

init_smc = {"coefficient0_log__": np.random.normal(size=(num_draws, num_chains)),
            "coefficient1_log__": np.random.normal(size=(num_draws, num_chains))}

the warning is suppressed, even though the shape of the numpy arrays inside init_smc remains the same.

Shouldn’t it be chains x draws instead of the other way around?

@ricardoV94 That’s what I initially thought as well. I read somewhere that the shape should be (num_chains, num_draws) although I don’t recall where exactly I saw it. But when I tried the shape (num_chains, num_draws) in my code, it crashed. So I instead had to use the shape (num_draws, num_chains). It would be great if someone could revise the documentation of sample_smc to clarify this point.

That may actually be a bug / untested feature? I see the only test we have is for a single chain: pymc/tests/smc/test_smc.py at 6761c0c73ba07cc9dc51ec3adab7f1aa5f76b23d · pymc-devs/pymc · GitHub

Do you mind opening an issue to check if this is behaving as expected and/or update the docs in our github repository?

Sure. Let me open an issue on PyMC’s GitHub repository.

@ricardoV94 I’ve reported this issue on GitHub, asking for the clarification in the PyMC documentation and updating the unit test.

1 Like