SMC questions: start values

Hi there,
I’ve been playing around with SMC for a problem with a large number of dimensions. I have a few questions though:

  • it’s apparently not possible to use a specific number of chains but the process can be parallelized on cores. I tried to find some documentation to know more about the algorithm, could you point me to some?
  • I tried to setup start values, but I guess it would need one set of start value (one dict) and not an array corresponding to the various chains, is that correct?
  • in smc.py, I’m wondering whether v in the code below should be self.model.unobserved_RVs rather than self.variables? I can’t seem to make it work properly.
    for i in range(self.draws):
    point = Point({v.name: init_rnd[v.name][i] for v in self.variables}, model=self.model)
    population.append(self.model.dict_to_array(point))
  • as a matter of fact, I wonder whether start values are really useful for SMC, judging from the way the parameter space is sampled. Normally I would choose random start values, possibly different for different chains, but here it seems it wouldn’t be necessary, is that right?

Cheers,
MV

Hi Vianney,
I think the one and only @aloctavodia will be helpful here :wink:

Hi @vian

  • In our implementation the numbers of draws is the number of chains.
    The docstring of sample_smc has a high-level description of the algorithm. The exact description of the algorithm implemented in PyMC3 has not been published but is based on this and this with a few modifications, mostly how the covariance of the proposal distribution and the number of Metropolis steps are tunned.

  • You need a dictionary with the names of the variables as keys and arrays as values. The arrays has to be of length draws. For example.
    a = pm.Poisson(“a”, 5)
    b = pm.HalfNormal(“b”, 10)
    y = pm.Normal(“y”, a, b, observed=[1, 2, 3, 4])
    start = {
    “a”: np.random.poisson(5, size=500),
    “b_log__”: np.abs(np.random.normal(0, 10, size=500)),
    }
    trace = pm.sample_smc(500, start=start)

  • self.variables is model.vars

  • Unless you have a very good reason to use some specific start values it is better to let SMC initialize from the prior. This is also true for other sampler in PyMC3, in general you don’t need to initialize the algorithm manually.

BTW, I am interested in the behavior of SMC in high-dimensions, so if you can share your observations/problems It will be great.

1 Like

Hi again,
I don’t know if I should create a new topic for this, but while testing SMC, it seems I can’t load the trace that was saved:
works fine: pm.save_trace(trace, directory=tracelocation, overwrite=True)
fails: trace = pm.load_trace(tracelocation, model=model)

File “[]/lib/python3.7/site-packages/pymc3/backends/ndarray.py”, line 81, in load_trace
straces.append(SerializeNDArray(subdir).load(model))
File “[]/lib/python3.7/site-packages/pymc3/backends/ndarray.py”, line 146, in load
metadata[’_stats’] = [{k: np.array(v) for k, v in stat.items()} for stat in metadata[’_stats’]]
TypeError: ‘NoneType’ object is not iterable

Is there a specific procedure to follow?
Cheers,
MV

The different backends are going to be deprecated in favor of ArviZ’s NetCDF functionality, so I’d use that instead. Hope this helps :vulcan_salute:

Great, thanks. I imagine there a special call for loading the netCDF file as a PyMC3 trace?

edit: sorry nevermind, from_netcdf works just fine, the issue was in my code
edit: although, I get TypeError: ‘InferenceData’ object is not subscriptable. I’ll look into the current PyMC3 version to see how netCDF files are implemented
edit: if I understand correctly, once the inference is over, one can do az.data.convert_to_inference_data to convert the trace to the arviz format. However, I’m not sure anymore how to do things like pm.sample_posterior_predictive from a loaded trace
final edit: ok, using convert_to_inference_data once inference is done, modifying the code to use this format instead, and I see that the inference data format is handled in the current git version for sample_posterior_predictive. So I guess that answers all my questions. I’ll start making my code ready for the next PyMC3 version

1 Like

Exactly :slight_smile: You answered your owned questions perfectly :sweat_smile: