SMC questions: start values

vian · July 29, 2020, 8:51am

Hi there,
I’ve been playing around with SMC for a problem with a large number of dimensions. I have a few questions though:

it’s apparently not possible to use a specific number of chains but the process can be parallelized on cores. I tried to find some documentation to know more about the algorithm, could you point me to some?
I tried to setup start values, but I guess it would need one set of start value (one dict) and not an array corresponding to the various chains, is that correct?
in smc.py, I’m wondering whether v in the code below should be self.model.unobserved_RVs rather than self.variables? I can’t seem to make it work properly.
for i in range(self.draws):
point = Point({v.name: init_rnd[v.name][i] for v in self.variables}, model=self.model)
population.append(self.model.dict_to_array(point))
as a matter of fact, I wonder whether start values are really useful for SMC, judging from the way the parameter space is sampled. Normally I would choose random start values, possibly different for different chains, but here it seems it wouldn’t be necessary, is that right?

Cheers,
MV

AlexAndorra · July 29, 2020, 9:59am

Hi Vianney,
I think the one and only @aloctavodia will be helpful here

aloctavodia · July 29, 2020, 11:53am

Hi @vian

In our implementation the numbers of draws is the number of chains.
The docstring of sample_smc has a high-level description of the algorithm. The exact description of the algorithm implemented in PyMC3 has not been published but is based on this and this with a few modifications, mostly how the covariance of the proposal distribution and the number of Metropolis steps are tunned.
You need a dictionary with the names of the variables as keys and arrays as values. The arrays has to be of length draws. For example.
a = pm.Poisson(“a”, 5)
b = pm.HalfNormal(“b”, 10)
y = pm.Normal(“y”, a, b, observed=[1, 2, 3, 4])
start = {
“a”: np.random.poisson(5, size=500),
“b_log__”: np.abs(np.random.normal(0, 10, size=500)),
}
trace = pm.sample_smc(500, start=start)
self.variables is model.vars
Unless you have a very good reason to use some specific start values it is better to let SMC initialize from the prior. This is also true for other sampler in PyMC3, in general you don’t need to initialize the algorithm manually.

BTW, I am interested in the behavior of SMC in high-dimensions, so if you can share your observations/problems It will be great.

vian · July 31, 2020, 8:46am

Hi again,
I don’t know if I should create a new topic for this, but while testing SMC, it seems I can’t load the trace that was saved:
works fine: pm.save_trace(trace, directory=tracelocation, overwrite=True)
fails: trace = pm.load_trace(tracelocation, model=model)

File “[]/lib/python3.7/site-packages/pymc3/backends/ndarray.py”, line 81, in load_trace
straces.append(SerializeNDArray(subdir).load(model))
File “[]/lib/python3.7/site-packages/pymc3/backends/ndarray.py”, line 146, in load
metadata[’_stats’] = [{k: np.array(v) for k, v in stat.items()} for stat in metadata[’_stats’]]
TypeError: ‘NoneType’ object is not iterable

Is there a specific procedure to follow?
Cheers,
MV

AlexAndorra · July 31, 2020, 9:14am

The different backends are going to be deprecated in favor of ArviZ’s NetCDF functionality, so I’d use that instead. Hope this helps

vian · July 31, 2020, 10:09am

Great, thanks. I imagine there a special call for loading the netCDF file as a PyMC3 trace?

edit: sorry nevermind, from_netcdf works just fine, the issue was in my code
edit: although, I get TypeError: ‘InferenceData’ object is not subscriptable. I’ll look into the current PyMC3 version to see how netCDF files are implemented
edit: if I understand correctly, once the inference is over, one can do az.data.convert_to_inference_data to convert the trace to the arviz format. However, I’m not sure anymore how to do things like pm.sample_posterior_predictive from a loaded trace
final edit: ok, using convert_to_inference_data once inference is done, modifying the code to use this format instead, and I see that the inference data format is handled in the current git version for sample_posterior_predictive. So I guess that answers all my questions. I’ll start making my code ready for the next PyMC3 version

AlexAndorra · August 3, 2020, 9:02am

Exactly You answered your owned questions perfectly

Topic		Replies	Views
Using the starts argument to sample_smc() v5	1	255	November 2, 2023
SMC sampling appears to not allow supplying a starting trace Development	3	792	June 26, 2018
What is the correct format/shape of the initial values for Sequential Monte Carlo (SMC)? v5 modeling	7	191	April 27, 2024
Examples for SMC with streaming data? v5 modeling	12	445	December 12, 2023
Incorrect number of draws in SMC example v5 bug	2	31	June 19, 2025

SMC questions: start values

Related topics