Help with understanding how sum_stat="sort" works and how it is applied in "smc.py" source code

mjedrz · January 21, 2022, 1:26pm

Hi,
I’ve been fiddling with the source code “smc.py” recently and discovered something which bugs me. When a summary statistic called “sum_stat=sort” is used, then np.sort is applied to observations and sim_data as follows:

self.observations = self.sum_stat(observations)
(...)
elemwise = self.distance(self.epsilon, self.observations, self.sum_stat(sim_data))

Shouldn’t sorting be done in a way which sorts both of the arrays based on one of them (on observations)? In many cases this does not make a difference, but if for example for observations = [3.04,2.99,1.5] we have corresponding sim_data = [3.02, 3.05, 1.56], then the distance function will not calculate distance between correct pairs of values.
I will greatly appreciate if someone explains if this sorting procedure is intended to work the way it does or if it is an error in the source code.

cluhmann · January 21, 2022, 3:36pm

@ricardoV94 @aloctavodia ?

aloctavodia · January 21, 2022, 5:26pm

Hi @mjed this was intentional and I think this should be the correct way to approximate the Wasserstein distance for empirical distributions. But I can check again or find a good reference if you are interested on the details.

Topic		Replies	Views
Custom implementation of epsilon, sum_stat in simulator? v5 bug , smc_abc , modeling	3	199	March 28, 2024
Order statistics in PyMC3 Questions	17	3481	January 18, 2024
Sort or argsort do not compile in jax	5	56	December 12, 2024
SMC questions: start values Questions	6	971	August 3, 2020
Tournament Skill Estimator, some modelling challenges Questions	7	1302	February 15, 2019

Help with understanding how sum_stat="sort" works and how it is applied in "smc.py" source code

Related topics