Dear Ricardo,
Thank you for your super quick reply! ![]()
For my problem this implies that I can solve it in only two ways. Please let me know if I understood it correctly. I wonder if there is a pymc trick that would solve this in a more elegant way, because I am convinced that my solution for the problem at hand - namely creating a custom-epsilon that is updated each time with the parameters and a suitable simulated data array represented by the mean of several simulations instead of just one - could be useful to improve the prediction accuracy for many problems.
I now understand that sum_stat needs to be a function that will take one time the observed data and another time the simulated data, compute the same summary statistics on each of them. Both summary stats are then - together with epsilon - passed to the distance function.
In my case, I would like to instead use the raw observed data but for the simulated data I want to perform 10 (or more) simulations, compute the mean and standard deviation per bin (resulting in two arrays with the same dimensions as the raw observed data) and use the mean as input data for the distance function while i use the std as epsilon.
It seems like in the current implementation of pymc I have two options to achieve a workaround for my solution, both using sum_stat=”identity”:
Instead of the original simulator function, I pass a function to pm.Simulator() that calculates the mean of 10 simulations. Since epsilon can only be passed as np.array I would need to perform 10 additional simulations, compute the std and then also pass the resulting epsilon.
Keep the same epsilon as in 1) but use the original simulator and live with the fact that we compute the distance from a single instance simulation instead of the mean, which would probably represent better what happens for the given parameters.
I am curious to try out both options and see if the results vary significantly in runtime or prediction accuracy.