Hello

Shorter version:

I want to implement a custom epsilon in my simulator to model within-simulation variability for the same parameters.

- I am not sure if this is taken care of in the smc algorithm by default?
- I donâ€™t know what is the right syntax to define this epsilon, if the numpy array is depending on the current parameters and therefore needs to call the simulator function.
- What is the syntax to have additional parameters in the simulator function that should not be predicted by the model?

Instead of `my_simulator(rng,a,b,c,size=None)`

smth like `my_simulator(rng,a,b,c,H,E,L,P,size=None)`

where only a,b,c should be predicted?

Longer version:

- The problem:

I am using the smc_sampler to estimate 3 parameters: a,b and c that shape my data.

While a and b get predicted quite well, the third parameter is estimated quite poorly.

By working on the same problem beforehand with different methods other than pymc (=a self written non-fancy proof-of-concept mcmc, that does the job but takes quite a long time)

I know that - by design of the problem - a better prediction is possible. I am however struggeling to find the right implementation within pymc.

How I got good predictions without pymc:

In my self-written method I utilized the same simulator function to create my data. In each step of my chain I compare the simulated to the observed data.

But instead of computing a distance between one simulated instance under the current suggested parameters to the real data, as it is implemented in the distance functions in pymc (like gaussian and laplace),

I simulate 10 datasets for the same set of the current parameters. From this I obtained confidence intervals and I was able to check if the observed data lies within the uncertainty that is

obtained for the parameters. With this I ended up with a pretty accurate estimate of c.

It seems to me like as if using this variability from simulating the data several times

for a given parameterset is not implemented in the smc_sampler method - is this correct or am I missing something?

Because of this I assume that the pymc smc_sampler struggles to find the best c prediction.

- Possible solutions:

I played around with the epsilon value as a scalar, but it doesnâ€™t seem to do a good job.

For epsilon=1 I already get good predictions for a and b.

For epsilon < 0.5 and smaller the algortihm runs really long and the c prediction does not change significantly.

Now it seems like there is a way to utilize the custom options for sum_stat and epsilon to get closer to my solution:

My idea is to indirectly use the same information as I used to before implementing my problem in pymc:

Give the smc_sampler the standard devition obtained from 10 simulations under the current parameters in the form the an epsilon-array.

I read in the simulator documentation (Simulator â€” PyMC dev documentation)

that feeding the smc_sampler with a custom 1-d numpy array epsilon is possible, in order to reflect different variation

among summary statistics. This implies that I would also need to include summary statistics of the same dimension.

In my case this would actually just be one representative run of the simulated data.

I could not find any example case of a custom numpy array of sum_stat and epsilon within pymc.

Since I need to call my simulator function within the model I am not sure which is the right syntax to use, similiary to

how the simulator function (my_simulator) needs to have a very specific form to be compatible within pm.Simulator.

```
#Choose a set of true parameters for the observed distribution, called 'real_observation'
true_a=0.8
true_b=200
true_c=150
#Defining additional parameters
H=1
E=2
L=3
P=4
real_observation=create_simulation_sample(a,b,c,H,E,L,P)
#Format: the observed data is a 1-dim numpy array of 100 bins.
#The order of the data is important because it represents histogram data, representing a bimodal distribution.
#Define my simulator function as described in the new version of pymc, with parameters in between rng and size.
def my_simulator(rng,a,b,c,size=None):
H=1
E=2
L=3
P=4
sim = create_simulation_sample(a,b,c,H,E,L,P)
return sim
with pm.Model() as model1:
a = pm.Uniform("a", lower=0.0, upper=1.0)
b = pm.DiscreteUniform("b", lower=0, upper=1000)
c = pm.DiscreteUniform("c", lower=0, upper=1000)
#computing mean and std of 10 simulations under the current a,b,c parameters.
#Returns two numpy arrays of same dimension as real_observation
(sum_stat_data,epsilon_data)=mean_and_std_of_my_simulator(a,b,c)
s = pm.Simulator("s", my_simulator, params=[a,b,c],
distance="gaussian", sum_stat=sum_stat_data,
epsilon=epsilon_data,
observed=real_observation)
trace_model1 = pm.sample_smc()
```

- My final question is:

Is there a way to add my additional arguments within my_simulator? Such as

my_simulator(rng,a,b,c,H,E,L,P,size=None)?

I would like to change those parameters for different runs in the future but it seems as if pm.Simulator can only work with this strict definition.