Hello
Shorter version:
I want to implement a custom epsilon in my simulator to model within-simulation variability for the same parameters.
- I am not sure if this is taken care of in the smc algorithm by default?
- I don’t know what is the right syntax to define this epsilon, if the numpy array is depending on the current parameters and therefore needs to call the simulator function.
- What is the syntax to have additional parameters in the simulator function that should not be predicted by the model?
Instead of my_simulator(rng,a,b,c,size=None)
smth like my_simulator(rng,a,b,c,H,E,L,P,size=None)
where only a,b,c should be predicted?
Longer version:
- The problem:
I am using the smc_sampler to estimate 3 parameters: a,b and c that shape my data.
While a and b get predicted quite well, the third parameter is estimated quite poorly.
By working on the same problem beforehand with different methods other than pymc (=a self written non-fancy proof-of-concept mcmc, that does the job but takes quite a long time)
I know that - by design of the problem - a better prediction is possible. I am however struggeling to find the right implementation within pymc.
How I got good predictions without pymc:
In my self-written method I utilized the same simulator function to create my data. In each step of my chain I compare the simulated to the observed data.
But instead of computing a distance between one simulated instance under the current suggested parameters to the real data, as it is implemented in the distance functions in pymc (like gaussian and laplace),
I simulate 10 datasets for the same set of the current parameters. From this I obtained confidence intervals and I was able to check if the observed data lies within the uncertainty that is
obtained for the parameters. With this I ended up with a pretty accurate estimate of c.
It seems to me like as if using this variability from simulating the data several times
for a given parameterset is not implemented in the smc_sampler method - is this correct or am I missing something?
Because of this I assume that the pymc smc_sampler struggles to find the best c prediction.
- Possible solutions:
I played around with the epsilon value as a scalar, but it doesn’t seem to do a good job.
For epsilon=1 I already get good predictions for a and b.
For epsilon < 0.5 and smaller the algortihm runs really long and the c prediction does not change significantly.
Now it seems like there is a way to utilize the custom options for sum_stat and epsilon to get closer to my solution:
My idea is to indirectly use the same information as I used to before implementing my problem in pymc:
Give the smc_sampler the standard devition obtained from 10 simulations under the current parameters in the form the an epsilon-array.
I read in the simulator documentation (Simulator — PyMC dev documentation)
that feeding the smc_sampler with a custom 1-d numpy array epsilon is possible, in order to reflect different variation
among summary statistics. This implies that I would also need to include summary statistics of the same dimension.
In my case this would actually just be one representative run of the simulated data.
I could not find any example case of a custom numpy array of sum_stat and epsilon within pymc.
Since I need to call my simulator function within the model I am not sure which is the right syntax to use, similiary to
how the simulator function (my_simulator) needs to have a very specific form to be compatible within pm.Simulator.
#Choose a set of true parameters for the observed distribution, called 'real_observation'
true_a=0.8
true_b=200
true_c=150
#Defining additional parameters
H=1
E=2
L=3
P=4
real_observation=create_simulation_sample(a,b,c,H,E,L,P)
#Format: the observed data is a 1-dim numpy array of 100 bins.
#The order of the data is important because it represents histogram data, representing a bimodal distribution.
#Define my simulator function as described in the new version of pymc, with parameters in between rng and size.
def my_simulator(rng,a,b,c,size=None):
H=1
E=2
L=3
P=4
sim = create_simulation_sample(a,b,c,H,E,L,P)
return sim
with pm.Model() as model1:
a = pm.Uniform("a", lower=0.0, upper=1.0)
b = pm.DiscreteUniform("b", lower=0, upper=1000)
c = pm.DiscreteUniform("c", lower=0, upper=1000)
#computing mean and std of 10 simulations under the current a,b,c parameters.
#Returns two numpy arrays of same dimension as real_observation
(sum_stat_data,epsilon_data)=mean_and_std_of_my_simulator(a,b,c)
s = pm.Simulator("s", my_simulator, params=[a,b,c],
distance="gaussian", sum_stat=sum_stat_data,
epsilon=epsilon_data,
observed=real_observation)
trace_model1 = pm.sample_smc()
- My final question is:
Is there a way to add my additional arguments within my_simulator? Such as
my_simulator(rng,a,b,c,H,E,L,P,size=None)?
I would like to change those parameters for different runs in the future but it seems as if pm.Simulator can only work with this strict definition.