Inference of Physical Parameters from Binned Data

Hi all,

I am relatively new to pymc, so this problem might have a very simple solution that I am not seeing:
For my current project I am trying to infer some physical parameters from a set of time-resolved measurements. Essentially the output of the measurement is a histogram as the one shown in the figure, where a number of counts is binned into time-segments:

image

You can see that a) the data follows an almost-exponential distribution and b) the relative error is larger at lower count numbers (obviously, I guess).
I have a physical model that I have translated into pytensor, so that I can simulate the same decay from a set of parameters. Now I would like to use pymc to infer the set of parameters that best fit the data.
However, if I just use

Y_obs = pm.Normal('Y_obs', mu = simulation, sigma = sigma, observed = data)

the lower counts at later times are essentially ignored, because they don’t change the overall likelihood too much anymore. Similarly I have tried

Y_obs_log = pm.Normal('Y_obs_log', mu = at.log(simulation), sigma = sigma, observed = np.log(data))

but then the relative change in sigma with time seems to become an issue. Lastly, I have also tried

Y_obs_poiss = pm.Poisson('Y_obs_poiss', mu = simulation/simulation.sum(), observed = data/data.sum() )

and

w = data
Y_obs_poiss2 = pm.Potential('Y_obs_poiss2 ', w*pm.logp(rv=pm.Poisson.dist(simulation/simulation.sum()), value=data/data.sum()))

but neither have resulted in anything useful so far.

I would appreciate any input of how to combine the continuous model I have with the discrete, binned data in a ‘proper’ way to infer the parameters. Maybe there is something obvious I am missing here.
Thank you very much already!

The “correct way” to handle discretized data from a continuous process is to integrate out the density between bin edges.

PyMC can give you the probability of “rounded”, “ceiling”, “floored” data, but not arbitrary bins yet.

An example is given in this thread:

For a manual implementation of arbitrary bins you may check this notebook: Estimating parameters of a distribution from awkwardly binned data — PyMC example gallery

However I am not sure you are writing the correct PyMC model to begin with. I suggest you do some parameter recovery study with simulated data before discretization to see if your model works correctly in that case. Then you can add discretization and repeat the experiments with the same continuous likelihood and/or the right likelihood for the binned data.

Sometimes the distortion caused by binning can be ignored altogether if you have a lot of data / a well identified model.

2 Likes