How to fit beta distribution from bin and count data?

Kynnemall · September 7, 2022, 2:51pm

I used a lab instrument to acquire millions of datapoints and the machine returns a summary file with the count data for each bin. I can’t generate n values for each bin based on the count data, since my laptop wouldn’t be able to hold this data in memory, and I haven’t been able to find a case where someone was able to use a library like Dask or Vaex to get around the memory issue.

So, I would like to know if it’s possible to use PyMC to fit distributions if I only have the bin and count data?

ricardoV94 · September 7, 2022, 10:47pm

CC @ferrine I think he worked on something similar

ferrine · September 11, 2022, 6:46am

Hi, you can use this reference implementation to build logp for the observed with bins and counts.

github.com

pymc-devs/pymc-experimental/blob/main/pymc_experimental/distributions/histogram_utils.py#L133


      
          ...     )
          """
          if dask and isinstance(observed, (dask.dataframe.Series, dask.dataframe.DataFrame)):
              observed = observed.to_dask_array(lengths=True)
          if np.issubdtype(observed.dtype, np.integer):
              histogram = discrete_histogram(observed, **h_kwargs)
          else:
              histogram = quantile_histogram(observed, **h_kwargs)
          if dask is not None:
              (histogram,) = dask.compute(histogram)
          return pm.Potential(name, pm.logp(dist, histogram["mid"]) * histogram["count"])

You need a target distibution, let’s say

dist=pm.Beta.dist(a, b)

and your histogram data (bins+counts) that you pass to the potential

with pm.Model() as model:
    a = pm.Exponential("a", 1)
    b = pm.Exponential("b", 1)
    dist = pm.Beta.dist(a, b)
    pm.Potential("obs", pm.logp(dist, histogram["mid"]) * histogram["count"])

Topic		Replies	Views
Inference of Physical Parameters from Binned Data v5 difficult_inference , modeling	1	650	July 20, 2023
Discrete DensityDist v5 modeling	5	509	January 7, 2023
Recommended way to create a new discrete distribution? Questions development	3	1798	February 4, 2020
Poisson Binomial in PyMC3 Questions	5	803	February 13, 2022
Sampler issues on Beta prior Binomial likelihood v5 bug	5	394	December 8, 2022

How to fit beta distribution from bin and count data?

Related topics