Fitting a Lorentzian peak to counting data / Poisson noise?

Hello,

I hope this isn’t a totally novice question - I’ve done some digging around on these forums and elsewhere to try and see if I’m implementing my problem correctly but I thought I’d ask it here just in case.

I’m trying to fi ta function (here, a Lorentzian peak + flat background) to data that is generated by Poisson sampling (e.g. by counts hitting a detector and populating a histogram).

Below I have (hopefully) implemented a Poisson likelihood for the data (obs) and some Normal / HalfNormal priors for the fitting variables. I got a good match with the interfered parameters + fit and the true distribution. But my questions are:

  • Is this the correct way to handle Poisson distributed data? Is it even appropriate to call it ‘Poisson distributed’, or is the distribution function actually my fit equation ‘mu’ (Lorentzian + background)?

  • Do I instead need to define a variable that accounts for Poisson noise? If so, then what sort of distribution do I use for the ‘obs’ variable?

Thanks very much for any feedback you may have (directly relevant to the problem or any advice at all on how to better implement this)!

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pymc3 as pm
np.random.seed(12345)

# Parameters for data and function
c0    = 2 # Background constant
amp   = 100 # Peak amplitude
x0    = 1 # Offset
gamma = 20 # FWHM

x = np.linspace(-50, 50, 300)
y_true = c0+amp*(1/2)*gamma/((x-x0)**2+(1/2*gamma)**2)
y = np.random.poisson(y_true, size=len(y_true))

fig1 = plt.figure(1, dpi=150, figsize=(4, 4))
plt.scatter(x, y, color='r', s=2, label="Data")
plt.plot(x, y_true, color='b', lw=1, label="True")
plt.legend()
plt.tight_layout()

var_names = ["c0", "amp", "x0", "gamma"]

if __name__ == "__main__":
    with pm.Model() as model:
        c0    = pm.HalfNormal('c0', sigma=5, testval=1)
        amp   = pm.Normal('amp', mu=100, sigma=20, testval=100)
        x0    = pm.Normal('x0', mu=1, sigma=10, testval=1)
        gamma = pm.Normal('gamma', mu=20, sigma=10, testval=20)
        mu    = pm.Deterministic('mu', c0+amp*(1/2)*gamma/((x-x0)**2+(1/2*gamma)** 2))
        obs   = pm.Poisson('obs', mu=mu, observed=y)

    with model:
        trace = pm.sample(3000, chains=2, cores=1)
        trace = trace[1000:]

    with model:
        az.plot_trace(trace, var_names=var_names, figsize=(8, 6))

    with model:
        ppc = pm.sample_posterior_predictive(trace, var_names=var_names, random_seed=42)
        ppc_fit = pm.sample_posterior_predictive(trace, var_names=['mu'], samples=100, random_seed=42)

    y_MCMC = trace.mu.mean(0)

    plt.figure(1)
    plt.plot(x, y_MCMC, color='k', lw=2, label='MCMC fit') # plot trace mean for mu
    [plt.plot(x, y, color=[0.5, 0.5, 0.5], alpha=0.1, zorder=0) for y in ppc_fit['mu'][0:100]] # plot 100 posterior samples of mu
    plt.legend()

This looks good to me. The sampling is well behaved and it reproduces your simulation values. It seems to me the questions you’re asking are more philosophical—“is it appropriate to call it ‘Poisson distributed’?” I suppose that depends on what you’re implying is your independent variable. Personally, I think of your data as Lorentz distributed with a Poisson variance. So, for a single x-value on your plot (or, practically speaking, for a narrow bin), you could look at the count distribution of the data in that bin and it’s reasonable to assume it’s Poisson distributed. But the value of \lambda for that Poisson will depend on your value of x (of course that corresponds to each value of \mu in your obs variable). Basically, in your code you’re fitting len(x) number of Poisson distributions with your obs variable.

Alternatively, you could convert your data into a 1D list and use the pm.Cauchy() distribution as your log-likelihood. For example say you had the following data points y=[(-10.5, 3), (-1, 11), (5.5, 4)] which would be re-expressed as y=[-10.5, -10.5, -10.5, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 5.5, 5.5, 5.5, 5.5]. This 1D representation could then be set as the observed data for an explicitly Lorentz (a.k.a. “Cauchy”) likelihood.
i.e. obs = pm.Cauchy('obs', alpha=a_prior, beta=b_prior, observed=y)

I’m not sure if this is answering your question or if it’s even helpful, but perhaps it gives you another way of conceptualizing the problem. All of that said, I like the way you’re doing it.

1 Like

Thanks very much for the feedback and advice! That was also my interpretation, that the distribution here is actually a Lorentzian and that each bin or x-value of the plot is Poisson distributed, so I’m glad that the way I’ve implemented it seems correct.