The issue I am finding seems to be a general one: Given a mixture model where one component of the mixture is a Uniform
component (representing some background signal), if the background is sufficiently high it will bias the width parameters of the remaining components (say a Normal
peak, representing some signal).
In other words, fitting the mixture (peak signal + uniform background) will result in a wider peak and a lower background weight than expected.
This appears to happen irrespective of the type of sampler/optimizer I am using (ADVI, NUTS, etc).
My question is this: Are there any particular hyperparameter-setting strategies or tricks that can help to account for this particular bias? I have tried adjusting the model priors to account for this bias, but it only gets me so far and has not proven to be a particularly robust solution.
To give you a sense of what I’m talking about I’ve included the following simple example.
Generate some test data
The example I’m using is a two component model consisting of a Normal
distribution centered at zero with some width sigma
and a Uniform
background component. The component weights are (w_signal, w_back) = (0.1, 0.9)
. I generate some test data using scipy
:
import scipy as sp
data_range = [-20000, 20000]
N = 10000
w_back = 0.9
N_back = int(N*w_back)
N_signal = int(N*(1-w_back))
mu = 0
sigma = 2000
background = np.round(sp.stats.uniform.rvs(loc=data_range[0], scale=data_range[1]-data_range[0], size = N_back, random_state=42))
signal = np.round(sp.stats.norm.rvs(loc=0, scale=2000, size = N_signal, random_state=42))
Where I’ve chosen some total number of data points N
and assigned them to the peak signal and background according to some background weight w_back
. Plotting the above:
import matplotlib.pyplot as plt
plt.hist(np.concatenate([signal,background]), bins=np.linspace(data_range[0], data_range[1], 300))
plt.xlim(data_range)
plt.show()
Here you can see the background is very high, but the peak is still clearly visible.
Fitting PyMC model
In order to bias the fit toward smaller values of sigma
I’m using an Exponential
prior for this parameter. I am also biasing the Dirichlet
priors for the weights to attempt to assign more weight to the background. Here I’m showing the results using ADVI as the optimizer, but the same results happens with the MC samplers like NUTS.
with pm.Model() as mod:
peak_mu = pm.Uniform("mu", lower=data_range[0], upper=data_range[1])
peak_sig = pm.Exponential("sig", lam=1/2000)
peak_pdf = pm.Normal.dist(mu=peak_mu, sigma=peak_sig)
back_pdf = pm.Uniform.dist(lower=data_range[0], upper=data_range[1])
weights = pm.Dirichlet("w", a=np.array([1, 3]))
model = pm.Mixture("mod", w=weights, comp_dists=[peak_pdf, back_pdf], observed=np.concatenate([signal,background]))
with mod:
vi = pm.ADVI()
ivi = vi.fit(50000)
idata = ivi.sample(10000)
Fitting results
Here we see that the fitting results have two complimentary systematic biases: 1) the peak width sig
is biased toward larger values (mean=2131
whereas ground truth is sigma=2000
) and 2) the background weight w1
is biased toward smaller values (mean=0.89
whereas ground truth is w_back=0.9
).
Effectively, what appears to be happening here is that the optimization is WIDENING the signal peak (sig
) to account for some fraction of the uniform background which results in a DECREASE in the inferred background weight (w1
).
import arviz as az
az.plot_trace(idata);
az.plot_posterior(idata);
A few criticisms I could imagine a reader might present:
- But the background is so high to begin with.
- The bias in the result really isn’t all that significant given that the ground truth values are still within the HDI for each posterior.
Yes, the background is high, but looking at the data the peak is still clearly visible, and so I think it should still be able to be correctly inferred. Furthermore, the Dirichlet
weights provide for the possibility that all the data could be attributed to uniform background, so the correct result is still well within the model’s possible parameter space.
Yes, in this case the true value is still within the HDI of the posteriors, but the fact that it’s a consistent systematic bias regardless of how I tune the prior hyperparameters and it occurs in every case I’ve tried with a mixture model consisting of a localized signal + a uniform background, seems like a big problem.
Also, what I am presenting here is just the simplest example I could think of to reproduce the issue. My actual use-case which is a more complicated mixture model demonstrates this same bias but to a much more extreme degree, such that the ground truth parameter values are way outside the HDI for their respective posteriors.
Has anyone else seen this correlated bias between background weight and other model parameters? Is there any robust way of adjusting for this? Thanks for any insights!