This is my observed bimodal data :
My model is mixture of 2 uniform distributions (exactly the same that what I used to produce data), one on [0,5] the other [10,15].
My goal is to be able to tell wich distribution produce every data sample.
(Which shoud be obvious. If the sample is in [0,5] then it comes form the first distribution, and if the sample is in [10,15] then it comes from the second distribution. There is no overlap.).
import numpy as np
import pymc3 as pm
# Test data
data = np.concatenate([np.random.uniform(low=0,high=5,size=100),np.random.uniform(low=10,high=15,size=100)])
with pm.Model() as model:
c1 = pm.Uniform.dist(lower=0,upper=5)
c2 = pm.Uniform.dist(lower=10,upper=15)
w = pm.Dirichlet('w',a=np.array([1,1]),shape=(len(data),2))
mix = pm.Mixture('mix',w=w,comp_dists=[c1,c2],observed=data,shape=len(data))
trace = pm.sample()
Wich seems to sample well :
Since the both data distributions and the the two mixture components (c1 & c2) are totally distinct, I expect that the mixture weights w will be nicely separated. But it’s not!! Let’s see :
First let’s see the shape of w :
First coordinate is the distribution, second coordinate is the observed data sample, and the last coordinate is the 2 component weights.
So let’s choose the first data sample (0), and see the corresponding distribution of w for that sample.
%matplotlib notebook
import matplotlib.pyplot as plt
sample = 0
plt.hist(trace[‘w’][:,sample,0],alpha=0.5) # w[0] blue
plt.hist(trace[‘w’][:,sample,1],alpha=0.5) # w[1] orange
plt.grid(True)
In my understanding of the problem and model, these two histograms should be separated… How can that be that this two distributions overlap ?
Since c2 is uniform on the interval [10,15], the likelihood of the c2 distribution is 0 at ~2.24. Then the log-likelyhood is -infinity.
How the mixture can hold a non-zero weight on a -infinity likelyhood !!?
Is there a problem with my understanding or a problem with the mixture feature ?
Any help will be appreciated…