How to improve fit of Poisson Mixture Model in PyMC3?

Doing proper inference for mixture model is not easy - if you do a search on the discourse there are a few pointer to better parameterize your model (e.g., Properly sampling mixture models)

First, you need to make sure the inference is done correctly. Plot the trace and check that: there is no divergence, no rhat and effective sample size warning, and in general the trace looks fine.

1 Like

@cynthiaw2004 , I’ve tried running that model a few times on separate days. Using your parameters, the model converged successfully after about 6 trials. I’m not sure why it takes so many trials to converge, but I never got it on the first time. When the model didn’t converge (Rhat > 1), it is quite obvious in the visualization below:
image
image

But when it converged, you can see the visual alignment:
image

Like @Junpenglao suggested, it would be best if you separate the model into groups such as follows:

with pm.Model() as model:
    lam1 = pm.Exponential('lam1', lam=1)
    lam2 = pm.Exponential('lam2', lam=1)
    pois1 = pm.Poisson.dist(mu=lam1)
    pois2 = pm.Poisson.dist(mu=lam2)
    w = pm.Dirichlet('w', a=np.array([1, 1]))
    like = pm.Mixture('like', w=w, comp_dists=[pois1, pois2], observed=data)
with model:
    trace = pm.sample(5000, n_init=10000, tune=10000, random_seed=SEED)[1000:]

Then try:
pm.summary(trace)

or:
pm.forestplot(trace)

to confirm that the model converged.

I think we can help the model converge on the first trial by setting testval=something within pm.Exponential('lam1', lam=1), but I’m not sure what that should be equal to in this case.

Does anyone have a suggestion?

1 Like

Restricting the order of the location parameter (lam in this case) usually prevents label switching and helps convergence - although your model still can be unidentifiable if you dont have enough data or the latent mixture location is too close. For more detail see Mike Betancourt’s case study (I have a PyMC3 port for this).

In this case, applying the similar parameter constrains:

import pymc3.distributions.transforms as tr
chain_tran = tr.Chain([tr.log, tr.ordered])

with pm.Model() as model:
    lam = pm.Exponential('lam', lam=1, 
                         shape=2, 
                         transform=chain_tran, 
                         testval=np.asarray([1., 1.5]))
    pois = pm.Poisson.dist(mu=lam)
    w = pm.Dirichlet('w', a=np.array([1, 1]))
    like = pm.Mixture('like', w=w, comp_dists=pois, observed=data)

should makes the inference a bit easier.

Thanks for the suggestions. I did get a “The estimated number of effective samples is smaller than 200 for some parameters” so I upped the draws parameter for the sample function to make that go away. I also used traceplot (attached)

and checked the rhats using summary(trace), but the rhats are not looking too hot…

As for your suggestion on using tr.Chain, I am getting an attribute error: AttributeError: module ‘pymc3.distributions.transforms’ has no attribute ‘Chain’

I am using the git version of pymc3 (not the pip version) so any ideas what is going on there?

attaching result of summary(trace) since I can’t put 2 pictures in one post as a n00b user

You say you got it to converge after 6 trials. I apologize for the beginner question, but is that a parameter in the sample function? If so, which? sample api

Or that you just ran sample 6 times? And on the 6th time it finally converged?

@cynthiaw2004 I meant that I ran it 6 or 7 times when it finally converged. Your summary trace results appear to be pretty good, suggesting a nice convergence.
If you’d like to visually compare the observed data with the posterior trace results, this should work:

N = 1000                              # this is the size of your observed data number 

W = np.array([0.18, 0.81])            # these are the means of the w's in your summary(trace) 

Lam = np.array([281.91, 1032.00])     # these are the means of the Lam's

# Simulate a Poisson distribution from the posterior parameters
component = np.random.choice(Lam.size, size=N, p=W)
x = np.random.poisson(Lam[component], size=N)

fig, ax = plt.subplots(figsize=(8, 6))

# Histogram of the original, observed data
ax.hist(observed_data, bins=20, normed=True, lw=0, alpha=.8, label='Observed data')

# Histogram of the simulated posterior parameters from above
ax.hist(x, bins=20, normed=True, lw=0, alpha=0.4, color='gray', label='Posterior traced distribution')

sns.kdeplot(x, label='Posterior traced KDE')
sns.kdeplot(observed_data, label='Observed data KDE')
plt.legend()
plt.show()

The tr.Chain is recently merge on master - try updating the PyMC3 installation.

Thanks. However, the graph still looks wrong…attached below. Also, sorry if I made it ambiguous but I am using the data in the text file below which is not exactly the data used in the tutorial above.

data here…the first column is the hour and the 2nd column is the counts of some event in that hourdata.txt (9.5 KB)

can you please send me the data in csv or excel format.

1 Like

Thank you for taking the time to look at the data. I hope this csv version helps. data.csv (9.5 KB)

I’m not what you’re trying to model. The two columns are two separate distributions:

image
image

One appears to be exponential while the other is uniform. You shouldn’t use the Poisson mixture model for this.

Essentially, the data is supposed to look like this: Figure_2

So in row 21 (hour 20), there is a count of 7. Essentially we add to the distribution [20, 20, 20, 20, 20, 20, 20]

Thanks, I git pulled. Although now I get the error: ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV w_stickbreaking__.ravel()[0] is zero.

I also tried this data out with the Chapter 1 code involving switchpoints in the Probabilistic Programming and Bayesian Methods for Hackers book. This gave me this figure which I think is decent. But I would still really like it if I could somehow use mixture models. In addition, there is the possibility that multiple taus exist, and knowing how many beforehand without looking ahead of time at the graph is not obvious to me. Figure_3

@cynthiaw2004 you shouldn’t be using mixture model for this. If you’re interesting in seeing where the count of x had a sudden change, as the hours pass by, switchpoint is a good option. Mixture models is not what you’re looking for.

1 Like

Do you have any suggestions on how to deal with the possibility of multiple taus? Let’s say I could not look at the graph beforehand. Looking at the graph above is sort of cheating because I already know there are two pieces separated by one tau. Can I account for any number of taus?

You are asking about estimating the number of clusters within a data set. That is complicated; there is no simple way to do that. But I would continue reading material on the usage of pymc3 and bayesian statistics before you worry too much about this advanced material. Try and get through more chapters of the book and things will clear up a bit.

As @adam said, you shouldnt model the data with mixture, as the data is time dependent, which means each data point is not iid.
In some cases if the cluster is well separated (i.e., large tau) or you have lots of data, you might still be able to do clustering using Mixture, but in this case you dont have enough data.
For multiple tau you can have a look at Switch point Metropolis tuning. But with number of tau unknown it will be difficult to deal with.