Struggling with Beta mixture models

gmcfad · January 31, 2019, 5:56pm

Hi everyone,

I’m relatively new to PYMC3 - I can follow along with tutorials and examples but haven’t had much luck with my own models - so apologies in advance for any silly questions.

I’m working with some data which takes values between 0 and 1, and which is multi-modal. I decided to use a mixture of Beta distributions. I’d also like to start from a Uniform distribution (I think that this requires the expected value of my priors to be 1 so that I get a Beta(1,1) distribution if I don’t pass any observations).

I’ve generated some data which is reasonably similar to the data I’m working with, as follows:

sample_1 = np.random.beta(1, 200, size=350)
sample_2 = np.random.beta(750, 1500, size=150)
sample_3 = np.random.beta(900, 600, size=100)
sample_4 = np.random.beta(1, 1, size=100)
sample = np.concatenate([sample_1, sample_2, sample_3, sample_4])

I’ve worked around some of the difficulties I had with mixture models (e.g. I can’t iterate over the Beta RV) and have come up with the following model. I was a bit stuck on what to pick as priors - I suspect this could be where my problem is arising…

K=4
    
with pm.Model() as beta_model:
             
    w = pm.Dirichlet('w', a=np.array([1]*K), shape=K)
      
    alpha = pm.Gamma('alpha', alpha=1, beta=1, shape=K, testval=1) 
    beta = pm.Gamma('beta', alpha=1, beta=1, shape=K, testval=1)
    comp_dists=[pm.Gamma.dist(alpha=alpha[i], beta=beta[i]) for i in range(K)]
    obs = pm.Mixture('obs', w=w, comp_dists=comp_dists, observed=sample)

But, it doesn’t seem to do a very good job with the data.

The acceptance probability does not match the target. It is 0.8983302428206118, but should be close to 0.8. Try to increase the number of tuning steps.
The acceptance probability does not match the target. It is 0.8817013927564947, but should be close to 0.8. Try to increase the number of tuning steps.
The gelman-rubin statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.

And here is the summary of the trace. Neither N_eff nor Rhat looks very happy.

           mean     sd  mc_error  hpd_2.5  hpd_97.5   n_eff   Rhat

w__0       0.05   0.05      0.00     0.00      0.12    1.03   6.12
w__1       0.28   0.28      0.03     0.00      0.58    1.00  28.05
w__2       0.45   0.11      0.01     0.30      0.59    1.02   8.16
w__3       0.22   0.12      0.01     0.08      0.36    1.01   8.92
alpha__0   3.66   3.08      0.29     0.01      8.45    1.12   3.36
alpha__1   0.86   0.55      0.02     0.00      1.65  218.44   1.03
alpha__2  11.05  10.20      1.01     0.81     23.98    1.01   9.37
alpha__3  13.84   7.70      0.75     4.15     25.03    1.04   5.37
beta__0    5.65   4.55      0.43     0.00     12.34    1.10   3.60
beta__1   14.91  13.60      1.35     0.02     31.27    1.01  10.28
beta__2   45.57  17.73      1.71    23.80     71.77    1.05   4.98
beta__3   36.42  27.16      2.67     5.90     73.01    1.02   7.13

I expect something is off with my parameterisation, but I’m a bit stuck on how to proceed. In addition, this is actually better than using my real dataset - with the real data, NUTS samples very slowly (2s per iteration) and I get a bad initial energy error if I use jitter.

Any advice you have would be very welcome!

Thank you.

junpenglao · January 31, 2019, 9:25pm

Mixture model is difficult to sample, have a look at Properly sampling mixture models. You do need care when you parameterize your mixture model.

gmcfad · February 2, 2019, 11:28am

Thank you @junpenglao ! I will work my way through that thread and see if it helps.
I notice you said that in 1D it is usually easy to keep the parameters ordered between chains - could you point me in the right direction of how to achieve that?

junpenglao · February 2, 2019, 12:11pm

See https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/Ports/Bayesian_Mixture_PyMC3.ipynb

Topic		Replies	Views
Difficulties fitting a mixture of dirichlet distributions Questions	4	724	September 24, 2020
Issue with .sample() method used on a dirichlet mixture model with normal prior version agnostic	0	298	April 13, 2023
Improving sampling / parameterization of complex mixture model	0	280	May 25, 2023
How to infer parameters for a continuous mixture model: Poisson-Beta model v5 modeling	2	372	November 24, 2022
Error when trying mixture distribution for likelihood (updated) v3	1	351	January 26, 2023

Struggling with Beta mixture models

Related topics