Mixture model with truncated distributions

Hi everyone,

just wanted to get some feedback regarding the issue with my mixture model setup. I’m trying to fit data with a mixture model, consisting of two truncated components. Even in the case where I mock up the data myself (based on a combination of random samples from the two mixture components) and hence know the input parameters, the models do not converge. Starting with priors close the actual values doesn’t help much, either.
Even if i skip the truncation part and only want to fit the non-truncated (i.e. regular) distributions to the data, it doesn’t really work, so I’m not sure what the issue here is.
Any help is appreciated!

with pm.Model() as model:

#Specify prior distributions
#For LogNormal
mu1 = pm.TruncatedNormal(‘mu1’,mu=1,sigma=1,lower=0.01,upper=10)
tau1 = pm.TruncatedNormal(‘tau1’,mu=1,sigma=1,lower=0.01,upper=5)
#For Pareto
alpha3 = pm.TruncatedNormal(‘alpha3’,mu=1,sigma=0.1,lower=0.01)
mu3 = pm.TruncatedNormal(‘mu3’,mu=5e4,sigma=1e4,lower=0.01)

#Define Mixture components
lognormal_dist = pm.LogNormal.dist(mu=mu1,tau=tau1)
pareto_dist = pm.Pareto.dist(m=mu3,alpha=alpha3)

components = [
#Weights of Mixture components
w = pm.Dirichlet(‘w’,a=np.array([1,1]))
like = pm.Mixture(‘likelihood’,w=w,comp_dists=components,observed=df)
trace = pm.sample()

Can you provide more details on what is not working?

Hi Ricardo,

Sure. So input parameters are:
mu=2.3 and tau=0.065 for lognormal, and m=1e4 and alpha=0.65 for the Pareto distribution. Weights of lognormal and pareto are 0.2 and 0.8 respectively.
I created the mock data from drawing random samples via the respective scipy implementations of the distributions. There is a slight difference in notation when compared to the pymc implementation etc. but I think i’ve figured that part out. Especially since I’ve also run fits to the individual distributions (that is, no mixture involved), and they’ve converged beautifully.
When fitting the data with the mixture however, even if I choose priors pretty close to the actual input parameters, the chains are not converging and even moving away from the “true” parameters.
I’ve run the chain for 2h and more. Below are some screenshots of the parameter estimates so far etc.

The sampler is definitely struggling with this model. Looking at pair plots may help to see if there are parameters that are very highly correlated. Do you have divergences? If so, you may also want to see in what part of the parameter space those happen