Mixture Model Dirichlet

adam · May 25, 2018, 6:19pm

I have a dataset that describes the wealth index of Rwandan housholds: wealth.csv (29.0 KB)
Its shape and distribution is:

I am using Pymc3 Mixture Modeling to divide wealth index into 2 separate groups (presumably rich and poor).

with pm.Model() as model:
    hyper_mean = pm.Uniform('hyper_mean', -100, 10)
    hyper_mean1 = pm.Uniform('hyper_mean1', 100, 300)

    hyper_sigma = pm.Uniform('hyper_sigma', 0, 100)
    hyper_sigma1 = pm.Uniform('hyper_sigma1', 0, 150)

    component = pm.Normal.dist(mu=hyper_mean, sd=hyper_sigma)
    component1 = pm.Normal.dist(mu=hyper_mean1, sd=hyper_sigma1)

    w = pm.Dirichlet('w', a=np.array([1, 1]))
    like = pm.Mixture('like', w=w, comp_dists=[component, component1], observed=data)

with model:
    trace = pm.sample(5000, tune=2500, njobs=1)[1000:]

As I am uncertain about the prior parameters of Rwandan wealth, I (think I) am using non restrictive priors; thus allowing the data to have a strong influence on the posterior.

Resulting in:

When I visualize posterior values over observed data, the observed data is flatter around the 100-200 wealth index:

I’ve tried using the Student_T distribution as the components instead of the Normal one, but it gave the model longer tails and a bigger hump around the 100-200 mark.

With the priors being so non restrictive, I would think the data would take over the posterior.

Do you suggest using a different distribution to represent the components, or reparametrize, or do anything else to have the posterior align better?

junpenglao · May 25, 2018, 7:08pm

I would suggest changing the prior of hypermean1 - you can see from the shape of the posterior that the value is likely below 100.

rlouf · May 26, 2018, 8:00am

Hey Adam,

How is the index computed? This may give us more information about the distribution that should be used!

Also, from personal experience, uniform priors on parameters of a Gaussian are often not adequate. I would try a Normal for the means ( and and Inverse Gamma for the variance (the conjugate priors).

adam · May 27, 2018, 1:38pm

Thank you, @rlouf thank you @junpenglao. I changed the prior for hyper_mean1 and used the InverseGamma for the standard deviation. This gave me better results.

But it got me thinking about a difficulty I’ve been having for a while.

Let’s assume I think the standard deviation for the Gaussian is between 80-120. How do I make sure that the InverseGamma hyperparameteres are appropriate and would contain the ‘true’ Gaussian std values?

For example, for InverseGamma alpha=3 and beta=360.

plt.hist(stats.invgamma.rvs(a=3, scale=360, size=data.size, random_state=13), bins=300,normed=True)
sns.kdeplot(stats.invgamma.rvs(a=3, scale=360, size=data.size, random_state=13))
plt.xlim(-50,300)

What should I be looking at (mode, mean, the expected value, the visualization) when considering the hyperparameters?

Same goes for the other commonly used distributions for setting hyperparameters:
Exponential
Gamma
Beta
Normal
Cauchy

The hyperparameter distributions build the prior landscape that will allow the data to reflect where the true parameters likely live. So for each of these distributions, what factor (mode, mean, expected value, or visualization) should I be looking at when building the best prior landscape?

junpenglao · May 27, 2018, 2:06pm

That’s a great question - I dont have a definite answer for that, but base on the reading of recent papers like The prior can generally only be understood in the context of the likelihood or
Visualization in Bayesian workflow, I think the general principle is generate from the prior model a couple of times to make sure the range of the generated data is within a reasonable range. The first difficulty is thus how to define reasonable range (I guess that’s case by case).

If I were to pick one value for to represent it, I would pick the mean or expected value, however, that’s likely not enough as you want to place majority of the volume from the prior within the possible range, and sometimes the expectation is not necessary representative for that.

fonnesbeck · May 31, 2018, 11:23pm

You say the prior is not restrictive, but the model is structurally restrictive because it only has two components. I would try a Dirichlet Prior Process, which does not fix the number of groups. It could be that there are 3+ groups here, which is not implausible given the nature of the data.

adam · June 1, 2018, 2:55am

Thank you, Chris. Can you please recommend 1 or 2 good sources were I can learn how to implement the Dirichlet Prior Process.

fonnesbeck · June 1, 2018, 12:37pm

Yes, our very own dev Austin R. has an example in the notebooks directory of the repository.

github.com

pymc-devs/pymc3/blob/master/docs/source/notebooks/dp_mix.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Dirichlet process mixtures for density estimation\n",
    "Author: [Austin Rochford](https://github.com/AustinRochford/)\n",
    "\n",
    "## Dirichlet processes\n",
    "\n",
    "The [Dirichlet process](https://en.wikipedia.org/wiki/Dirichlet_process) is a flexible probability distribution over the space of distributions.  Most generally, a probability distribution, $P$, on a set $\\Omega$ is a [measure](https://en.wikipedia.org/wiki/Measure_(mathematics%29) that assigns measure one to the entire space ($P(\\Omega) = 1$).  A Dirichlet process $P \\sim \\textrm{DP}(\\alpha, P_0)$ is a measure that has the property that, for every finite [disjoint](https://en.wikipedia.org/wiki/Disjoint_sets) partition $S_1, \\ldots, S_n$ of $\\Omega$,\n",
    "\n",
    "$$(P(S_1), \\ldots, P(S_n)) \\sim \\textrm{Dir}(\\alpha P_0(S_1), \\ldots, \\alpha P_0(S_n)).$$\n",
    "\n",
    "Here $P_0$ is the base probability measure on the space $\\Omega$.  The precision parameter $\\alpha > 0$ controls how close samples from the Dirichlet process are to the base measure, $P_0$.  As $\\alpha \\to \\infty$, samples from the Dirichlet process approach the base measure $P_0$.\n",
    "\n",
    "Dirichlet processes have several properties that make then quite suitable to [MCMC](https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo) simulation.\n",
    "\n",
    "1.  The posterior given [i.i.d.](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables) observations $\\omega_1, \\ldots, \\omega_n$ from a Dirichlet process $P \\sim \\textrm{DP}(\\alpha, P_0)$ is also a Dirichlet process with\n",

This file has been truncated. show original

Topic		Replies	Views
Dirichlet Mixture model Questions	3	1055	June 4, 2018
Difficulties fitting a mixture of dirichlet distributions Questions	4	740	September 24, 2020
Correctly specifying a model with a multimodal distribution version agnostic modeling	10	1107	May 14, 2022
4 lines basic mixture model outputs wrong results Questions	0	389	March 12, 2020
Problem with fitting Multivariate Mixture of Gaussians Questions	6	3552	June 4, 2018

Mixture Model Dirichlet

Related topics