Checking my understanding of Dirichlet Process Mixtures

cynthiaw2004 · May 28, 2018, 12:18am

This is going to be more of a definition understanding question than a PyMC3 question but the question was brought up bc I read a PyMC3 blog by Austin Rochford: https://docs.pymc.io/notebooks/dp_mix.html

First some basics on what I understand:

Dirichlet Distribution = multivariate generalization of beta distribution

Probability measure = function that assigns subsets to values in [0,1] (a probability)

Dirichlet Process (DP) = P, a probability measure, is a DP if for every finite disjoint partition S1,S2,…Sn of space omega, the following is true:

(P(S1), P(S2),…P(Sn)) has Dirichlet Distribution with parameters alpha * P0(S1), alpha * P0(S2)…alpha * P0(Sn)

where E(P(Si)) = P0

Ok, I get it so far. Let’s move onto Dirichlet Process MIXTURES.

Dirichlet Process Mixture = hierarchical model where

xi | theta_i has distribution f_theta

theta_1, theta_2…theta_n has distribution P

P is a Dirichlet Process with params alpha and P0

So let me check to see if I understand these 3 lines correctly

What first line is saying:

a point x comes from a distribution, let’s say f_theta = normal with parameters mean = 0 and std dev = 1

What the second and third line are saying is:

My space omega is partitioned into parameter subsets. So let’s say our components are normals. So theta_1 = mean 0 with std 1, theta_2 = mean 3 with std 5, …

So in essence omega = theta_1 union theta_2 union …

So P(theta_i) = probability of us getting the parameter set i

Since P is a DP,

(P(theta1), P(theta2)…) has distribution Dir(alpha * P0(theta1), alpha * P0(theta2)…)

Am I correct in my understanding of these lines?

cynthiaw2004 · May 28, 2018, 8:01pm

Found the answer here: https://stats.stackexchange.com/questions/348522/understanding-dirichlet-process-mixtures

In essence, P takes SUBSETS of thetas and not the thetas directly

narendramukherjee · May 29, 2018, 10:21am

There’s an incredible set of lectures that Tamara Broderick gave on Bayesian non-parametric clustering problems. I found these really useful to understand the theory behind DPMMs and the Chinese restaurant process:
http://www.tamarabroderick.com/tutorial_2016_mlss_cadiz.html

Topic		Replies	Views
GSoC 2022: Continuation of Dirichlet Process + Mixture Support Development	0	441	March 26, 2022
Help Debugging Dirichlet Process Mixture of Multivariate Normal Questions	0	527	February 17, 2021
Is my model setup in proper way? Dependent Dirichlet process (DDP) v5 development , modeling , sampling	1	41	April 21, 2025
Dirichlet Processes for GSoC 2019 Development	2	872	April 3, 2019
How to get original observed data's cluser id or probability after pm.sample by the returned trace object?	0	136	December 13, 2023

Checking my understanding of Dirichlet Process Mixtures

Related topics