Checking my understanding of Dirichlet Process Mixtures

This is going to be more of a definition understanding question than a PyMC3 question but the question was brought up bc I read a PyMC3 blog by Austin Rochford:

First some basics on what I understand:

Dirichlet Distribution = multivariate generalization of beta distribution

Probability measure = function that assigns subsets to values in [0,1] (a probability)

Dirichlet Process (DP) = P, a probability measure, is a DP if for every finite disjoint partition S1,S2,…Sn of space omega, the following is true:

(P(S1), P(S2),…P(Sn)) has Dirichlet Distribution with parameters alpha * P0(S1), alpha * P0(S2)…alpha * P0(Sn)

where E(P(Si)) = P0

Ok, I get it so far. Let’s move onto Dirichlet Process MIXTURES.

Dirichlet Process Mixture = hierarchical model where

xi | theta_i has distribution f_theta

theta_1, theta_2…theta_n has distribution P

P is a Dirichlet Process with params alpha and P0

So let me check to see if I understand these 3 lines correctly

What first line is saying:

a point x comes from a distribution, let’s say f_theta = normal with parameters mean = 0 and std dev = 1

What the second and third line are saying is:

My space omega is partitioned into parameter subsets. So let’s say our components are normals. So theta_1 = mean 0 with std 1, theta_2 = mean 3 with std 5, …

So in essence omega = theta_1 union theta_2 union …

So P(theta_i) = probability of us getting the parameter set i

Since P is a DP,

(P(theta1), P(theta2)…) has distribution Dir(alpha * P0(theta1), alpha * P0(theta2)…)

Am I correct in my understanding of these lines?

Found the answer here:

In essence, P takes SUBSETS of thetas and not the thetas directly

1 Like

There’s an incredible set of lectures that Tamara Broderick gave on Bayesian non-parametric clustering problems. I found these really useful to understand the theory behind DPMMs and the Chinese restaurant process: