Hi folks, I’m studding Dirichlet mixtures process to evaluate the number of cluster of a bidimensional dataset.
In the tutorial example you asses the the number of clusters in a one dimensional data, whats wrong with this use to bidimensional data?
with pm.Model() as model:
alpha = pm.Gamma('alpha', 1., 1.)
beta = pm.Beta('beta', 1., alpha, shape=(K))
w = pm.Deterministic('w', stick_breaking(beta))
tau = pm.Gamma('tau', 1., 1., shape=(K,2))
lambda_ = pm.Uniform('lambda', 0, 5, shape=(K,2))
mu = pm.Normal('mu', 0, tau=lambda_ * tau, shape=(K,2))
obs = pm.NormalMixture('obs', w, mu, tau=lambda_ * tau, comp_shape=(K,2),
observed=DATA[['x', 'y']].values)
Explanation:
1, component of mixture should be the last dimension, which gets collapse.
2, the observation need to be evaluated on each component
–> solution: a place holder at the last dimension for observed, thus allow it to be evaluated on each component.
But as you can see, there are a bunch of hyperpriors (alpha, beta, tau, lambda_, mu). I’m running exactly as you wrote, because I’m trying to understand how it works before any further setting.
Just a note to say, my problem wasn’t multidimensional observations, only that the observations occurred at a higher level so I couldn’t modify the observed data dimensions. But, adding a shape argument with an empty extra dimension seemed to do the trick.