Hello,
I am continuing my series on better understanding how the mixed distribution works in pymc. Based on the discussion (thanks to @ricardoV94, @jessegrabowski) I already can understand the difference between how pymc samples when one uses a MvNormal vs a set of UvNormals of the same shape. I guess in summary, in MvNormal sampling is done by just mixing in the “components” vs in UvNormal mixing is done in both components and other dimensions. So for instance if you equally mix two normals with centers [0.5,0.5] and [2,2] in the Uv case you can get points near [0.5,2], [2,0.5] as well as [0.5,0.5] and [2,2] where as in Mv you just get near your cluster points (see code below).
The code for the models (as a bonus, it also includes the other suggested solution by @ricardoV94 to verify that it samples in the same way Mv):
import pymc as pm
import numpy as np
import pytensor.tensor as pt
import pymc_experimental as pme
mu = [0.5, 0.5, 2, 2]
sd = [0.1, 0.1, 0.1, 0.1]
w = [0.5, 0.5]
n_clusters = 2
lower = 0
upper = 3
coords = {
"cluster": range(n_clusters),
"ndim": range(2),
"obs": range(1),
}
with pm.Model() as model_mv:
components = [pm.MvNormal.dist(mu[:2], sd[:2]*np.eye(2), shape=(2)),
pm.MvNormal.dist(mu[2:4], sd[2:4]*np.eye(2), shape=(2))]
samples_mv = pm.Mixture("mix", w, components)
with pm.Model() as model_uv:
components = [pm.Normal.dist(mu[:2], sd[:2]),
pm.Normal.dist(mu[2:4], sd[2:4])]
samples_uv = pm.Mixture("mix", w, components)
with pme.MarginalModel(coords=coords) as m3:
idx = pm.Categorical("idx", p=np.ones(n_clusters) / n_clusters, dims=("obs",))
mu_x = np.array([mu[0],mu[2]])
mu_y = np.array([mu[1],mu[2]])
mu = pm.math.concatenate([mu_x[..., None], mu_y[..., None]], axis=-1)
sigma = 0.01
def dist(idx, mu, sigma, _):
# Define a mixture where the output x, y depends on the value of idx
rv = pm.Normal.dist(mu[0][:, None], sigma, shape=[2,1])
for i in range(n_clusters - 1):
rv = pm.math.where(
pm.math.le(idx, i),
rv,
pm.Normal.dist(mu[i + 1][:, None], sigma, shape=[2,1])
)
return rv
samples_uv_fixed = pm.CustomDist(
"mix",
idx, mu, sigma,
dist=dist,
dims=("ndim", "obs"),
)
draws_uv = pm.draw(samples_uv, 1000)
draws_mv = pm.draw(samples_mv, 1000)
draws_uv_fixed = pm.draw(samples_uv_fixed, 1000)[:,:,0]
value_uv = pt.vector("value_uv")
rv_logp = pm.logp(samples_uv, value_uv)
print("UV logP(0.5, 0.5):" + str(rv_logp.eval({value_uv: [0.5, 0.5]})))
print("UV logP(2, 2):" + str(rv_logp.eval({value_uv: [2, 2]})))
print("UV logP(0.5, 2):" + str(rv_logp.eval({value_uv: [0.5, 2]})))
print("UV logP(2, 0.5):" + str(rv_logp.eval({value_uv: [2, 0.5]})), end="\n\n")
value_mv = pt.vector("value_mv")
rv_logp = pm.logp(samples_mv, value_mv)
print("MV logP(0.5, 0.5):" + str(rv_logp.eval({value_mv: [0.5, 0.5]})))
print("MV logP(2, 2):" + str(rv_logp.eval({value_mv: [2, 2]})))
print("MV logP(0.5, 2):" + str(rv_logp.eval({value_mv: [0.5, 2]})))
print("MV logP(2, 0.5):" + str(rv_logp.eval({value_mv: [2, 0.5]})))
This prints
UV logP(0.5, 0.5):[0.69049938 0.69049938]
UV logP(2, 2):[0.69049938 0.69049938]
UV logP(0.5, 2):[0.69049938 0.69049938]
UV logP(2, 0.5):[0.69049938 0.69049938]
MV logP(0.5, 0.5):-0.2284391538060554
MV logP(2, 2):-0.2284391538060554
MV logP(0.5, 2):-10.7852919734153
MV logP(2, 0.5):-10.7852919734153
One sees that likelihood of [0.5,2] is much lower than [2,2] in MV but same in UV. But then to me this suggests that a mixture of MV Normals is acting somehow different than how mixture models are classically defined. For a classical mixture of MV Normals, the likelihood is ( MV() stands for pdf of multivariate normal)
This would evaluate to the same value for [0.5, 2] and [0.5, 0.5]. So does that somehow imply that the logp of mixed MvNormals is different from the classical mixture model in pymc’s case? Does it mean that when pymc is evaluating the MV normal mixture model’s logp, it is really choosing a component randomly (based on weights) and just evaluating logp with respect to the chosen cluster? I guess then it would mean this is not a classical mixture model in some sense right?
Also this is perhaps a question about me not understanding dimensions correctly but why is the logp in the UV is coming out as two positive numbers (I would have expect a single negative number)?
Thanks