Model doesn’t contain any free variables error in 2D - GMM

I am adapting and old two dimensional GMM to the new PyMC5 syntax, and I found an strange sampling error similar to this. Without really understanding what the error is about, I try the solution proposed there (to eliminate the shape argument that was redundant) but the error remains. Here goes the code snippet,

data generation,

import pymc as pm
import numpy as np
import matplotlib.pyplot as plt
import arviz as az
from pytensor.tensor.nlinalg import matrix_inverse

## Generate some data for three groups.
# Means, variances, covariances, and proportions

mus = np.array([[-4,2],[0,1],[6,-2]])
variance1 = [1,.4,1.5]
variance2 = [1,.8,5 ]
covariances = [.5,0, -1]
ps = np.array([0.2, 0.5, 0.3])
D = mus[0].shape[0]

# Total amount of data
N = 1000

# Number of groups
K = 3

# Form covariance matrix for each group
sigmas = [np.array([[variance1[i],covariances[i]],[covariances[i],variance2[i]]]) for i in range(K)]
# Form group assignments
zs = np.array([np.random.multinomial(1, ps) for _ in range(N)]).T
xs = [z[:, np.newaxis] * np.random.multivariate_normal(m, s, size=N) for z, m, s in zip(zs, mus, sigmas)]

# Stack data into single array
data = np.sum(np.dstack(xs), axis=2)

model construction,

testvals = [[-2,-2],[0,0],[2,2]]

# Model structure
with pm.Model() as mvgmm:
    # Prior over component weights
    p = pm.Dirichlet('p', a=np.array([1.]*K))

    # Prior over component means
    mus = [pm.MvNormal('mu_%d' % i, mu=pm.floatX(np.zeros(D)), tau=pm.floatX(0.1 * np.eye(D)), shape=(D,), initval=pm.floatX(testvals[i])) for i in range(K)]

    # Cholesky decomposed LKJ prior over component covariance matrices

    # chol, corr, stds = [pm.LKJCholeskyCov('packed_L_%d' % i, n=D, eta=2., sd_dist=pm.HalfCauchy.dist(1)) for i in range(K)]
    L = [pm.LKJCholeskyCov('packed_L_%d' % i, n=D, eta=2., sd_dist=pm.HalfCauchy.dist(1))[0] for i in range(K)]

    # Unpack packed_L into full array
    # L = [pm.expand_packed_triangular(D, packed_L[i]) for i in range(K)]

    # Convert L to sigma and tau for convenience
    sigma = [pm.Deterministic('sigma_%d' % i ,L[i].dot(L[i].T)) for i in range(K)]
    tau = [pm.Deterministic('tau_%d' % i,matrix_inverse(sigma[i])) for i in range(K)]

    # Specify the likelihood
    mvnl = [pm.MvNormal.dist(mu=mus[i],chol=L[i]) for i in range(K)]
    Y_obs = pm.Mixture('Y_obs',w=p, comp_dists=mvnl, observed=data)

And this is the sampling,

with pm.Model() as mvgmm:
    trace = pm.sample(chains=8)

And the error is

SamplingError: Cannot sample from the model, since the model does not contain any free variables.

You are overriding the original model here. You should change to with mvgmm:

Ahhh what an idiot. I was looking in the most implausible places.
Thank you.

By the way, which is the recommended way to write these models? because it is easy to confuse dimensions and components. I was reading how to better label coordinates and dimensions here, but is really different to a list of random variables?

1 Like

The main difference is that using dimensions/coordinates allows you to slice and dice the resulting inferenceData object using the features available in xarray. If you instead instantiate N unrelated parameters, it is much more difficult to select, filter, and otherwise work with the results. But if it’s easier for you to work with what you have, that’s ok!

1 Like

Thank you!