In my dataset, the observations are grouped under a few categories (
sample_type in the code below), and I want to find clusters within each category. I also believe that some clusters may appear in more than one category. I assumed a Dirichlet distribution for within-cluster variation. With that in mind, I started building my model:
n, k, c, t = 30, 6, 2, 5 sample_type = np.random.choice(range(t), n) with pm.Model() as model: cluster_profiles = pm.Exponential("Cluster profile ratio", 1, shape=(k,c)) cluster_weights = pm.Dirichlet("Cluster weights", np.ones((t,c))/2, shape=(t,c)) components = pm.Dirichlet.dist(a=cluster_profiles, shape=(k, c))
k is the dimensionality of my observations,
n is how many observations I have,
c is how many clusters I am looking for and
t is the number of possible categories. So far so good. Then, I added the mixture part:
with model: pm.MixtureSameFamily("Tumor-based prior", w=cluster_weights[ttype], comp_dists=components, shape=(n,k))
But it throws a
ValueError: Input dimension mis-match. (input.shape = 30, input.shape = 6). I believe it is because
cluster_weights is supposed to be have shape
(n, c). If that is the case, how can I implement a mixture with observation-specific weights? Am I using the Mixture module correctly?
I am using PyMC3 v3.11.2 on a Google Colab Linux instance. I have already tried replacing
pm.Mixture, but it did not help.