Hello,
I am trying to follow a tutorial for Bernoulli Mixture Models here but implementing it in PyMC. I am immediately hitting a shape mismatch problem. I’ve searched extensively here and indeed this appears to be a common problem with mixture models but most of the answers are either sufficiently old (before the introduction of the dims
API) or specific to NormalMixture
rather than the general Mixture
class that I am struggling to apply any lessons to my example.
Here is what I have:
import numpy as np
import pymc as pm
from scipy.stats import bernoulli as Bernoulli
# generate synthetic data
p0 = [0.1, 0.9, 0.1, 0.9, 0.1, 0.9, 0.1, 0.9, 0.1, 0.9]
p1 = [0.1, 0.1, 0.1, 0.1, 0.1, 0.9, 0.9, 0.9, 0.9, 0.9]
p2 = [0.9, 0.9, 0.9, 0.9, 0.9, 0.1, 0.1, 0.1, 0.1, 0.1]
p = np.array([p0, p1, p2])
z = np.random.choice(np.arange(3), p=[1/3, 1/3, 1/3], size=100)
x = Bernoulli.rvs(p[z])
# set up coordinates
N = z.shape[0] # 100
D = p.shape[1] # 10
K = 9 # Number of clusters
coords = {"cluster": np.arange(K), "question": np.arange(D)}
coords_mutable = {"candidate": np.arange(N)}
# model
with pm.Model(coords=coords, coords_mutable=coords_mutable) as bmm:
observations = pm.MutableData("observed_candidates", x, dims=("candidate", "question"))
R = pm.Dirichlet("R", a=K * [1e-5], dims="cluster")
Z = pm.Categorical("Z", p=R, dims=("candidate", "cluster"))
P = pm.Beta("P", alpha=0.5, beta=0.5, dims=("question", "cluster"))
bernoulli_components = pm.Bernoulli.dist(p=P, shape=(D, K))
X = pm.Mixture("X", w=Z, comp_dists=bernoulli_components, observed=observations, dims=("candidate", "question"))
with bmm:
trace = pm.sample()
When I sample (either the posterior or the prior predictive) I get the following error
ValueError: Input dimension mismatch. One other input has shape[1] = 10, but input[6].shape[1] = 100.
Apply node that caused the error: Elemwise{Composite}(Elemwise{Composite}.0, InplaceDimShuffle{x,0,1}.0, InplaceDimShuffle{x,0,1}.0, Elemwise{Composite}.1, TensorConstant{(1, 1, 1) of -inf}, InplaceDimShuffle{x,x,x}.0, Elemwise{log,no_inplace}.0)
Toposort index: 36
Inputs types: [TensorType(int64, (?, ?, 1)), TensorType(float64, (1, 10, 9)), TensorType(float64, (1, 10, 9)), TensorType(bool, (?, ?, 1)), TensorType(float32, (1, 1, 1)), TensorType(bool, (1, 1, 1)), TensorType(float64, (1, ?, 9))]
Inputs shapes: [(100, 10, 1), (1, 10, 9), (1, 10, 9), (100, 10, 1), (1, 1, 1), (1, 1, 1), (1, 100, 9)]
Inputs strides: [(80, 8, 8), (720, 72, 8), (720, 72, 8), (10, 1, 1), (4, 4, 4), (1, 1, 1), (7200, 72, 8)]
Inputs values: ['not shown', 'not shown', 'not shown', 'not shown', array([[[-inf]]], dtype=float32), array([[[ True]]]), 'not shown']
Outputs clients: [[Max{maximum}{axis=[2]}(Elemwise{Composite}.0), Elemwise{Composite}[(0, 0)](Elemwise{Composite}.0, InplaceDimShuffle{0,1,x}.0, Elemwise{isinf,no_inplace}.0, Elemwise{exp,no_inplace}.0)]]
HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.
Can anyone point me in the right direction?
I can tell that there is some issue broacasting Z and P in teh mixture but then I am entirely lost. I assume the main culprit is Z, which is 2-dimensional (most examples I’ve come across have a 1d array of weights).
Also it’s a bit awkward that I can use named dimensions everywhere but then in the components I have to use unnamed shape/size params - I am wondering if there is also a mismatch there maybe? I’ve tried doing it all using shape
(abandoning dims
entirely) but that didn’t help.