My problem is in this line: **how should I define a batch of K Wishart distribution with d\times d scale matrix?**

```
taus = pm.Wishart("taus", nu=d, V=S_inv, shape=(K,))
```

**Detailed description:**

I’m currently constructing a Dirichlet Process Mixture Model with **multivariate normal** likelihood and truncation-based definition. To be specific, this is a mixture of K components, each related with a mean parameter *mu* and inverse covariance matrix *tau*, and a normal distribution N(mu, tau^{-1}). The data include N observations, with d features.

In my construction of the model, I want each mixture component to have its own distinct mu’s and tau’s. Therefore, I have to specify **explicity** the batchsize, i.e. the number of components, in the definition of the variable, like this:

```
mus = pm.MvNormal("mu", tbar, cov=np.eye(d)/0.1, shape=(K,d))
```

However, this becomes a little confusing when it comes to the definition of covariance/precision matrix. I’ve tried several ways to specify the shape parameter, but they all lead me to some error. (I’ve tried `shape=(K,)`

, which i think will be simply ignored by the program; and also `shape=(K,d)`

, `shape=(K,d,d)`

, which all raised some error.) I also tried to change the shape of the parameter `nu`

to include the batch information **implicitly**, which also raised an error. **How should I specify this batchsize with matrix-valued random variables like this?**

As i tried visualizing the model, which is defined by the chunk below, I found it has observations of dimensionality N\times N, which is not expected. (I’m expecting N\times d). Here is my code:

```
with pm.Model(coords={"component": np.arange(K), "obs_id": np.arange(N), "feature": np.arange(d)}) as model:
alpha = pm.Gamma("alpha", 1.0, 1.0)
w = pm.StickBreakingWeights("w", alpha, K-1)
mus = pm.MvNormal("mu", tbar, cov=np.eye(d)/0.1, shape=(K,d))
taus = pm.Wishart("taus", nu=d, V=S_inv, shape=(K,))
category = pm.Categorical("category", p=w, shape=N)
obs = pm.MvNormal("obs", mu=mus[category], tau=taus[category], observed=df.values)
```

I’ve been testing the model with a toy example, with 2 real components of 3-dimensional Gaussian distribution

```
N = 100
K = 30
d = 3
mu1 = -np.ones(d)
mu2 = np.ones(d)
s1 = np.diag([1, 1, 0.01])
s2 = np.diag([1, 0.01, 1])
t1 = np.random.multivariate_normal(mean=mu1, cov=s1, size=(50,))
t2 = np.random.multivariate_normal(mean=mu2, cov=s2, size=(50,))
t = np.concatenate((t1, t2))
df = pd.DataFrame({
"v1":t[:,0],
"v2":t[:,1],
"v3":t[:,2]
})
# preprocessing
tbar = np.mean(t, axis=0)
D_center = t-tbar
S = np.matmul(D_center.transpose(), D_center) / N
S_inv = np.linalg.inv(S)
```

Besides, I’ve been wondering whether it’s possible to *collapse* on the `category`

variable, which is discrete and making it hard for the program to perform variational inference. Will using `Mixture`

in the model help? (However this is not my major concern right now lol, i’m really struggling with dimensionality!)

**Thanks for any tips or advice!**

P.s. I’ve also posted this question on stackoverflow, i’ll keep these posts updated. Thanks again!