I am developing a bivariate gaussian mixture model, with invariant component distributions and time-variant mixing coefficients to model a spatiotemporal distribution. The model is defined as follows:

P(\mathcal{D}|\theta; t) = \displaystyle \sum_{i=1}^K a_i(t)\phi(\mu_i, \Sigma_i)

with the priors:

a(t) = (a_1(t), \dots, a_K(t)) \sim \operatorname{Dir}(1)

\mu_i \sim \mathcal{N}(0, 10)

\Sigma_i | \sigma_i\sim \operatorname{LKJ}(2, \sigma_i)

\sigma_i \sim \operatorname{Half-Cauchy}(1)

I have modeled the data for two time point, t=1 and t=2, having 6K and 13K data points respectively. The code for the model is given below:

```
with pm.Model() as model:
n_comps = 5
n_hours = 2
packed_L = []
L = []
sigma = []
mu = []
obs = []
w = []
for i in range(n_comps):
packed_L.append(pm.LKJCholeskyCov('packed_L' + str(i),
n=2,
eta=2.,
sd_dist=pm.HalfCauchy.dist(1)))
L.append(pm.expand_packed_triangular(2, packed_L[-1]))
sigma.append(pm.Deterministic('sigma' + str(i),
L[-1].dot(L[-1].T)))
mu.append(pm.Normal('mu' + str(i), 0., 2., shape=2))
obs.append(pm.MvNormal.dist(mu=mu[-1], chol=L[-1]))
mixt = []
for i in range(n_hours):
w.append(pm.Dirichlet('w'+str(i), a=np.ones(n_comps)))
x, y = df[df.pickup_hour == i][['pickup_latitude', 'pickup_longitude']].values.T
x = (x - mean_x) / std_x
y = (y - mean_y) / std_y
mixt.append(pm.Mixture('mixt' + str(i), w=w[i], comp_dists=obs, observed=list(zip(*np.stack([x,y])))))
```

and the following sampler:

```
with model:
trace = pm.sample(njobs=1)
```

The most I can get from this model is about 1 iteration per second, with GPU acceleration (tested on AWS p2.xlarge). Could anybody point out what I have done wrong or any suggestions to model it better?