I think you’ll find that’s what the model does. Each sample is treated as having an intercept or slope from one of k clusters (with unknown mean and (co)variance).
Your model draws each slope/intercept independently for each sample from the mixture model. The way you would include this in a hierarchical model (with replicates) is by allowing offsets for each of the individuals from each of the clusters. This can quickly lead to over-parameterization, since there are an additional n \times k parameters.
If you have the N \times n indicator matrix mapping observations to sample identifiers sample_ids you can do this:
sample_beta_offsets = pm.Normal('per_sample_offsets', 0., 0.1, shape=(n, k, 2))
beta_sample = pm.Deterministic('per_sample_beta', tt.transpose(tt.dot(tt.transpose(sample_beta_offsets), sample_ids.T)) + beta)
# Each sample has its own (slope, int) for each cluster
# (N, k, 2)
# observe (N, 2) values X
# Multiplying the values for a particular sample gives the cluster mean
# (i, k, 2) * X[i,:] = mean_i
cluster_means = pm.Deterministic('cl_means', tt.stack([tt.dot(beta_sample[i,:,:], X[i,:]) for i in range(sample_ids.shape[0])]))
but it’s really painful…