Oh, I see. You want the slope and intercept to be like a mixed model, for data that looks something like this:

You’re nearly there; but I wouldn’t treat the intercepts and slopes as separate mixture models, but rather I’d treat the “mean” of a cluster as the predicted response for that cluster. That is, each cluster is a “line”, and the standard-deviation is the residual deviation.
I put the intercept term and x into a design matrix X and it looks something like:
with pm.Model() as mod:
# priors
beta_prior = pm.Normal('bprior', 0., 5., shape=(2,))
beta_se = pm.HalfNormal('bse', 5., shape=(2,))
# parameters
weights = pm.Dirichlet('weights', a=np.array([2.]*k))
# non-centered parameterization
beta_raw = pm.Normal('beta_raw', 0., 1., shape=(k, 2))
beta = pm.Deterministic('beta', beta_raw*beta_se + beta_prior)
cluster_means = pm.Deterministic('cl_means', tt.dot(X, tt.transpose(beta))) # (n x 2) x (2 x k)
cluster_sds = pm.HalfNormal('cl_sds', 1., shape=(k,))
lik = pm.NormalMixture('y', w=weights, mu=cluster_means, sd=cluster_sds, observed=y)
res = pm.sample(350, init='jitter+adapt_diag', tune=500, chains=3, cores=1, nuts_kwargs={'max_treedepth': 25})
The true betas are
intercepts = np.array([-3.14, 0.151, 2.681])
slopes = np.array([3.1, -1.2, 0.50])
so basically spot on. Posterior:

