Sampling 12 params from 10^5 binomials

I am trying to learn 12 parameters over a simplex. The data to fit is a matrix P that is (K, 12) in shape and the observation vector (K, 1)-shaped. I am using this model:

with pm.Model() as model:
  f = pm.Dirichlet("f", np.ones(12))
  n_ = pm.Binomial("n", n=2, p=pm.math.dot(P, f), observed=n)
  idata = pm.sample()

K is large, like O(1e5) rows. But still, I think this is a relatively easy problem - and MAP can be computed instantaneously as well. However, sampling from the above model takes hours. Is this normal?

Is it considerably faster if you sample with a single chain?

Seems that way! Using 1 chain instead of 4 makes it ~4 times faster (4 chain run is at 3% now and the estimate is ~2 hours and it keeps increasing; 1 chain run ended in 30 minutes).

How many cores do you have?

For why running in parallel may be slower and what you can do about it check out: Frequently Asked Questions - #19 by ricardoV94