Because the AR model can be vectorized, but the MA cannot. Intuitively: in the AR model, you can calculate the values for epsilon_t in one fell swoop, no matter what the order is. E.g. for order 3:
epsilon = y[3:] - (phi_1 * y[0:-3] + phi_2 * y[1:-2] + phi_3 * y[2:-1])
(Before the recent refactor, the AR model in pymc3 didn’t use scan. )
Try to vectorize the MA model in this way and you’ll see it’s impossible.
Check out the logp method - you’ll see its similar to what I have above, but order 1 and no phi.