Your ADVI has not converged, there are some flat gradient makes the training difficult when the ELBO gets to ~2000. Your plot is still on a crazy scale - limit it to between [-5000, 0] and you will see the ELBO is still decreasing.
I am currently doing:
with regression_model:
apprx = pm.fit(50000, method='advi', obj_n_mc=25,
obj_optimizer=pm.adagrad(learning_rate=50.))
And the ELBO of the last 40000 iteration looks like this:
The saddle point of this model is just crazy.
A closer look shows some paramters are on a different scale compared to the others, and the gradient just propagates slowly there.
apprx.groups[0].bij.rmap(apprx.params[0].eval())
{'sigma_log__': array(4.25433834),
'w': array([ 12.20237742, 3.06760075, 47.48649899, 36.56623474,
18.55803309, 12.2358095 , -24.81552255, 27.37072583,
46.02615981, 25.88801179, 145.41653368])}