How do we predict on new unseen groups in a hierarchical model in PyMC3?

I am trying to replicate the toy example to suit my own dataset. It seems that sampling time increases exponentially with the increase in the number of sites and observations.
@lucianopaz 's toy setup indeed samples quite fast (08:52 4.44draws/s). However, my dataset has ~100 observations per group. I tried to run @lucianopaz code with the following setup:

n_features = 6
ntrain_site = 5    # Although I have more groups. But I am testing with just five
ntrain_obs = 500   # 100 observations per group totaling 500 observations 
ntest_site = 1
ntest_obs = 1

Above setup took 2hrs (1.3s/draws) to complete the sampling.

Further increasing the group/observation makes sampling more time-consuming. I was wondering what is the source of this slow-down? How this could be improved if I have many groups and observations in my training data?