Massive slow down when moving from 3.6 to 3.7

Hi Pymc3 team,
I’ve just upgraded from 3.6 to 3.7 and my models are 100x slower to sample.
It appears to only be when using the categorical dist. Did the default sampler for categorical change between versions?

Here is a minimal example to replicate:

    data = np.random.randint(0,3,size=(1000,2))

    with pm.Model() as model:
        tp1 = pm.Dirichlet('tp1', a=np.array([0.25]*4), shape=(4,4))
        obs = pm.Categorical('obs', p=tp1[data[:,0],:], observed=data[:,1])
        trace = pm.sample()

With pymc3==3.6:

Auto-assigning NUTS sampler…
Initializing NUTS using jitter+adapt_diag…
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [tp1]
Sampling 4 chains: 100%|████████████████| 4000/4000 [00:04<00:00, 887.91draws/s]

With pymc3==3.7:

Auto-assigning NUTS sampler…
Initializing NUTS using jitter+adapt_diag…
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [tp1]
Sampling 4 chains: 6%|▉ | 225/4000 [01:09<1:11:13, 1.13s/draws]

I am not using the gpu (or at least I don’t think i am)

This seems to related to a bug in Categorical logp: https://github.com/pymc-devs/pymc3/issues/3535#issuecomment-508256640

Ok cool, i’m glad its a known bug.
I’ll continue on 3.6 for now with the expectation that the new dev branch will be updated sortly :slight_smile: