Poor Performance of pyMC5 vs pyMC3 for large number of variables

Hello there,
I would like to share findings with the community and get some feedback :slight_smile:

I need to estimate distributions for a larger number of variables in a production setting.
While working on integrating pyMC into the data pipeline I noticed that pyMC3 (theano) behaves much more gracefully when going towards larger variable counts than pyMC5 (using JAX). I have a feeling that the performance characteristics are similar to numpyro which should not be surprising if JAX is to blame.

What have I done:

  • Estimate negative Binomial Distribution for sales numbers with a hierarchical model
  • Tested on a Mac M1 Pro 32 GB mem
  • Tested for variables count (x-axis) 1000, 2000, 3000, 3500 (due to pyMC5 not finishing with 5000 after hours)
  • y-axis is seconds

How does PyMC5 default sampler perform? You may also want to have a look at nutpie: GitHub - pymc-devs/nutpie: Python wrapper for nuts-rs

Ricardo, how can I answer this question? The times recorded are the output of the NUTS run.

This was the output of the sample call for the runs with pyMC 5: (I added the var count in front of each line)

1000 : Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 100 seconds.
2000 : Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 224 seconds.
3000 : Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 374 seconds.
3500 : Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 660 seconds.

Thanks for the link to nutpie. Looks promising.

If you are using pm.sample you are not using jax, but still basically the same sampler as with pymc3. There were some changes in between theano and pytensor, and they might lead to performance differences, but so far I have not seen any behavior as you are seeing.
Is there a model you could share that we could run to further look at what might contribute to the slowdown?

1 Like

That’s a surprise. :slight_smile: What should I use now instead of sample()? The introductory examples all still use pm.sample() if I read correctly?
I can share the model with you and explain the data. What’s the best way to do this here?

pm.sample() should still be fine, there isn’t anything wrong with using that. It didn’t change that much in recent years, which is why I’m a bit surprised about that slowdown you are observing.
We might in the hopefully not-to-distant future change pm.sample, so that it will use nutpie internally by default. That usually brings quite large speedups on CPUs.

If you want to use the jax samplers (on a CPU they are typically faster than the current pm.sample, but slower than nutpie) you can do that:

import pymc.sampling_jax
pymc.sampling_jax.sample_blackjax_nuts()

A big advantage of those is that they can also run on a GPU, and if you have very big datasets that might give you speedups as well.

About the model: The easiest is probably to just post code here that creates some sample data and the model (or as a gist if it is long).

We put together some code to demonstrate the problem. You can find model and data gen here:

2 Likes

@thomas_muhlfriedel Thank you for code!
I can reproduce the performance difference (although it seems to be somewhat smaller on my machine), and simplified the example a bit:

try:
    import pymc3 as pm
except ImportError:
    import pymc as pm

import pandas as pd

import numpy as np

K = 100_000
N = 10

np.random.seed(42)
data = np.random.randint(100, size=K)
idxs = np.random.randint(N, size=K)
idxs.sort()

with pm.Model() as model:
    mu = pm.Lognormal('mu', mu=1, sigma=1, shape=N)
    alpha = pm.HalfNormal('alpha', sigma=1, shape=N)

    pm.NegativeBinomial("y", mu=mu[idxs], alpha=alpha[idxs], observed=data)

func = model.logp_dlogp_function()
func.set_extra_values({})

# With theano (pymc3) we get
np.random.seed(42)
x = np.random.randn(func.size)
%timeit func._theano_function(x)
# 9.99 ms ± 155 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# With pymc and pytensor on the main branch
np.random.seed(42)
x = [
    np.random.randn(N),
    np.random.randn(N),
]
%timeit func._pytensor_function(*x)
# 13.8 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

If I replace the NegativeBinomial with a normal everything gets much faster, and I still see a now very slight performance difference between the two. (Interesting question by the way: Why would negbinom be that much slower? Possibly the special functions it’s using, or are we generating a worse computation graph for some reason?)
I’d guess one of the changes with error checking, broadcasting or some other rewrite change in pytensor is responsible? I think computation reuse for the gradient looks a bit different in the two versions?

Got to admit I’m still a bit lost though…

The full computation graphs in the two cases:

main pymc

Sum{acc_dtype=float64} [id A] '__logp' 56
 |MakeVector{dtype='float64'} [id B] 55
   |Sum{acc_dtype=float64} [id C] 25
   | |Elemwise{Composite{(Switch(i0, (((i1 * sqr((i2 - i3))) - i4) - i2), i5) + i2)}} [id D] 'mu_log___logprob' 16
   |   |Elemwise{gt,no_inplace} [id E] 5
   |   | |Elemwise{exp,no_inplace} [id F] 'mu_log___log' 1
   |   | | |mu_log__ [id G]
   |   | |TensorConstant{(1,) of 0.0} [id H]
   |   |TensorConstant{(1,) of -0.5} [id I]
   |   |mu_log__ [id G]
   |   |TensorConstant{(1,) of 1.0} [id J]
   |   |TensorConstant{(1,) of 0...5332046727} [id K]
   |   |TensorConstant{(1,) of -inf} [id L]
   |Sum{acc_dtype=float64} [id M] 54
   | |Elemwise{Composite{(Switch(i0, ((i1 * sqr(i2)) - i3), i4) + i5)}}[(0, 2)] [id N] 'alpha_log___logprob' 53
   |   |Elemwise{ge,no_inplace} [id O] 3
   |   | |Elemwise{exp,no_inplace} [id P] 'alpha_log___log' 0
   |   | | |alpha_log__ [id Q]
   |   | |TensorConstant{(1,) of 0.0} [id H]
   |   |TensorConstant{(1,) of -0.5} [id I]
   |   |Elemwise{exp,no_inplace} [id P] 'alpha_log___log' 0
   |   |TensorConstant{(1,) of 0...5264472738} [id R]
   |   |TensorConstant{(1,) of -inf} [id L]
   |   |alpha_log__ [id Q]
   |Sum{acc_dtype=float64} [id S] 47
     |Elemwise{Composite{Switch(i0, Switch(i1, Switch(i2, i3, (Switch(i4, i5, (i6 * log(i7))) - (i8 + i7))), i9), Switch(i10, (((gammaln(i11) - i8) - gammaln(i12)) + Switch(i13, i5, (i6 * log(i14))) + Switch(i15, Switch(EQ(i16, i3), i17, i9), (i16 * i18))), i9))}}[(0, 11)] [id T] 'y_logprob' 44
       |Elemwise{gt,no_inplace} [id U] 6
       | |AdvancedSubtensor1 [id V] 2
       | | |Elemwise{exp,no_inplace} [id P] 'alpha_log___log' 0
       | | |TensorConstant{[0 0 0 ... 9 9 9]} [id W]
       | |TensorConstant{(1,) of 10..00000000.0} [id X]
       |InplaceDimShuffle{x} [id Y] 33
       | |All [id Z] 24
       |   |Elemwise{ge,no_inplace} [id BA] 14
       |     |AdvancedSubtensor1 [id BB] 4
       |     | |Elemwise{exp,no_inplace} [id F] 'mu_log___log' 1
       |     | |TensorConstant{[0 0 0 ... 9 9 9]} [id W]
       |     |TensorConstant{(1,) of 0} [id BC]
       |Elemwise{mul,no_inplace} [id BD] 22
       | |Elemwise{eq,no_inplace} [id BE] 12
       | | |AdvancedSubtensor1 [id BB] 4
       | | |TensorConstant{(1,) of 0} [id BC]
       | |TensorConstant{[False Fal..lse False]} [id BF]
       |TensorConstant{(1,) of 0} [id BC]
       |Elemwise{eq,no_inplace} [id BE] 12
       |TensorConstant{[-inf -inf..-inf -inf]} [id BG]
       |TensorConstant{[51. 92. 1... 53. 19.]} [id BH]
       |AdvancedSubtensor1 [id BB] 4
       |TensorConstant{[152.40959...33988419]} [id BI]
       |TensorConstant{(1,) of -inf} [id L]
       |InplaceDimShuffle{x} [id BJ] 39
       | |All [id BK] 36
       |   |MakeVector{dtype='bool'} [id BL] 32
       |     |All [id BM] 23
       |     | |Elemwise{gt,no_inplace} [id BN] 13
       |     |   |AdvancedSubtensor1 [id BB] 4
       |     |   |TensorConstant{(1,) of 0} [id BC]
       |     |All [id BO] 19
       |       |Elemwise{gt,no_inplace} [id BP] 9
       |         |AdvancedSubtensor1 [id V] 2
       |         |TensorConstant{(1,) of 0} [id BC]
       |Elemwise{add,no_inplace} [id BQ] 8
       | |TensorConstant{[51. 92. 1... 53. 19.]} [id BH]
       | |AdvancedSubtensor1 [id V] 2
       |SpecifyShape [id BR] 7
       | |AdvancedSubtensor1 [id V] 2
       | |TensorConstant{100000} [id BS]
       |Elemwise{eq,no_inplace} [id BT] 31
       | |Elemwise{true_div,no_inplace} [id BU] 21
       | | |AdvancedSubtensor1 [id BB] 4
       | | |Elemwise{add,no_inplace} [id BV] 11
       | |   |AdvancedSubtensor1 [id BB] 4
       | |   |AdvancedSubtensor1 [id V] 2
       | |TensorConstant{(1,) of 0} [id BC]
       |Elemwise{true_div,no_inplace} [id BU] 21
       |Elemwise{eq,no_inplace} [id BW] 29
       | |Elemwise{true_div,no_inplace} [id BX] 20
       | | |AdvancedSubtensor1 [id V] 2
       | | |Elemwise{add,no_inplace} [id BV] 11
       | |TensorConstant{(1,) of 0} [id BC]
       |AdvancedSubtensor1 [id V] 2
       |TensorConstant{(1,) of 0.0} [id H]
       |Elemwise{log,no_inplace} [id BY] 30
         |Elemwise{true_div,no_inplace} [id BX] 20
Elemwise{Composite{((i0 * i1) + i2)}}[(0, 0)] [id BZ] 'mu_log___grad' 51
 |SpecifyShape [id CA] '(d__logp/dmu_log___log)' 49
 | |SpecifyShape [id CB] 46
 | | |AdvancedIncSubtensor1{inplace,inc} [id CC] 43
 | | | |Elemwise{Composite{Switch(i0, (((-(i1 - i2)) / i3) + (i4 / i3)), i5)}} [id CD] 15
 | | | | |Elemwise{gt,no_inplace} [id E] 5
 | | | | |mu_log__ [id G]
 | | | | |TensorConstant{(1,) of 1.0} [id J]
 | | | | |Elemwise{exp,no_inplace} [id F] 'mu_log___log' 1
 | | | | |TensorConstant{(1,) of -1.0} [id CE]
 | | | | |TensorConstant{(1,) of 0} [id BC]
 | | | |Elemwise{Composite{(-((i0 * i1 * i2) - ((i1 * i3) / i4)))}} [id CF] 41
 | | | | |TensorConstant{(1,) of -1.0} [id CE]
 | | | | |Elemwise{Composite{(i0 + i1 + i2 + Switch(i3, i4, ((-i5) / i6)) + Switch(i7, i4, ((-(((i8 * i9) + sqr(i9)) * i10)) / ((i8 * i6) + (i9 * i6)))))}}[(0, 5)] [id CG] 40
 | | | | | |Elemwise{Composite{Switch(i0, i1, (Switch(i2, i1, (i3 * i4)) / i5))}} [id CH] 35
 | | | | | | |Elemwise{eq,no_inplace} [id BE] 12
 | | | | | | |TensorConstant{(1,) of 0} [id BC]
 | | | | | | |Elemwise{mul,no_inplace} [id BD] 22
 | | | | | | |TensorConstant{[51. 92. 1... 53. 19.]} [id BH]
 | | | | | | |SpecifyShape [id CI] '(d__logp/dy_logprob)' 27
 | | | | | | | |Elemwise{Switch} [id CJ] 18
 | | | | | | | | |Elemwise{gt,no_inplace} [id U] 6
 | | | | | | | | |TensorConstant{(1,) of 1.0} [id J]
 | | | | | | | | |TensorConstant{(1,) of 0.0} [id H]
 | | | | | | | |TensorConstant{100000} [id BS]
 | | | | | | |AdvancedSubtensor1 [id BB] 4
 | | | | | |Elemwise{Composite{Switch(i0, i1, (-i2))}}[(0, 2)] [id CK] 37
 | | | | | | |Elemwise{mul,no_inplace} [id BD] 22
 | | | | | | |TensorConstant{(1,) of 0} [id BC]
 | | | | | | |SpecifyShape [id CI] '(d__logp/dy_logprob)' 27
 | | | | | |Elemwise{Composite{Switch(i0, i1, (i2 / i3))}} [id CL] 38
 | | | | | | |Elemwise{eq,no_inplace} [id BT] 31
 | | | | | | |TensorConstant{(1,) of 0} [id BC]
 | | | | | | |Elemwise{mul,no_inplace} [id CM] 34
 | | | | | | | |TensorConstant{[51. 92. 1... 53. 19.]} [id BH]
 | | | | | | | |SpecifyShape [id CN] 26
 | | | | | | |   |Elemwise{Switch} [id CO] 17
 | | | | | | |   | |Elemwise{gt,no_inplace} [id U] 6
 | | | | | | |   | |TensorConstant{(1,) of 0.0} [id H]
 | | | | | | |   | |TensorConstant{(1,) of 1.0} [id J]
 | | | | | | |   |TensorConstant{100000} [id BS]
 | | | | | | |AdvancedSubtensor1 [id BB] 4
 | | | | | |Elemwise{eq,no_inplace} [id BT] 31
 | | | | | |TensorConstant{(1,) of 0} [id BC]
 | | | | | |Elemwise{mul,no_inplace} [id CM] 34
 | | | | | |Elemwise{add,no_inplace} [id BV] 11
 | | | | | |Elemwise{eq,no_inplace} [id BW] 29
 | | | | | |AdvancedSubtensor1 [id BB] 4
 | | | | | |AdvancedSubtensor1 [id V] 2
 | | | | | |SpecifyShape [id CN] 26
 | | | | |Elemwise{sub,no_inplace} [id CP] 28
 | | | | | |TensorConstant{(1,) of 1.0} [id J]
 | | | | | |Elemwise{true_div,no_inplace} [id BX] 20
 | | | | |AdvancedSubtensor1 [id V] 2
 | | | | |Elemwise{add,no_inplace} [id BV] 11
 | | | |TensorConstant{[0 0 0 ... 9 9 9]} [id W]
 | | |TensorConstant{10} [id CQ]
 | |TensorConstant{10} [id CQ]
 |Elemwise{exp,no_inplace} [id F] 'mu_log___log' 1
 |TensorConstant{(1,) of 1.0} [id J]
Elemwise{Composite{((i0 * i1) + i2)}}[(0, 0)] [id CR] 'alpha_log___grad' 52
 |SpecifyShape [id CS] '(d__logp/dalpha_log___log)' 50
 | |SpecifyShape [id CT] 48
 | | |AdvancedIncSubtensor1{inplace,inc} [id CU] 45
 | | | |Elemwise{Composite{Switch(i0, (-i1), i2)}} [id CV] 10
 | | | | |Elemwise{ge,no_inplace} [id O] 3
 | | | | |Elemwise{exp,no_inplace} [id P] 'alpha_log___log' 0
 | | | | |TensorConstant{(1,) of 0} [id BC]
 | | | |Elemwise{Composite{((((i0 / i1) * i2) + ((i3 * i0 * i4) / i5) + i6 + (psi(i7) * i8) + (i3 * psi(i9) * i8) + Switch(i10, i11, i8) + Switch(i10, i12, (i13 * i8))) - (i14 + i15 + i16))}}[(0, 0)] [id CW] 42
 | | | | |Elemwise{Composite{(i0 + i1 + i2 + Switch(i3, i4, ((-i5) / i6)) + Switch(i7, i4, ((-(((i8 * i9) + sqr(i9)) * i10)) / ((i8 * i6) + (i9 * i6)))))}}[(0, 5)] [id CG] 40
 | | | | |Elemwise{true_div,no_inplace} [id BX] 20
 | | | | |Elemwise{sub,no_inplace} [id CP] 28
 | | | | |TensorConstant{(1,) of -1.0} [id CE]
 | | | | |AdvancedSubtensor1 [id BB] 4
 | | | | |AdvancedSubtensor1 [id V] 2
 | | | | |Elemwise{Composite{(-((i0 * i1 * i2) - ((i1 * i3) / i4)))}} [id CF] 41
 | | | | |Elemwise{add,no_inplace} [id BQ] 8
 | | | | |SpecifyShape [id CN] 26
 | | | | |SpecifyShape [id BR] 7
 | | | | |Elemwise{eq,no_inplace} [id BW] 29
 | | | | |TensorConstant{(1,) of 0.0} [id H]
 | | | | |TensorConstant{(1,) of 0} [id BC]
 | | | | |Elemwise{log,no_inplace} [id BY] 30
 | | | | |Elemwise{Composite{Switch(i0, i1, (Switch(i2, i1, (i3 * i4)) / i5))}} [id CH] 35
 | | | | |Elemwise{Composite{Switch(i0, i1, (-i2))}}[(0, 2)] [id CK] 37
 | | | | |Elemwise{Composite{Switch(i0, i1, (i2 / i3))}} [id CL] 38
 | | | |TensorConstant{[0 0 0 ... 9 9 9]} [id W]
 | | |TensorConstant{10} [id CQ]
 | |TensorConstant{10} [id CQ]
 |Elemwise{exp,no_inplace} [id P] 'alpha_log___log' 0
 |TensorConstant{(1,) of 1.0} [id J]

theano

Sum{acc_dtype=float64} [id A] '__logp'   40
 |MakeVector{dtype='float64'} [id B] ''   39
   |Sum{acc_dtype=float64} [id C] '__logp_mu_log__'   35
   | |Elemwise{Composite{((i0 + (i1 * sqr((i2 + i3))) + i4) - i3)}}[(0, 3)] [id D] ''   32
   |   |TensorConstant{(1,) of -0..5332046727} [id E]
   |   |TensorConstant{(1,) of -0.5} [id F]
   |   |TensorConstant{(1,) of -1.0} [id G]
   |   |Elemwise{log,no_inplace} [id H] ''   10
   |   | |Elemwise{exp,no_inplace} [id I] 'mu'   6
   |   |   |Subtensor{int64:int64:} [id J] 'mu_log__'   2
   |   |     |__args_joined [id K]
   |   |     |Constant{0} [id L]
   |   |     |Constant{10} [id M]
   |   |Subtensor{int64:int64:} [id J] 'mu_log__'   2
   |Sum{acc_dtype=float64} [id N] '__logp_alpha_log__'   38
   | |Elemwise{Composite{(Switch(i0, (i1 + (i2 * sqr(i3))), i4) + i5)}}[(0, 3)] [id O] ''   36
   |   |Elemwise{Composite{Cast{int8}(GE(i0, i1))}} [id P] ''   8
   |   | |Elemwise{exp,no_inplace} [id Q] 'alpha'   4
   |   | | |Subtensor{int64:int64:} [id R] 'alpha_log__'   0
   |   | |   |__args_joined [id K]
   |   | |   |Constant{10} [id M]
   |   | |   |Constant{20} [id S]
   |   | |TensorConstant{(1,) of 0} [id T]
   |   |TensorConstant{(1,) of -0..3526447274} [id U]
   |   |TensorConstant{(1,) of -0.5} [id F]
   |   |Elemwise{exp,no_inplace} [id Q] 'alpha'   4
   |   |TensorConstant{(1,) of -inf} [id V]
   |   |Subtensor{int64:int64:} [id R] 'alpha_log__'   0
   |Sum{acc_dtype=float64} [id W] '__logp_y'   29
     |Elemwise{Composite{Switch(i0, Switch(i1, i2, Switch(i3, ((Switch(i4, i5, (i6 * log(i7))) - i8) - i7), i9)), Switch(i10, (((scalar_gammaln(i11) - i8) - scalar_gammaln(i12)) + Switch(i13, i5, (i6 * log(i14))) + Switch(i15, Switch(EQ(i12, i2), i16, i9), (i12 * i17))), i9))}}[(0, 7)] [id X] ''   26
       |Elemwise{gt,no_inplace} [id Y] ''   11
       | |AdvancedSubtensor1 [id Z] ''   7
       | | |Elemwise{exp,no_inplace} [id Q] 'alpha'   4
       | | |TensorConstant{[0 0 0 ... 9 9 9]} [id BA]
       | |TensorConstant{(1,) of 10..00000000.0} [id BB]
       |Elemwise{mul,no_inplace} [id BC] ''   19
       | |Elemwise{eq,no_inplace} [id BD] ''   15
       | | |AdvancedSubtensor1 [id BE] ''   9
       | | | |Elemwise{exp,no_inplace} [id I] 'mu'   6
       | | | |TensorConstant{[0 0 0 ... 9 9 9]} [id BA]
       | | |TensorConstant{(1,) of 0} [id T]
       | |TensorConstant{[False Fal..lse False]} [id BF]
       |TensorConstant{(1,) of 0} [id T]
       |Elemwise{Composite{Cast{int8}(GE(i0, i1))}} [id BG] ''   16
       | |AdvancedSubtensor1 [id BE] ''   9
       | |TensorConstant{(1,) of 0} [id T]
       |Elemwise{eq,no_inplace} [id BD] ''   15
       |TensorConstant{[-inf -inf..-inf -inf]} [id BH]
       |TensorConstant{[51. 92. 1... 53. 19.]} [id BI]
       |AdvancedSubtensor1 [id BE] ''   9
       |TensorConstant{[152.40959...33988419]} [id BJ]
       |TensorConstant{(1,) of -inf} [id V]
       |Elemwise{Composite{Cast{int8}((GT(i0, i1) * GT(i2, i1)))}} [id BK] ''   14
       | |AdvancedSubtensor1 [id BE] ''   9
       | |TensorConstant{(1,) of 0} [id T]
       | |AdvancedSubtensor1 [id Z] ''   7
       |Elemwise{add,no_inplace} [id BL] ''   12
       | |TensorConstant{[51. 92. 1... 53. 19.]} [id BI]
       | |AdvancedSubtensor1 [id Z] ''   7
       |AdvancedSubtensor1 [id Z] ''   7
       |Elemwise{eq,no_inplace} [id BM] ''   20
       | |Elemwise{true_div,no_inplace} [id BN] ''   17
       | | |AdvancedSubtensor1 [id BE] ''   9
       | | |Elemwise{add,no_inplace} [id BO] ''   13
       | |   |AdvancedSubtensor1 [id BE] ''   9
       | |   |AdvancedSubtensor1 [id Z] ''   7
       | |TensorConstant{(1,) of 0} [id T]
       |Elemwise{true_div,no_inplace} [id BN] ''   17
       |Elemwise{eq,no_inplace} [id BP] ''   21
       | |Elemwise{true_div,no_inplace} [id BQ] ''   18
       | | |AdvancedSubtensor1 [id Z] ''   7
       | | |Elemwise{add,no_inplace} [id BO] ''   13
       | |TensorConstant{(1,) of 0} [id T]
       |TensorConstant{(1,) of 0.0} [id BR]
       |Elemwise{Log}[(0, 0)] [id BS] ''   22
         |Elemwise{true_div,no_inplace} [id BQ] ''   18
IncSubtensor{InplaceInc;int64:int64:} [id BT] '__grad'   37
 |IncSubtensor{InplaceInc;int64:int64:} [id BU] ''   33
 | |Alloc [id BV] ''   5
 | | |TensorConstant{(1,) of 0.0} [id BW]
 | | |Shape_i{0} [id BX] ''   1
 | |   |__args_joined [id K]
 | |Elemwise{Composite{((-(i0 + i1)) + i2 + i3 + (i4 * i5))}}[(0, 4)] [id BY] '(d__logp/dmu_log__)'   30
 | | |TensorConstant{(1,) of -1.0} [id G]
 | | |Elemwise{log,no_inplace} [id H] ''   10
 | | |TensorConstant{(1,) of -1.0} [id G]
 | | |TensorConstant{(1,) of 1.0} [id BZ]
 | | |AdvancedIncSubtensor1{no_inplace,inc} [id CA] '(d__logp/dmu)'   27
 | | | |Alloc [id CB] ''   3
 | | | | |TensorConstant{(1,) of 0.0} [id BW]
 | | | | |TensorConstant{10} [id CC]
 | | | |Elemwise{Composite{(Switch(i0, i1, Switch(i2, Switch(i3, i1, Switch(i4, (i5 / i6), i1)), i1)) + Switch(i2, Switch(i3, i1, Switch(i4, i7, i1)), i1) + Switch(i8, i1, Switch(i9, Switch(i4, i1, (i5 / i6)), i1)) + i10 + i11)}} [id CD] ''   25
 | | | | |Elemwise{eq,no_inplace} [id BD] ''   15
 | | | | |TensorConstant{(1,) of 0} [id T]
 | | | | |Elemwise{Composite{Cast{int8}(GE(i0, i1))}} [id BG] ''   16
 | | | | |Elemwise{mul,no_inplace} [id BC] ''   19
 | | | | |Elemwise{gt,no_inplace} [id Y] ''   11
 | | | | |TensorConstant{[51. 92. 1... 53. 19.]} [id BI]
 | | | | |AdvancedSubtensor1 [id BE] ''   9
 | | | | |TensorConstant{(1,) of -1.0} [id G]
 | | | | |Elemwise{eq,no_inplace} [id BM] ''   20
 | | | | |Elemwise{Composite{Cast{int8}((GT(i0, i1) * GT(i2, i1)))}} [id BK] ''   14
 | | | | |Elemwise{Composite{Switch(i0, i1, Switch(i2, Switch(i3, i1, (i4 / i5)), i1))}} [id CE] ''   23
 | | | | | |Elemwise{eq,no_inplace} [id BM] ''   20
 | | | | | |TensorConstant{(1,) of 0} [id T]
 | | | | | |Elemwise{Composite{Cast{int8}((GT(i0, i1) * GT(i2, i1)))}} [id BK] ''   14
 | | | | | |Elemwise{gt,no_inplace} [id Y] ''   11
 | | | | | |TensorConstant{[-51. -92...-53. -19.]} [id CF]
 | | | | | |Elemwise{add,no_inplace} [id BO] ''   13
 | | | | |Elemwise{Composite{Switch(i0, i1, Switch(i2, Switch(i3, i1, ((-i4) / i5)), i1))}}[(0, 5)] [id CG] ''   24
 | | | |   |Elemwise{eq,no_inplace} [id BP] ''   21
 | | | |   |TensorConstant{(1,) of 0} [id T]
 | | | |   |Elemwise{Composite{Cast{int8}((GT(i0, i1) * GT(i2, i1)))}} [id BK] ''   14
 | | | |   |Elemwise{gt,no_inplace} [id Y] ''   11
 | | | |   |AdvancedSubtensor1 [id Z] ''   7
 | | | |   |Elemwise{add,no_inplace} [id BO] ''   13
 | | | |TensorConstant{[0 0 0 ... 9 9 9]} [id BA]
 | | |Elemwise{exp,no_inplace} [id I] 'mu'   6
 | |Constant{0} [id L]
 | |Constant{10} [id M]
 |Elemwise{Composite{(Switch(i0, (i1 * i2 * i2), i3) + i4 + (i5 * i2))}}[(0, 5)] [id CH] '(d__logp/dalpha_log__)'   34
 | |Elemwise{Composite{Cast{int8}(GE(i0, i1))}} [id P] ''   8
 | |TensorConstant{(1,) of -1.0} [id G]
 | |Elemwise{exp,no_inplace} [id Q] 'alpha'   4
 | |TensorConstant{(1,) of 0} [id T]
 | |TensorConstant{(1,) of 1.0} [id BZ]
 | |AdvancedIncSubtensor1{inplace,inc} [id CI] '(d__logp/dalpha)'   31
 |   |Alloc [id CB] ''   3
 |   |Elemwise{Composite{(Switch(i0, Switch(i1, i2, psi(i3)), i2) + Switch(i0, Switch(i1, i2, (-psi(i4))), i2) + i5 + Switch(i6, i7, Switch(i0, Switch(i1, i7, i8), i7)) + i9 + Switch(i6, i2, Switch(i0, Switch(i1, i2, i10), i2)))}}[(0, 3)] [id CJ] ''   28
 |   | |Elemwise{Composite{Cast{int8}((GT(i0, i1) * GT(i2, i1)))}} [id BK] ''   14
 |   | |Elemwise{gt,no_inplace} [id Y] ''   11
 |   | |TensorConstant{(1,) of 0} [id T]
 |   | |Elemwise{add,no_inplace} [id BL] ''   12
 |   | |AdvancedSubtensor1 [id Z] ''   7
 |   | |Elemwise{Composite{Switch(i0, i1, Switch(i2, Switch(i3, i1, (i4 / i5)), i1))}} [id CE] ''   23
 |   | |Elemwise{eq,no_inplace} [id BP] ''   21
 |   | |TensorConstant{(1,) of 0.0} [id BR]
 |   | |TensorConstant{(1,) of 1.0} [id BZ]
 |   | |Elemwise{Composite{Switch(i0, i1, Switch(i2, Switch(i3, i1, ((-i4) / i5)), i1))}}[(0, 5)] [id CG] ''   24
 |   | |Elemwise{Log}[(0, 0)] [id BS] ''   22
 |   |TensorConstant{[0 0 0 ... 9 9 9]} [id BA]
 |Constant{10} [id M]
 |Constant{20} [id S]

Did you time the [d]logp function directly?

Yes, I timed calls to model.logp_dlogp_function()._theano_function or model.logp_dlogp_function()._pytensor_function.

One thing we changed on the PyMC side is that the value check is done with an Elemwise switch, while the parameter checks are done with an all or none switch. In V3 the value check was also all or none. I would be surprised if that was it though.