Poor Performance of pyMC5 vs pyMC3 for large number of variables

thomas_muhlfriedel · January 19, 2023, 7:51am

Hello there,
I would like to share findings with the community and get some feedback

I need to estimate distributions for a larger number of variables in a production setting.
While working on integrating pyMC into the data pipeline I noticed that pyMC3 (theano) behaves much more gracefully when going towards larger variable counts than pyMC5 (using JAX). I have a feeling that the performance characteristics are similar to numpyro which should not be surprising if JAX is to blame.

What have I done:

Estimate negative Binomial Distribution for sales numbers with a hierarchical model
Tested on a Mac M1 Pro 32 GB mem
Tested for variables count (x-axis) 1000, 2000, 3000, 3500 (due to pyMC5 not finishing with 5000 after hours)
y-axis is seconds

ricardoV94 · January 19, 2023, 8:14am

How does PyMC5 default sampler perform? You may also want to have a look at nutpie: GitHub - pymc-devs/nutpie: Python wrapper for nuts-rs

thomas_muhlfriedel · January 19, 2023, 8:44am

Ricardo, how can I answer this question? The times recorded are the output of the NUTS run.

This was the output of the sample call for the runs with pyMC 5: (I added the var count in front of each line)

1000 : Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 100 seconds.
2000 : Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 224 seconds.
3000 : Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 374 seconds.
3500 : Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 660 seconds.

Thanks for the link to nutpie. Looks promising.

aseyboldt · January 19, 2023, 6:46pm

If you are using pm.sample you are not using jax, but still basically the same sampler as with pymc3. There were some changes in between theano and pytensor, and they might lead to performance differences, but so far I have not seen any behavior as you are seeing.
Is there a model you could share that we could run to further look at what might contribute to the slowdown?

thomas_muhlfriedel · January 20, 2023, 9:19am

That’s a surprise. What should I use now instead of sample()? The introductory examples all still use pm.sample() if I read correctly?
I can share the model with you and explain the data. What’s the best way to do this here?

aseyboldt · January 21, 2023, 1:10am

pm.sample() should still be fine, there isn’t anything wrong with using that. It didn’t change that much in recent years, which is why I’m a bit surprised about that slowdown you are observing.
We might in the hopefully not-to-distant future change pm.sample, so that it will use nutpie internally by default. That usually brings quite large speedups on CPUs.

If you want to use the jax samplers (on a CPU they are typically faster than the current pm.sample, but slower than nutpie) you can do that:

import pymc.sampling_jax
pymc.sampling_jax.sample_blackjax_nuts()

A big advantage of those is that they can also run on a GPU, and if you have very big datasets that might give you speedups as well.

About the model: The easiest is probably to just post code here that creates some sample data and the model (or as a gist if it is long).

thomas_muhlfriedel · January 25, 2023, 1:24pm

We put together some code to demonstrate the problem. You can find model and data gen here:

gist.github.com

https://gist.github.com/tomsen-san/66ec9e20594dc89ce73ad77283ff05a0

gistfile1.txt

import os
import numpy as np
import pandas as pd
from scipy.stats import norm, lognorm, halfnorm, nbinom

## generates data NBD distributed. We have a variable number of deltas between purchase events in days per customer  

num_customers = 10000 # generate data for 10k customers at once
data = []
num_deltas_dist = halfnorm(loc=10, scale=10)

This file has been truncated. show original

aseyboldt · January 25, 2023, 6:51pm

@thomas_muhlfriedel Thank you for code!
I can reproduce the performance difference (although it seems to be somewhat smaller on my machine), and simplified the example a bit:

try:
    import pymc3 as pm
except ImportError:
    import pymc as pm

import pandas as pd

import numpy as np

K = 100_000
N = 10

np.random.seed(42)
data = np.random.randint(100, size=K)
idxs = np.random.randint(N, size=K)
idxs.sort()

with pm.Model() as model:
    mu = pm.Lognormal('mu', mu=1, sigma=1, shape=N)
    alpha = pm.HalfNormal('alpha', sigma=1, shape=N)

    pm.NegativeBinomial("y", mu=mu[idxs], alpha=alpha[idxs], observed=data)

func = model.logp_dlogp_function()
func.set_extra_values({})

# With theano (pymc3) we get
np.random.seed(42)
x = np.random.randn(func.size)
%timeit func._theano_function(x)
# 9.99 ms ± 155 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# With pymc and pytensor on the main branch
np.random.seed(42)
x = [
    np.random.randn(N),
    np.random.randn(N),
]
%timeit func._pytensor_function(*x)
# 13.8 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

If I replace the NegativeBinomial with a normal everything gets much faster, and I still see a now very slight performance difference between the two. (Interesting question by the way: Why would negbinom be that much slower? Possibly the special functions it’s using, or are we generating a worse computation graph for some reason?)
I’d guess one of the changes with error checking, broadcasting or some other rewrite change in pytensor is responsible? I think computation reuse for the gradient looks a bit different in the two versions?

Got to admit I’m still a bit lost though…

The full computation graphs in the two cases:

main pymc

Sum{acc_dtype=float64} [id A] '__logp' 56
 |MakeVector{dtype='float64'} [id B] 55
   |Sum{acc_dtype=float64} [id C] 25
   | |Elemwise{Composite{(Switch(i0, (((i1 * sqr((i2 - i3))) - i4) - i2), i5) + i2)}} [id D] 'mu_log___logprob' 16
   |   |Elemwise{gt,no_inplace} [id E] 5
   |   | |Elemwise{exp,no_inplace} [id F] 'mu_log___log' 1
   |   | | |mu_log__ [id G]
   |   | |TensorConstant{(1,) of 0.0} [id H]
   |   |TensorConstant{(1,) of -0.5} [id I]
   |   |mu_log__ [id G]
   |   |TensorConstant{(1,) of 1.0} [id J]
   |   |TensorConstant{(1,) of 0...5332046727} [id K]
   |   |TensorConstant{(1,) of -inf} [id L]
   |Sum{acc_dtype=float64} [id M] 54
   | |Elemwise{Composite{(Switch(i0, ((i1 * sqr(i2)) - i3), i4) + i5)}}[(0, 2)] [id N] 'alpha_log___logprob' 53
   |   |Elemwise{ge,no_inplace} [id O] 3
   |   | |Elemwise{exp,no_inplace} [id P] 'alpha_log___log' 0
   |   | | |alpha_log__ [id Q]
   |   | |TensorConstant{(1,) of 0.0} [id H]
   |   |TensorConstant{(1,) of -0.5} [id I]
   |   |Elemwise{exp,no_inplace} [id P] 'alpha_log___log' 0
   |   |TensorConstant{(1,) of 0...5264472738} [id R]
   |   |TensorConstant{(1,) of -inf} [id L]
   |   |alpha_log__ [id Q]
   |Sum{acc_dtype=float64} [id S] 47
     |Elemwise{Composite{Switch(i0, Switch(i1, Switch(i2, i3, (Switch(i4, i5, (i6 * log(i7))) - (i8 + i7))), i9), Switch(i10, (((gammaln(i11) - i8) - gammaln(i12)) + Switch(i13, i5, (i6 * log(i14))) + Switch(i15, Switch(EQ(i16, i3), i17, i9), (i16 * i18))), i9))}}[(0, 11)] [id T] 'y_logprob' 44
       |Elemwise{gt,no_inplace} [id U] 6
       | |AdvancedSubtensor1 [id V] 2
       | | |Elemwise{exp,no_inplace} [id P] 'alpha_log___log' 0
       | | |TensorConstant{[0 0 0 ... 9 9 9]} [id W]
       | |TensorConstant{(1,) of 10..00000000.0} [id X]
       |InplaceDimShuffle{x} [id Y] 33
       | |All [id Z] 24
       |   |Elemwise{ge,no_inplace} [id BA] 14
       |     |AdvancedSubtensor1 [id BB] 4
       |     | |Elemwise{exp,no_inplace} [id F] 'mu_log___log' 1
       |     | |TensorConstant{[0 0 0 ... 9 9 9]} [id W]
       |     |TensorConstant{(1,) of 0} [id BC]
       |Elemwise{mul,no_inplace} [id BD] 22
       | |Elemwise{eq,no_inplace} [id BE] 12
       | | |AdvancedSubtensor1 [id BB] 4
       | | |TensorConstant{(1,) of 0} [id BC]
       | |TensorConstant{[False Fal..lse False]} [id BF]
       |TensorConstant{(1,) of 0} [id BC]
       |Elemwise{eq,no_inplace} [id BE] 12
       |TensorConstant{[-inf -inf..-inf -inf]} [id BG]
       |TensorConstant{[51. 92. 1... 53. 19.]} [id BH]
       |AdvancedSubtensor1 [id BB] 4
       |TensorConstant{[152.40959...33988419]} [id BI]
       |TensorConstant{(1,) of -inf} [id L]
       |InplaceDimShuffle{x} [id BJ] 39
       | |All [id BK] 36
       |   |MakeVector{dtype='bool'} [id BL] 32
       |     |All [id BM] 23
       |     | |Elemwise{gt,no_inplace} [id BN] 13
       |     |   |AdvancedSubtensor1 [id BB] 4
       |     |   |TensorConstant{(1,) of 0} [id BC]
       |     |All [id BO] 19
       |       |Elemwise{gt,no_inplace} [id BP] 9
       |         |AdvancedSubtensor1 [id V] 2
       |         |TensorConstant{(1,) of 0} [id BC]
       |Elemwise{add,no_inplace} [id BQ] 8
       | |TensorConstant{[51. 92. 1... 53. 19.]} [id BH]
       | |AdvancedSubtensor1 [id V] 2
       |SpecifyShape [id BR] 7
       | |AdvancedSubtensor1 [id V] 2
       | |TensorConstant{100000} [id BS]
       |Elemwise{eq,no_inplace} [id BT] 31
       | |Elemwise{true_div,no_inplace} [id BU] 21
       | | |AdvancedSubtensor1 [id BB] 4
       | | |Elemwise{add,no_inplace} [id BV] 11
       | |   |AdvancedSubtensor1 [id BB] 4
       | |   |AdvancedSubtensor1 [id V] 2
       | |TensorConstant{(1,) of 0} [id BC]
       |Elemwise{true_div,no_inplace} [id BU] 21
       |Elemwise{eq,no_inplace} [id BW] 29
       | |Elemwise{true_div,no_inplace} [id BX] 20
       | | |AdvancedSubtensor1 [id V] 2
       | | |Elemwise{add,no_inplace} [id BV] 11
       | |TensorConstant{(1,) of 0} [id BC]
       |AdvancedSubtensor1 [id V] 2
       |TensorConstant{(1,) of 0.0} [id H]
       |Elemwise{log,no_inplace} [id BY] 30
         |Elemwise{true_div,no_inplace} [id BX] 20
Elemwise{Composite{((i0 * i1) + i2)}}[(0, 0)] [id BZ] 'mu_log___grad' 51
 |SpecifyShape [id CA] '(d__logp/dmu_log___log)' 49
 | |SpecifyShape [id CB] 46
 | | |AdvancedIncSubtensor1{inplace,inc} [id CC] 43
 | | | |Elemwise{Composite{Switch(i0, (((-(i1 - i2)) / i3) + (i4 / i3)), i5)}} [id CD] 15
 | | | | |Elemwise{gt,no_inplace} [id E] 5
 | | | | |mu_log__ [id G]
 | | | | |TensorConstant{(1,) of 1.0} [id J]
 | | | | |Elemwise{exp,no_inplace} [id F] 'mu_log___log' 1
 | | | | |TensorConstant{(1,) of -1.0} [id CE]
 | | | | |TensorConstant{(1,) of 0} [id BC]
 | | | |Elemwise{Composite{(-((i0 * i1 * i2) - ((i1 * i3) / i4)))}} [id CF] 41
 | | | | |TensorConstant{(1,) of -1.0} [id CE]
 | | | | |Elemwise{Composite{(i0 + i1 + i2 + Switch(i3, i4, ((-i5) / i6)) + Switch(i7, i4, ((-(((i8 * i9) + sqr(i9)) * i10)) / ((i8 * i6) + (i9 * i6)))))}}[(0, 5)] [id CG] 40
 | | | | | |Elemwise{Composite{Switch(i0, i1, (Switch(i2, i1, (i3 * i4)) / i5))}} [id CH] 35
 | | | | | | |Elemwise{eq,no_inplace} [id BE] 12
 | | | | | | |TensorConstant{(1,) of 0} [id BC]
 | | | | | | |Elemwise{mul,no_inplace} [id BD] 22
 | | | | | | |TensorConstant{[51. 92. 1... 53. 19.]} [id BH]
 | | | | | | |SpecifyShape [id CI] '(d__logp/dy_logprob)' 27
 | | | | | | | |Elemwise{Switch} [id CJ] 18
 | | | | | | | | |Elemwise{gt,no_inplace} [id U] 6
 | | | | | | | | |TensorConstant{(1,) of 1.0} [id J]
 | | | | | | | | |TensorConstant{(1,) of 0.0} [id H]
 | | | | | | | |TensorConstant{100000} [id BS]
 | | | | | | |AdvancedSubtensor1 [id BB] 4
 | | | | | |Elemwise{Composite{Switch(i0, i1, (-i2))}}[(0, 2)] [id CK] 37
 | | | | | | |Elemwise{mul,no_inplace} [id BD] 22
 | | | | | | |TensorConstant{(1,) of 0} [id BC]
 | | | | | | |SpecifyShape [id CI] '(d__logp/dy_logprob)' 27
 | | | | | |Elemwise{Composite{Switch(i0, i1, (i2 / i3))}} [id CL] 38
 | | | | | | |Elemwise{eq,no_inplace} [id BT] 31
 | | | | | | |TensorConstant{(1,) of 0} [id BC]
 | | | | | | |Elemwise{mul,no_inplace} [id CM] 34
 | | | | | | | |TensorConstant{[51. 92. 1... 53. 19.]} [id BH]
 | | | | | | | |SpecifyShape [id CN] 26
 | | | | | | |   |Elemwise{Switch} [id CO] 17
 | | | | | | |   | |Elemwise{gt,no_inplace} [id U] 6
 | | | | | | |   | |TensorConstant{(1,) of 0.0} [id H]
 | | | | | | |   | |TensorConstant{(1,) of 1.0} [id J]
 | | | | | | |   |TensorConstant{100000} [id BS]
 | | | | | | |AdvancedSubtensor1 [id BB] 4
 | | | | | |Elemwise{eq,no_inplace} [id BT] 31
 | | | | | |TensorConstant{(1,) of 0} [id BC]
 | | | | | |Elemwise{mul,no_inplace} [id CM] 34
 | | | | | |Elemwise{add,no_inplace} [id BV] 11
 | | | | | |Elemwise{eq,no_inplace} [id BW] 29
 | | | | | |AdvancedSubtensor1 [id BB] 4
 | | | | | |AdvancedSubtensor1 [id V] 2
 | | | | | |SpecifyShape [id CN] 26
 | | | | |Elemwise{sub,no_inplace} [id CP] 28
 | | | | | |TensorConstant{(1,) of 1.0} [id J]
 | | | | | |Elemwise{true_div,no_inplace} [id BX] 20
 | | | | |AdvancedSubtensor1 [id V] 2
 | | | | |Elemwise{add,no_inplace} [id BV] 11
 | | | |TensorConstant{[0 0 0 ... 9 9 9]} [id W]
 | | |TensorConstant{10} [id CQ]
 | |TensorConstant{10} [id CQ]
 |Elemwise{exp,no_inplace} [id F] 'mu_log___log' 1
 |TensorConstant{(1,) of 1.0} [id J]
Elemwise{Composite{((i0 * i1) + i2)}}[(0, 0)] [id CR] 'alpha_log___grad' 52
 |SpecifyShape [id CS] '(d__logp/dalpha_log___log)' 50
 | |SpecifyShape [id CT] 48
 | | |AdvancedIncSubtensor1{inplace,inc} [id CU] 45
 | | | |Elemwise{Composite{Switch(i0, (-i1), i2)}} [id CV] 10
 | | | | |Elemwise{ge,no_inplace} [id O] 3
 | | | | |Elemwise{exp,no_inplace} [id P] 'alpha_log___log' 0
 | | | | |TensorConstant{(1,) of 0} [id BC]
 | | | |Elemwise{Composite{((((i0 / i1) * i2) + ((i3 * i0 * i4) / i5) + i6 + (psi(i7) * i8) + (i3 * psi(i9) * i8) + Switch(i10, i11, i8) + Switch(i10, i12, (i13 * i8))) - (i14 + i15 + i16))}}[(0, 0)] [id CW] 42
 | | | | |Elemwise{Composite{(i0 + i1 + i2 + Switch(i3, i4, ((-i5) / i6)) + Switch(i7, i4, ((-(((i8 * i9) + sqr(i9)) * i10)) / ((i8 * i6) + (i9 * i6)))))}}[(0, 5)] [id CG] 40
 | | | | |Elemwise{true_div,no_inplace} [id BX] 20
 | | | | |Elemwise{sub,no_inplace} [id CP] 28
 | | | | |TensorConstant{(1,) of -1.0} [id CE]
 | | | | |AdvancedSubtensor1 [id BB] 4
 | | | | |AdvancedSubtensor1 [id V] 2
 | | | | |Elemwise{Composite{(-((i0 * i1 * i2) - ((i1 * i3) / i4)))}} [id CF] 41
 | | | | |Elemwise{add,no_inplace} [id BQ] 8
 | | | | |SpecifyShape [id CN] 26
 | | | | |SpecifyShape [id BR] 7
 | | | | |Elemwise{eq,no_inplace} [id BW] 29
 | | | | |TensorConstant{(1,) of 0.0} [id H]
 | | | | |TensorConstant{(1,) of 0} [id BC]
 | | | | |Elemwise{log,no_inplace} [id BY] 30
 | | | | |Elemwise{Composite{Switch(i0, i1, (Switch(i2, i1, (i3 * i4)) / i5))}} [id CH] 35
 | | | | |Elemwise{Composite{Switch(i0, i1, (-i2))}}[(0, 2)] [id CK] 37
 | | | | |Elemwise{Composite{Switch(i0, i1, (i2 / i3))}} [id CL] 38
 | | | |TensorConstant{[0 0 0 ... 9 9 9]} [id W]
 | | |TensorConstant{10} [id CQ]
 | |TensorConstant{10} [id CQ]
 |Elemwise{exp,no_inplace} [id P] 'alpha_log___log' 0
 |TensorConstant{(1,) of 1.0} [id J]

theano

Sum{acc_dtype=float64} [id A] '__logp'   40
 |MakeVector{dtype='float64'} [id B] ''   39
   |Sum{acc_dtype=float64} [id C] '__logp_mu_log__'   35
   | |Elemwise{Composite{((i0 + (i1 * sqr((i2 + i3))) + i4) - i3)}}[(0, 3)] [id D] ''   32
   |   |TensorConstant{(1,) of -0..5332046727} [id E]
   |   |TensorConstant{(1,) of -0.5} [id F]
   |   |TensorConstant{(1,) of -1.0} [id G]
   |   |Elemwise{log,no_inplace} [id H] ''   10
   |   | |Elemwise{exp,no_inplace} [id I] 'mu'   6
   |   |   |Subtensor{int64:int64:} [id J] 'mu_log__'   2
   |   |     |__args_joined [id K]
   |   |     |Constant{0} [id L]
   |   |     |Constant{10} [id M]
   |   |Subtensor{int64:int64:} [id J] 'mu_log__'   2
   |Sum{acc_dtype=float64} [id N] '__logp_alpha_log__'   38
   | |Elemwise{Composite{(Switch(i0, (i1 + (i2 * sqr(i3))), i4) + i5)}}[(0, 3)] [id O] ''   36
   |   |Elemwise{Composite{Cast{int8}(GE(i0, i1))}} [id P] ''   8
   |   | |Elemwise{exp,no_inplace} [id Q] 'alpha'   4
   |   | | |Subtensor{int64:int64:} [id R] 'alpha_log__'   0
   |   | |   |__args_joined [id K]
   |   | |   |Constant{10} [id M]
   |   | |   |Constant{20} [id S]
   |   | |TensorConstant{(1,) of 0} [id T]
   |   |TensorConstant{(1,) of -0..3526447274} [id U]
   |   |TensorConstant{(1,) of -0.5} [id F]
   |   |Elemwise{exp,no_inplace} [id Q] 'alpha'   4
   |   |TensorConstant{(1,) of -inf} [id V]
   |   |Subtensor{int64:int64:} [id R] 'alpha_log__'   0
   |Sum{acc_dtype=float64} [id W] '__logp_y'   29
     |Elemwise{Composite{Switch(i0, Switch(i1, i2, Switch(i3, ((Switch(i4, i5, (i6 * log(i7))) - i8) - i7), i9)), Switch(i10, (((scalar_gammaln(i11) - i8) - scalar_gammaln(i12)) + Switch(i13, i5, (i6 * log(i14))) + Switch(i15, Switch(EQ(i12, i2), i16, i9), (i12 * i17))), i9))}}[(0, 7)] [id X] ''   26
       |Elemwise{gt,no_inplace} [id Y] ''   11
       | |AdvancedSubtensor1 [id Z] ''   7
       | | |Elemwise{exp,no_inplace} [id Q] 'alpha'   4
       | | |TensorConstant{[0 0 0 ... 9 9 9]} [id BA]
       | |TensorConstant{(1,) of 10..00000000.0} [id BB]
       |Elemwise{mul,no_inplace} [id BC] ''   19
       | |Elemwise{eq,no_inplace} [id BD] ''   15
       | | |AdvancedSubtensor1 [id BE] ''   9
       | | | |Elemwise{exp,no_inplace} [id I] 'mu'   6
       | | | |TensorConstant{[0 0 0 ... 9 9 9]} [id BA]
       | | |TensorConstant{(1,) of 0} [id T]
       | |TensorConstant{[False Fal..lse False]} [id BF]
       |TensorConstant{(1,) of 0} [id T]
       |Elemwise{Composite{Cast{int8}(GE(i0, i1))}} [id BG] ''   16
       | |AdvancedSubtensor1 [id BE] ''   9
       | |TensorConstant{(1,) of 0} [id T]
       |Elemwise{eq,no_inplace} [id BD] ''   15
       |TensorConstant{[-inf -inf..-inf -inf]} [id BH]
       |TensorConstant{[51. 92. 1... 53. 19.]} [id BI]
       |AdvancedSubtensor1 [id BE] ''   9
       |TensorConstant{[152.40959...33988419]} [id BJ]
       |TensorConstant{(1,) of -inf} [id V]
       |Elemwise{Composite{Cast{int8}((GT(i0, i1) * GT(i2, i1)))}} [id BK] ''   14
       | |AdvancedSubtensor1 [id BE] ''   9
       | |TensorConstant{(1,) of 0} [id T]
       | |AdvancedSubtensor1 [id Z] ''   7
       |Elemwise{add,no_inplace} [id BL] ''   12
       | |TensorConstant{[51. 92. 1... 53. 19.]} [id BI]
       | |AdvancedSubtensor1 [id Z] ''   7
       |AdvancedSubtensor1 [id Z] ''   7
       |Elemwise{eq,no_inplace} [id BM] ''   20
       | |Elemwise{true_div,no_inplace} [id BN] ''   17
       | | |AdvancedSubtensor1 [id BE] ''   9
       | | |Elemwise{add,no_inplace} [id BO] ''   13
       | |   |AdvancedSubtensor1 [id BE] ''   9
       | |   |AdvancedSubtensor1 [id Z] ''   7
       | |TensorConstant{(1,) of 0} [id T]
       |Elemwise{true_div,no_inplace} [id BN] ''   17
       |Elemwise{eq,no_inplace} [id BP] ''   21
       | |Elemwise{true_div,no_inplace} [id BQ] ''   18
       | | |AdvancedSubtensor1 [id Z] ''   7
       | | |Elemwise{add,no_inplace} [id BO] ''   13
       | |TensorConstant{(1,) of 0} [id T]
       |TensorConstant{(1,) of 0.0} [id BR]
       |Elemwise{Log}[(0, 0)] [id BS] ''   22
         |Elemwise{true_div,no_inplace} [id BQ] ''   18
IncSubtensor{InplaceInc;int64:int64:} [id BT] '__grad'   37
 |IncSubtensor{InplaceInc;int64:int64:} [id BU] ''   33
 | |Alloc [id BV] ''   5
 | | |TensorConstant{(1,) of 0.0} [id BW]
 | | |Shape_i{0} [id BX] ''   1
 | |   |__args_joined [id K]
 | |Elemwise{Composite{((-(i0 + i1)) + i2 + i3 + (i4 * i5))}}[(0, 4)] [id BY] '(d__logp/dmu_log__)'   30
 | | |TensorConstant{(1,) of -1.0} [id G]
 | | |Elemwise{log,no_inplace} [id H] ''   10
 | | |TensorConstant{(1,) of -1.0} [id G]
 | | |TensorConstant{(1,) of 1.0} [id BZ]
 | | |AdvancedIncSubtensor1{no_inplace,inc} [id CA] '(d__logp/dmu)'   27
 | | | |Alloc [id CB] ''   3
 | | | | |TensorConstant{(1,) of 0.0} [id BW]
 | | | | |TensorConstant{10} [id CC]
 | | | |Elemwise{Composite{(Switch(i0, i1, Switch(i2, Switch(i3, i1, Switch(i4, (i5 / i6), i1)), i1)) + Switch(i2, Switch(i3, i1, Switch(i4, i7, i1)), i1) + Switch(i8, i1, Switch(i9, Switch(i4, i1, (i5 / i6)), i1)) + i10 + i11)}} [id CD] ''   25
 | | | | |Elemwise{eq,no_inplace} [id BD] ''   15
 | | | | |TensorConstant{(1,) of 0} [id T]
 | | | | |Elemwise{Composite{Cast{int8}(GE(i0, i1))}} [id BG] ''   16
 | | | | |Elemwise{mul,no_inplace} [id BC] ''   19
 | | | | |Elemwise{gt,no_inplace} [id Y] ''   11
 | | | | |TensorConstant{[51. 92. 1... 53. 19.]} [id BI]
 | | | | |AdvancedSubtensor1 [id BE] ''   9
 | | | | |TensorConstant{(1,) of -1.0} [id G]
 | | | | |Elemwise{eq,no_inplace} [id BM] ''   20
 | | | | |Elemwise{Composite{Cast{int8}((GT(i0, i1) * GT(i2, i1)))}} [id BK] ''   14
 | | | | |Elemwise{Composite{Switch(i0, i1, Switch(i2, Switch(i3, i1, (i4 / i5)), i1))}} [id CE] ''   23
 | | | | | |Elemwise{eq,no_inplace} [id BM] ''   20
 | | | | | |TensorConstant{(1,) of 0} [id T]
 | | | | | |Elemwise{Composite{Cast{int8}((GT(i0, i1) * GT(i2, i1)))}} [id BK] ''   14
 | | | | | |Elemwise{gt,no_inplace} [id Y] ''   11
 | | | | | |TensorConstant{[-51. -92...-53. -19.]} [id CF]
 | | | | | |Elemwise{add,no_inplace} [id BO] ''   13
 | | | | |Elemwise{Composite{Switch(i0, i1, Switch(i2, Switch(i3, i1, ((-i4) / i5)), i1))}}[(0, 5)] [id CG] ''   24
 | | | |   |Elemwise{eq,no_inplace} [id BP] ''   21
 | | | |   |TensorConstant{(1,) of 0} [id T]
 | | | |   |Elemwise{Composite{Cast{int8}((GT(i0, i1) * GT(i2, i1)))}} [id BK] ''   14
 | | | |   |Elemwise{gt,no_inplace} [id Y] ''   11
 | | | |   |AdvancedSubtensor1 [id Z] ''   7
 | | | |   |Elemwise{add,no_inplace} [id BO] ''   13
 | | | |TensorConstant{[0 0 0 ... 9 9 9]} [id BA]
 | | |Elemwise{exp,no_inplace} [id I] 'mu'   6
 | |Constant{0} [id L]
 | |Constant{10} [id M]
 |Elemwise{Composite{(Switch(i0, (i1 * i2 * i2), i3) + i4 + (i5 * i2))}}[(0, 5)] [id CH] '(d__logp/dalpha_log__)'   34
 | |Elemwise{Composite{Cast{int8}(GE(i0, i1))}} [id P] ''   8
 | |TensorConstant{(1,) of -1.0} [id G]
 | |Elemwise{exp,no_inplace} [id Q] 'alpha'   4
 | |TensorConstant{(1,) of 0} [id T]
 | |TensorConstant{(1,) of 1.0} [id BZ]
 | |AdvancedIncSubtensor1{inplace,inc} [id CI] '(d__logp/dalpha)'   31
 |   |Alloc [id CB] ''   3
 |   |Elemwise{Composite{(Switch(i0, Switch(i1, i2, psi(i3)), i2) + Switch(i0, Switch(i1, i2, (-psi(i4))), i2) + i5 + Switch(i6, i7, Switch(i0, Switch(i1, i7, i8), i7)) + i9 + Switch(i6, i2, Switch(i0, Switch(i1, i2, i10), i2)))}}[(0, 3)] [id CJ] ''   28
 |   | |Elemwise{Composite{Cast{int8}((GT(i0, i1) * GT(i2, i1)))}} [id BK] ''   14
 |   | |Elemwise{gt,no_inplace} [id Y] ''   11
 |   | |TensorConstant{(1,) of 0} [id T]
 |   | |Elemwise{add,no_inplace} [id BL] ''   12
 |   | |AdvancedSubtensor1 [id Z] ''   7
 |   | |Elemwise{Composite{Switch(i0, i1, Switch(i2, Switch(i3, i1, (i4 / i5)), i1))}} [id CE] ''   23
 |   | |Elemwise{eq,no_inplace} [id BP] ''   21
 |   | |TensorConstant{(1,) of 0.0} [id BR]
 |   | |TensorConstant{(1,) of 1.0} [id BZ]
 |   | |Elemwise{Composite{Switch(i0, i1, Switch(i2, Switch(i3, i1, ((-i4) / i5)), i1))}}[(0, 5)] [id CG] ''   24
 |   | |Elemwise{Log}[(0, 0)] [id BS] ''   22
 |   |TensorConstant{[0 0 0 ... 9 9 9]} [id BA]
 |Constant{10} [id M]
 |Constant{20} [id S]

ricardoV94 · January 25, 2023, 10:08pm

Did you time the [d]logp function directly?

aseyboldt · January 25, 2023, 10:48pm

Yes, I timed calls to model.logp_dlogp_function()._theano_function or model.logp_dlogp_function()._pytensor_function.

ricardoV94 · January 25, 2023, 10:55pm

One thing we changed on the PyMC side is that the value check is done with an Elemwise switch, while the parameter checks are done with an all or none switch. In V3 the value check was also all or none. I would be surprised if that was it though.

Topic		Replies	Views
Version dependant slowing down of Gaussian Mixture sampling in Ubuntu 20.04	44	737	November 7, 2023
PYMC 5 significant speedup of default sampler (Pytensor) Sharing	4	1382	January 25, 2023
Why would scipy/numpy wrapped in @as_op cause much faster sampling than using pytensor operations? Questions	10	997	April 29, 2023
Dynamic shaping, "round" function, JAX, and a "few" more questions v5 modeling , jax , pytensor	28	1556	September 29, 2023
Bayesian VAR example notebook: extremely low sampling rate	18	1421	May 23, 2023

Poor Performance of pyMC5 vs pyMC3 for large number of variables

main pymc

theano

Related topics