Memory allocation limit for NUTS with custom `logp` function (but not with VI methods)

ricardoV94 · December 9, 2022, 6:38am

The most useful here would be the profile with memory. You have to set some other flag for that, I think it’s mentioned on the doc page about profiling

jstanley · December 9, 2022, 6:55am

Ah yes, thanks!

I reran it with the following flags:

aesara.config.profile = True
aesara.config.profile_memory = True

(it produced this warning)

/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/aesara/link/vm.py:1037: UserWarning: CVM does not support memory profiling, using Stack VM.
  warnings.warn("CVM does not support memory profiling, using Stack VM.")

However, it did include more information in the summary. Here’s the Memory Profiling for dlogp:

Memory Profile
(Sparse variables are ignored)
(For values in brackets, it's for linker = c|py
---
    Max peak memory with current setting
        CPU: 17234KB (18008KB)
        GPU: 0KB (0KB)
        CPU + GPU: 17234KB (18008KB)
    Max peak memory with current setting and Aesara flag optimizer_excluding=inplace
        CPU: 18017KB (18712KB)
        GPU: 0KB (0KB)
        CPU + GPU: 18017KB (18712KB)
    Max peak memory if allow_gc=False (linker don't make a difference)
        CPU: 24988KB
        GPU: 0KB
        CPU + GPU: 24988KB
---

    <Sum apply outputs (bytes)> <Apply outputs shape> <created/inplace/view> <Apply node>

     {node_outputs_size:9d}B  [(120202,)] c Elemwise{Composite{((i0 + i1) - i2)}}(TensorConstant{(1,) of -10000.0}, ARange{dtype='int32'}.0, InplaceDimShuffle{x}.0)
     {node_outputs_size:9d}B  [(120202,)] c Elemwise{sub,no_inplace}(ARange{dtype='int32'}.0, Elemwise{Composite{(i0 + (i1 / i2))}}.0)
     {node_outputs_size:9d}B  [(120202,)] c Elemwise{sub,no_inplace}(ARange{dtype='int32'}.0, InplaceDimShuffle{x}.0)
     {node_outputs_size:9d}B  [(120202,)] c Elemwise{sub,no_inplace}(InplaceDimShuffle{x}.0, ARange{dtype='int32'}.0)
     {node_outputs_size:9d}B  [(120202,)] c Elemwise{Composite{erfc(((i0 * i1) / i2))}}(TensorConstant{(1,) of 0...7932881648}, Elemwise{Composite{((i0 + i1) - i2)}}.0, InplaceDimShuffle{x}.0)
     {node_outputs_size:9d}B  [(120202,)] c Elemwise{Composite{exp(((i0 * i1 * i1) / i2))}}(TensorConstant{(1,) of -0..0171142714}, Elemwise{Composite{((i0 + i1) - i2)}}.0, Elemwise{sqr,no_inplace}.0)
     {node_outputs_size:9d}B  [(120202,)] c Elemwise{true_div,no_inplace}(Elemwise{sub,no_inplace}.0, InplaceDimShuffle{x}.0)
     {node_outputs_size:9d}B  [(120202,)] c Elemwise{Composite{erfc(((i0 * i1) / i2))}}(TensorConstant{(1,) of 0...7932881648}, Elemwise{sub,no_inplace}.0, InplaceDimShuffle{x}.0)
     {node_outputs_size:9d}B  [(120202,)] c Elemwise{Composite{erfcx(((i0 * i1) / i2))}}(TensorConstant{(1,) of -0..7932881648}, Elemwise{sub,no_inplace}.0, InplaceDimShuffle{x}.0)
     {node_outputs_size:9d}B  [(120202,)] c Elemwise{Composite{exp(((i0 * i1 * i1) / i2))}}(TensorConstant{(1,) of -0..0171142714}, Elemwise{sub,no_inplace}.0, Elemwise{sqr,no_inplace}.0)
     {node_outputs_size:9d}B  [(120202,)] c Elemwise{Composite{erfcx(((i0 * i1) / i2))}}(TensorConstant{(1,) of -0..7932881648}, Elemwise{sub,no_inplace}.0, InplaceDimShuffle{x}.0)
     {node_outputs_size:9d}B  [(120202,)] c Elemwise{Composite{erfc(((i0 * i1) / i2))}}(TensorConstant{(1,) of 0...7932881648}, Elemwise{sub,no_inplace}.0, InplaceDimShuffle{x}.0)
     {node_outputs_size:9d}B  [(120202,)] c Elemwise{true_div,no_inplace}(Elemwise{sub,no_inplace}.0, InplaceDimShuffle{x}.0)
     {node_outputs_size:9d}B  [(120202,)] c Elemwise{Composite{exp(((i0 * i1 * i1) / i2))}}(TensorConstant{(1,) of -0..0171142714}, Elemwise{sub,no_inplace}.0, Elemwise{sqr,no_inplace}.0)
     {node_outputs_size:9d}B  [(120202,)] c Elemwise{Composite{(((i0 * i1) / i2) + ((i3 * i1 * i1) / i4))}}(TensorConstant{(1,) of -1..1670955126}, Elemwise{sub,no_inplace}.0, Elemwise{Composite{erfcx(((i0 * i1) / i2))}}.0, TensorConstant{(1,) of -1..5865763297}, InplaceDimShuffle{x}.0)
     {node_outputs_size:9d}B  [(120202,)] c Elemwise{Composite{((i0 / (i1 * i2)) + ((i3 * i4) / i5))}}(TensorConstant{(1,) of -1..1670955126}, Elemwise{Composite{erfcx(((i0 * i1) / i2))}}.0, InplaceDimShuffle{x}.0, TensorConstant{(1,) of -1..5865763297}, Elemwise{sub,no_inplace}.0, Elemwise{sqr,no_inplace}.0)
     {node_outputs_size:9d}B  [(120202,)] i Elemwise{Composite{Switch(i0, (log((i1 * i2)) - (i1 * sqr(i3))), log1p((i4 * i5)))}}[(0, 2)](Elemwise{lt,no_inplace}.0, TensorConstant{(1,) of 0.5}, Elemwise{Composite{erfcx(((i0 * i1) / i2))}}.0, Elemwise{true_div,no_inplace}.0, TensorConstant{(1,) of -0.5}, Elemwise{Composite{erfc(((i0 * i1) / i2))}}.0)
     {node_outputs_size:9d}B  [(120202,)] i Elemwise{Composite{(i0 + (-i1))}}[(0, 1)](TensorConstant{(1,) of 2.0}, Elemwise{Composite{erfc(((i0 * i1) / i2))}}.0)
     {node_outputs_size:9d}B  [(120202,)] i Elemwise{Composite{(((i0 / i1) + i2 + Switch(i3, (log((i4 * i5)) - (i4 * sqr(i6))), log1p((i7 * i8)))) - i9)}}[(0, 6)](Elemwise{sub,no_inplace}.0, InplaceDimShuffle{x}.0, Elemwise{Composite{(i0 * sqr(i1))}}.0, Elemwise{lt,no_inplace}.0, TensorConstant{(1,) of 0.5}, Elemwise{Composite{erfcx(((i0 * i1) / i2))}}.0, Elemwise{true_div,no_inplace}.0, TensorConstant{(1,) of -0.5}, Elemwise{Composite{erfc(((i0 * i1) / i2))}}.0, Elemwise{Composite{Switch(i0, (log((i1 * i2)) - (i1 * sqr(i3))), log1p((i4 * i5)))}}[(0, 2)].0)
     {node_outputs_size:9d}B  [(120202,)] i Elemwise{Composite{(i0 + (-i1))}}[(0, 1)](TensorConstant{(1,) of 2.0}, Elemwise{Composite{erfc(((i0 * i1) / i2))}}.0)
   ... (remaining 245 Apply account for 21987206B/41219526B ((53.34%)) of the Apply with dense outputs sizes)

    <created/inplace/view> is taken from the Op's declaration.
    Apply nodes marked 'inplace' or 'view' may actually allocate memory, this is not reported here. If you use DebugMode, warnings will be emitted in those cases.

Here are tips to potentially make your code run faster
                 (if you think of new ones, suggest them on the mailing list).
                 Test them first, as they are not guaranteed to always provide a speedup.
  - Try the Aesara flag floatX=float32
  - Try installing amdlibm and set the Aesara flag lib__amblibm=True. This speeds up only some Elemwise operation.

And for logp:

Memory Profile
(Sparse variables are ignored)
(For values in brackets, it's for linker = c|py
---
    Max peak memory with current setting
        CPU: 1643KB (1721KB)
        GPU: 0KB (0KB)
        CPU + GPU: 1643KB (1721KB)
    Max peak memory with current setting and Aesara flag optimizer_excluding=inplace
        CPU: 1643KB (1721KB)
        GPU: 0KB (0KB)
        CPU + GPU: 1643KB (1721KB)
    Max peak memory if allow_gc=False (linker don't make a difference)
        CPU: 2356KB
        GPU: 0KB
        CPU + GPU: 2356KB
---

    <Sum apply outputs (bytes)> <Apply outputs shape> <created/inplace/view> <Apply node>

     {node_outputs_size:9d}B  [(120202,)] c Elemwise{Composite{(exp(Switch(i0, Switch(i1, (Composite{Switch(LT(i0, i1), (log((i2 * erfcx(((i3 * i4) / i5)))) - (i2 * sqr(i0))), log1p((i6 * erfc(((i7 * i4) / i5)))))}(((i2 - i3) / i4), i5, i6, i7, (i2 - i3), i4, i8, i9) + scalar_log1mexp(((((i3 - i2) / i10) + i11 + Switch(LT(((i2 - i12) / i4), i5), (log((i6 * erfcx(((i7 * (i2 - i12)) / i4)))) - (i6 * sqr(((i2 - i12) / i4)))), log1p((i8 * erfc(((i9 * (i2 - i12)) / i4)))))) - Composite{Switch(LT(i0, i1), (log((i2 * erfcx(((i3 * i4) / i5)))) - (i2 * sqr(i0))), log1p((i6 * erfc(((i7 * i4) / i5)))))}(((i2 - i3) / i4), i5, i6, i7, (i2 - i3), i4, i8, i9)))), Composite{Switch(LT(i0, i1), (log((i2 * erfcx(((i3 * i4) / i5)))) - (i2 * sqr(i0))), log1p((i6 * erfc(((i7 * i4) / i5)))))}(((i2 - i3) / i4), i5, i6, i7, (i2 - i3), i4, i8, i9)), i13)) * erfc(((i9 * ((i14 + i2) - i15)) / i16)))}}(InplaceDimShuffle{x}.0, Elemwise{Composite{GT(i0, (i1 * i2))}}.0, ARange{dtype='int32'}.0, InplaceDimShuffle{x}.0, InplaceDimShuffle{x}.0, TensorConstant{(1,) of -1.0}, TensorConstant{(1,) of 0.5}, TensorConstant{(1,) of -0..7932881648}, TensorConstant{(1,) of -0.5}, TensorConstant{(1,) of 0...7932881648}, InplaceDimShuffle{x}.0, Elemwise{Composite{(i0 * sqr((i1 / i2)))}}.0, Elemwise{Composite{(i0 + (i1 / i2))}}[(0, 1)].0, TensorConstant{(1,) of -inf}, TensorConstant{(1,) of -10000.0}, InplaceDimShuffle{x}.0, InplaceDimShuffle{x}.0)
     {node_outputs_size:9d}B  [(120202,)] c ARange{dtype='int32'}(Elemwise{Floor}[(0, 0)].0, Elemwise{Ceil}[(0, 0)].0, TensorConstant{1})
     {node_outputs_size:9d}B  [(10000, 4)] c Join(TensorConstant{1}, Elemwise{Composite{Switch(i0, Switch(i1, (i2 + i3 + i4 + i5), (i6 - ((i7 * sqr(i8)) / i9))), i10)}}.0, Elemwise{Composite{((Switch(i0, Switch(i1, (Composite{Switch(LT(i0, i1), (log((i2 * erfcx(((i3 * i4) / i5)))) - (i2 * sqr(i0))), log1p((i6 * erfc(((i7 * i4) / i5)))))}((i2 / i3), i4, i5, i6, i2, i3, i7, i8) + scalar_log1mexp(((i9 + i10 + i11) - Composite{Switch(LT(i0, i1), (log((i2 * erfcx(((i3 * i4) / i5)))) - (i2 * sqr(i0))), log1p((i6 * erfc(((i7 * i4) / i5)))))}((i2 / i3), i4, i5, i6, i2, i3, i7, i8)))), Composite{Switch(LT(i0, i1), (log((i2 * erfcx(((i3 * i4) / i5)))) - (i2 * sqr(i0))), log1p((i6 * erfc(((i7 * i4) / i5)))))}((i2 / i3), i4, i5, i6, i2, i3, i7, i8)), i12) + Switch(GT(i13, i14), (log((i5 * erfcx((i8 * i13)))) - (i5 * sqr(i13))), log1p((i7 * erfc((i6 * i13)))))) - i15)}}[(0, 2)].0, Elemwise{Composite{Switch(i0, ((i1 + (i2 * sqr(i3))) - i4), i5)}}[(0, 3)].0, TensorConstant{(10000, 1)..0970631272})
     {node_outputs_size:9d}B  [(10000, 4)] i Elemwise{Add}[(0, 1)](Elemwise{Log}[(0, 0)].0, Join.0)
     {node_outputs_size:9d}B  [(10000, 4)] i Elemwise{Composite{Switch(i0, i1, exp((i2 - i3)))}}[(0, 2)](Elemwise{isinf,no_inplace}.0, Elemwise{exp,no_inplace}.0, Elemwise{Add}[(0, 1)].0, InplaceDimShuffle{0,x}.0)
     {node_outputs_size:9d}B  [(10000, 1)] c Elemwise{sub,no_inplace}(TensorConstant{[[-2265.]
.. [ 7658.]]}, InplaceDimShuffle{x,x}.0)
     {node_outputs_size:9d}B  [(10000, 1)] c Elemwise{Composite{((i0 - i1) / i2)}}(InplaceDimShuffle{x,x}.0, TensorConstant{[[-2265.]
.. [ 7658.]]}, InplaceDimShuffle{x,x}.0)
     {node_outputs_size:9d}B  [(10000, 1)] c Elemwise{Composite{((i0 - i1) / i2)}}(TensorConstant{[[-12265.]..[ -2342.]]}, InplaceDimShuffle{x,x}.0, InplaceDimShuffle{x,x}.0)
     {node_outputs_size:9d}B  [(10000, 1)] c Elemwise{Composite{Switch(LT(((i0 - i1) / i2), i3), (log((i4 * erfcx(((i5 * (i0 - i1)) / i2)))) - (i4 * sqr(((i0 - i1) / i2)))), log1p((i6 * erfc(((i7 * (i0 - i1)) / i2)))))}}(TensorConstant{[[-2265.]
.. [ 7658.]]}, InplaceDimShuffle{0,x}.0, InplaceDimShuffle{x,x}.0, TensorConstant{(1, 1) of -1.0}, TensorConstant{(1, 1) of 0.5}, TensorConstant{(1, 1) of ..7932881648}, TensorConstant{(1, 1) of -0.5}, TensorConstant{(1, 1) of ..7932881648})
     {node_outputs_size:9d}B  [(10000, 1)] c Elemwise{Composite{Switch(i0, Switch(i1, (i2 + i3 + i4 + i5), (i6 - ((i7 * sqr(i8)) / i9))), i10)}}(InplaceDimShuffle{x,x}.0, InplaceDimShuffle{0,x}.0, Elemwise{Composite{(-log(i0))}}[(0, 0)].0, Elemwise{Composite{((i0 - i1) / i2)}}.0, InplaceDimShuffle{0,x}.0, Elemwise{Composite{Switch(LT(((i0 - i1) / i2), i3), (log((i4 * erfcx(((i5 * (i0 - i1)) / i2)))) - (i4 * sqr(((i0 - i1) / i2)))), log1p((i6 * erfc(((i7 * (i0 - i1)) / i2)))))}}.0, Elemwise{Composite{(i0 - log(Abs(i1)))}}.0, TensorConstant{(1, 1) of 0.5}, Elemwise{sub,no_inplace}.0, Elemwise{sqr,no_inplace}.0, TensorConstant{(1, 1) of -inf})
     {node_outputs_size:9d}B  [(10000, 1)] i Elemwise{Composite{((Switch(i0, Switch(i1, (Composite{Switch(LT(i0, i1), (log((i2 * erfcx(((i3 * i4) / i5)))) - (i2 * sqr(i0))), log1p((i6 * erfc(((i7 * i4) / i5)))))}((i2 / i3), i4, i5, i6, i2, i3, i7, i8) + scalar_log1mexp(((i9 + i10 + i11) - Composite{Switch(LT(i0, i1), (log((i2 * erfcx(((i3 * i4) / i5)))) - (i2 * sqr(i0))), log1p((i6 * erfc(((i7 * i4) / i5)))))}((i2 / i3), i4, i5, i6, i2, i3, i7, i8)))), Composite{Switch(LT(i0, i1), (log((i2 * erfcx(((i3 * i4) / i5)))) - (i2 * sqr(i0))), log1p((i6 * erfc(((i7 * i4) / i5)))))}((i2 / i3), i4, i5, i6, i2, i3, i7, i8)), i12) + Switch(GT(i13, i14), (log((i5 * erfcx((i8 * i13)))) - (i5 * sqr(i13))), log1p((i7 * erfc((i6 * i13)))))) - i15)}}[(0, 2)](InplaceDimShuffle{x,x}.0, InplaceDimShuffle{0,x}.0, Elemwise{sub,no_inplace}.0, InplaceDimShuffle{x,x}.0, TensorConstant{(1, 1) of -1.0}, TensorConstant{(1, 1) of 0.5}, TensorConstant{(1, 1) of ..7932881648}, TensorConstant{(1, 1) of -0.5}, TensorConstant{(1, 1) of ..7932881648}, Elemwise{Composite{((i0 - i1) / i2)}}.0, InplaceDimShuffle{0,x}.0, Elemwise{Composite{Switch(LT(((i0 - i1) / i2), i3), (log((i4 * erfcx(((i5 * (i0 - i1)) / i2)))) - (i4 * sqr(((i0 - i1) / i2)))), log1p((i6 * erfc(((i7 * (i0 - i1)) / i2)))))}}.0, TensorConstant{(1, 1) of -inf}, Elemwise{Composite{((i0 - i1) / i2)}}.0, TensorConstant{(1, 1) of 1.0}, Elemwise{Composite{log((i0 * i1))}}[(0, 1)].0)
     {node_outputs_size:9d}B  [(10000, 1)] i Elemwise{Composite{Switch(i0, ((i1 + (i2 * sqr(i3))) - i4), i5)}}[(0, 3)](Elemwise{gt,no_inplace}.0, TensorConstant{(1, 1) of ..5332046727}, TensorConstant{(1, 1) of -0.5}, Elemwise{Composite{((i0 - i1) / i2)}}.0, Elemwise{Log}[(0, 0)].0, TensorConstant{(1, 1) of -inf})
     {node_outputs_size:9d}B  [(10000,)] c Max{maximum}{1}(Elemwise{Add}[(0, 1)].0)
     {node_outputs_size:9d}B  [(10000, 1)] v InplaceDimShuffle{0,x}(max)
     {node_outputs_size:9d}B  [(10000, 1)] c Elemwise{exp,no_inplace}(InplaceDimShuffle{0,x}.0)
     {node_outputs_size:9d}B  [(10000,)] c Sum{axis=[1], acc_dtype=float64}(Elemwise{Composite{Switch(i0, i1, exp((i2 - i3)))}}[(0, 2)].0)
     {node_outputs_size:9d}B  [(10000,)] i Elemwise{Composite{Switch(i0, (i1 + log(i2)), i3)}}[(0, 1)](InplaceDimShuffle{x}.0, max, Sum{axis=[1], acc_dtype=float64}.0, TensorConstant{(1,) of -inf})
     {node_outputs_size:9d}B  [(10000, 1)] c Elemwise{isinf,no_inplace}(InplaceDimShuffle{0,x}.0)
   ... (remaining 91 Apply account for  868B/3373292B ((0.03%)) of the Apply with dense outputs sizes)

    <created/inplace/view> is taken from the Op's declaration.
    Apply nodes marked 'inplace' or 'view' may actually allocate memory, this is not reported here. If you use DebugMode, warnings will be emitted in those cases.

Here are tips to potentially make your code run faster
                 (if you think of new ones, suggest them on the mailing list).
                 Test them first, as they are not guaranteed to always provide a speedup.
  - Try the Aesara flag floatX=float32
  - Try installing amdlibm and set the Aesara flag lib__amblibm=True. This speeds up only some Elemwise operation.

It looks like peak memory is around 20mb in dlogp

ricardoV94 · December 9, 2022, 9:03am

Doesn’t look too bad. Looking back at the original memory error was coming from the numpy.arange call of tensor.arange Op. You can add some print statements in the perform method here to check what the inputs of your arange are actually like?: aesara/basic.py at ae182f02f879741e409f927b27e874d8a1a4ef21 · aesara-devs/aesara · GitHub

If this Arange is indeed too large, you might need to implement your integration with a Scan?

ricardoV94 · December 9, 2022, 9:04am

I’m going to try this locally now…

ricardoV94 · December 9, 2022, 9:23am

I monkey-patched the perform method like this

        try:
            res = np.arange(start, stop, step, dtype=self.dtype)
        except (MemoryError, ValueError):
            print(f"Failed to allocate arange array with {start=}, {stop=}, {step=}, {dtype=}")
            raise
        out[0] = res

In my machine it raises a ValueError: Maximum value exceeded instead of a MemoryError:

Failed to allocate arange array with start=-196242792413697.0, stop=2.5870972395998505e+18, step=1, self.dtype='int64'

ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.

This might be coming from very extreme integration bounds (or underflow/overflow somewhere, or a bug) that are not seen at the initial point (so your profile did not fail/ show anything extreme).

Ruling out those issues, my guess would be that the VI never tries such extreme bounds of integration, and that’s why it doesn’t fail.

ricardoV94 · December 9, 2022, 9:36am

Reducing to _n=3 (tried that at random) seems to fix it. Two notes:

You might want to tweak the integration grid. More or less steps, non-uniform grid, etc…). Numerical integration is tricky. More strict priors for the standard_deviation could be the other option.
If you need this many steps, you might need to integrate with a Scan.
You might want to use some logsumexp tricks to obtain more stable integration. I see you have a log(sum(exp(logcdf(ExGaussian)) * norm_sf)). This would probably work better with logsumexp(logcdf(ExGaussian) + log(norm_sf)). This has nothing to do with the out of memory issues.

jstanley · December 9, 2022, 11:18pm

@ricardoV94 this is great, thanks! Very illuminating, but I’m now quite confused as to why the arange bounds would ever be so large, even given my choice of priors. Earlier in the thread I provided this example (a literal reproduction of the _min _max variables in the model, using the hyper-parameter values from the priors m0, s0, t0, etc) to determine the _min and _max values for a certain number of sigma out on each prior:

Nsig = 10

_n = 10
m0 = -Nsig*2000
s0 = Nsig*5000
t0 = Nsig*500
m2 = Nsig*10000 + 10000
s2 = Nsig*5000

_min = np.floor(min([m0-_n*s0, m2-_n*s2]))
_max = np.ceil(max([m0+_n*np.sqrt(s0**2+t0**2), m2+_n*s2]))
print(f"Length of _x (min:{_min}, max:{_max}): {abs(_min-_max):e}")

Returns:

Length of _x (min:-520000.0, max:610000.0): 1.130000e+06

Here I’m out 10\sigma on each prior (i.e. sampling way out in the tail of each prior distribution—that’s something like a 10^{-5} event based on the CDF for each prior, which shouldn’t be happening very often during model sampling) and even in this case |_min - _max| is around 10e6. So I don’t see how it’s possible to get the start and stop values you’re getting (greater than 10e+15) for arange. Am I missing something fundamental about how PyMC is doing the prior sampling?

jstanley · December 9, 2022, 11:27pm

As an aside, is there no way to evaluate the aesara tensor variables _min and _max in my model directly instead of having to use the monkey-patched fix you used (not to say it wasn’t a clever idea, but rather it seems like it would be helpful in general to be able to directly evaluate those variables within the model for debugging purposes )?

ricardoV94 · December 10, 2022, 5:09am

You can wrap them in a printOp to see their values everytime the function is evaluated: Debugging Aesara: FAQ and Troubleshooting — Aesara 0+untagged.50.gfe3e76d.dirty documentation

I didn’t try that because I don’t know how it works and I was afraid it might affect the final shape of the graph. But it doesn’t hurt to try.

ricardoV94 · December 10, 2022, 5:11am

As to why it happens, my best theories are: an Aesara bug, numerical issues (overflow underflow somewhere) or your data pushes the sampler into more extreme values than seem plausible a priori (or are you just sampling from the prior?)

jstanley · December 10, 2022, 6:10am

Hm, yeah I tried the Print Op based on that example but I was unable to get it to work. I apparently don’t understand the syntax. I created the aesara function using the Print object as specified in that example, but it’s not clear to me where I’m supposed to call that function to produce the printing. In my above model code (where the _min variable is defined in comp1_logp()) I included the following:

_min = at.floor(at.min([m0-_n*s0, m2-_n*s2]))

min_print = ae.printing.Print("min val: ")(_min)
f_min_print = ae.function([_min], min_print)
f_min_print([_min])

Which errors. I assume it’s because I’m calling the f_min_print incorrectly or in the wrong place (when I comment it out I don’t get an error, but also nothing is printed)

In the example you linked, they call the printing function using explicit numeric values, but in my case, since it’s embedded in the middle of the graph, I don’t know how to access the numeric values that go into the computing of _min.

jstanley · December 10, 2022, 6:13am

Yeah, I think you’re probably right it’s one of those. I’m going to try using scan next as you suggested earlier to see if that works.

ricardoV94 · December 10, 2022, 6:25am

For the print op you want to just use it when you define it in the PyMC model and then when you call pm.sample it will be included in whatever functions PyMC compiles and evaluates during sampling.

with pm.Model()
  x = pm.Normal("x")
  x = PrintOp("...")(x)
  y = pm.Normal(”y”, x)
  pm.sample()

The Scan will take too long with such large number of steps.

jstanley · December 10, 2022, 6:28am

Hm, it’s a surreal experience to search for examples using Print only to find my own post from a couple years ago that uses it successfully in PyMC3:

However, printing is no longer a submodule of the tensor module, and using the same syntax I did in the post I linked here doesn’t work. This to me seems like an aesara bug with Print.

As an aside, apparently you were the one that solved my problem in that previous post!

ricardoV94 · December 10, 2022, 6:30am

Printing was probably moved around but should still exist in a similar form.

jstanley · December 10, 2022, 6:38am

Right, yeah I assume you mean this one:

help(ae.printing.Print)

Help on class Print in module aesara.printing:

class Print(aesara.graph.op.Op)
 |  Print(message='', attrs=('__str__',), global_fn=<function _print_fn at 0x2b09717edfc0>)
 |  
 |  This identity-like Op print as a side effect.
 |  
 |  This identity-like Op has the side effect of printing a message
 |  followed by its inputs when it runs. Default behaviour is to print
 |  the __str__ representation. Optionally, one can pass a list of the
 |  input member functions to execute, or attributes to print.

So, in the model I have (changing nothing else):

_min = at.floor(at.min([m0-_n*s0, m2-_n*s2]))
min_print = ae.printing.Print("min val: ")(_min)

And when I sample it:

with mod:
    vi = pm.ADVI(random_seed=42)
    approx = vi.fit(100, progressbar=True)

Nothing is printed (the same is true when sampling with pm.sample()).

jstanley · December 10, 2022, 6:47am

To add some further detail: I also tried printing one of the prior variables in case the issue was that the other Print was being done inside a user-defined function. So if you include a print of the variable s0_0 just below its definition with the following line:

    s0_0_print = ae.printing.Print("s0 val: ")(s0_0)

This also does not produce a print statement upon sampling.

jstanley · December 10, 2022, 7:02am

Ah, so I tried setting _n=3 as you did but I still got a memory error when sampling with NUTS.

...
MemoryError                               Traceback (most recent call last)
MemoryError: Unable to allocate 66.9 TiB for an array with shape (18382417557864,) and data type int32
...

It appears to be a stochastic problem: Sometimes when I run the sampling it runs ok for a few hundred iterations, other times it errors almost immediately (which lends credence to the idea that there’s a “no go zone” in the parameter space). How were you running the sampling with _n=3 that seemed to work?

(And now I’m beginning to feel like I’m bombarding you! My apologies…)

ricardoV94 · December 10, 2022, 7:48am

You need to then use that variable in the model and not the original one. Otherwise the print op will not be part of the graph.

Do this:

with pm.Model()
  x = pm.Normal("x")
  x_print = PrintOp("...")(x)
  y = pm.Normal("y", x_print)
  pm.sample()

Not this:

with pm.Model()
  x = pm.Normal("x")
  x_print = PrintOp("...")(x)
  y = pm.Normal("y", x)
  pm.sample()

ricardoV94 · December 10, 2022, 7:54am

No need to apologize. My pleasure to help

Topic		Replies	Views
Excessive memory - Multiple regression Questions	9	2932	October 24, 2017
Runtime Exception with NUTS v3	16	1141	March 16, 2022
Trouble with Initilization and Posterior Sampling Questions theano	13	2226	June 15, 2018
Sampling time is very long (minutes per sample) Questions	13	4564	November 2, 2017
Larger dataset linear regression GPU memory problems with NUTS for larger N and D (batch training?) Questions theano	2	1869	October 3, 2017

Memory allocation limit for NUTS with custom `logp` function (but not with VI methods)

Related topics