Rolling mean and rolling median in PyMC3 model

Hi,

I’m using PyMC3 for a model that does the folling:

  • The full dataset (timeseries), is a combination of multiple individual datasets that have a common trend
  • Each dataset has its own offset to subtract and its own white noise term that I add in quadrature with the initial errors
  • Then, inside the model, I want to take the (weighted) rolling median or rolling mean of the full residuals after subtracting the offsets and white noise terms:
    • The rolling window is based on time, not on a number of points, so I use a boolean mask to keep only weights and values in a certain time window around a given point.
    • For the rolling median, I need to use tt.argsort to sort values and calculate the weighted median (or at least I have not found another way so far), then I use pymc3.{theanof,aesaraf}.take_along_axis to sort the 2d arrays along the desired axes.

Currently, sampling the model with a rolling mean in it takes about 20 minutes to run on 4 CPUs for 500 tuning steps and 500 draws while the rolling median just keeps adding more time to the estimated total (after 40 hours it still shows > 80 hours). I first used a for loop to iterate over all the points in my timeseries, then tried with tt.scan, and then I tried to use matrices and perform the calculation along and axis.

My questions are:

  • Is there a known better way to implement such models with PyMC3 ?
  • According to the profiling output below, if I understand it correctly, ArgSort is the main bottleneck, so I was wondering if there are alternative mechanism in PyMC3 to avoid using argsort (or use it better).

I’m very much a beginner with PyMC3, sorry if this already has an answer somewhere. Let me know if there is other useful information I could share to help answer the question :slight_smile:.

Thank you !

logpt profile output

[nav] In [225]: rolling_model_pm.profile(rolling_model_pm.logpt).summary()
Function profiling
==================
  Message: /home/vandal/.pyenv/versions/v0cal/lib/python3.9/site-packages/pymc3/model.py:1254
  Time in 1000 calls to Function.__call__: 4.207738e+01s
  Time in Function.fn.__call__: 41.98405909538269s (99.778%)
  Time in thunks: 41.927605390548706s (99.644%)
  Total compile time: 4.491658e-01s
    Number of Apply nodes: 52
    Theano Optimizer time: 3.716309e-01s
       Theano validate time: 3.323317e-03s
    Theano Linker time (includes C, CUDA code generation/compiling): 0.036318063735961914s
       Import time 0.000000e+00s
       Node make_thunk time 3.473973e-02s
           Node Elemwise{mul,no_inplace}(InplaceDimShuffle{x,0}.0, TensorConstant{[[1. 1. 0...1. 1. 1.]]}) time 9.595394e-03s
           Node Elemwise{Composite{((((((((((i0 - (i1 * i2)) - (i3 * i4)) - (i5 * i6)) - (i7 * i8)) - (i9 * i10)) - (i11 * i12)) - (i13 * i14)) - (i15 * i16)) - (i17 * i18)) - (i19 * i20))}}(TensorConstant{[-5.323109...58023765]},
InplaceDimShuffle{x}.0, TensorConstant{[0. 0. 0. .. 0. 0. 1.]}, InplaceDimShuffle{x}.0, TensorConstant{[0. 0. 0. .. 0. 0. 0.]}, InplaceDimShuffle{x}.0, TensorConstant{[0. 0. 0. .. 0. 0. 0.]}, InplaceDimShuffle{x}.0, TensorConstant{
[0. 1. 0. .. 0. 1. 0.]}, InplaceDimShuffle{x}.0, TensorConstant{[0. 0. 0. .. 0. 0. 0.]}, InplaceDimShuffle{x}.0, TensorConstant{[0. 0. 0. .. 0. 0. 0.]}, InplaceDimShuffle{x}.0, TensorConstant{[0. 0. 0. .. 0. 0. 0.]}, InplaceDimShuf
fle{x}.0, TensorConstant{[1. 0. 1. .. 0. 0. 0.]}, InplaceDimShuffle{x}.0, TensorConstant{[0. 0. 0. .. 0. 0. 0.]}, InplaceDimShuffle{x}.0, TensorConstant{[0. 0. 0. .. 1. 0. 0.]}) time 1.569986e-03s
           Node Elemwise{Composite{(i0 * (i1 + (i2 * sqr((i3 + i4)))))}}(TensorConstant{0.5}, TensorConstant{-9.496770537132663}, TensorConstant{-0.0004718..2148866378}, TensorConstant{0.07975531206725212}, gamma1) time 1.4977
46e-03s
           Node InplaceDimShuffle{x}(gamma2) time 9.064674e-04s
           Node Elemwise{Composite{((i0 * i1 * sqr((i2 - (i3 + (((i4 - i5) * (i6 - i3)) / (i7 - i5)))))) + i8)}}[(0, 2)](TensorConstant{(1,) of -1.0}, TensorConstant{[0.8051108...49118222]}, Elemwise{Composite{((((((((((i0 - (i1 *
i2)) - (i3 * i4)) - (i5 * i6)) - (i7 * i8)) - (i9 * i10)) - (i11 * i12)) - (i13 * i14)) - (i15 * i16)) - (i17 * i18)) - (i19 * i20))}}.0, Subtensor{int64}.0, TensorConstant{(1,) of 0.5}, Subtensor{int64}.0, Subtensor{int64}.0, Subt
ensor{int64}.0, TensorConstant{[-2.054652...54881717]}) time 7.774830e-04s

Time in all call to theano.grad() 2.246730e+01s
Time since theano import 36252.758s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
  34.7%    34.7%      14.537s       1.45e-02s     Py    1000       1   theano.tensor.sort.ArgSortOp
  29.2%    63.8%      12.224s       3.06e-03s     Py    4000       4   theano.tensor.subtensor.AdvancedSubtensor
  16.3%    80.1%       6.827s       3.59e-04s     C    19000      19   theano.tensor.elemwise.Elemwise
   9.2%    89.3%       3.866s       3.87e-03s     C     1000       1   theano.tensor.extra_ops.CumOp
   4.2%    93.6%       1.773s       1.77e-03s     Py    1000       1   theano.tensor.extra_ops.DiffOp
   4.1%    97.7%       1.738s       5.79e-04s     C     3000       3   theano.tensor.elemwise.Sum
   2.2%    99.9%       0.929s       9.29e-04s     C     1000       1   theano.tensor.basic.Argmax
   0.0%   100.0%       0.016s       1.01e-06s     C    16000      16   theano.tensor.elemwise.DimShuffle
   0.0%   100.0%       0.013s       1.33e-05s     C     1000       1   theano.tensor.basic.Join
   0.0%   100.0%       0.003s       8.15e-07s     C     4000       4   theano.tensor.subtensor.Subtensor
   0.0%   100.0%       0.002s       1.66e-06s     C     1000       1   theano.tensor.opt.MakeVector
   ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)

Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
  34.7%    34.7%      14.537s       1.45e-02s     Py    1000        1   ArgSortOp{quicksort, None}
  29.2%    63.8%      12.224s       3.06e-03s     Py    4000        4   AdvancedSubtensor
   9.2%    73.0%       3.866s       3.87e-03s     C     1000        1   CumOp{1, add}
   4.2%    77.3%       1.773s       1.77e-03s     Py    1000        1   DiffOp{n=1, axis=1}
   4.2%    81.4%       1.742s       1.74e-03s     C     1000        1   Elemwise{true_div,no_inplace}
   4.1%    85.6%       1.736s       1.74e-03s     C     1000        1   Sum{axis=[1], acc_dtype=float64}
   3.9%    89.5%       1.655s       1.66e-03s     C     1000        1   Elemwise{sub,no_inplace}
   3.1%    92.6%       1.308s       1.31e-03s     C     1000        1   Elemwise{Composite{sgn(((i0 + i1) - i2))}}[(0, 1)]
   2.5%    95.2%       1.067s       1.07e-03s     C     1000        1   Elemwise{mul,no_inplace}
   2.4%    97.6%       1.026s       1.03e-03s     C     1000        1   Elemwise{Composite{((i0 * i1) / i2)}}[(0, 1)]
   2.2%    99.8%       0.929s       9.29e-04s     C     1000        1   Argmax{axis=(1,)}
   0.0%    99.9%       0.013s       1.33e-05s     C     1000        1   Join
   0.0%    99.9%       0.012s       1.19e-05s     C     1000        1   Elemwise{Composite{((((((((((i0 - (i1 * i2)) - (i3 * i4)) - (i5 * i6)) - (i7 * i8)) - (i9 * i10)) - (i11 * i12)) - (i13 * i14)) - (i15 * i16)) - (i17 * i18)) -
 (i19 * i20))}}
   0.0%    99.9%       0.008s       7.99e-07s     C     10000       10   InplaceDimShuffle{x}
   0.0%    99.9%       0.008s       7.86e-07s     C     10000       10   Elemwise{Composite{(i0 * (i1 + (i2 * sqr((i3 + i4)))))}}
   0.0%   100.0%       0.005s       4.81e-06s     C     1000        1   Elemwise{Composite{((i0 * i1 * sqr((i2 - (i3 + (((i4 - i5) * (i6 - i3)) / (i7 - i5)))))) + i8)}}[(0, 2)]
   0.0%   100.0%       0.003s       3.49e-06s     C     1000        1   Elemwise{add,no_inplace}
   0.0%   100.0%       0.003s       8.15e-07s     C     4000        4   Subtensor{int64}
   0.0%   100.0%       0.003s       1.02e-06s     C     3000        3   InplaceDimShuffle{1,0}
   0.0%   100.0%       0.003s       1.49e-06s     C     2000        2   InplaceDimShuffle{x,0}
   ... (remaining 4 Ops account for   0.02%(0.01s) of the runtime)

Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
  34.7%    34.7%      14.537s       1.45e-02s   1000    23   ArgSortOp{quicksort, None}(Elemwise{mul,no_inplace}.0, TensorConstant{1})
  14.6%    49.3%       6.140s       6.14e-03s   1000    25   AdvancedSubtensor(Elemwise{mul,no_inplace}.0, TensorConstant{[[   0]
 [..]
 [1206]]}, ArgSortOp{quicksort, None}.0)
  14.3%    63.6%       5.988s       5.99e-03s   1000    24   AdvancedSubtensor(TensorConstant{[[0.805110..49118222]]}, TensorConstant{[[   0]
 [..]
 [1206]]}, ArgSortOp{quicksort, None}.0)
   9.2%    72.8%       3.866s       3.87e-03s   1000    30   CumOp{1, add}(Elemwise{true_div,no_inplace}.0)
   4.2%    77.0%       1.773s       1.77e-03s   1000    33   DiffOp{n=1, axis=1}(Elemwise{Composite{sgn(((i0 + i1) - i2))}}[(0, 1)].0)
   4.2%    81.2%       1.742s       1.74e-03s   1000    28   Elemwise{true_div,no_inplace}(AdvancedSubtensor.0, InplaceDimShuffle{0,x}.0)
   4.1%    85.3%       1.736s       1.74e-03s   1000    26   Sum{axis=[1], acc_dtype=float64}(AdvancedSubtensor.0)
   3.9%    89.3%       1.655s       1.66e-03s   1000    31   Elemwise{sub,no_inplace}(CumOp{1, add}.0, Elemwise{Composite{((i0 * i1) / i2)}}[(0, 1)].0)
   3.1%    92.4%       1.308s       1.31e-03s   1000    32   Elemwise{Composite{sgn(((i0 + i1) - i2))}}[(0, 1)](TensorConstant{(1, 1) of -0.5}, CumOp{1, add}.0, Elemwise{Composite{((i0 * i1) / i2)}}[(0, 1)].0)
   2.5%    95.0%       1.067s       1.07e-03s   1000    22   Elemwise{mul,no_inplace}(InplaceDimShuffle{x,0}.0, TensorConstant{[[1. 1. 0...1. 1. 1.]]})
   2.4%    97.4%       1.026s       1.03e-03s   1000    29   Elemwise{Composite{((i0 * i1) / i2)}}[(0, 1)](TensorConstant{(1, 1) of 0.5}, AdvancedSubtensor.0, InplaceDimShuffle{0,x}.0)
   2.2%    99.6%       0.929s       9.29e-04s   1000    34   Argmax{axis=(1,)}(DiffOp{n=1, axis=1}.0)
   0.1%    99.8%       0.061s       6.08e-05s   1000    40   AdvancedSubtensor(AdvancedSubtensor.0, TensorConstant{[[   0]
 [..]
 [1206]]}, InplaceDimShuffle{1,0}.0)
   0.1%    99.8%       0.035s       3.45e-05s   1000    39   AdvancedSubtensor(Elemwise{sub,no_inplace}.0, TensorConstant{[[   0]
 [..]
 [1206]]}, InplaceDimShuffle{1,0}.0)
   0.0%    99.9%       0.013s       1.33e-05s   1000    37   Join(TensorConstant{0}, InplaceDimShuffle{x,0}.0, Elemwise{add,no_inplace}.0)
   0.0%    99.9%       0.012s       1.19e-05s   1000    20   Elemwise{Composite{((((((((((i0 - (i1 * i2)) - (i3 * i4)) - (i5 * i6)) - (i7 * i8)) - (i9 * i10)) - (i11 * i12)) - (i13 * i14)) - (i15 * i16)) - (i17 * i18)) - (i19 * i20
))}}(TensorConstant{[-5.323109...58023765]}, InplaceDimShuffle{x}.0, TensorConstant{[0. 0. 0. .. 0. 0. 1.]}, InplaceDimShuffle{x}.0, TensorConstant{[0. 0. 0. .. 0. 0. 0.]}, InplaceDimShuffle{x}.0, TensorConstant{[0. 0. 0. .. 0. 0.
0.]}, InplaceDimShuffle{x}.0, TensorConstant{[0. 1.
   0.0%    99.9%       0.005s       4.81e-06s   1000    47   Elemwise{Composite{((i0 * i1 * sqr((i2 - (i3 + (((i4 - i5) * (i6 - i3)) / (i7 - i5)))))) + i8)}}[(0, 2)](TensorConstant{(1,) of -1.0}, TensorConstant{[0.8051108...4911822
2]}, Elemwise{Composite{((((((((((i0 - (i1 * i2)) - (i3 * i4)) - (i5 * i6)) - (i7 * i8)) - (i9 * i10)) - (i11 * i12)) - (i13 * i14)) - (i15 * i16)) - (i17 * i18)) - (i19 * i20))}}.0, Subtensor{int64}.0, TensorConstant{(1,) of 0.5},
 Subtensor{int64}.0, Subtensor{int64}.0, Subtensor{i
   0.0%    99.9%       0.003s       3.49e-06s   1000    36   Elemwise{add,no_inplace}(TensorConstant{(1, 1) of 1}, InplaceDimShuffle{x,0}.0)
   0.0%    99.9%       0.002s       2.15e-06s   1000    27   InplaceDimShuffle{0,x}(Sum{axis=[1], acc_dtype=float64}.0)
   0.0%    99.9%       0.002s       2.10e-06s   1000    48   Sum{acc_dtype=float64}(Elemwise{Composite{((i0 * i1 * sqr((i2 - (i3 + (((i4 - i5) * (i6 - i3)) / (i7 - i5)))))) + i8)}}[(0, 2)].0)
   ... (remaining 32 Apply instances account for 0.07%(0.03s) of the runtime)

Here are tips to potentially make your code run faster
                 (if you think of new ones, suggest them on the mailing list).
                 Test them first, as they are not guaranteed to always provide a speedup.
  - Try the Theano flag floatX=float32

Can you compute the sorted values before you pass in the data? In other words, for each point that you want to compute the median could you sort the weighted subset before you pass everything into the pymc3 model? The data will be messier to start but you could avoid the sort.

Hi, thanks for the suggestion. I don’t think it would work in this case as I have two (or more) datasets that have points at each epoch, and the model includes an offset for each dataset that is subtracted before taking the median at each epoch. So it is not excluded that a change in individual offsets whill shifts some points from dataset 1 below points of dataset 2 and vice-versa, so I think the sorting needs to be done inside the model unfortunately.