How to read model.profile

HI Discourse Community,

I have been trying to profile the model as I am seeing the sampler slowing down with increasing input dataset.

<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
  44.6%    44.6%       0.058s       5.84e-05s   1000    17   Composite{...}(ExpandDims{axis=1}.0, [[-1.]], [[0.5]], ExpandDims{axis=1}.0, [[0.70710679]], [[3.178053 ... 39789527]], ExpandDims{axis=1}.0, [[False] ... [False]], [[0]], [[-3.17805 ... 39789527]], [[6.28318531]])
  26.0%    70.6%       0.034s       3.41e-05s   1000    14   Mul(ExpandDims{axis=0}.0, features_num{[[ 0. ... 2713767 ]]})
  14.4%    85.0%       0.019s       1.88e-05s   1000    15   Sum{axis=1}(Mul.0)
   1.8%    86.8%       0.002s       2.33e-06s   1000    19   Sum{axes=None}(LL_logprob)
   1.5%    88.3%       0.002s       2.01e-06s   1000    13   ExpandDims{axis=0}(theta_D)
   1.3%    89.6%       0.002s       1.69e-06s   1000    16   ExpandDims{axis=1}(Sum{axis=1}.0)
   1.2%    90.7%       0.002s       1.51e-06s   1000    10   Alloc(ExpandDims{axis=1}.0, 1, 2080)
   0.8%    91.6%       0.001s       1.08e-06s   1000     9   ExpandDims{axis=1}(SpecifyShape.0)
   0.8%    92.4%       0.001s       1.03e-06s   1000    18   Reshape{1}(Composite{...}.0, [-1])
   0.7%    93.1%       0.001s       9.78e-07s   1000     2   ExpandDims{axis=1}(SpecifyShape.0)
   0.7%    93.8%       0.001s       9.63e-07s   1000    11   Reshape{1}(Alloc.0, [-1])
   0.7%    94.6%       0.001s       9.45e-07s   1000    12   ExpandDims{axis=1}(Reshape{1}.0)
   0.7%    95.3%       0.001s       9.25e-07s   1000     5   ExpandDims{axis=1}(Reshape{1}.0)
   0.7%    96.0%       0.001s       9.14e-07s   1000     6   Reshape{1}(sigma_D_log__, [-1])
   0.7%    96.7%       0.001s       9.14e-07s   1000     3   Alloc(ExpandDims{axis=1}.0, 1, 2080)
   0.7%    97.3%       0.001s       8.73e-07s   1000     4   Reshape{1}(Alloc.0, [-1])
   0.6%    97.9%       0.001s       7.67e-07s   1000     0   Reshape{1}(mu_D, [-1])
   0.3%    98.3%       0.000s       4.41e-07s   1000     7   Exp(Reshape{1}.0)
   0.3%    98.6%       0.000s       4.02e-07s   1000    26   Composite{((i2 * sqr((i0 * i1))) - i3)}([0.01], theta_D, [-0.5], [5.52410872])
   0.2%    98.8%       0.000s       2.89e-07s   1000    28   MakeVector{dtype='float64'}(Sum{axes=None}.0, Sum{axes=None}.0, sigma > 0, sigma > 0, sigma_D_log___logprob, sigma_P_log___logprob, Sum{axes=None}.0)
   ... (remaining 10 Apply instances account for 1.22%(0.00s) of the runtime)

I would be grateful if somebody could help me understand how to read the above and also suggest if potential bottlenecks. Like I can see ~ 50% of the time is going into some Op called Expand_dims… why and what exactly is it doing? Is it normal to see this op hogging this amount of time?

Thanks in advance !

This is the the compiled graph with all of the optimization re-writes applied, so it can be hard to read. The first op you see, Composite{...}(ExpandDims.., isn’t actually an ExpandDims Op, it’s a Composite Op, which is an optimized fusion of a bunch of smaller operations. Knowing what I know about your model, I think this is the function where you have a bunch of pt.where and boolean masks going on.

The rest of the Ops look quotidian – the 2nd and 3rd ones look like they jointly form a linear projection X\beta with X.ndim > 2. I’m not sure if there’s any optimization to be done there.

What I would recommend is to focus on the first bit. I’d compile that code into its own pytensor function with symbolic inputs, then time the function’s performance as you scale up the size of the data.

You should also re-visit how you’re doing that computation, and see if you can do it in a more efficient way, for example by thinking about if there’s anything you can pre-compute and pass in as an argument to the function.