I have trouble interpreting the profiling information. What should I be looking for?
Fast model (Wide form data):
Function profiling
==================
Message: C:\Users\alfoc.ULAVAL\AppData\Local\Continuum\miniconda3\envs\data-science\lib\site-packages\pymc3\model.py:909
Time in 1000 calls to Function.__call__: 4.900503e-02s
Time in Function.fn.__call__: 2.300286e-02s (46.940%)
Time in thunks: 1.700163e-02s (34.694%)
Total compile time: 5.300529e-01s
Number of Apply nodes: 17
Theano Optimizer time: 1.930192e-01s
Theano validate time: 0.000000e+00s
Theano Linker time (includes C, CUDA code generation/compiling): 3.600359e-02s
Import time 1.200104e-02s
Node make_thunk time 3.500342e-02s
Node Elemwise{Composite{Switch(i0, (i1 * ((-(i2 * sqr((i3 - i4)))) + i5)), i6)}}(Elemwise{Composite{Cast{int8}(GT(i0, i1))}}.0, TensorConstant{(1, 1, 1) of 0.5}, Elemwise{Composite{inv(sqr(i0))}}[(0, 0)].0, TensorConstant{[[[11. 0... 0. 89.]]]}, Elemwise{add,no_inplace}.0, Elemwise{Composite{log((i0 * i1))}}.0, TensorConstant{(1, 1, 1) of -inf}) time 2.000213e-02s
Node Elemwise{Composite{(Switch(Cast{int8}(GE(i0, i1)), (i2 - log1p(sqr((i3 * i0)))), i4) + i5)}}(σ_avatars_age, TensorConstant{(1, 1, 1) of 0}, TensorConstant{(1, 1, 1) ..8588394566}, TensorConstant{(1, 1, 1) of 0.04}, TensorConstant{(1, 1, 1) of -inf}, σ_avatars_age_log__) time 4.000425e-03s
Node Elemwise{Composite{((i0 + Switch(Cast{int8}((GE(i1, i2) * LE(i1, i3))), i4, i5)) - ((i6 * scalar_softplus((-i7))) + i7))}}[(0, 1)](TensorConstant{(1, 1, 1) ..0185988092}, μ_avatars_age, TensorConstant{(1, 1, 1) of 0.0}, TensorConstant{(1, 1, 1) of 100.0}, TensorConstant{(1, 1, 1) ..0185988092}, TensorConstant{(1, 1, 1) of -inf}, TensorConstant{(1, 1, 1) of 2.0}, μ_avatars_age_interval__) time 4.000187e-03s
Node Elemwise{Composite{inv(sqr(i0))}}[(0, 0)](σ_avatars_age) time 2.000093e-03s
Node Elemwise{Mul}[(0, 1)](TensorConstant{0.5}, Sum{acc_dtype=float64}.0) time 1.000166e-03s
Time in all call to theano.grad() 1.612999e+00s
Time since theano import 1342.864s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
88.2% 88.2% 0.015s 1.36e-06s C 11000 11 theano.tensor.elemwise.Elemwise
11.8% 100.0% 0.002s 4.00e-07s C 5000 5 theano.tensor.elemwise.Sum
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 theano.tensor.opt.MakeVector
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime)
Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
58.8% 58.8% 0.010s 1.00e-05s C 1000 1 Elemwise{Composite{Switch(i0, (i1 * ((-(i2 * sqr((i3 - i4)))) + i5)), i6)}}
11.8% 70.6% 0.002s 4.00e-07s C 5000 5 Sum{acc_dtype=float64}
5.9% 76.5% 0.001s 1.00e-06s C 1000 1 Elemwise{exp,no_inplace}
5.9% 82.4% 0.001s 1.00e-06s C 1000 1 Elemwise{Composite{(Switch(Cast{int8}(GE(i0, i1)), (i2 - log1p(sqr((i3 * i0)))), i4) + i5)}}
5.9% 88.2% 0.001s 1.00e-06s C 1000 1 Elemwise{add,no_inplace}
5.9% 94.1% 0.001s 1.00e-06s C 1000 1 Elemwise{Composite{inv(sqr(i0))}}[(0, 0)]
5.9% 100.0% 0.001s 1.00e-06s C 1000 1 Elemwise{Composite{((i0 + Switch(Cast{int8}((GE(i1, i2) * LE(i1, i3))), i4, i5)) - ((i6 * scalar_softplus((-i7))) + i7))}}[(0, 1)]
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Elemwise{Composite{(i0 * scalar_sigmoid(i1))}}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Elemwise{Composite{(i0 + (i1 * sqr(i2)))}}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Elemwise{Composite{Cast{int8}(GT(i0, i1))}}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Elemwise{Mul}[(0, 1)]
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Elemwise{Composite{log((i0 * i1))}}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 MakeVector{dtype='float64'}
... (remaining 0 Ops account for 0.00%(0.00s) of the runtime)
Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
58.8% 58.8% 0.010s 1.00e-05s 1000 13 Elemwise{Composite{Switch(i0, (i1 * ((-(i2 * sqr((i3 - i4)))) + i5)), i6)}}(Elemwise{Composite{Cast{int8}(GT(i0, i1))}}.0, TensorConstant{(1, 1, 1) of 0.5}, Elemwise{Composite{inv(sqr(i0))}}[(0, 0)].0, TensorConstant{[[[11. 0... 0. 89.]]]}, Elemwise{add,no_inplace}.0, Elemwise{Composite{log((i0 * i1))}}.0, TensorConstant{(1, 1, 1) of -inf})
11.8% 70.6% 0.002s 2.00e-06s 1000 14 Sum{acc_dtype=float64}(Elemwise{Composite{Switch(i0, (i1 * ((-(i2 * sqr((i3 - i4)))) + i5)), i6)}}.0)
5.9% 76.5% 0.001s 1.00e-06s 1000 8 Elemwise{Composite{((i0 + Switch(Cast{int8}((GE(i1, i2) * LE(i1, i3))), i4, i5)) - ((i6 * scalar_softplus((-i7))) + i7))}}[(0, 1)](TensorConstant{(1, 1, 1) ..0185988092}, μ_avatars_age, TensorConstant{(1, 1, 1) of 0.0}, TensorConstant{(1, 1, 1) of 100.0}, TensorConstant{(1, 1, 1) ..0185988092}, TensorConstant{(1, 1, 1) of -inf}, TensorConstant{(1, 1, 1) of 2.0}, μ_avatars_age_interval__)
5.9% 82.4% 0.001s 1.00e-06s 1000 7 Elemwise{Composite{inv(sqr(i0))}}[(0, 0)](σ_avatars_age)
5.9% 88.2% 0.001s 1.00e-06s 1000 5 Elemwise{add,no_inplace}(μ_avatars_age, μ_participant_bias)
5.9% 94.1% 0.001s 1.00e-06s 1000 3 Elemwise{Composite{(Switch(Cast{int8}(GE(i0, i1)), (i2 - log1p(sqr((i3 * i0)))), i4) + i5)}}(σ_avatars_age, TensorConstant{(1, 1, 1) of 0}, TensorConstant{(1, 1, 1) ..8588394566}, TensorConstant{(1, 1, 1) of 0.04}, TensorConstant{(1, 1, 1) of -inf}, σ_avatars_age_log__)
5.9% 100.0% 0.001s 1.00e-06s 1000 0 Elemwise{exp,no_inplace}(σ_avatars_age_log__)
0.0% 100.0% 0.000s 0.00e+00s 1000 16 Sum{acc_dtype=float64}(MakeVector{dtype='float64'}.0)
0.0% 100.0% 0.000s 0.00e+00s 1000 15 MakeVector{dtype='float64'}(__logp_μ_avatars_age_interval__, __logp_σ_avatars_age_log__, __logp_μ_participant_bias, __logp_obs_age)
0.0% 100.0% 0.000s 0.00e+00s 1000 12 Sum{acc_dtype=float64}(Elemwise{Composite{((i0 + Switch(Cast{int8}((GE(i1, i2) * LE(i1, i3))), i4, i5)) - ((i6 * scalar_softplus((-i7))) + i7))}}[(0, 1)].0)
0.0% 100.0% 0.000s 0.00e+00s 1000 11 Elemwise{Composite{log((i0 * i1))}}(TensorConstant{(1, 1, 1) ..4309189535}, Elemwise{Composite{inv(sqr(i0))}}[(0, 0)].0)
0.0% 100.0% 0.000s 0.00e+00s 1000 10 Elemwise{Mul}[(0, 1)](TensorConstant{0.5}, Sum{acc_dtype=float64}.0)
0.0% 100.0% 0.000s 0.00e+00s 1000 9 Sum{acc_dtype=float64}(Elemwise{Composite{(Switch(Cast{int8}(GE(i0, i1)), (i2 - log1p(sqr((i3 * i0)))), i4) + i5)}}.0)
0.0% 100.0% 0.000s 0.00e+00s 1000 6 Sum{acc_dtype=float64}(Elemwise{Composite{(i0 + (i1 * sqr(i2)))}}.0)
0.0% 100.0% 0.000s 0.00e+00s 1000 4 Elemwise{Composite{Cast{int8}(GT(i0, i1))}}(σ_avatars_age, TensorConstant{(1, 1, 1) of 0})
0.0% 100.0% 0.000s 0.00e+00s 1000 2 Elemwise{Composite{(i0 + (i1 * sqr(i2)))}}(TensorConstant{(1, 1, 1) ..2891277546}, TensorConstant{(1, 1, 1) of -0.04}, μ_participant_bias)
0.0% 100.0% 0.000s 0.00e+00s 1000 1 Elemwise{Composite{(i0 * scalar_sigmoid(i1))}}(TensorConstant{(1, 1, 1) of 100.0}, μ_avatars_age_interval__)
... (remaining 0 Apply instances account for 0.00%(0.00s) of the runtime)
Here are tips to potentially make your code run faster
(if you think of new ones, suggest them on the mailing list).
Test them first, as they are not guaranteed to always provide a speedup.
- Try the Theano flag floatX=float32
- Try installing amdlibm and set the Theano flag lib.amdlibm=True. This speeds up only some Elemwise operation.
Slow model (Long form data) :
Function profiling
==================
Message: C:\Users\alfoc.ULAVAL\AppData\Local\Continuum\miniconda3\envs\data-science\lib\site-packages\pymc3\model.py:909
Time in 1000 calls to Function.__call__: 1.150117e-01s
Time in Function.fn.__call__: 8.600879e-02s (74.783%)
Time in thunks: 8.000898e-02s (69.566%)
Total compile time: 4.070408e-01s
Number of Apply nodes: 17
Theano Optimizer time: 3.190322e-01s
Theano validate time: 2.000093e-03s
Theano Linker time (includes C, CUDA code generation/compiling): 4.100394e-02s
Import time 6.000996e-03s
Node make_thunk time 3.900385e-02s
Node CGemv{inplace}(CGemv{no_inplace}.0, TensorConstant{-1.0}, TensorConstant{[[1. 0. 0...0. 0. 1.]]}, μ_participant, TensorConstant{1.0}) time 1.300120e-02s
Node CGemv{no_inplace}(TensorConstant{[ 9. 11. ... 87. 89.]}, TensorConstant{-1.0}, TensorConstant{[[1. 0. 0...0. 0. 1.]]}, μ_avatars_age, TensorConstant{1.0}) time 1.100111e-02s
Node Elemwise{Composite{((i0 + Switch(Cast{int8}((GE(i1, i2) * LE(i1, i3))), i4, i5)) - ((i6 * scalar_softplus((-i7))) + i7))}}[(0, 1)](TensorConstant{(1,) of 4...0185988092}, μ_avatars_age, TensorConstant{(1,) of 0.0}, TensorConstant{(1,) of 100.0}, TensorConstant{(1,) of -4..0185988092}, TensorConstant{(1,) of -inf}, TensorConstant{(1,) of 2.0}, μ_avatars_age_interval__) time 4.000187e-03s
Node Elemwise{Composite{Switch(Cast{int8}(GT(i0, i1)), (i2 * ((-(Composite{inv(sqr(i0))}(i0) * sqr(i3))) + log((i4 * Composite{inv(sqr(i0))}(i0))))), i5)}}[(0, 0)](CGemv{inplace}.0, TensorConstant{(1,) of 0}, TensorConstant{(1,) of 0.5}, CGemv{inplace}.0, TensorConstant{(1,) of 0...4309189535}, TensorConstant{(1,) of -inf}) time 3.000259e-03s
Node Elemwise{Composite{(Switch(Cast{int8}(GE(i0, i1)), (i2 - log1p(sqr((i3 * i0)))), i4) + i5)}}[(0, 0)](σ_avatars_age, TensorConstant{(1,) of 0}, TensorConstant{(1,) of -3..8588394566}, TensorConstant{(1,) of 0.04}, TensorConstant{(1,) of -inf}, σ_avatars_age_log__) time 2.000332e-03s
Time in all call to theano.grad() 1.612999e+00s
Time since theano import 1361.139s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
72.5% 72.5% 0.058s 8.29e-06s C 7000 7 theano.tensor.elemwise.Elemwise
26.2% 98.8% 0.021s 7.00e-06s C 3000 3 theano.tensor.blas_c.CGemv
1.2% 100.0% 0.001s 2.00e-07s C 5000 5 theano.tensor.elemwise.Sum
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 theano.tensor.basic.AllocEmpty
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 theano.tensor.opt.MakeVector
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime)
Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
65.0% 65.0% 0.052s 5.20e-05s C 1000 1 Elemwise{Composite{Switch(Cast{int8}(GT(i0, i1)), (i2 * ((-(Composite{inv(sqr(i0))}(i0) * sqr(i3))) + log((i4 * Composite{inv(sqr(i0))}(i0))))), i5)}}[(0, 0)]
18.7% 83.7% 0.015s 7.50e-06s C 2000 2 CGemv{inplace}
7.5% 91.2% 0.006s 6.00e-06s C 1000 1 CGemv{no_inplace}
2.5% 93.7% 0.002s 2.00e-06s C 1000 1 Elemwise{exp,no_inplace}
1.3% 95.0% 0.001s 1.00e-06s C 1000 1 Elemwise{Composite{(i0 * scalar_sigmoid(i1))}}
1.3% 96.3% 0.001s 1.00e-06s C 1000 1 Elemwise{Composite{(i0 + (i1 * sqr(i2)))}}
1.3% 97.5% 0.001s 1.00e-06s C 1000 1 Elemwise{Composite{((i0 + Switch(Cast{int8}((GE(i1, i2) * LE(i1, i3))), i4, i5)) - ((i6 * scalar_softplus((-i7))) + i7))}}[(0, 1)]
1.3% 98.8% 0.001s 1.00e-06s C 1000 1 Elemwise{Composite{(Switch(Cast{int8}(GE(i0, i1)), (i2 - log1p(sqr((i3 * i0)))), i4) + i5)}}[(0, 0)]
1.2% 100.0% 0.001s 2.00e-07s C 5000 5 Sum{acc_dtype=float64}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 AllocEmpty{dtype='float64'}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Elemwise{Mul}[(0, 1)]
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 MakeVector{dtype='float64'}
... (remaining 0 Ops account for 0.00%(0.00s) of the runtime)
Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
65.0% 65.0% 0.052s 5.20e-05s 1000 13 Elemwise{Composite{Switch(Cast{int8}(GT(i0, i1)), (i2 * ((-(Composite{inv(sqr(i0))}(i0) * sqr(i3))) + log((i4 * Composite{inv(sqr(i0))}(i0))))), i5)}}[(0, 0)](CGemv{inplace}.0, TensorConstant{(1,) of 0}, TensorConstant{(1,) of 0.5}, CGemv{inplace}.0, TensorConstant{(1,) of 0...4309189535}, TensorConstant{(1,) of -inf})
12.5% 77.5% 0.010s 1.00e-05s 1000 9 CGemv{inplace}(CGemv{no_inplace}.0, TensorConstant{-1.0}, TensorConstant{[[1. 0. 0...0. 0. 1.]]}, μ_participant, TensorConstant{1.0})
7.5% 85.0% 0.006s 6.00e-06s 1000 4 CGemv{no_inplace}(TensorConstant{[ 9. 11. ... 87. 89.]}, TensorConstant{-1.0}, TensorConstant{[[1. 0. 0...0. 0. 1.]]}, μ_avatars_age, TensorConstant{1.0})
6.2% 91.2% 0.005s 5.00e-06s 1000 5 CGemv{inplace}(AllocEmpty{dtype='float64'}.0, TensorConstant{1.0}, TensorConstant{[[1. 0. 0...0. 0. 1.]]}, σ_avatars_age, TensorConstant{0.0})
2.5% 93.7% 0.002s 2.00e-06s 1000 1 Elemwise{exp,no_inplace}(σ_avatars_age_log__)
1.3% 95.0% 0.001s 1.00e-06s 1000 0 Elemwise{Composite{(i0 * scalar_sigmoid(i1))}}(TensorConstant{(1,) of 100.0}, μ_avatars_age_interval__)
1.3% 96.3% 0.001s 1.00e-06s 1000 8 Elemwise{Composite{(Switch(Cast{int8}(GE(i0, i1)), (i2 - log1p(sqr((i3 * i0)))), i4) + i5)}}[(0, 0)](σ_avatars_age, TensorConstant{(1,) of 0}, TensorConstant{(1,) of -3..8588394566}, TensorConstant{(1,) of 0.04}, TensorConstant{(1,) of -inf}, σ_avatars_age_log__)
1.3% 97.5% 0.001s 1.00e-06s 1000 7 Elemwise{Composite{((i0 + Switch(Cast{int8}((GE(i1, i2) * LE(i1, i3))), i4, i5)) - ((i6 * scalar_softplus((-i7))) + i7))}}[(0, 1)](TensorConstant{(1,) of 4...0185988092}, μ_avatars_age, TensorConstant{(1,) of 0.0}, TensorConstant{(1,) of 100.0}, TensorConstant{(1,) of -4..0185988092}, TensorConstant{(1,) of -inf}, TensorConstant{(1,) of 2.0}, μ_avatars_age_interval__)
1.3% 98.8% 0.001s 1.00e-06s 1000 3 Elemwise{Composite{(i0 + (i1 * sqr(i2)))}}(TensorConstant{(1,) of -5..2891277546}, TensorConstant{(1,) of -0.04}, μ_participant)
1.2% 100.0% 0.001s 1.00e-06s 1000 14 Sum{acc_dtype=float64}(Elemwise{Composite{Switch(Cast{int8}(GT(i0, i1)), (i2 * ((-(Composite{inv(sqr(i0))}(i0) * sqr(i3))) + log((i4 * Composite{inv(sqr(i0))}(i0))))), i5)}}[(0, 0)].0)
0.0% 100.0% 0.000s 0.00e+00s 1000 16 Sum{acc_dtype=float64}(MakeVector{dtype='float64'}.0)
0.0% 100.0% 0.000s 0.00e+00s 1000 15 MakeVector{dtype='float64'}(__logp_μ_avatars_age_interval__, __logp_σ_avatars_age_log__, __logp_μ_participant, __logp_obs_age)
0.0% 100.0% 0.000s 0.00e+00s 1000 12 Sum{acc_dtype=float64}(Elemwise{Composite{(Switch(Cast{int8}(GE(i0, i1)), (i2 - log1p(sqr((i3 * i0)))), i4) + i5)}}[(0, 0)].0)
0.0% 100.0% 0.000s 0.00e+00s 1000 11 Sum{acc_dtype=float64}(Elemwise{Composite{((i0 + Switch(Cast{int8}((GE(i1, i2) * LE(i1, i3))), i4, i5)) - ((i6 * scalar_softplus((-i7))) + i7))}}[(0, 1)].0)
0.0% 100.0% 0.000s 0.00e+00s 1000 10 Elemwise{Mul}[(0, 1)](TensorConstant{0.5}, Sum{acc_dtype=float64}.0)
0.0% 100.0% 0.000s 0.00e+00s 1000 6 Sum{acc_dtype=float64}(Elemwise{Composite{(i0 + (i1 * sqr(i2)))}}.0)
0.0% 100.0% 0.000s 0.00e+00s 1000 2 AllocEmpty{dtype='float64'}(TensorConstant{1600})
... (remaining 0 Apply instances account for 0.00%(0.00s) of the runtime)
Here are tips to potentially make your code run faster
(if you think of new ones, suggest them on the mailing list).
Test them first, as they are not guaranteed to always provide a speedup.
- Try the Theano flag floatX=float32
- Try installing amdlibm and set the Theano flag lib.amdlibm=True. This speeds up only some Elemwise operation.