Here’s a clearer and more polished version of your message that preserves all the important content and intent:
Hi all,
I’ve opened a pull request to incorporate the changes to the Representation
class here:
https://github.com/pymc-devs/pymc-extras/pull/490
I also started merging some of the changes into structural.py
. Previously, I had been inheriting from the structural component classes instead of modifying them directly. I’ve now tried to fold those changes back into the base classes, but I may have introduced some issues in the process.
Unfortunately, I haven’t been able to fully test this. We’ve been running things on Databricks internally, and I haven’t had much luck getting pymc-extras
to build cleanly there due to hatch dependency resolution issues.
On a broader note, I was able to get a more customized model running internally (which wouldn’t make sense to merge as-is). However, it was extremely slow (About a week to train). I was estimating a model with:
- ~1800 time periods,
k_endog = 4
,- time-varying measurement error,
- and a time-varying observation intercept.
The likelihood itself only takes ~0.2 seconds to evaluate, but with gradient calculations included, it looked like it would take about a week to run. When profiling, the entire cost seems to be coming from inside the scan
. Am I correct in assuming that the gradient of the scan is the main bottleneck?
Interestingly, a functionally similar model using pymc.GaussianRandomWalk
and pymc.AR
finishes in about 3 hours — presumably because it avoids scan
, at least in the same way.
Also, when I try to profile the gradient directly, I run into errors related to missing .data
attributes. Is it expected that you can’t profile the gradient when scan
is involved?
As a result, I’ve had to move away from using the state-space model, and put this work on the back burner. If the changes to structural.py aren’t useful, feel free to disregard. I just thought I’d share since you asked.