First of all I wanted to provide a code example for others as a starting point in case somebody else would like to do something similar.

Second I’d like to have your feed-back if that’s the way to go or if there are other/better ways to do it. I noticed that on several runs with more tuning steps the r_hat of the estimated parameters got very high, sometimes above 1.5. In the one run that I saved that’s sadly not the case, but I ran the notebook several times and often the r_hat values would be high plus there were several/many divergencies. Therefore I guess there’s something not quite right with my implemenation.

P.S.: it is on purpose that I did not use something like pm.GaussianRandomWalk(), because this implementation is more “transparent” (you see in detail what’s going on) and you can change it to something non-Gaussian if needed. I’d like to keep that level of transparency if possible.

I work on SSM a lot, using mostly https://www.tensorflow.org/probability/api_docs/python/tfp/sts (which in my opinion also have the best, transparent design).
From a quick look of your notebook, I think it is definitely the way to go using theano.scan. However, the Kalman filtering step doesnt seem right as there is no filtering and predict step (kalmen gain etc).

The result does match, I guess I am trying to point out that if you want to implement Kalman Filter that take advantage of the Gaussian Conjugacy for updating parameter, there are few more step you need to implement in the update (see @brandonwillard’s post for more detail).
Also note that @brandonwillard’s post use symbolic-pymc, it has a huge advantage that it can isolate state update and state observed into separate theano.scan, and flexibly reuse the output tensor for different filtering. In regular pymc3 you will need to merge these steps into 1 and do one pass update (similar to what TFP currently doing: https://github.com/tensorflow/probability/blob/621678fff3624f7c347efb5b1d78f2f553eee806/tensorflow_probability/python/distributions/linear_gaussian_ssm.py#L1529-L1613)

In case anyone else comes here looking for Kalman Filtering in PyMC3, I just wanted to flag a page that Chad Fulton and I added to the Statsmodels docs that shows how to fit statespace models using a combination of Statsmodels and PyMC3. The link is: Fast Bayesian estimation of SARIMAX models. This approach doesn’t use theano.scan as it hands over the statespace part to Statsmodels, which simplifies matters considerably. However, I’m not familiar enough with theano to say whether this approach is better or worse in other ways.

Wow this is awesome, thanks for sharing!
Question: is the score function to get gradient only for SARIMAX or it is a general method that works with any state space model?