Your Gaussian Random Walk implementation is fast, simple, and elegant. Samples in about 30 seconds when I ran it myself, compared to 90 seconds when I use my state space model with diffuse initialization (no estimation of x0, P0).
The parameter estimates are quite different, though. With apologies for the terrible formatting, here are the estimates from your model:
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
sigma_trend 0.038 0.017 0.016 0.069 0.002 0.001 116.0 204.0 1.03
sigma_mu 0.224 0.102 0.068 0.413 0.010 0.007 106.0 234.0 1.02
sigma_obs 0.725 0.072 0.588 0.851 0.004 0.003 322.0 880.0 1.01
And from my Kalman filter model:
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
sigma_mu[level] 0.150 0.071 0.047 0.276 0.001 0.001 3495.0 2834.0 1.0
sigma_mu[trend] 0.066 0.023 0.030 0.109 0.000 0.000 4628.0 2847.0 1.0
sigma_obs 0.442 0.092 0.284 0.614 0.001 0.001 3822.0 2415.0 1.0
The biggest difference is in the observation noise, where the hdi ranges almost don’t overlap between the two models. My hypothesis is that this is due to data leakage from the future to the past, although I haven’t (yet) coded up the Kalman smoother likelihood to test how this would change the parameter estimates. If that hypothesis is true, it would also lead to much different estimates of missing data in the time series. There’s also a huge difference in the number of effective samples. By exploiting the conjugacy of the Gaussian setup, the parameter space evidently becomes easier to explore.
Since I asked the “does this matter” question I’ve thought about it more, and I think it does. Using future information to estimate the parameters will certainly lead to model over-fitting and worse out of sample performance. I think it’s only called for in specific situations, such as missing value interpolation or time series decomposition, where we don’t necessarily want forecasts.
Another answer, which is over-fit to my personal context, is that I am interested in estimating rational expectation models where the information set available to the decision makers in the model needs to be carefully controlled. To get valid parameter estimates in this context, I can’t allow any information about the future to seep in.
I am curious if you think the GaussianRandomWalk setup can generalize to more complex models. The reason I avoided it in the first place is because I’m interested in a much larger family of models than just the local level model (which is known to be equivalent to a GRW). Can you represent an auto-regressive or moving-average process using clever application of the GRW process?
I think there is real demand for being able to conveniently estimate ARIMA, VAR, and GARCH family models in a Bayesian framework, but I don’t see how to do that with just the tools we have now. This is why I’ve starting going the route I’m taking now, although I would drop it and switch to something easier if anyone has a better idea. Like I said above, I really don’t know what I’m doing.