You only have two timepoints and you are trying to predict the third time point right? In that case I think the simplest thing to do is adding a new predictor using the score at t-1 to the linear function. I dont know how you parameterize the model, but if you are doing matrix multiplication essentially you are adding a new column into the design matrix.
Another way to think about it is that, at each time point the variance of the observation is divided to 3 part: noise, variance explained by the predictors, and the variance explained by the previous timepoint. You can then model the total variance (eps_total ~ HalfCauchy(10.)), and generate a simplex (e.g., [.01, .2, .79]) to divide the variance. You can have a look at this notebook which demonstrates the idea.