Intuition behind different priors in this spline-like knot scheme for timevariant data

I am trying to find the intuition and cons/pros of using a multivariate normal prior, gaussian random walk and independent normal distribution priors for coefficients in this context.

Lets assume i am trying to construct a regression model which generalizes w.r.t a time-attribute such as day of the month.

To do this, i constructed the following model:
y_{t, d} = \beta_0 + \beta_{t, d} * x
where as t denotes the timestep and d is an indexholder stating the day of the month, \beta_{t, d} has the following form:


Now, to my question, lets assume that we set a 4-dimensional multivariate normal prior on \beta_1, \beta_2, \beta_3, \beta_4 with a 0-mean vector and a covariancematrix filled with 1s and then we compare it with a gaussian random walk prior: \beta_1 \sim \mathcal{N}(0, 1) and \beta_t \sim \mathcal{N}(\beta_{t -1}, 1) for t = 2…4. We also compare it with independently setting a prior for each \beta_t according to \beta_t \sim \mathcal{N}(0, 1).

My question is then, what can we assume about the effect on the posterior by setting these different priors. I would assume e.g that the gaussian random walk prior would result in a less wiggly and more smooth change in parameter estimates between subsequent \beta's and i would assume that the independent priors would result in the most wiggly behaviour.

Can someone shed some light onto this and further develop my thought process and secondly, if someone have seen resources doing these type of timevariant mappings priorly, please send me some links.

To make things more interesting, lets say we start to make interactions between the parameters according to the following setup for \beta_t:


Thus allowing datapoints between two dayofthemonth intervals to be a weighted minmax-like sum of “subsequent” coefficients(does anyone know the name of this type of scheme?). The question stays the same ,priors and subsequent posterior dynamics are also very interesting to me in this case, i would assume that this scheme together with the GRW prior would be the setup resulting in the least wiggly behaviour thus enforcing some regularization.

I am aware that the notation might be a bit iffy, any suggestions to improve it would also be appreciated.

I guess my first piece of advice is to make the models and sample some prior predictive draws. Here’s what I got for a gaussan random walk and you knotting scheme, with x_t = 1 \quad \forall t . Left plot is the prior predictive distribution, right plot is the mean of that distribution (I made y_t normally distributed with \sigma \sim HalfNormal(1):

As you can see, the knots build in some periodicity based on the day of week. I think your weighting scheme would just smooth this periodicity, it looks something like a weighted moving average.

If you added a prior correlation matrix, you would be assuming that large/small values of parameters co-occur, so for example if \beta_1 is large then \beta_2 is likely to be also (or the inverse).

But you do not need to build correlations into the priors for there to be correlations in the posterior. This can be seen easily by fitting a simple slope-intercept model, y_i = \alpha + \beta x_i. If \beta is large, \alpha must be small, because that’s how lines work (if the slope is steep, the y-intercept must necessarily be lower). This correlation will be captured by the posterior regardless of whether you explicitly model it in the priors.

Knowing nothing about your specific application it’s harder to give more advice, but in general I am leery of putting time dynamics like seasonality into the parameters, rather than as latent components of the model. Thinking about the effect of the temperature on demand for ice cream for example, why would I expect the strength of that connection to vary over time? I more just expect there to be a seasonal pattern in ice cream sales, which is captured by changes in the temperature.

My point is you could probably just include day-of-week effects in whatever model you’re considering and get a more interpretable result.

1 Like

Thank you for your response. I apologize for my previous laziness in not exploring prior predictive distributions and conducting simulations, i blame it on the lack of a functioning computer atm. I was primarily considering regularization and ->my lack of intuition<-. My assumption was that the multivariate normal prior and GRW priors would provide more regularization to the model compared to independent normal priors, even if they resulted in strong correlations in the posterior distribution. Similarly, I believed that the weighting scheme would introduce some regularization into the model, but upon reflection, I may have been conceptually mistaken and the main and only source of regularization in this scenario would come from just setting more informative priors. Please correct me if my intuition is wrong.

In the case of the ice-cream example, my interest lies in understanding how my independent variables affect the dependent variable with respect to time-varying factors. Let’s assume I can invest in two different ice-cream vendors, and my independent variables are the amounts of money I wish to invest in each vendor. The dependent variable represents the return on my investments. Suppose one vendor specializes in selling hot ice creams (using various chemical and physical techniques), while the other vendor focuses on cold ice creams. My goal is to determine how my investments impact the return on investment considering these time-varying factors. To achieve this, I introduce parameter knots that are linked to time or climate variables, such as temperature, enabling me to plan optimally.

If I were to exclude the interaction between time or climate and my investments, treating them solely as control variables, I would fail to capture the fact that I prefer to invest more in the cold-ice-cream vendor during the summer and more in the hot-ice-cream vendor during the winter. This is because there would be no interaction between the time-varying parameters and my investment inputs.
Since i did not give any context, it was obviously impossible to see this, sry.
I might have misinterpreted the term latent components in this context though but i assume you mean to insert them as control-variables without interactions with our independent variables. The reasoning behind the weighting scheme is to introduce continuity(much like kernel smoothing)(although it remains non-differentiable), one could also extend this to arbitrary kernels also weighing in more than just the “nearest” coefficient etc.

I’ll admit I’m having a hard time following you without a specific context, but it seems like you have the problem well in hand. You certainly can have time-varying effects, and you can impose any prior structure on them you like. Also priors are just that; the posterior is free to be whatever it needs to be if the evidence from the data is strong enough (conditional on the model). You seem like you have specific domain knowledge which is leading you down this route, which is great. I just wanted to point out that a time varying effect is different to, say, trend or seasonality.

1 Like

Hi Jesse,

Your response here set my head of spinning, i seen you responding with a lot of very informational content teaching us beginners w.r.t timeseries and causal inference etc.

The scenario i depicted above was a very specific one in which our regressors were the sole IV’s affecting our output.

However, your response set me of on a spiral of thoughts where you may be able to bring some clarity, maybe this deserve a separate thread and maybe this is not even applicable to this forum. If i should create a separate thread to this matter and relate it to bayesian analysis just let me know.

Consider the following high-level time series model:
y_t = \text{seasonality}_t + \text{trend}_t + \text{cyclical}_t + \text{regressor_effects}_t

My primary focus lies in the causal analysis of the influence exerted by the regressors.
Let us consider y_t as indicative of total sales, with our regressors signifying distinct investment opportunities.

Acknowledging that these investment opportunities are not the sole determinants of sales, we have chosen to estimate the baseline sales by incorporating \text{seasonality}_t,\text{trend}_t and \text{cyclical}_t.

Herein lies my concern: given that the effects of our regressors are affected by an underlying demand that concurrently affects our baseline sales, and bearing in mind that my ultimate objective is to make informed decisions about allocation of investments across these varied opportunities, the fluctuating cofounder demand could potentially result in misleading estimates.

Understanding that the impacts of our investments are likely to oscillate in correlation with seasonal variations due to this underlying demand,
I have chosen to employ time-varying parameters that interact with these investment inputs. This composite notion is encapsulated by the variable \text{regressor_effects}_t.

I have refrained from explicitly delineating a low-level model, as I believe that an appropriate response can encompass any suitable model of choice to address my query.

To facilitate a more insightful response to the ensuing question, envision the following scenarios:

  • Our expenditure demonstrates a perfect correlation with the seasonality.
  • Our expenditure is equally split between random and perfect correlation with the seasonality.
  • Our expenditure is entirely random.

Question: How can we gain an intuitive comprehension of the manner in which a time-series model would differentiate the baseline-sales seasonality from the regressor-effect seasonality given the different expenditure scenarios?

any related resources would be of great appreciation.

Thank you in advance Jesse, i seen that you have done a lot for this forum and i really appreciate it.

Your problem sounds like a nice candidate for a structural time series model. I’ve been working on a module for pymc_experimental, you can see an example of the API here. I should have a version merged in a couple more days, I just need to finish a few more details (one of which is exogenous regression blocks, which seems relevant to your problem).

The state space formulation is nice because it gives you a natural way to incorporate correlations between the evolution in the different components, by correlating the innovations to their respective hidden states. In you case you have a model that looks something like this:

\begin{align} x_t &= Tx_t + R\varepsilon_t & \varepsilon_t &\sim N(0, Q) & x_t &= \begin{bmatrix}\alpha_t \\ \gamma_t \\ c_t \\ \beta_t \end{bmatrix} \\ y_t &= \begin{bmatrix} Z & X_t \end{bmatrix} x_t \end{align}

Where \alpha_t is a vector of terms associated with the trend, \gamma_t with the seasonality, c_t with the cycle, and \beta_t are time-varying regression coefficients. T, Z, R, Q are matrices of coefficients, and X_t is a row of your exogenous data at time t.

Typically Q is diagonal, but you could easily consider off-diagonal terms correlating the innovations between \beta_t and \gamma_t, which would be equivalent to saying “when the strength of the seasonal effect changes, the strength of the regression coefficients should change too”. I think this is something like what you are after?

In general though, I think a model should be able to tease out the three scenarios you propose by looking at estimated correlations between parameters/innovations to parameters.

1 Like