Hey PyMC Extras Team:
I have a few questions that may be quick to answer.
-
Multiple Regression Components: Does pymc statespace only support the use of one regression component at a time? When trying to add a second regression component, I noticed that ss_mod.coords did not have a coordinate for the second regression component and I couldn’t proceed.
-
Cross Validation with Regression Component: When using a regression component, I noticed that I was no longer able to use my implementation of loo-cv to analyze the results of my model. Is this a predictable outcome of having used a regression component in the statespace module? I was getting errors regarding my dimensionality, although it’s been a minute since I saw the error.
-
Bringing Bsplines into Model Context Directly: Is it possible to bring some b-splines, defined outside the model context, directly into the model context and have it interact with a statespace component (my weekly fourier) and have this interaction included in the observation? I did see where @jessegrabowski did something like that here. However, unlike Jesse’s example, I’d like to bring in the bspline into my model context without using the regression component, since I’m already using the regression component to bring in other exogenous variables unrelated to the interaction on the weekly fourier. Maybe there’s a simpler approach?
Background and Motivation
I am modeling some store sales using exogenous data from a nearby store and some sales data for my own store (248 days worth). Right now the store is in an increasing season of sales, and the statespace forecasting is not capturing the increasing amplitude happening on weekends fast enough. So while weekdays gets a perfect forecast, my weekends are off.
My next attempt at modeling this was to bring in some bsplines to see if it could capture some signal that my weekly fourier and seasonal (quarterly) fourier weren’t capturing yet and would help anticipate the increasing amplitude in weekend sales. My thought was to use a b-spline as a modulator on the weekly.
I’m including a photo of the component estimations to demonstrate the increasing amplitude [photo shows component with innovations off]. You’ll see that it is capturing the increasing amplitude, but it does this a little too slowly. Moreover, my understanding is that this component should be capturing a consistent shape and not the quarterly amplitude shifts (which my quarterly fourier is still learning).
Any additional input you may have on the matter would be much appreciated.
Thank you for your time!
-Roy
P.S. I’m an utter noob at this. So if you’re confused by my approach - so am I.
Actually, here’s a few more results to make this a little more plain.
Shows weekly with innovations on.
And here’s some residuals on the weekend when weekly fourier has innovations off.
-Roy
For the three questions:
- This should be allowed. If it doesn’t work it’s a bug – please open an issue, preferably with a super simple model that produces the error that we can use a unit test.
- This also should work. I’d need to see a specific code snippet. If you’re using
az.loo
, all it cares about is having the logp values for the observed data, it shouldn’t care about the actual model structure. If there are multiple observed values in the idata (which might be the case the the regression component, I’m not sure), you might need to manually tell arviz which one to use.
- I’d need to see what you have in mind written out more formally. Putting the fourier bases in the transition matrix works because it’s cyclical, so the states essentially rotate at each step. What would the spline matrix do? How would a time update look like? To be clear, everything “inside” the model needs to be of the form x_{t} = f(x_{t-1}), where f is a linear function.
The speed of change is controlled by the size of the innovation sigma. So if you think this should change more quickly, you could try setting a larger prior. I’m not sure what you mean by “consistent shape” – if you don’t want the shape to change, turn off the innovations for this component.
Hey Jesse:
Thank you for your swift response.
Regarding question 3, where you ask what the spline would do, I’m intending to do a linear regression on the weekly. I’m asking my question because, as it stands, I can only use one regression component. To work around this constraint, I want to manually perform another (second) regression using the splines to modulate my weekly component.
Regarding ‘consistent shape’: my goal is to keep the amplitude of my weekly relatively level and capture the signal causing the rising amplitude in some other component, like my level/trend or my frequency (quarterly) component or some b-splines. [See the included graphic to view the rising amplitude of the weekly.]
Ultimately, this may be a misguided goal if my weekends are showing much higher sales than weekdays because maybe I should want to capture this in my weekly component. However, I have two concerns when I capture the rising amplitude in my weekly component:
-
For now, my forecasting might be slow to react to the rising amplitude because it’s forcing the weekly component to adapt (too) quickly. Maybe the rising amplitude should be captured by another component?
-
The rising amplitude in the weekly pattern happens once a year (between March and May) and this pattern might not be repeated at any other time. Next year, the model may have to re-learn this pattern instead of referencing a component that captured this rising amplitude and includes this effect in the model, resulting in earlier forecasting accuracy.
Maybe I just take this weekly component and include it as a regression for next year? Maybe you have a suggestion?
Regarding possible bugs: I’ll try to recreate the possible bugs I mentioned in my OP and open an issue in the next few days.
Thank you for your input and time.
-Roy