(Also, I did read the API on seasonal components.)
I became convinced…
[Edit: I shortened my question a lot to save time] Basically, is the recommended approach to inject 0’s for missing days of data (Mon, Tues in my data) when estimating seasonality, or should I just reduce the ‘n’ parameter in the frequency and time seasonality components to match my data (e.g. not 365 days per year but 261 days per year since I have 261 days of data)? Does it become impossible to forecast for Mondays and Tuesdays without those injected 0’s?
Definitely don’t put 0. You can put np.nan and just interpolate the missing values with the seasonal lengths at 365 for annual and 7 for weekly. But if you drop them, you definitely need to set the seasonal lengths to 261 for annual and 5 for weekly, otherwise the filter is not going to be able to learn the right pattern.
You can help check an approach is right or wrong by making seasonal plots for each option and checking that the seasonal pattern looks right in the data at a given frequency.
Somehow, I think I’m still not following the intended approach with nan’s for missing data while using seasonal components.
The first image below shows a failure to capture the weekly pattern. The second image shows a perfect capture of the weekly and quarterly, with residual noise in the quarterly because, I assume, I haven’t included a measurement error component yet.
If I put nan’s back into my data, I’ll get a seasonal output like this:
The other difference between the two images is with the first, I parameterize the seasonal components (7, 365) and I’m using data containing nan’s for missing days. In the second image I only use my data (no nans, no zeros) and I parameterize seasonal components with 5 days a week and 261 days a year.
After your reply, I believed I could set my seasonal components to be (7, 365) with nans for missing data, and, I thought nan’s would be handled by Kalman filter interpolation auto-magically. But now I’m not sure what to do. Could you advise a little further?
Also, suppose I do nan’s with seasonals correctly, but my model is still having a hard time resolving the weekly variance with interpolated Mondays and Tuesdays, would you recommend injecting 0’s in for the NaN’s but then using an exogenous indicator variable called “Days Open” that might circumvent Kalman interpolation creating weird variance patterns?
I’m asking because before I came to this forum, I noticed weekly variance kept flip flopping between quarterly and weekly, and quarterly variance doing the same thing in tandem. I began to believe that weekly variance was being improperly estimated, causing the model to fail to resolve the patterns accurately.