'dayofyear' in sample_posterior_predictive Call in PyMC-Marketing

Title:

KeyError: ‘dayofyear’ in sample_posterior_predictive Call in PyMC-Marketing

Question:

Hello, I’m using PyMC-Marketing to build a Marketing Mix Model (MMM). While running sample_posterior_predictive on my model, I encountered a KeyError: 'dayofyear'. I would appreciate any help or guidance on resolving this issue.

Environment:

  • PyMC-Marketing version: 0.10.0
  • PyMC version: 5.15.1
  • Python version: 3.12
  • System: MacOS/Windows/Linux (specify your system)

Problem Description:

After setting up my model, I attempted to call sample_posterior_predictive using a new DataFrame (temp) as input data. However, I received the following error:

Error Message (partial):

json

{
	"name": "KeyError",
	"message": "'dayofyear'",
	"stack": "---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/work/.venv/lib/python3.12/site-packages/pymc/model/core.py:1564, in Model.__getitem__(self, key)
   1563 try:
-> 1564     return self.named_vars[self.name_for(key)]
   1565 except KeyError:

KeyError: 'dayofyear'
...
KeyError: 'dayofyear'

Code Snippet:

Here’s the code where the error occurs. I’m using a DataFrame called temp as the input without the dayofyear variable.

mmm.sample_posterior_predictive(temp,
                                original_scale=False,
                                var_names=["control_contributions"])

Attempts to Resolve:

  1. I tried adding a dayofyear variable to temp, but it does not appear to be defined in mmm.named_vars.
  2. I inspected control_contributions but found no explicit reference to dayofyear.
  3. I considered that the model might have defined seasonality based on dayofyear during initialization, but I could not find a clear indication in the setup code.

Questions:

  1. Does PyMC-Marketing require the dayofyear variable for seasonality in posterior predictive sampling, or is there a recommended approach to include this variable if necessary?
  2. I’m unsure why dayofyear is needed in the control_contributions variable, as I couldn’t find a direct reference to it. Could you clarify typical scenarios where this variable might be expected?
  3. Is there a way to configure the model to make dayofyear optional in the posterior predictive sampling?

Additional Details:

  • I intended to include seasonality and trend in the model setup, but dayofyear was not explicitly mentioned.
  • Due to data sensitivity, I can share relevant code snippets if needed.

Thank you for your help!

This is my first time posting a question, so I would appreciate any guidance or patience with my query. If additional information is needed, please feel free to ask.

I have no solution, but encountered a similar problem. Maybe the following can help in understanding what goes wrong and where the issue comes from in your case.

I enocuntered the same exception as you, albeit when trying to load and sample for a loaded model that was originally created with v0.7.0.

It seems that the build_model, in 0.10.0, adds the dayofyear variable if yearly_seasonality is passed (not None) when the model is instantiated:

Looks like it simply converts the date column to day of year to then apply the Fourier transformation.

Now, I can load the model, but when sample_posterior_predictive is called, it crashes with the same KeyError you encountered. What happens in the method is that it first enriches the passed data with a dayofyear column, and then passes the enriched data to pm.set_data():

pm.set_data() in turn iterates over the columns of the passed data (including dayofyear!) and tries to assign the column’s data to the respective variable – which throws the mentioned KeyError since the variable is not there.

Regarding your questions, I am not sure, but here are my 2 cents:

  1. Does PyMC-Marketing require the dayofyear variable for seasonality in posterior predictive sampling, or is there a recommended approach to include this variable if necessary?

If you enable yearly_seasonality when first creating the model, it should have the required variable. If you dont enable it, then it should not have that variable, and also not expect it later.

  1. I’m unsure why dayofyear is needed in the control_contributions variable, as I couldn’t find a direct reference to it. Could you clarify typical scenarios where this variable might be expected?

I don’t think that column is ever expected in any user-passed data. It seems to be completely internal, only being used to construct Fourier features.

  1. Is there a way to configure the model to make dayofyear optional in the posterior predictive sampling?

As mentioned above, if you disabled yearly_seasonality (passed None) during instantiation, it should not be required.

Why it does not work that way in your case, I don’t know. Maybe you changed the yearly_seasonality attribute from None to something after fitting?

I have opened an issue here based on this thread. Thanks for reporting it and thank you @Jonas for investigating the underlying cause.

1 Like