Understanding intuition behind the gaussian process regression in pymc blogpost "moddeling changes marketing effectiveness over time"

After reviewing the post Bayesian Media Mix Models: Modelling changes in marketing effectiveness over time - PyMC Labs
i got intrigued to understand how one would go on about this and customize it.
lets say we assume a data-generation process: y_t = (e^gaussprocess_t) * x_t (daily data)
where we want to generalize the gaussprocess per month and not on the whole dataset meaning that we want to fit the same gaussprocess on the daily data per month thus 1 <= x <= 31 for the gaussianprocess and then broadcast this to the corresponding x_s’s. So we would have the same gaussian process every month fitted to the daily data.
How would one go about this?
Second question, does anyone have an smart example of restricting the generated functions from the guassprocess to produce values in the interval [0, 1] such that we could use it as saturation in an power-function saturation setup?

Hi, I’m glad you were intrigued by the post. Sorry but I didn’t understand your first question. You want to train a gp with monthly aggregate data and then reuse it somehow for daily data?
If that’s the case, I’m not sure how it is done. I imagine that you can work out a way to rewrite the gp amplitude in terms of the amplitude of the gp on aggregate data as long as you assume that the length scale is the same, and know at which day of the month you assume the aggregate is placed.
About the second question, you can pass the gp through a sigmoid link function, similar to what you did with the exponential. That will give you the output on the [0,1] interval, but setting priors on the gp amplitude and mean will be much harder.

i am really looking forward to see the follow up on that post.
it is really unclear what i meant, i will try to clarify.
Given daily data that spans over several months i would like to capture the day-in-month effects by using GP’s as parameters/latent variables. E.g in an sales = coeff_1 * spend^coeff_2 model this would be the coeff_2 or coeff_1 parameter that is now time-variant w.r.t day in month.
I dont really want to fit the gp over the whole dataset since i want to generalize the day-in-month effect over all months and use this as an forecast for the upcoming month.
Are there any practical examples in pymc out there where people used GP’s in such models, i have just seen people utilizing them when regressing on time as sole regressor, are there any examples of them being used as latent variables/parameters?

as for your answer on the second question, brilliant, makes sense.

thanks for taking your time.

kind regards

Oh, in that case you can do something like this

  1. Take the whole time series and split it into day and year_month.
  2. Factorize day into the unique days (let’s call that array uday) and an indexing array (day_idx). The unique days array will have at most 31 entries while the indexing array will have as many entries as individual observations
  3. Do the same factorization for the year_month array. Let’s call the indexing array year_month_idx, and the unique year-month pairs X_month.

Then you can create a GP using X_month, a separate RV for the days in month using uday and sum them index them with the indexing arrays and sum them together. The pseudocodish version of what I’m saying would look like this:

with pm.Model():
    latent = pm.gp.Latent(...)  # Use the cov and mean that you want
    gp_month = latent.prior("gp_month", X_month)
    day_rv = SomeDistribution("day_rv", ..., dims="uday")
    mu = gp_month[year_month_idx] + day_rv[day_idx]
    # Use mu for whatever else you need
1 Like