Modelling sales for two cities

Hello, PyMC community!

Please help with the following question.

I have an e-commerce store working for two cities (call them city A and city B). Both cities share a very common pattern of people buying things, so the two timelines resemble each other in terms of trend and fluctuations (please have a look at the screenshot attached).

I believe that both cities share some common latent dynamic of purchasing behaviour and would like to be able to extract it.

My attempt to model it is to fit the two cities within the model.

  • both cities share the same trend
  • both cities share the same weekly seasonality*
  • but have different intercepts

*(the pattern is the same, but I can see that it’s proportionally a bit different, so I add a ‘squeezer’, to account for proportional changes).

Here’s the specification

day_of_week_idx, day_of_week = df["day_of_week"].factorize(sort=True)
date = df["date_day"].to_numpy()

coords = {
    "day_of_week": day_of_week,
    "date" :date

with pm.Model(coords=coords) as model:
    # target variables
    sales_city_a = pm.MutableData(name="sales_city_a", value=df['sales_city_a'])
    sales_city_b = pm.MutableData(name="sales_city_b", value=df['sales_city_b'])
    # Intercepts: different for both cities
    intercept_city_a = pm.Normal(name="intercept_city_a", mu=0, sigma=200)
    intercept_city_b = pm.Normal(name="intercept_city_b", mu=0, sigma=200)
    # trend: common for both cities
    general_trend = pm.Normal(name="general_trend", mu=150, sigma=200)
    trend = pm.MutableData(name="trend", value=df['trend'])

    # Weekly seasonality - common for both cities, but in one city is proportional to the other, so we use 'squeeze coefficient'
    b_day_of_week = pm.ZeroSumNormal(name="b_day_of_week", sigma=100, dims="day_of_week")
    day_of_week = pm.MutableData(name="day_of_week", value=df['day_of_week'])
    beta_week_squeeze = pm.Normal(name="beta_week_squeeze", mu=0, sigma=2)

    # Modeled behaviour of two cities
    mu_city_a = pm.Deterministic(name="mu_city_a", var=intercept_city_a + trend*general_trend +  b_day_of_week[day_of_week_idx]*beta_week_squeeze*day_of_week)
    mu_city_b = pm.Deterministic(name="mu_city_b", var=intercept_city_b + trend*general_trend + b_day_of_week[day_of_week_idx]*day_of_week)

    sigma = pm.HalfNormal(name="sigma", sigma=200)

    # fit two cities
    pm.Normal(name="l_city_a", mu=mu_city_a, sigma=sigma, observed=sales_city_a)
    pm.Normal(name="l_city_b", mu=mu_city_b, sigma=sigma, observed=ftr_ios)

As a result I get a model that fits the data quite well.
However, there are some days when something happens in both cities. So the purchasing pattern is still shared across the two cities, but is not captured by the model (see screenshot below).

So my question is:
What is the best way to tell the model about those cases? I would like to capture not all deviations, but only those that are shared by both cities simultaneously. One option is to mark all those days as dummy variables. Or is it correct to collect all the ‘errors’ to some separate variable? Or can those be modelled as some process (like Gaussian Random Walk, for example?)

Why am I doing that?
We have been running multiple advertising activities in the two cities at different times. So I assume that each city’s timeline can be decomposed to the purchasing pattern (common for both cities) + the impact of advertising.

Therefore, I’d like to extract the common purchasing pattern as accurately as possible, so that the model does not confuse it with the impact of advertising.

Appreciate any recommendations!

PS. When you want to model the impact of advertising on your sales, one simpler thing you can do is just put sales in a city as target variable and put advertising activities as predictors. However, in this case the model over/underestimates the impact, often confusing it with other external factors (which are out of our control and often immeasurable). So we want to take advantage of both cities sharing the same pattern and assume that this common pattern would capture all the external factors.

1 Like

Just eyeballing it, it looks like the errors of the model have serial correlation – it produces “runs” of under-estimates and over-estimates. Another way to think about it is that the exogenous “innovations” that cause the shared common behavior to move exhibit some time persistence. You could do a more formal test by looking at the (partial) auto-correlations of the model residuals.

This is a long-winded way of saying that you should consider an AR component in your model. You can do it by just including one or two lags of sales in each city as a regression component of mu.

Hello, @jessegrabowski! Appreciate your response!

Did I get it correctly that AR components should help make sure that once there’s ‘innovation’, we can model it over a longer period and not just one day? (i.e. once there’s some innovation, using AR components the model will ‘try’ to expand this innovation further in days)

But just to give the model the idea that innovation is happening, should I provide some additional variables?

Thanks in advance!

Yes, with an AR component, a shock/innovation/whatever you want to call it, positive or negative, will have persistence in your model. It means that if sales were higher yesterday, they will, on average, tend to be higher tomorrow.

“Innovations” is just time series jargon for exogenous shocks. You can do fancier stuff to explicitly model them as correlated errors, but for AR it works out to be the same as just including lags of sales as a little regression.

The more variables you bring in that might explain exogenous movements, the better. It’s hard to give suggestions here without domain knowledge, but calendar effects are a good place to start looking. The only downside to including exogenous variables is that it makes forecasting difficult, but that doesn’t seem to be your objective here, so go nuts.

Not really a satisfying response, but what you want to do here is very close to what CausalPy does. I mean that in terms of estimating impacts of treatments (advertising) by comparing actual sales to estimates of what would have happened with no advertising.

Depending how many cities you have, it feels like a synthetic control / geolift type problem. Maybe with a hint of difference in differences with staggered treatment effects. I don’t think there’s anything in CausalPy right now would flat out solve the problem - but I think it could/should in the future.

Hello, @drbenvincent !

Your comment is very relevant. Indeed, we have been and are using Causal Impact (we are looking at CausalPy with much interest, but Causal Impact is just more established for now).

Such methodologies indeed can give accurate insights about impact of ads. But they require that a set of assumptions holds, so you have to set up proper conditions for the experiment. And this is a limitation to scale. It gets very complicated if you have many cities and assume that each of them can have different impact throughout a year.

So we are thinking of a solution that could extract impacts probably less accurately, but could give insight about:

  1. varying impact of ads across seasons
  2. diminishing value of ads as market gets saturated.
    and confirm the results of causalimpact-like tests

Basically, it is a custom Marketing Mix model (we are also watching pymc-marketing).

Actually, I came to thinking about the model in the question when was thinking about extending CausalImpact-like tests.

1 Like

Understood. The way you described the problem initially didn’t make me think of MMM’s, but with that added info I can see how this problem sits in between pymc-marketing and causalpy.

From what you are saying and the data you are showing, I think that you mean that you want the random fluctuations of both cities to be correlated with each other. To do this, you need to change the last three lines of your model.

Your current model says:

    sigma = pm.HalfNormal("sigma", sigma=200)

    # fit two cities
    pm.Normal(name="l_city_a", mu=mu_city_a, sigma=sigma, observed=sales_city_a)
    pm.Normal(name="l_city_b", mu=mu_city_b, sigma=sigma, observed=ftr_ios)

This basically means that you assume that the observations for city a and city b come from independent normal distributions. This means that, without knowing, you baked into your model the assumption that the fluctuations across cities should be uncorrelated!

To fix this, you should change your observed distribution to something that can handle correlations across observations. I suggest that you use an MvNormal and also take the opportunity to actually learn how correlated each city is.

    # Your old sigma now is replaced by an LKJ prior for the correlation
    # plus independent sigmas for each city, both of which are assumed to come
    # from a HalfNormal with 200 standard deviation
    chol, corr, sigma = pm.LKJCholeskyCov(
        "packed_L", n=2, eta=1.0, sd_dist=pm.HalfNormal.dist(200), compute_corr=True
    # fit two cities
        mu=[mu_city_a, mu_city_b],
        observed=np.stack([sales_city_a, ftr_ios], axis=-1)

Let us know how this works out. I can’t test it without the actual data.

From what I understand of the problem so far, you also want to:

  • add in your two observed time series of advertising spends as MutableData nodes
  • run that through some linear or non-linear function to model the effect of advertising on sales
  • add those advertising-caused sales to the mu’s

Along with @lucianopaz’s suggestion, that should pretty much solve your problem.

You can then explore counterfactual situations of what-if advertising spend had been zero (or something else) using the kinds of methods in this example notebook Counterfactual inference: calculating excess deaths due to COVID-19 — PyMC example gallery

Hello, @lucianopaz and thanks a lot for this suggestion!

I am excited trying it, but can I please ask you to explain a couple of things?

I am hoping that by capturing the common movements of two cities, I would be able to get the latent demand for my company (which will help me get rid of confounders).

Running the code you suggested, I can see that correlation between variables indeed is captured ( I can see correlation matrix showing 0.8 at the posterior) However, the model does not get to predict the “external effects” I was originally asking about. Can you please say what I should be expecting from adding the MvNormal and correlations?

Also, can you please say if I am correct saying that sd_dist parameter should be related to my prior belief about the correlation between two variables? (Lower the variance of distribution => stronger belief about correlation)

Also, please say what you think about adding lag variables to the model with MvNormal?
(The ones @jessegrabowski suggested above)

Very much appreciated!