I have read this PyMC blog post and try to use tanh_saturation function in Media Mix Model (MMM).
I understand in the example in the post, S is set to be 100 based on the observed data. In this example, the number of new users (customers acquired) is the dependent variable in MMM. The number of new users can be clearly implied by data.
Analogously, when we look at sales volume, we could set a reasonable S if we would know the incremental sales driven by ADs. How can we apart sales into base/organic sales and AD driven sales? Any practical suggestions? I feel the value of S is a critical assumption to make in MMM when using tanh_saturation.
Suppose we have two marketing channels and assume 10% of total sales are driven by ADs in these two channels. Should I set S=0.05\times {\rm total sales} for each channel, if they are equally likely important?
I think this depends on the formulation. In the blog post, the author simulate the data using tanh formulation. You can then define a graphical model to estimated the saturation parameters form the formula. If you have a known factor, sure just multiply by a constant value. But a better way would be to define a prior over S variable, and constrain using pm.find_constrained_prior to limit the possible search space
If things goes correctly, after fit with your data, it should produce a reasonable output after fitting (sampling). For sanity checking, you can use the simulated data to test your model parameter estimates. For more in depth MMM modeling, I recommend following Dr. Juan’s blog
For making this schematic plot S was fixed. But in real life then S is just another unknown parameter which we’d try to infer from the data. Adding in any information in the form of priors would of course be useful.
I’d maybe recommend checking out PyMC-Marketing for more info, there are some example notebooks for MMM estimation there.