I believe that the budget optimizer is failing to create sensible budgets due to extremely small gradients at initial conditions. I’ve tried a couple methods such as SLSQP, trust-constr, and others from this page, along with different iterations, step and gradient tolerances. While my optimizer succeeds, my gradients are about -1e-10 within 1-2 iterations when using SLSQP. Here is an example of a minimize_kwarg option I used
I tried setting various initial conditions with the x0 param such as giving channel A 30-50% of budget, channel B 30-50% of budget, ect…and I still converge in 1-2 iterations with vastly different budget allocations each time. There doesn’t seem to be any meaningful direction in the gradients.
If I don’t supply budget bounds it gives uniform spends:
These values look more reasonable at first glance. Why are there scaling issues at play here? I thought that the MMM class internally scales the channels and target variables when fitting?
Hey indeed, scalers are handle under the hood. Nevertheless output and input in your case move in original scale, meaning, optimizer see the information in original scale.
Because your input is probably to far in scale than output this can happen. Your modification make sense, but the easy way will be to say something like:
def average_response(
samples: pt.TensorVariable, budgets: pt.TensorVariable
) -> pt.TensorVariable:
"""Compute the average response of the posterior predictive distribution."""
return (
pt.mean(_check_samples_dimensionality(samples)) / pt.max(samples)
) * pt.sum(budgets)
No need to go directly into the grad, just adjust your response function to be in the same scale than your budgets. A way to do it will be as above, this will provide a augmented gradient by slsqp not to tiny or under the ftol after one iteration.
Note: Scale models (inputs or outputs) which are multi-dimensional in a optimization is not easy given the internal iterations from the model in the calls. If any scale wants to be apply, because magnitudes are different is up to the user to decide the scaling process, given the fact a general one for all cases can be complicated and not beneficial in certain types of optimization problems.