Gradient Issues During Budget Optimization

This is a follow up to my previous post at Budget recommendations don't align with saturation curves - #3 by cetagostini

I believe that the budget optimizer is failing to create sensible budgets due to extremely small gradients at initial conditions. I’ve tried a couple methods such as SLSQP, trust-constr, and others from this page, along with different iterations, step and gradient tolerances. While my optimizer succeeds, my gradients are about -1e-10 within 1-2 iterations when using SLSQP. Here is an example of a minimize_kwarg option I used

allocation_strategy, optimization_result = mmm.optimize_budget(
    response_variable='total_contribution',
    budget=total_budget,
    num_periods=12,
    budget_bounds=budget_bounds,
    minimize_kwargs={
        "method": "SLSQP",
        "options": {
            "ftol": 1e-12,
            "eps": 1e-4,
            "maxiter": 10000
        }
    }
)

With output

     message: Optimization terminated successfully
     success: True
      status: 0
         fun: -2.6441645213969007e-05
           x: [ 9.493e+03  9.725e+03  2.761e+03  6.144e+03  9.493e+03]
         nit: 2
         jac: [-3.105e-10 -6.196e-10 -3.665e-09 -5.130e-10 -4.423e-10]
        nfev: 3
        njev: 2
 multipliers: [ 8.562e-08]

I tried setting various initial conditions with the x0 param such as giving channel A 30-50% of budget, channel B 30-50% of budget, ect…and I still converge in 1-2 iterations with vastly different budget allocations each time. There doesn’t seem to be any meaningful direction in the gradients.

If I don’t supply budget bounds it gives uniform spends:

message: Optimization terminated successfully
     success: True
      status: 0
         fun: -6.262813287729734e-05
           x: [ 7.523e+03  7.523e+03  7.523e+03  7.523e+03  7.523e+03]
         nit: 1
         jac: [-4.657e-10 -9.294e-10 -5.497e-09 -7.696e-10 -6.635e-10]
        nfev: 1
        njev: 1
 multipliers: [-1.665e-09]

allocation_strategy: [7522.95181563 7522.95181563 7522.95181563 7522.95181563 7522.95181563]

Hey, could you take out eps, and ftol or tol parameters and try out the proposed options here?

@cetagostini

I did the following

# Step 1: Create allocator normally
allocator = BudgetOptimizer(
    num_periods=12,
    response_variable='total_contribution',
    model=mmm,
)

# Step 2: Get the original objective function
original_objective = allocator._compiled_functions[allocator.utility_function]["objective_and_grad"]

# Step 3: Create scaled wrapper
SCALE_FACTOR = 1e10

def scaled_objective_and_grad(x):
    obj, grad = original_objective(x)
    return obj * SCALE_FACTOR, grad * SCALE_FACTOR

# Step 4: Replace the compiled function
allocator._compiled_functions[allocator.utility_function]["objective_and_grad"] = scaled_objective_and_grad

# Step 5: Run optimization with the scaled objective
allocation_strategy, optimization_result = allocator.allocate_budget(
    total_budget=total_budget,
    budget_bounds=budget_bounds,
    minimize_kwargs={
        "method": "SLSQP",
        "options": {
            "maxiter": 10000,
            "disp": True
        }
    }
)
     message: Optimization terminated successfully
     success: True
      status: 0
         fun: -441567.3512655523
           x: [ 3.562e+03  2.327e+04  2.761e+03  3.504e+03  4.522e+03]
         nit: 19
         jac: [-4.657e+00 -9.294e+00 -5.497e+01 -7.696e+00 -6.635e+00]
        nfev: 30
        njev: 19
 multipliers: [-7.696e+00]

These values look more reasonable at first glance. Why are there scaling issues at play here? I thought that the MMM class internally scales the channels and target variables when fitting?

1 Like

Hey indeed, scalers are handle under the hood. Nevertheless output and input in your case move in original scale, meaning, optimizer see the information in original scale.

Because your input is probably to far in scale than output this can happen. Your modification make sense, but the easy way will be to say something like:

def average_response(
    samples: pt.TensorVariable, budgets: pt.TensorVariable
) -> pt.TensorVariable:
    """Compute the average response of the posterior predictive distribution."""
    return (
pt.mean(_check_samples_dimensionality(samples)) / pt.max(samples) 
) * pt.sum(budgets)

No need to go directly into the grad, just adjust your response function to be in the same scale than your budgets. A way to do it will be as above, this will provide a augmented gradient by slsqp not to tiny or under the ftol after one iteration.

Note: Scale models (inputs or outputs) which are multi-dimensional in a optimization is not easy given the internal iterations from the model in the calls. If any scale wants to be apply, because magnitudes are different is up to the user to decide the scaling process, given the fact a general one for all cases can be complicated and not beneficial in certain types of optimization problems.