Constraint or prior on product of pm.MutableData and Random Variable

My question for this approach is why? You are setting the sigma of a normal to be extremely low; in this case just deterministically plug in the result of the optimizer. Actually the value of sigma doesn’t matter because the return value of the True branch of the switch is zero (the False branch will never occur, because the normal is defined everywhere on R). It has zero gradient everywhere, so I’m not surprised it makes your model complain.

Algebraically though, the whole thing is a round-about implementation of a normal logp. Look again at the logp formula for a normal:

\log f(\mu, \sigma |y) \propto -\frac{1}{\sigma^2}\left ( \sum_{i=0}^N {y_i - \mu} \right)^2

Define \mu_i = \beta \cdot x_i:

\log f(\sigma, \beta |y, X) \propto -\frac{1}{\sigma^2}\left ( \sum_{i=0}^N {y_i - \beta \cdot x_i} \right)^2

As you can see, your deterministic eq_determ sum of squares appears directly in the formula. Using pm.Normal(mu=channel_A_coef * channel_A_spend, sigma=0.0001, observed=channel_A_uplift) is equivalent to what you’ve written (if you then multiplied by zero).

Basically, you need to decide if you want uncertainty in the value of channel_A_coef or not. If you do (and you should), use second likelihood function to add information about the observed uplift process to the model. If you don’t, run the optimizer and plug in the number deterministically – no need to get fancy with potentials.

I’ve never used find_constrained_prior, so I can’t comment on it. For keeping parameters “realistic”, I’d tinker with priors before I reached for hard boundaries.

Adding GRW priors doesn’t change anything in this analysis.