Thanks much for the solution (I have the exact same problem with large multidimensional array that benefits from having its full shape). Why is this different from the model below:
with pm.Model() as model:
beta = pm.Normal('beta',shape=p)
err_sd = pm.HalfCauchy('err_sd',beta=1)
y_hat = pm.math.dot(X,beta)
y_misfit = (y_hat - y) * (1-mask)
# or alternatively:
# y_misfit = pytensor.tensor.set_subtensor(y_misfit[mask], 0)
pm.Normal("likelihood", mu=y_misfit, sigma=err_sd, observed=np.zeros(y.shape))
trace = pm.sample()
This example above does not really work. Why? I think it is related to the fact that the potential below:
likelihood = pm.Potential('likelihood',pm.logp(pm.Normal.dist(mu=y_hat-y, sigma=err_sd), 0)*mask)
works equivalently, as expected. But
likelihood = pm.Potential('likelihood',pm.logp(pm.Normal.dist(mu=(y_hat-y)*mask, sigma=err_sd), 0))
does not. I thought this would have resulted in something similar, since the resulting logp is the same up to a constant offset, right? But that offset seems to matter.
Anyway, thanks for the clever solution.