Disabling missing data imputation

Thanks much for the solution (I have the exact same problem with large multidimensional array that benefits from having its full shape). Why is this different from the model below:

with pm.Model() as model:
    beta   = pm.Normal('beta',shape=p)
    err_sd = pm.HalfCauchy('err_sd',beta=1)
    y_hat  = pm.math.dot(X,beta)

    y_misfit = (y_hat - y) * (1-mask)
    # or alternatively:
    # y_misfit = pytensor.tensor.set_subtensor(y_misfit[mask], 0)

    pm.Normal("likelihood", mu=y_misfit, sigma=err_sd, observed=np.zeros(y.shape))
    trace = pm.sample()

This example above does not really work. Why? I think it is related to the fact that the potential below:

likelihood = pm.Potential('likelihood',pm.logp(pm.Normal.dist(mu=y_hat-y, sigma=err_sd), 0)*mask)

works equivalently, as expected. But

likelihood = pm.Potential('likelihood',pm.logp(pm.Normal.dist(mu=(y_hat-y)*mask, sigma=err_sd), 0))

does not. I thought this would have resulted in something similar, since the resulting logp is the same up to a constant offset, right? But that offset seems to matter.

Anyway, thanks for the clever solution.