I’m setting up a model involving censored observations. I’d like to use pm.MutableData
for out of sample predictions. The basic structure of the model is:
normal_lcdf = pm.distributions.dist_math.normal_lcdf
with pm.Model():
x = pm.MutableData("x", x_data) # (n_obs, n_features) predictors
y = pm.MutableData("y", y_data) # (n_obs,) responses
beta = ... # (n_features,) coefficients
sigma = ... # std dev
mu = x @ beta # (n_obs,) modelled responses
# Uncensored observations
mask = np.array([False, True, ...]) # (n_obs,) boolean array
uncensored = pm.MutableData("uncensored", mask)
pm.Normal(
"obs_uncensored", mu=mu[uncensored], sigma=sigma, observed=y[uncensored]
)
# Censored observations
censored = pm.MutableData("censored", ~mask)
pm.Potential(
"obs_censored",
normal_lcdf(
mu=mu[censored], sigma=sigma, x=y[censored]
)
)
This throws: TypeError: index must be integers or a boolean mask
presumably because you
can’t index with pm.MutableData.
I thought about masking before passing to pm.MutableData
, like:
mask = np.array([False, True, ...]) # (n_obs,) boolean array
with pm.Model():
...
y_uncensored = pm.MutableData("y_uncensored", y_data[mask])
y_censored = pm.MutableData("y_censored", y_data[~mask])
...
But I would still have to index into mu, which would mean redefining mu for new data.
Is there a way to implement masks using pm.MutableData
? Is there another way to go
about this?
Thanks in advance for any suggestions