MutableData for boolean masks

davipatti · November 20, 2023, 5:23pm

I’m setting up a model involving censored observations. I’d like to use pm.MutableData for out of sample predictions. The basic structure of the model is:

normal_lcdf = pm.distributions.dist_math.normal_lcdf

with pm.Model():
    x = pm.MutableData("x", x_data)  # (n_obs, n_features) predictors
    y = pm.MutableData("y", y_data)  # (n_obs,) responses
    
    beta = ...  # (n_features,) coefficients
    sigma = ...  # std dev
    mu = x @ beta  # (n_obs,) modelled responses

    # Uncensored observations
    mask = np.array([False, True, ...])  # (n_obs,) boolean array
    uncensored = pm.MutableData("uncensored", mask)
    pm.Normal(
        "obs_uncensored", mu=mu[uncensored], sigma=sigma, observed=y[uncensored]
    )

    # Censored observations
    censored = pm.MutableData("censored", ~mask)
    pm.Potential(
        "obs_censored",
        normal_lcdf(
            mu=mu[censored], sigma=sigma, x=y[censored]
        )
    )

This throws: TypeError: index must be integers or a boolean mask presumably because you
can’t index with pm.MutableData.

I thought about masking before passing to pm.MutableData, like:


mask = np.array([False, True, ...])  # (n_obs,) boolean array

with pm.Model():
    ...

    y_uncensored = pm.MutableData("y_uncensored", y_data[mask])
    y_censored = pm.MutableData("y_censored", y_data[~mask])

    ...

But I would still have to index into mu, which would mean redefining mu for new data.

Is there a way to implement masks using pm.MutableData? Is there another way to go
about this?

Thanks in advance for any suggestions

davipatti · November 20, 2023, 5:37pm

Answer occurred to me 5 mins after posting. Will leave here for posterity.

Just breakdown the computation of mu into mu_censored and mu_uncensored and do the masking before passing to pm.MutableData:


with pm.Model():

    ...

    x_uncensored = pm.MutableData("x_uncensored", x_data[mask])
    x_censored = pm.MutableData("x_censored", x_data[~mask])
    
    mu_uncensored = x_uncensored @ beta
    mu_censored = x_censored @ beta
    ...

Topic		Replies	Views
SamplingError with Gamma dist for censored model	3	181	April 8, 2024
PyMC4 Bayesian Parametric Survival Analysis & Mutable Masking v5 modeling	4	267	October 27, 2023
Help with Out of Sample Predictions	12	702	August 24, 2023
Pm.set_data throws error v5 bug	2	434	March 5, 2023
Calling `pm.set_data` multiple times v5	4	411	February 2, 2023

MutableData for boolean masks

Related topics