Why are MaskedArrays not supported anymore and is there suggested circumvention?

In old v4 models I used MaskedArrays to deal with missing data. Now running the same model on v5.5.0 gave an error saying MaskedArrays are not supported anymore. Is there any explanation of it?

/opt/conda/envs/pm5/lib/python3.11/site-packages/numpy/ma/core.py:467: RuntimeWarning: invalid value encountered in cast
  fill_value = np.array(fill_value, copy=False, dtype=ndtype)
/root/Codes/Bagger/development/../../Bagger/bagger/models_pm5.py:170: RuntimeWarning: invalid value encountered in cast
  TD_observed = pm.MutableData('TD_observed', dataDict['TD'].astype(int), dims=('sample', 'k4'))
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[4], line 1
----> 1 m = SNV_depth_GLM_1bc_v46(bcDict['trainData'], 8)

File ~/Codes/Bagger/development/../../Bagger/bagger/models_pm5.py:170, in SNV_depth_GLM_1bc_v46(dataDict, mu_pattern_init, mu_pattern_lower, adDis)
    167 m.add_coord('k4', dataDict['dataDF']['k4Str'], mutable=True)
    168 m.add_coord('sample', np.arange(dataDict['nSample']), mutable=True)
--> 170 TD_observed = pm.MutableData('TD_observed', dataDict['TD'].astype(int), dims=('sample', 'k4'))
    171 AD_observed = pm.MutableData('AD_observed', dataDict['AD'].astype(int), dims=('sample', 'k4'))
    172 context_idx = pm.MutableData('context_idx', dataDict['dataDF']['context'].values, dims='k4')

File /opt/conda/envs/pm5/lib/python3.11/site-packages/pymc/data.py:314, in MutableData(name, value, dims, coords, export_index_as_coords, infer_dims_and_coords, **kwargs)
    308     infer_dims_and_coords = export_index_as_coords
    309     warnings.warn(
    310         "Deprecation warning: 'export_index_as_coords; is deprecated and will be removed in future versions. Please use 'infer_dims_and_coords' instead.",
    311         DeprecationWarning,
    312     )
--> 314 var = Data(
    315     name,
    316     value,
    317     dims=dims,
    318     coords=coords,
    319     infer_dims_and_coords=infer_dims_and_coords,
    320     mutable=True,
    321     **kwargs,
    322 )
    323 return cast(SharedVariable, var)

File /opt/conda/envs/pm5/lib/python3.11/site-packages/pymc/data.py:437, in Data(name, value, dims, coords, export_index_as_coords, infer_dims_and_coords, mutable, **kwargs)
    435     mutable = False
    436 if mutable:
--> 437     x = pytensor.shared(arr, name, **kwargs)
    438 else:
    439     x = pt.as_tensor_variable(arr, name, **kwargs)

File /opt/conda/envs/pm5/lib/python3.11/site-packages/pytensor/compile/sharedvalue.py:202, in shared(value, name, strict, allow_downcast, **kwargs)
    199     raise TypeError("Shared variable values can not be symbolic.")
    201 try:
--> 202     var = shared_constructor(
    203         value,
    204         name=name,
    205         strict=strict,
    206         allow_downcast=allow_downcast,
    207         **kwargs,
    208     )
    209     add_tag_trace(var)
    210     return var

File /opt/conda/envs/pm5/lib/python3.11/functools.py:909, in singledispatch.<locals>.wrapper(*args, **kw)
    905 if not args:
    906     raise TypeError(f'{funcname} requires at least '
    907                     '1 positional argument')
--> 909 return dispatch(args[0].__class__)(*args, **kw)

File /opt/conda/envs/pm5/lib/python3.11/site-packages/pytensor/tensor/sharedvar.py:69, in tensor_constructor(value, name, strict, allow_downcast, borrow, shape, target, broadcastable)
     59 r"""`SharedVariable` constructor for `TensorType`\s.
     60 
     61 Notes
   (...)
     66 
     67 """
     68 if isinstance(value, np.ma.MaskedArray):
---> 69     raise NotImplementedError("MaskedArrays are not supported")
     71 if broadcastable is not None:
     72     warnings.warn(
     73         "The `broadcastable` keyword is deprecated; use `shape`.",
     74         DeprecationWarning,
     75     )

NotImplementedError: MaskedArrays are not supported

They were not properly supported and could result in buggy behavior. You can use ConstantData with nan pass them directly to observed to trigger automatic imputation

1 Like

In this case a follow-up question would be that what if one needs to predict on hold-out data given it’s why MutableData was used in the first place?

Out of sample predictions with imputed variables is a bit tricky and depends on the specific model structure. If you can share more details about your model and how you wanted to do out of sample predictions it would be helpful.

Dang, is this still the case?

I am having to impute missing values for my testing X variable, and would like to be able to test the model fit by giving predictions with test data.

Is there more info I can give to help answer whether this is possible or not?

Can you share some example of what you need to do?

Certainly,

I have this model (that I’m still figuring out):

with pm.Model() as model:
    
    # Priors
    home_intercept = pm.Poisson('home_ntercept', mu=2)
    away_intercept = pm.Poisson('away_intercept', mu=2)
    home_coeff = pm.Normal('home_coeff', mu=0, sigma=1, shape=home_train.shape[1])
    away_coeff = pm.Normal('away_coeff', mu=0, sigma=1, shape=away_train.shape[1])
   
    home_imp = pm.Normal("home_imp", mu=0, sigma=3, observed = home_train)
    away_imp = pm.Normal("away_imp", mu=0, sigma=3, observed = away_train)
    
    # Model error
    eps = pm.HalfCauchy('eps', beta=1)
   
    # Likelihood function
    mu = (home_intercept + pm.math.dot(home_coeff, home_imp.T)) - \
                    (away_intercept + pm.math.dot(away_coeff, away_imp.T))
    
                    
    likelihood = pm.Normal('y', mu=mu, sigma=eps, observed = y_train)
   
    # Inference
    trace = pm.sample(draws=1000, tune=1000, chains=4)

And I train it with x_train, y_train, but I would like to be able to then make predictions with some data I have stored as x_test to investigate how well it fits y_test.

Is the default use of set_data not sufficient?

https://www.pymc.io/projects/docs/en/latest/api/model/generated/pymc.model.core.set_data.html