Error: "Normal has no finite default value to use" when using data with NaN

I am trying to fit a simple model using some missing value in my data:

with pm.Model(coords=coords) as h_model:

    mu = pm.TruncatedNormal(name="mu",
                            mu=90.,
                            sigma=20.,
                            lower=0.0)
    sigma = pm.HalfNormal(name="sigma", 
                          sigma=10.0)

    b = pm.Normal(name="b",
                  mu=mu,
                  sigma=sigma,
                  dims=tuple(coords)[1])

    s = pm.HalfNormal(name="s", sigma=30.0)
    a = pm.HalfNormal(name="a", sigma=s, dims=tuple(coords)[1])

    eps = pm.HalfCauchy(name="eps", beta=20.0)

    weight_est = pm.Deterministic('Linear_Model', a + b * data_x,
                                  dims=tuple(coords))

    _gaussian = pm.Normal(name='Gaussian',
                          mu=weight_est,
                          sigma=eps,
                          observed=data_y)

where data_x and data_y are pandas.DataFrames contaiing some NaN values :

data_x (just a sub-set overall shape is 34 x 64):

  completed_at     27914     27915     27918     27919
0   2021-06-09  1.318627  0.940294  1.078431  0.543137
1   2021-06-11  1.062745  0.940294  1.093137  1.543137
2   2021-06-14       NaN       NaN  1.008824       NaN
3   2021-06-16  1.837255       NaN  1.236275       NaN
4   2021-06-17       NaN       NaN       NaN       NaN

data_y:

  completed_at  27914  27915  27918  27919
0   2021-06-09  110.4  163.2   21.6  216.0
1   2021-06-11   96.0  192.0  136.8  192.0
2   2021-06-14    NaN    NaN  182.4    NaN
3   2021-06-16  163.2    NaN  168.0    NaN
4   2021-06-17    NaN    NaN    NaN    NaN


(as you can notice values are missing both on observed and predictor)

What I get is the following error:

/lib/python3.8/site-packages/pymc3/distributions/distribution.py in get_test_val(self, val, defaults)
    153                     if np.all(np.isfinite(attr_val)):
    154                         return attr_val
--> 155             raise AttributeError(
    156                 "%s has no finite default value to use, "
    157                 "checked: %s. Pass testval argument or "

AttributeError: [unnamed] ~ Normal has no finite default value to use, checked: ('median', 'mean', 'mode'). Pass testval argument or adjust so value is finite.

It seems like the NaN gives some problem.

I have also tried to use Data Containers:

with pm.Model(coords=coords) as h_model:
    x = pm.Data(name='data_x', value=data_x, dims=tuple(coords))
    y = pm.Data(name='data_y', value=data_y, dims=tuple(coords))
    mu = pm.TruncatedNormal(name="mu",
                            mu=90.,
                            sigma=20.,
                            lower=0.0)
    sigma = pm.HalfNormal(name="sigma", 
                          sigma=10.0)

    b = pm.Normal(name="b",
                  mu=mu,
                  sigma=sigma,
                  dims=tuple(coords)[1])

    s = pm.HalfNormal(name="s", sigma=30.0)
    a = pm.HalfNormal(name="a", sigma=s, dims=tuple(coords)[1])

    eps = pm.HalfCauchy(name="eps", beta=20.0)

    weight_est = pm.Deterministic('Linear_Model', a + b * x,
                                  dims=tuple(coords))

    _gaussian = pm.Normal(name='Gaussian',
                          mu=weight_est,
                          sigma=eps,
                          observed=y)

but when I try to sample the model i get a SamplingError:

with h_model:
    step = pm.Metropolis()
    h_trace = pm.sample(step=step, return_inferencedata=True)

SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'mu_lowerbound__': array(0.), 'sigma_log__': array(2.07679374), 'b': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]), 's_log__': array(3.17540603), 'a_log__': array([2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
       2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
       2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
       2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
       2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
       2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
       2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
       2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
       2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
       2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
       2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
       2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
       2.94961468, 2.94961468, 2.94961468]), 'eps_log__': array(2.99573227)}

Initial evaluation results:
mu_lowerbound__    -13.82
sigma_log__         -0.77
b                 -188.73
s_log__             -0.77
a_log__            -48.50
eps_log__           -1.14
Gaussian              NaN
Name: Log-probability of test_point, dtype: float64

The problem is “solved” if I use data_x.fillna(1e-4) and data_x.fillna(1e-4) but I would prefer pymc3 to fill the missing value from the prior distribution.

1 Like