I am trying to fit a simple model using some missing value in my data:
with pm.Model(coords=coords) as h_model:
mu = pm.TruncatedNormal(name="mu",
mu=90.,
sigma=20.,
lower=0.0)
sigma = pm.HalfNormal(name="sigma",
sigma=10.0)
b = pm.Normal(name="b",
mu=mu,
sigma=sigma,
dims=tuple(coords)[1])
s = pm.HalfNormal(name="s", sigma=30.0)
a = pm.HalfNormal(name="a", sigma=s, dims=tuple(coords)[1])
eps = pm.HalfCauchy(name="eps", beta=20.0)
weight_est = pm.Deterministic('Linear_Model', a + b * data_x,
dims=tuple(coords))
_gaussian = pm.Normal(name='Gaussian',
mu=weight_est,
sigma=eps,
observed=data_y)
where data_x and data_y are pandas.DataFrames contaiing some NaN values :
data_x (just a sub-set overall shape is 34 x 64):
completed_at 27914 27915 27918 27919
0 2021-06-09 1.318627 0.940294 1.078431 0.543137
1 2021-06-11 1.062745 0.940294 1.093137 1.543137
2 2021-06-14 NaN NaN 1.008824 NaN
3 2021-06-16 1.837255 NaN 1.236275 NaN
4 2021-06-17 NaN NaN NaN NaN
data_y:
completed_at 27914 27915 27918 27919
0 2021-06-09 110.4 163.2 21.6 216.0
1 2021-06-11 96.0 192.0 136.8 192.0
2 2021-06-14 NaN NaN 182.4 NaN
3 2021-06-16 163.2 NaN 168.0 NaN
4 2021-06-17 NaN NaN NaN NaN
(as you can notice values are missing both on observed and predictor)
What I get is the following error:
/lib/python3.8/site-packages/pymc3/distributions/distribution.py in get_test_val(self, val, defaults)
153 if np.all(np.isfinite(attr_val)):
154 return attr_val
--> 155 raise AttributeError(
156 "%s has no finite default value to use, "
157 "checked: %s. Pass testval argument or "
AttributeError: [unnamed] ~ Normal has no finite default value to use, checked: ('median', 'mean', 'mode'). Pass testval argument or adjust so value is finite.
It seems like the NaN gives some problem.
I have also tried to use Data Containers:
with pm.Model(coords=coords) as h_model:
x = pm.Data(name='data_x', value=data_x, dims=tuple(coords))
y = pm.Data(name='data_y', value=data_y, dims=tuple(coords))
mu = pm.TruncatedNormal(name="mu",
mu=90.,
sigma=20.,
lower=0.0)
sigma = pm.HalfNormal(name="sigma",
sigma=10.0)
b = pm.Normal(name="b",
mu=mu,
sigma=sigma,
dims=tuple(coords)[1])
s = pm.HalfNormal(name="s", sigma=30.0)
a = pm.HalfNormal(name="a", sigma=s, dims=tuple(coords)[1])
eps = pm.HalfCauchy(name="eps", beta=20.0)
weight_est = pm.Deterministic('Linear_Model', a + b * x,
dims=tuple(coords))
_gaussian = pm.Normal(name='Gaussian',
mu=weight_est,
sigma=eps,
observed=y)
but when I try to sample the model i get a SamplingError:
with h_model:
step = pm.Metropolis()
h_trace = pm.sample(step=step, return_inferencedata=True)
SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'mu_lowerbound__': array(0.), 'sigma_log__': array(2.07679374), 'b': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]), 's_log__': array(3.17540603), 'a_log__': array([2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
2.94961468, 2.94961468, 2.94961468, 2.94961468, 2.94961468,
2.94961468, 2.94961468, 2.94961468]), 'eps_log__': array(2.99573227)}
Initial evaluation results:
mu_lowerbound__ -13.82
sigma_log__ -0.77
b -188.73
s_log__ -0.77
a_log__ -48.50
eps_log__ -1.14
Gaussian NaN
Name: Log-probability of test_point, dtype: float64
The problem is “solved” if I use data_x.fillna(1e-4) and data_x.fillna(1e-4) but I would prefer pymc3 to fill the missing value from the prior distribution.