Ricardo, thank you for the input. To test this out on a smaller data set I started using the data I generated in step 1 above. Generating 20 observations with 5 missing and the following code I get these results:
with pm.Model():
init = np.random.randn(ma.count_masked(test_masked)).astype('float32')
#init = np.abs(np.random.randn(14990,1))
alpha_m=pm.HalfNormal('alpha_m',1,testval=.25)
beta_m=pm.HalfNormal('beta_m',1,testval=.25)
mu_m=pm.Normal('mu_m',0,1,testval=0)
x = pm.InverseGamma('x',alpha=alpha_m,beta=beta_m,mu=mu_m,observed=test)
#x_missing = pm.InverseGamma('x_missing',alpha=alpha_m,beta=beta_m,mu=mu_m,shape=init)
approx=pm.sample(2000,tune=2000,progressbar=True,cores=1,chains=4,target_accept=.9999)
As you can see I left out piece trying to specify x_missing explicitly. However, when I attempt the method you suggested I get an error telling me that I can’t specify it since it’s being generated automatically.
with pm.Model():
init = np.random.randn(ma.count_masked(test_masked)).astype('float32')
#init = np.abs(np.random.randn(14990,1))
alpha_m=pm.HalfNormal('alpha_m',1,testval=.25)
beta_m=pm.HalfNormal('beta_m',1,testval=.25)
mu_m=pm.Normal('mu_m',0,1,testval=0)
x = pm.InverseGamma('x',alpha=alpha_m,beta=beta_m,mu=mu_m,observed=test)
x_missing = pm.InverseGamma('x_missing',alpha=alpha_m,beta=beta_m,mu=mu_m,shape=init)
approx=pm.sample(2000,tune=2000,progressbar=True,cores=1,chains=4,target_accept=.9999)
x_missing---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-150-f4bc58ab6bfd> in <module>
8 mu_m=pm.Normal('mu_m',0,1,testval=0)
9 x = pm.InverseGamma('x',alpha=alpha_m,beta=beta_m,mu=mu_m,observed=test)
---> 10 = pm.InverseGamma('x_missing',alpha=alpha_m,beta=beta_m,mu=mu_m,shape=init)
11
12 approx=pm.sample(2000,tune=2000,progressbar=True,cores=1,chains=4,target_accept=.9999)
~\anaconda3\envs\pymc3gpu2\lib\site-packages\pymc3\distributions\distribution.py in __new__(cls, name, *args, **kwargs)
120 else:
121 dist = cls.dist(*args, **kwargs)
--> 122 return model.Var(name, dist, data, total_size, dims=dims)
123
124 def __getnewargs__(self):
~\anaconda3\envs\pymc3gpu2\lib\site-packages\pymc3\model.py in Var(self, name, dist, data, total_size, dims)
1156 )
1157 self.deterministics.append(var)
-> 1158 self.add_random_variable(var, dims)
1159 return var
1160 elif isinstance(data, dict):
~\anaconda3\envs\pymc3gpu2\lib\site-packages\pymc3\model.py in add_random_variable(self, var, dims)
1194 """Add a random variable to the named variables of the model."""
1195 if self.named_vars.tree_contains(var.name):
-> 1196 raise ValueError(f"Variable name {var.name} already exists.")
1197
1198 if dims is not None:
ValueError: Variable name x_missing already exists.
So for the moment that issue seems resolved. However, there is something new. As I tried to increase the size of the test data set, both the number observed and the number missing, I got a new error. Right now it seems that if there is a small number of missing covariates then the method works as it should, shown above, but I get an error in my starting values when the numbers are large. The error below is truncated because there are 1000 x_missing values.
SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'alpha_m_log__': array(-1.14815613), 'beta_m_log__': array(-1.60186875), 'mu_m': array(-0.03255755), 'x_missing': array([-6.82604916e-01, -2.79261278e-01, -7.98570009e-01, -2.89974966e-01,
2.47602847e-01, -3.44261805e-01, -1.44073574e-01, -6.88639315e-01,
1.51019834e-01, -3.44758260e-01, 5.23698558e-01, 1.16905209e-01,
-2.37173179e-01, 1.11750718e+00, -5.73059114e-01, 3.71448114e-01,
1.16254121e+00, -2.11479592e-01, 3.10421730e-01, 1.16562478e+00,
Initial evaluation results:
alpha_m_log__ -1.42
beta_m_log__ -1.85
mu_m -0.92
x_missing 0.00
x -inf
Name: Log-probability of test_point, dtype: float64
Here I would expect an error to be thrown for x_missing because I incorrectly formulated it’s component’s and allowed a negative value to be passed somewhere where it shouldn’t be, resulting in a inf/-inf logp but it’s showing up in x itself. Explicitly passing the correct array shapes has no effect either. I saw mention in other threads that there could be some issue with incorrectly passing int/float32/float64 values to different pieces of the code, so I’m curious whether that could be a driver or if it’s something far more simple that I am just missing. Thank you again for the feedback!
