Simple imputation difficulties

Ricardo, thank you for the input. To test this out on a smaller data set I started using the data I generated in step 1 above. Generating 20 observations with 5 missing and the following code I get these results:

with pm.Model():
    
    init = np.random.randn(ma.count_masked(test_masked)).astype('float32')
    #init = np.abs(np.random.randn(14990,1))
    
    alpha_m=pm.HalfNormal('alpha_m',1,testval=.25)
    beta_m=pm.HalfNormal('beta_m',1,testval=.25)
    mu_m=pm.Normal('mu_m',0,1,testval=0)
    x = pm.InverseGamma('x',alpha=alpha_m,beta=beta_m,mu=mu_m,observed=test)
    #x_missing = pm.InverseGamma('x_missing',alpha=alpha_m,beta=beta_m,mu=mu_m,shape=init)
    
    approx=pm.sample(2000,tune=2000,progressbar=True,cores=1,chains=4,target_accept=.9999)

As you can see I left out piece trying to specify x_missing explicitly. However, when I attempt the method you suggested I get an error telling me that I can’t specify it since it’s being generated automatically.

with pm.Model():
    
    init = np.random.randn(ma.count_masked(test_masked)).astype('float32')
    #init = np.abs(np.random.randn(14990,1))
    
    alpha_m=pm.HalfNormal('alpha_m',1,testval=.25)
    beta_m=pm.HalfNormal('beta_m',1,testval=.25)
    mu_m=pm.Normal('mu_m',0,1,testval=0)
    x = pm.InverseGamma('x',alpha=alpha_m,beta=beta_m,mu=mu_m,observed=test)
    x_missing = pm.InverseGamma('x_missing',alpha=alpha_m,beta=beta_m,mu=mu_m,shape=init)
    
    approx=pm.sample(2000,tune=2000,progressbar=True,cores=1,chains=4,target_accept=.9999)
x_missing---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-150-f4bc58ab6bfd> in <module>
      8     mu_m=pm.Normal('mu_m',0,1,testval=0)
      9     x = pm.InverseGamma('x',alpha=alpha_m,beta=beta_m,mu=mu_m,observed=test)
---> 10      = pm.InverseGamma('x_missing',alpha=alpha_m,beta=beta_m,mu=mu_m,shape=init)
     11 
     12     approx=pm.sample(2000,tune=2000,progressbar=True,cores=1,chains=4,target_accept=.9999)

~\anaconda3\envs\pymc3gpu2\lib\site-packages\pymc3\distributions\distribution.py in __new__(cls, name, *args, **kwargs)
    120         else:
    121             dist = cls.dist(*args, **kwargs)
--> 122         return model.Var(name, dist, data, total_size, dims=dims)
    123 
    124     def __getnewargs__(self):

~\anaconda3\envs\pymc3gpu2\lib\site-packages\pymc3\model.py in Var(self, name, dist, data, total_size, dims)
   1156                 )
   1157                 self.deterministics.append(var)
-> 1158                 self.add_random_variable(var, dims)
   1159                 return var
   1160         elif isinstance(data, dict):

~\anaconda3\envs\pymc3gpu2\lib\site-packages\pymc3\model.py in add_random_variable(self, var, dims)
   1194         """Add a random variable to the named variables of the model."""
   1195         if self.named_vars.tree_contains(var.name):
-> 1196             raise ValueError(f"Variable name {var.name} already exists.")
   1197 
   1198         if dims is not None:

ValueError: Variable name x_missing already exists.

So for the moment that issue seems resolved. However, there is something new. As I tried to increase the size of the test data set, both the number observed and the number missing, I got a new error. Right now it seems that if there is a small number of missing covariates then the method works as it should, shown above, but I get an error in my starting values when the numbers are large. The error below is truncated because there are 1000 x_missing values.

SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'alpha_m_log__': array(-1.14815613), 'beta_m_log__': array(-1.60186875), 'mu_m': array(-0.03255755), 'x_missing': array([-6.82604916e-01, -2.79261278e-01, -7.98570009e-01, -2.89974966e-01,
        2.47602847e-01, -3.44261805e-01, -1.44073574e-01, -6.88639315e-01,
        1.51019834e-01, -3.44758260e-01,  5.23698558e-01,  1.16905209e-01,
       -2.37173179e-01,  1.11750718e+00, -5.73059114e-01,  3.71448114e-01,
        1.16254121e+00, -2.11479592e-01,  3.10421730e-01,  1.16562478e+00,
Initial evaluation results:
alpha_m_log__   -1.42
beta_m_log__    -1.85
mu_m            -0.92
x_missing        0.00
x                -inf
Name: Log-probability of test_point, dtype: float64

Here I would expect an error to be thrown for x_missing because I incorrectly formulated it’s component’s and allowed a negative value to be passed somewhere where it shouldn’t be, resulting in a inf/-inf logp but it’s showing up in x itself. Explicitly passing the correct array shapes has no effect either. I saw mention in other threads that there could be some issue with incorrectly passing int/float32/float64 values to different pieces of the code, so I’m curious whether that could be a driver or if it’s something far more simple that I am just missing. Thank you again for the feedback!