Hi All,
I am new to PyMC and would like to thank all people contributing to PyMC. I enjoy exploring PyMC and will likely switch to PyMC from R-based Bayesian tools.
I have a specific question about the interpretation of the posterior predictive checks (PPC).
First thing first, my model is a simple GP model:
with pm.Model() as model:
if LIKELIHOOD == 'lognormal':
Lambda = pm.LogNormal("Lambda", mu_logx, sigma_logx, shape=m)
elif LIKELIHOOD == 'truncated_normal':
normal_dist = pm.Normal.dist(mu=mu_x, sigma=sigma_x)
Lambda = pm.Truncated("Lambda", normal_dist, shape=m, lower=0, upper=10)
#===============================================================================
# specify the mean function:
#===============================================================================
mu = LinearMean(K = K, Lambda = Lambda)
eta_sq = pm.Exponential("eta_sq", 1.0)
rho_sq = pm.Normal("rho_sq", 3.0, 0.25)
#===============================================================================
# specify the covariance function:
#===============================================================================
cov = eta_sq * pm.gp.cov.Exponential(input_dim=len (Y), ls=rho_sq)
#===============================================================================
# specify the GP:
#===============================================================================
gp = pm.gp.Marginal(mean_func=mu, cov_func=cov)
if SIGMA_MODEL == 'halfcauchy':
sigma = pm.HalfCauchy ("sigma", 1)
else:
sigma = pm.Exponential("sigma", 2.0)
Y_ = gp.marginal_likelihood("Y_", X=Dmat, y=Y, sigma = sigma)
trace = pm.sample(2000, tune=2000, target_accept=0.9, random_seed=RANDOM_SEED,\
return_inferencedata = True,
idata_kwargs = {'log_likelihood': True})
Then I did the posterior predictive sampling using:
with model:
pp = pm.sample_posterior_predictive(trace, var_names = [āY_ā],
extend_inferencedata=True, random_seed=RANDOM_SEED)
The result looks like:
The issue with me is that all observations should be positive as can be seen from āobserved.ā My guess is that when new data points were provided in āpm.sample_posterior_predictiveā, the sampling space of some features were automatically associated with ānegative values.ā But, physically, all of my features should be equal to or greater than 0, leading to positive observations.
Then I wrote my own PPC code (for readability I captured the code section from VSC):
Although I am not completely sure if this is correct, here, I randomly changed the original feature āKā (169 x 304 matrix, all positive values) to create a new X (features) using:
for j in range (K_copy.shape[0]):
new_X [j, :]= K_copy [j, :] * np.random.uniform (0.8, 1.2, size=K_copy.shape[1])
My own PPC looks like:
This looks more reasonable.
So, my question is which PPC result (az.plot_ppc vs. my own) is correct/better, or are both correct, or something else?
Thank you for your help in advance.