Related to : Using WAIC to compute Log Predictive Accuracy
Can someone please explain this behavior.
I used NUTS to train linear regression model with a generated dataset that have 100 data instances and 100 dimensions.
If we compute WAIC for the train data it reduces when increasing the number of samples (samples drawn from the target distribution), as expected. However, when I calculate the WAIC for the test data, at first it gives a very low value and then it increased as shown in the figure below (x axis is time taken and y axis is WAIC).

It should be noted that, eventhough the WAIC is really high when number of samples are increased, the model converges to the true parameters. Even if we compute the MSE for test set, that also converge to the 0 with time as expected. I also checked the traceplot and observe that the variance of the model also reduces when drawing more samples. However, when I ran the same experiment reducing the number of dimensions to 10 then MCMC WAIC converges as expected.
Moreover, when I compute LOO for test data, the process fails with and error during the _psislw computation (this won’t happen when computing LOO with train data).
It is a great help if someone can help me to understand this behavior. Is that means we can’t use current implementation of WAIC and LOO to calculate the predictive accuracy?
I have attached the relevant script for your reference.
import pymc3 as pm
import numpy as np
from sklearn.model_selection import train_test_split
from theano import shared, tensor as tt
from sklearn.metrics import mean_squared_error
seed = 7
np.random.seed(seed)
def operator(x, w, c, noise):
    """
    Operator is used to generate artificial datasets
    :param w:
    :param c:
    :param noise:
    :return:
    """
    return x * w + c + noise
n = 120
d = 100
_n = n
n = int(n + n*0.2)
X = np.matrix([np.random.uniform(0, 1, d) for i in range(n)])
_w = np.matrix(np.random.uniform(-10, 10, d)).T
_c = float(np.random.uniform(-10, 10, 1))
y = operator(x=X, w=_w, c=_c, noise=np.matrix(np.random.normal(scale=0.5, size=n)).T)
y = np.array(y).flatten()
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=int(_n*0.2), shuffle=True)
def adding_intecept(X):
    """
    Adding a column for intercept
    :param X: features
    :return:
    """
    return np.c_[X, np.ones(X.shape[0])]
train_X, test_X = adding_intecept(train_X), adding_intecept(test_X)
shared_X = shared(train_X)
shared_y = shared(train_y)
with pm.Model() as model:
    # Define hyper-prior
    alpha = pm.Gamma("alpha", alpha=1e-2, beta=1e-4)
    # Define priors'
    w = pm.Normal("w", mu=0, sd=alpha, shape=train_X.shape[1])
    sigma = pm.HalfCauchy("sigma", beta=10)
    mu = tt.dot(w, shared_X.T)
    # Define likelihood
    likelihood = pm.Normal("y", mu=mu, sd=sigma, observed=shared_y)
    trace = pm.sample(500, discard_tuned_samples=False, random_seed=seed, njobs=1, chains=1)
trace1 = trace[:50] # 50 samples from tuned samples
trace2 = trace[500:1000] # after discarding 500 tune samples
print("True coefficients")
print(np.array(_w).flatten().tolist()+[_c])
print("Coefficients inferred ")
print(trace1['w'].tolist())
print(trace2['w'].tolist())
shared_X.set_value(test_X)
print("MSE")
print(mean_squared_error(test_y, pm.sample_ppc(trace1, 10000, model)['y'].mean(axis=0)))
print(mean_squared_error(test_y, pm.sample_ppc(trace2, 10000, model)['y'].mean(axis=0)))
shared_X.set_value(train_X)
print("WAIC for train data")
print(pm.waic(trace1, model).WAIC)
print(pm.waic(trace2, model).WAIC)
shared_X.set_value(test_X)
shared_y.set_value(test_y)
print("WAIC for test data")
print(pm.waic(trace1, model).WAIC)
print(pm.waic(trace2, model).WAIC)